Thread: The more things change department

  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868

    The more things change department

    I took up programming many years ago (in BASIC), because we needed to quickly update a data fiile, in the office. The database program took about 18 hours, just to do one part of that process.

    Last week, a computer chess forum member complained that his chess game database had been searching and removing his duplicate games, for 11 days, and still wasn't done (on an i7 based PC). He had added a second file of games, and many of those games were already in his db.

    After exporting all the records in text mode, a bit of C programming removed the duplicates quite nicely: (throughput limited by I/O speed).

    Another C success story.
    Attached Images Attached Images The more things change department-runningjoshmsdatabase-png 
    Last edited by Adak; 10-12-2011 at 10:30 AM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    11 days!?

    I detect the icy grip of a naive bubble sort somewhere in the mix.

    Because 11 days down to 5 minutes is like 200,000 times quicker. You're only going to get that with a change of algorithm, not a change of language.

    For comparison, you should implement your C code in BASIC, using the known algorithms of the standard library (where appropriate).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    The longer run was made using the database program from ChessBase, and yep, 11 days, and still not finished.

    One big speed up, was to reconstruct the games as long (very long) strings, moving the PGN tagged info to the very end, and then using system sort -- you can't beat it for string sorting, to sort all those games as strings. Finding dupes is then a piece of cake, and this is where I also believe the database programs fall short.

    I don't have a version of BASIC anymore, but it would have also done a fine job with this. It's a DIY program success story, really.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Adak View Post
    The longer run was made using the database program from ChessBase, and yep, 11 days, and still not finished.

    One big speed up, was to reconstruct the games as long (very long) strings, moving the PGN tagged info to the very end, and then using system sort -- you can't beat it for string sorting, to sort all those games as strings. Finding dupes is then a piece of cake, and this is where I also believe the database programs fall short.

    I don't have a version of BASIC anymore, but it would have also done a fine job with this. It's a DIY program success story, really.
    FWIW... the system("sort...") call is actually qsort() which is part of the standard C library... And yes it's probably a little faster to use qsort() than the system call.

    But what a nifty success story... 11 days and ticking brought down to 6 minutes... Your friend must have been amazed.

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by CommonTater View Post
    FWIW... the system("sort...") call is actually qsort() which is part of the standard C library... And yes it's probably a little faster to use qsort() than the system call.

    But what a nifty success story... 11 days and ticking brought down to 6 minutes... Your friend must have been amazed.
    The Windows system sort appears to be a multi-merge sort (if I understand their cryptic description correctly), that balances out the merging passes. It has guaranteed resources and low level micro code, that our sorting programs can only dream about.

    My optimized Quicksort's leave qsort() in the dust, and sysem(sort...), leaves my optimized Quicksorts, far behind.

    I ran across this first, in Windows2000, and it's been true ever since ( Vista not checked). I've tested it with millions of Sudoku grids, and billions of chess positions (in FEN strings), and there is simply no comparison of their sorting times. Try it and see.

    I haven't tried it with int's or floats, because I like my Quicksort!

    One change for Windows 7 system sorting is that Windows now sees any digit in the string, as a NUMBER, not a char. That's a helluva change, imo, and when you make up the maximum size of the record to sort for system sort syntax, you have to add extra size, to accommodate that change.

    Non programmers are always amazed by the difference. "How could this little console program beat my $$$ database program? It's an epiphany for them.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Sales department salary range generator
    By erm3r in forum C++ Programming
    Replies: 12
    Last Post: 09-09-2011, 07:34 AM
  2. 2 More Things??
    By dolfaniss in forum Windows Programming
    Replies: 7
    Last Post: 02-20-2011, 09:29 PM
  3. Replies: 0
    Last Post: 05-21-2005, 09:42 PM
  4. i need help with a few things
    By kwm32 in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 03-25-2004, 05:36 PM
  5. Few things...
    By LouDu in forum A Brief History of Cprogramming.com
    Replies: 10
    Last Post: 03-30-2003, 12:47 AM