Thread: What's best for mass data access?

  1. #1
    Registered User
    Join Date
    Aug 2005
    Posts
    128

    What's best for mass data access?

    I'm re-writing a prgram that runs once a year. Every time it runs, it takes 6-8 hours. In short, the application manipulates a table which contains ~350,000 rows and 8 columns. Data in each column is validated/scrubbed and at the end of this long process, a few text files are produced.

    The current process runs in MS Access.

    I can re-write it in VB 6.0, VB.NET, or C++.

    I was initially considering c++ and writing that huge table into a struct-based vector. then I got to thinking about memory req's. The Access DB is around 200,000KB, 80% of which is that table.
    I think I've just fallen in love with vectors so it sounded cool. My basic idea was move as much data into memory and deal with as few table calls as possible. True, I could move chunks of records, like a few thousand at a time...

    So, should I do this in C++ or in an easier VB language?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > I'm re-writing a prgram that runs once a year. Every time it runs, it takes 6-8 hours.
    Why are you bothering with this at all?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    As Salem points out, bothering with this is pointless. How much time will it take you to write this? The computer takes only 6-8 hours and it costs far less than minimum wage.


    Unless that is some crazy computation being performed, I bet you that a correctly written program could process the data in less than one minute.

    But consider this. Suppose that computers double their real speed every year, and that this pattern continues forever. Then the total amount of time the program spends executing, if it's run every year for eternity (assuming hardware ugrades), is 8 + 4 + 2 + 1 + 0.5 + ... = 16 hours. What's so bad about a program that spends no more than sixteen hours executing? (Change the numbers to the correct values and the answer's different, but the point is the same. Ignore the physical limitations that processors and memory access are hitting, because it's more fun to be naively optimistic. :-) )

  4. #4
    the hat of redundancy hat nvoigt's Avatar
    Join Date
    Aug 2001
    Location
    Hannover, Germany
    Posts
    3,130
    First consider what has been said. Is it worthwile to rewrite it ? If this application is running in the background, undisturbed and no one else is involved... don't bother changing anything.

    Where I work, it definetly is. No one cares what it costs in developer time, important is only the runtime of the program. Why ? Even once a year, a 6-8 hour systemwide downtime is just that. A day of productivity lost multiplied by the number of employees. You don't need many employees to justify the mythical man-month of developer time to be spent right there.

    If it indeed is that important, order a state of the art system. If it costs 500$ more than normal, that cheap compared to your cost. Get 2GB of RAM, a fast processor, a fast disc or good network access, whereever your data is comming from. Load all your data into RAM and process it there. To be honest, 350K doesn't sound like much. We are processing millions in database tables and it does take 6-8 hous. But that's millions, many more than a single PC could hold in RAM. But then, it also is a real database and not MS Access.

    Your layout in RAM depends on your application, but if you need to have fast access... load it all into RAM and process it there. If it took 6-8 hours with MS Access, you should be looking at 0.5 - 2 Hours with a C++ Programm. "Less than a minute" is too optimistic. You won't even read the data in less than a minute from that MS Access stuff.
    hth
    -nv

    She was so Blonde, she spent 20 minutes looking at the orange juice can because it said "Concentrate."

    When in doubt, read the FAQ.
    Then ask a smart question.

  5. #5
    Registered User
    Join Date
    Aug 2005
    Posts
    128
    Thanks for the info. Re-writing for two reasons. First, it runs on my pc and locks up everything until it's done. Secondly, I currently have a very light workload and I was looking for something to do. I do understand about the whole "is it worth it." I might play around with a vector that size just to see if it's do-able in this situation.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Which Data Access Technologies to use?
    By MarlonDean in forum C++ Programming
    Replies: 0
    Last Post: 03-06-2009, 01:32 AM
  2. why I cannot access private data of a friend
    By patiobarbecue in forum C++ Programming
    Replies: 2
    Last Post: 02-08-2009, 06:01 AM
  3. Not able to access private data members
    By smitsky in forum C++ Programming
    Replies: 31
    Last Post: 05-09-2004, 07:06 PM
  4. How do I access data in a STL Map?
    By Dragoon_42 in forum C++ Programming
    Replies: 1
    Last Post: 04-09-2004, 07:32 PM
  5. can't insert data into my B-Tree class structure
    By daluu in forum C++ Programming
    Replies: 0
    Last Post: 12-05-2002, 06:03 PM