I wrote an embedded system that captures data. Originally, we wanted have one file per day. This, however, became a problem with the web interface on the embedded machine. I had to break the captured data at the hour level.
There is a way to bring the data from the embedded machine to a desktop computer. I elected to make a *hopefully* small data store on the local computer (until we release the server-side database with web front-end) so that the history of the information can be retained and reviewed locally without the need of the USB storage unit.
The Question: I'd like to import in the data as quickly as possible (if I can, I'd like to make it as fast as or faster than a copy), but at the same time, I need to sort the data into the files that may already exist. My main thought as to how I have done it so far is:
1) Look at the file name. If the file name was unchange from the embedded system, the file format is "%05i%i%02i%02%02.%s", Unit Number, Year, Month, Day, Hour, extension.
a) From here, I grab all the files with unit, year, month, and day the same and load all the files (should be 24 of these--duh) into one array.
b) I then grab the potentially existing file.
c) Next merge the two file together into one list (updating any record that may have more information).
d) Sort the list via qsort().
e) rewrite the data.
2) Someone out there will rename one of these files. To handle this, I'll have to process the files one at a time.
3) I have also made a directory import, or in other words, I have given the user the chance to select a directory that (supposedly) contains these file types, then I parse through the entire directory/tree structure and import all the data sequentially.
Would it be better (keeping in mind that the target platform is Windows 2000 and up) to "fork" each recursive call to the function that does the directory import? I would have to handle shared memory for the answer back for which file name was created (we are planning to return the LATEST information to be displayed in the program) to determine which file to display. . . which I'm not sure how one handles this in Windows, but I could figure that out without much problem. But the main thing that I don't know, and would rather not have to find out AFTER I have spent the time required to do all of the above, is would this gain me anything, or I'm I still bottle-necking at the device?
Or, to simplify, If I call Split() in Windows, then attempt to parallel read data from an USB storage device, will I gain ANY time if the ammount of data is large (upwards of 1GB)?