Thread: Basic file handling - again!

  1. #1
    Registered User
    Join Date
    Sep 2011
    Posts
    11

    Basic file handling - again!

    Hi Folks,

    I recently asked a question on how to find files matching a given filespec. CommonTater helpfully pointed me to the API functions FindFirstFile, FindNextFile and FindClose, which answered the question as asked.

    However, as noted in the API documentation, "The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. You cannot depend on any specific ordering behavior. If the data must be sorted, you must do the ordering yourself after obtaining all the results." Under NTFS, the files appear to be alphabetical, but as noted, this is not gauranteed.

    My follow up questions is on sorting (- which seems to be a fairly common question on Google!). There does not appear to be any other native functions to sort the files returned?

    I realise that I can read the returned filenames into an array and sort them in C (or try to!), but it would appear to be far easier to do a dos shell command and parse the output of "dir /o:n" and store the sorted list directly. I'm interested in whether there is much benefit in writing a sort algothim in C when I can use the features of the OS?

    I guess that there is probably a speed penalty of doing it through a shell command, but for what are likely to be a few files (<100) then this is probably not significant?

    regards
    Dave

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Just the opposite.

    Windows has special micro-code and guaranteed resources for sorting, and there is no program that can compete with it, on a Windows 2000 or better, system.

    So:
    first (if you want), you can create the directory of files into a list and have it go into a file:

    system("dir *.* >dir.txt");

    Then to sort it:

    system("sort dir.txt >dirSorted.txt");

    and your sorted list of files goes into dirSorted.txt (or whatever filename you want to use).

    Open a terminal window (DOS or text window are other names for it), and type: dir /? |more

    and you'll get a list of all the optional settings for dir - like not including directories in the list, etc. You may even be able to make the sorted file list directly by using one of the options, I'm not sure.

    But I've tested Windows sort function, and it ROCKS! Absolutely blows away an optimized Quicksort (which is the best general purpose sorter I've found).

    These options seem like what you'd want:

    dir *.* /B /O:N >dirSorted.txt

    /B = bare format, no file size, etc. Just the name
    /O:N = Order the output, sorted by Name
    Other sorts are available using /O (date, size, etc.)
    Last edited by Adak; 09-29-2011 at 05:00 AM.

  3. #3
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    There's probably no native way to sort them, because you are only getting one at a time. If you are keeping track of them somehow, just sort them as you insert them into whatever you are sorting as.

    A system call is going to do the same thing for you, in just a different way. You either force the OS to generate a sorted list, and then you read the list one line at a time, or you have it give you the files one at a time and sort them as you get them. The only thing you are really gaining with a system call is speed of implementation. But once it's been implemented, you don't gain anything from it.


    Quzah.
    Hope is the first step on the road to disappointment.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    FWIW... in tens of thousands of accesses to files both on local systems and over various LANs ... I have *never* seen a list of files come up in anything but alphabetical order.

    Even if they did, in console mode, it's not much trouble to make a linked list of the WIN32_FIND_DATA structs and insert them to create an ordered list as you go... very fast, easily done.

    Of course if you're working GUI mode, it's easy as pie to make a self-sorting listbox with the filenames and link pointers to the data structs on the LPARAM of the list. The list will sort the structs for you!

    Going the system() route is silly... it was never intended to be used as a substitute for API calls and for a certainty the API functions are orders of magnitude faster. Beside that, you end up with a list of names... losing all other information (size, dates, attributes, etc.) as you go.


    Now if you want to talk random order retrieval... lets talk registry enumerations...
    Last edited by CommonTater; 09-29-2011 at 05:37 AM.

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    The dir options I posted above, will give ONLY the filenames, and no other information.

    Newer versions of Windows will sort the filenames alphabetically by name, in limited testing on Win7. Older versions did not always do that, however. (Thus the sort option).

    Are you saying that using the API and a linked list for this, is as easy as using a one line system call, and then reading the results back from the file using fgets()?

    Using system calls is not as elegant as using the API, but in this case, it seems much easier (it would be for me, at least), and appropriate. I agree there is extra memory and a time penalty for shelling out to the system. Well worth it, imo.

  6. #6
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Adak View Post
    The dir options I posted above, will give ONLY the filenames, and no other information.
    And one of the big reasons for enumerating a directory is to check sizes and attributes (Read only, Archive, etc.)

    Newer versions of Windows will sort the filenames alphabetically by name, in limited testing on Win7. Older versions did not always do that, however. (Thus the sort option).
    Actually, I've not seen an out of order listing since before Win2000 ... so unless our friend is using an antique computer, it shouldn't be a problem.


    Are you saying that using the API and a linked list for this, is as easy as using a one line system call, and then reading the results back from the file using fgets()?
    By the time you parse the strings (or file), move them into an array, etc. Code wise it's about a tie.


    Using system calls is not as elegant as using the API, but in this case, it seems much easier (it would be for me, at least), and appropriate. I agree there is extra memory and a time penalty for shelling out to the system. Well worth it, imo.
    WIN32_FIND_DATA structure

    Not when you realize what the API call gives you...

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by CommonTater View Post
    And one of the big reasons for enumerating a directory is to check sizes and attributes (Read only, Archive, etc.)
    File attributes and info is the "dir" command default . You have to use options, to remove them.

    Actually, I've not seen an out of order listing since before Win2000 ... so unless our friend is using an antique computer, it shouldn't be a problem.
    The OP mentioned the files were sometimes out of order, but usually they were in order.

    By the time you parse the strings (or file), move them into an array, etc. Code wise it's about a tie.
    No parsing is necessary, whatsoever. One filename, per line. Use fgets() in a while loop - it's a few simple lines of code.

    For me, it would be 50 x faster than working with the API and a linked list.

    WIN32_FIND_DATA structure

    Not when you realize what the API call gives you...
    I'm woeful when it comes to using the API, and don't like using linked lists, either. They're both elegant and a certifiable PITA. <rofl>

  8. #8
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    For the OP, you can do it the way Adak suggested; it isn't necessarily bad, but if you are going to do anything fancy with the data, you may as well build a list or whatever data structure is appropriate for your needs as you go.

    FWIW... in tens of thousands of accesses to files both on local systems and over various LANs ... I have *never* seen a list of files come up in anything but alphabetical order.
    I seriously doubt this statement.

    There are only a few file systems in the wild that store directory entries in any human order. (Entries are often made in simple "first available" order within the context of the file system.)

    In other words, `FindFirstFile' and kin work within the order the file system returns; they don't do anything fancy.

    Feel free to try a "FAT32" experiment:

    Code:
    create a new directory
    create a new file "aaa.txt" in that directory
    create a new file "bbb.txt" in that directory
    create a new file "ccc'.txt" in that directory
    list the directory contents with the `FindFirstFile' API
    rename the file "bbb.txt" to "ddd.txt"
    list the directory contents with the `FindFirstFile' API
    unless our friend is using an antique computer, it shouldn't be a problem.
    This statement is simply idiotic. The age of the computer or the technology doesn't matter. The nature of the file system matters. For example, you may very well connect to a directory through "Samba" exporting a "FS" that stores entries in "inode" order.

    Soma

  9. #9
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by phantomotap View Post
    This statement is simply idiotic.
    I agree, it is.

    (Back to my ignore list you go...)

  10. #10
    Registered User
    Join Date
    Sep 2011
    Posts
    11
    Hi again folks,

    as the original poster, I am enjoying the thread - thanks very much for the replies! - I'm glad that it has stimulated some debate and I don't appear to be wasting peoples time (at least not too much).

    Picking up on one comment, I never actually said, and didn't mean to suggest, that "the files were sometimes out of order, but usually they were in order". I am developing on a a (relatively) recent PC with WindowsXP Pro and NTFS. All of the file lists that I have seen so far have been in alphabetical order, but the comment in the API description of the FindFirstFile function got me to thinking that I should try to make sure that the returned order was as I expected, rather than hoping the that behaviour would be repeatable - particularly as I might end up installing the program on a FAT32 system.

    Adak's first reply got me thinking that the shell command would work as well as anything else - although I was not sure why I should write to a file rather than capture the output in a pipe? Once I've processed the files, I'll not need to see the filenames again - they will be renamed anyway.

    It seems like there isn't a downside to going the shell route (probably even an upside) and given that it sounds like writing a linked list function (even if I could) is way more work than is really required,

    regards
    Dave

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main(void) {
       FILE *fp;
       int len;
       char buffer[BUFSIZ];
       //If on Windows, open a terminal
       //window, enter dir /? for more option info
       system("dir /B /O:N >dir.txt"); 
       
       fp = fopen("dir.txt", "r");
       if(!fp) {
          perror("Error: File not opened");
          return 0;
       }
       while((fgets(buffer, sizeof(buffer), fp))!= NULL) {
          printf("%s", buffer);
          len = strlen(buffer);
          if(buffer[len-1]=='\n')
             buffer[len-1]='\0'; //remove the newline char fgets adds.     
    
       }
       printf("\n\n");
       return 0;
    }

  12. #12
    Registered User
    Join Date
    Sep 2011
    Posts
    11
    Hi Adak!

    thanks a lot for the code! - That's much neater than I would have managed - even down to the best use of "dir" for this job - I've used DOS for years and had not realised the /b switch was there - I'd have spent ages stripping out the garbage from the full dir listing!

    regards
    Dave

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by stevendt View Post
    Adak's first reply got me thinking that the shell command would work as well as anything else - although I was not sure why I should write to a file rather than capture the output in a pipe? Once I've processed the files, I'll not need to see the filenames again - they will be renamed anyway.

    It seems like there isn't a downside to going the shell route (probably even an upside) and given that it sounds like writing a linked list function (even if I could) is way more work than is really required,

    regards
    Dave
    With all respect to my friend Adak... he's wrong about this one...

    Stop and think about your application... If you are simply renaming files you can deal with them one at a time... no need for lists, no need for arrays and certainly no need for system calls... It's painfully easy code...

    Code:
    BOOL DoDirectoryStuff(char *WildCards)
    {  WIN32_FIND_DATA fd;
       HANDLE fh;
    
       fh = FindFirstFile(WildCards,&fd);
       if (fh == INVALID_HANDLE_VALUE)
         return FALSE;
       do 
         {  // whatever you gotta do with the file/struct goes here for example :
    
            printf("%s  \t %d \n",fd.cFileName,fd.dwFileSizeLow);
    
         }
        while(FindNextFile(fh,&fd);
        FindClose(fh);
        return TRUE;
    }
    That's it... operate directly on the struct... for example... use CopyFile() to mass copy files... use list boxes to display them... whatever. All done from a very short loop with a single struct.

    This is ultimately far easier than the system() route and orders of magnitude faster... Trust me... this I know from experience.
    Last edited by CommonTater; 09-30-2011 at 03:15 AM. Reason: clean up typos

  14. #14
    Registered User
    Join Date
    Sep 2011
    Posts
    11
    Hi //Tater,

    Thanks for the info. I'm doing a bit more than just renaming the files though.

    I am checking for files that match the given filespec (of the form YYYYMMDD.CSV) and then opening them in turn, extracting about a dozen variables, storing the variables to a database then renaming the processed files as ".txt" for archiving. Each file contains sets of timestamped values that I need to store in the DB.

    To prevent duplicate writes to the database, I don't write the data if it is earlier than the last entry in the database - that's why I need the files in the correct order. They **should** be written in the correct order, and FindFirstFile will **probably** return them in the order that I want, at least on NTFS, but not necessarily on FAT32, but the intent here was to make sure that I was processing the files in the correct name order,

    regards
    Dave

  15. #15
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    You are still only ever doing something to one file at a time, and you aren't comparing them to each other, so all you really need to do is just use FindNextFile until you are done. I assume you have some actual notation in your act of renaming that lets you know not to process it. So just look at the name and ignore it if you are done with it.


    Quzah.
    Hope is the first step on the road to disappointment.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File Handling -Read write and edit a file
    By aprop in forum C Programming
    Replies: 3
    Last Post: 02-27-2010, 02:01 PM
  2. basic file handling problem
    By georgen1 in forum C Programming
    Replies: 4
    Last Post: 03-05-2009, 06:21 AM
  3. BAsic problem with error handling in sdl using printf
    By redwing26 in forum Game Programming
    Replies: 2
    Last Post: 08-01-2006, 05:45 AM
  4. handling basic math functions
    By MyDestiny in forum C++ Programming
    Replies: 3
    Last Post: 03-02-2005, 01:12 PM
  5. File Handling?!?
    By Twiggy in forum C Programming
    Replies: 1
    Last Post: 10-23-2001, 11:43 AM