Thread: Listing specific files in a directory

  1. #1
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27

    Listing specific files in a directory

    The GNU site has some example code for listing directory files, found here. According to the documentation, the scandir function allows the user to specify a function with which directory entires will be compared and selects files if they return something other than zero when tested with this function. In the code example they have written a function that simply returns 1, making sure that all files are returned.
    I am interested in finding files with particular names and following some suggestions elsewhere on this forum I was able to get some code working using regex.h to match filenames. What is not clear is how to modify the "one" function to compare directory entries with a regular expression. What should be passed to the function and how would one match the relevant parts of the dirent struct against the regex?
    Thanks for any suggestions.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well scandir() passes your function a dirent struct, which contains the filename amongst other things.

    So you can tell whether you have a directory or a file (for example).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User vijoeyz's Avatar
    Join Date
    Jan 2008
    Posts
    3
    This program works well if GNU GCC is used. I believe it would not compile if -ansi switch is used.

    Thanks,
    Vijay Zanvar

  4. #4
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Thanks for the suggestions.
    Though still confused by pointers, I seem to have got it working thus:

    Code:
    static int check (const struct dirent *regcheck)
    {
    	int rc;
    	regex_t * myregex = calloc(1, sizeof(regex_t));
    	if (NULL == myregex)
    	{
    		return 1;
    	}
    	// Compile the regular expression 
    	rc = regcomp( myregex, "^.*\\.zip$", REG_EXTENDED | REG_NOSUB ); 
    
    	if (0 == regexec(myregex,regcheck->d_name, 0, 0, 0))
    	{
    		printf("Match: %s\n",regcheck->d_name);
    		return 1;
    	}
    	else
    	{
    		printf("Non-match: %s\n",regcheck->d_name);
    		return 0;
    	}
    	free(myregex);
    }
    Called by:

    Code:
    scandir(dir, &eps, check, alphasort);
    As suggested it doesn't compile with the -ansi flag.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well you could avoid all the alloc/free with
    Code:
    static int check (const struct dirent *regcheck)
    {
    	int rc;
    	regex_t myregex = { 0 };
    
    	// Compile the regular expression 
    	rc = regcomp( &myregex, "^.*\\.zip$", REG_EXTENDED | REG_NOSUB ); 
    
    	if (0 == regexec(&myregex,regcheck->d_name, 0, 0, 0))
    	{
    		printf("Match: %s\n",regcheck->d_name);
    		return 1;
    	}
    	else
    	{
    		printf("Non-match: %s\n",regcheck->d_name);
    		return 0;
    	}
    }
    You could further improve the performance by using a global to store the result of the regcomp() just once rather than regenerating it for every match.
    Also consider using strstr() as a simplified way of matching just one fixed string.

    > As suggested it doesn't compile with the -ansi flag.
    Yes, you're calling a non-ANSI function.
    Not that it's a good or bad thing, but it's nice to know where the boundaries are sometimes.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You may also want to check if the "directory" bit is set, e.g.
    Code:
    static int check (const struct dirent *regcheck)
    {
    	int rc;
    	regex_t myregex = { 0 };
    
            if (regcheck->d_type & DT_DIR)
               return 0;
    
    	// Compile the regular expression 
    	rc = regcomp( &myregex, "^.*\\.zip$", REG_EXTENDED | REG_NOSUB ); 
    
    	if (0 == regexec(&myregex,regcheck->d_name, 0, 0, 0))
    	{
    		printf("Match: %s\n",regcheck->d_name);
    		return 1;
    	}
    	else
    	{
    		printf("Non-match: %s\n",regcheck->d_name);
    		return 0;
    	}
    }
    Unless of course you actually want "*.zip" directory names to be listed as well.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Thanks for the suggestions. I am reminded that I ought to take another look at bitwise operators.
    Just one more question:

    Code:
    regex_t myregex = { 0 };
    Why does this get around the need to allocate memory?

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    It doesn't "get round the need to allocate memory", but it does "get round the need to EXPLICITLY allocate memory". The memory is automatically allocated by the compiler on the stack.

    And the " = { 0 }" part is only there to ensure that it's filled with zero's, rather than containing whatever "rubbish" happens to be left on the stack from some previous function call.

    As Salem points out, this is a typical example where you may actually want to use a global/static variable. You could use something like this:

    Code:
    	static regex_t myregex = { 0 };
            static int re_initialized = 0; 
             
            if (!re_initialzied) { 
    	   // Compile the regular expression 
    	   rc = regcomp( &myregex, "^.*\\.zip$", REG_EXTENDED | REG_NOSUB ); 
               re_initialized = 1;
            }

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Thanks again. I did, of course, mean "get around" in the sense of "not having to do it myself."
    I should also have asked why this variable is apparently assigned as an array rather than:

    Code:
    static regex_t myregex = 0;
    As it is I have changed the code so there's only one compilation of the regex, and I will add the other suggestions shortly.

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I think regex_t may be a struct, in which case you need a
    Code:
     = { 0 }
    to ensure that it's a compatible with a struct.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Great - thanks once more.

  12. #12
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    I tried replacing the explicit memory allocation with the:

    Code:
    regex_t * myregex = { 0 };
    ...and though it compiled I got a segmentation fault when I ran it. However, it works as required with the calloc so that's not a problem.
    Last edited by knirirr; 01-29-2008 at 03:13 AM.

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You don't want a pointer to a regex_t if you don't allocate - you either have a pointer and allocate, or not a pointer without allocation. And in the latter case, you pass the ADDRESS of the myregex, so use &myregex.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Strangely I can't get it to run unless I use the specific allocation.

  15. #15
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Post your code (the one that doesn't work), and I'm sure we can sort it out - unfortunately, there are more ways that "doesn't work" than the number of ones that do, and even my plastic ball can't see what's wrong with your code.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How can i check a directory for certain files?
    By patrioticpar883 in forum C++ Programming
    Replies: 13
    Last Post: 02-01-2008, 05:27 PM
  2. Finding files in a directory
    By smarta_982002 in forum C Programming
    Replies: 1
    Last Post: 01-25-2008, 10:10 AM
  3. added start menu crashes game
    By avgprogamerjoe in forum Game Programming
    Replies: 6
    Last Post: 08-29-2007, 01:30 PM
  4. Reading files in a directory
    By roktsyntst in forum Windows Programming
    Replies: 5
    Last Post: 02-07-2003, 10:04 AM
  5. searching files in directory
    By lobo in forum Windows Programming
    Replies: 5
    Last Post: 10-05-2001, 03:00 AM