Scandir segfault

This is a discussion on Scandir segfault within the C Programming forums, part of the General Programming Boards category; I have a program that reads directories and assembles a file describing the data within - the purpose is to ...

  1. #1
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27

    Scandir segfault

    I have a program that reads directories and assembles a file describing the data within - the purpose is to create a list of files on a server so that these can be entered into a database to keep track of the data. Strangely it's started setfaulting when I try to run it on one particular directory. The directory structure is as follows:

    Code:
    /data/group1
    /data/group1/file1
    /data/group1/file2
    ...
    /data/groupN/file1
    ...
    On the dodgy directory it fails on group1/file2, at this point in the code:

    Code:
    struct dirent **epsz; // array of directory names
    char subpath[256] = {0}; // dame of subdirectory (e.g. group1) goes here
    
    p = scandir(subpath, &epsz, zipcheck, alphasort); // zipcheck returns only zip files in the directory
    if (p >= 0)
    { 
      int mc;
      for (mc = 0; mc < p; mc++)
      { 
         char filename[128] = {0};
         strcpy(filename, epsz[mc]->d_name); // accessing epsz[1]->d_name segfaults. There are 8 files in the directory.
      }
    }
    The error can be tracked to that point by the crude method of debugging with printf. If I run it through gdb insted I get:


    Code:
    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000400e6c in main ()
    (gdb) backtrace
    #0  0x0000000000400e6c in main ()
    (gdb) x 0x0000000000400e6c
    0x400e6c <main+1172>:   0x48008b48
    ...which doesn't mean much to me. If anyone can suggest how to investigate further I'd be interested to hear suggestions - thanks.

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,592
    Well that would depend on how your callback functions allocate your epsz array.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Quote Originally Posted by Salem View Post
    Well that would depend on how your callback functions allocate your epsz array.
    Removing the call to zipcheck seems to work, though it's not clear how I can fix things and keep that in.

    Code:
    static int zipcheck (const struct dirent *dir)
    { 
      if (0 == regexec(zipfinder,dir->d_name, 0, 0, 0))
      {
        return 1;
      }
      else
      {
        return 0;
      }
    }
    It's also not clear to me why this code has no trouble with larger directories containing similar files, even on the same server.
    Last edited by knirirr; 07-26-2008 at 11:12 AM. Reason: Further clarification

  4. #4
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,592
    Well something needs to be calling alloc or realloc, to allocate space for the variable number of filenames you're finding.

    Without seeing that code, there's no way to tell how you're making a mess of it.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  5. #5
    a_capitalist_story
    Join Date
    Dec 2007
    Posts
    2,651
    scandir allocates the memory for the struct dirent ** that it's passed, in this case epsz. Make sure as you go through the loop you're freeing both the internal pointers (epsz[mc]) and their container (epsz) prior to reusing epsz.

    zipfinder has been properly allocated/compiled through the use of regcomp, I assume?

  6. #6
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Quote Originally Posted by rags_to_riches View Post
    Make sure as you go through the loop you're freeing both the internal pointers (epsz[mc]) and their container (epsz) prior to reusing epsz.
    I thought that I had done that, but evidently not properly.

    zipfinder has been properly allocated/compiled through the use of regcomp, I assume?
    Indeed so.
    In case anyone is willing to look at my awful code and indicate the main fault(s) I've put it here. N.B. I would normally use Perl or Ruby, but the volume of data is so much that it's too slow.

  7. #7
    a_capitalist_story
    Join Date
    Dec 2007
    Posts
    2,651
    You have the memory leaks I suspected, related to not freeing the pointers to the contents of the containers (both epsd and epsz). These should be freed with each iteration of their respective loop. The container itself should be freed when you are done, which it appears you do properly. That doesn't appear to be the problem you're experiencing.

    You are using dynamically-allocated regex_t structs. Usually what you would do is declare them on the stack, pass the address to regcomp, which allocates them appropriately, and then pass the address of each to regexec. Then, at the end, pass the address to regfree to free the memory properly:
    Code:
    regex_t dirfinder;
    regcomp(&dirfinder, ...);
    regexec(&dirfinder, ...);
    ...
    regfree(&dirfinder); // when done with compiled regex
    One of the things I noticed is that in your scandir filters, you're not checking to see if the directory entries you're operating on are that for which you're looking. For example: in the dircheck filter, you check to see if the name matches the regex, but you're checking everything. If there happens to be a file in the directory you're scanning which is named what you're expecting a directory to be named, you're going to return true (1) and you don't want that, as it's only going to cause the scandir for zip files to fail.

    I would use fewer "magic numbers" and instead include <limits.h> and use PATH_MAX for your path-containing character arrays.

    The buffer for the popen of the md5sum program is rather short at 64 characters. It should be at least PATH_MAX characters to contain the program name and the path to the file. Overrunning this end of this array may be the cause of your crash.

    Same for your filepath array. You set aside 256 bytes for subpath in the outer loop, then in the inner loop you only set aside 128 bytes for subpath and the filename.

    Some of the changes I would make:
    Code:
    #include <limits.h> // for PATH_MAX
    ...
    const char md5_command[] = "md5sum";
    
    struct dirent **epsz;
    // No need to allocate and make a copy of the d_name member here.
    char subpath[PATH_MAX] = {0};
    snprintf(subpath, PATH_MAX, "%s/%s", dir, epsd[cnt]->d_name);
    ...
    
    for (mc = 0; mc < p; mc++)
    {
        // now that we finally have the filename in "filename"
        // and the dirname in "dirname" we can start writing the
        // "xml" file.
    
        // No need to allocate and make a copy of the d_name member here.
        char fullpath[PATH_MAX] = { 0 };
        snprintf(fullpath, PATH_MAX, "%s/%s", subpath, epsz[mc]->d_name);
        ...
        printf("\t\t<name>%s</name>\n",epsz[mc]->d_name);
    
        int bytes = 0;
        char md5sum[64] = {0}; // For alignment purposes
        ...
        char cmd[PATH_MAX + sizeof(md5_command) + 1] = { 0 };
        snprintf(cmd, sizeof(cmd), "%s %s", md5_command, fullpath);
        FILE *fd=popen(cmd,"r");
        if(fd)
        {
            fgets(md5sum, 33, fd);
            pclose(fd);
        }
        ...
    For bonus points you can check the results of snprintf and if anything's wonky, act appropriately.

    Hope this helps!

  8. #8
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Quote Originally Posted by rags_to_riches View Post
    Hope this helps!
    Indeed - most useful, thanks.
    With the exception of "hello world" &c. this was the first C program I wrote, and I still don't fully understand pointers, so I would expect it to contain various faults of this sort.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Segfault with additional variable?
    By misterFry in forum C++ Programming
    Replies: 11
    Last Post: 11-12-2008, 09:55 AM
  2. malloc() resulting in a SegFault?!
    By cipher82 in forum C++ Programming
    Replies: 21
    Last Post: 09-18-2008, 11:24 AM
  3. use of printf prevents segfault!
    By MK27 in forum C Programming
    Replies: 31
    Last Post: 08-27-2008, 12:38 PM
  4. Scandir ()
    By gurvinder in forum C Programming
    Replies: 1
    Last Post: 08-11-2008, 03:19 PM
  5. scandir select function
    By dsl24 in forum C Programming
    Replies: 3
    Last Post: 04-12-2002, 10:58 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21