Thread: Recursive DIR problems

  1. #16
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    The short answer is that it limits scalability and maintainability of your function - and of programs that call it - in the long run. As an example, the approach will not work in multi-threaded programs, if two threads attempt to use your function at once.
    O_o

    Indeed. This is an issue known as "reentrancy".

    As another example, which doesn't revolve around multiple threads, consider a function which simply used a function that is itself not reentrant.

    Consider a mutually recursive set of alternating functions (a trivial parser) which each internally calls `strtok' without manually adjusting the pointer passed to `strtok': the adjust offset of the first value stored by `strtok', the `str' pointer, will be used incorrectly by the nested calls thanks to those calls passing null to `strtok' instead of the a pointer to the string the alternate sides of the parser is responsible for processing. Such a problem happens even if `strtok' is thread-safe because the parser is being called from a single thread.

    I realize this is a complicated example, but the point here is to say that functions with reentrancy problems have a tendency to push problems uphill.

    If you write a utility function with a reentrancy problem, such as using a `static' because it is easier, you can't easily use a problematic function as part of the implementation of higher level functions without causing reentrancy problems in those higher level functions.

    Soma

  2. #17
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by nime View Post
    This is very simple but efficient and potrable code for recursive directory scanning for c files which will work in most "normal" cases...
    What on earth makes you think it is more portable than nftw()? Both nftw() and opendir() et al. are standardized in POSIX.1-2001. Granted, on some SVr4 systems you'd have to use ftw() instead, but POSIX.1-2008 obsoletes ftw(), which makes nftw() better for future use.

    Your code will get confused if directories or files are renamed, created, or deleted (in the same directory or in a parent directory) while scanning. In most cases, those do not bother nftw() at all.

    nftw() even detects loops (circular symlinks), and should report each file only once (even if renamed). Your code assumes that FILENAME_MAX is the maximum length for a path, but in reality, it's just the maximum length for one file name, i.e. one path component.

    Why don't you try to compile and run the following program using gcc in Windows, and see for yourself? I don't have or use Windows myself, but if opendir() works -- it's not in microsoft's C libraries, so you are using something else anyway --, then nftw() should work too.
    Code:
    #define  _POSIX_C_SOURCE 200809L
    #define  _XOPEN_SOURCE   500
    #include <stdlib.h>
    #include <ftw.h>
    #include <stdio.h>
    #include <string.h>
    #include <errno.h>
    
    int check(const char *const path, const struct stat *const info, int const typeflag, struct FTW *const ftwbuf)
    {
        /* Only consider regular files */
        if (typeflag == FTW_F) {
            const char *const name = path + ftwbuf->base;
            const int         depth = ftwbuf->level;
            const char *const extension = strrchr(name, '.');
    
            /* Only consider files with extension .c */
            if (extension && !strcmp(extension, ".c")) {
    
                /* Print depth, size in kilobytes (units of 1024 bytes), and full path. */
                printf("%3d %9.0fk %s\n", depth, (double)info->st_size / 1024.0, path);
    
            }
        }
    
        return 0;
    }
    
    int main(int argc, char *argv[])
    {
        if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help") || !strcmp(argv[1], "/?")) {
            fprintf(stderr, "\n");
            fprintf(stderr, "Usage: %s [ -h | --help | /? ]\n", argv[0]);
            fprintf(stderr, "       %s DIRECTORY\n", argv[0]);
            fprintf(stderr, "\n");
            fprintf(stderr, "This program will list all .c files under DIRECTORY.\n");
            fprintf(stderr, "\n");
            return EXIT_FAILURE;
        }
    
        if (nftw(argv[1], check, FOPEN_MAX - 1, 0)) {
            fprintf(stderr, "Aborted (%s).\n", strerror(errno));
            return EXIT_FAILURE;
        }
    
        return EXIT_SUCCESS;
    }

  3. #18
    Registered User
    Join Date
    Mar 2013
    Posts
    63
    No MinGw ftw.h !

    James

  4. #19
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by rcgldr View Post
    you can pass dir_level as a parameter.
    Forgot to mention, if you don't want to change the external interface, you can use an external function (list_dir()) to call an internal function (list_dir_level()).

    Code:
    static void list_dir_level(char *dir_name, int dir_level)
    {
    /* ... */
                        list_dir_level(abs_filename, dir_level+1);
    /* ... */
    }
    
    static void list_dir(char *dir_name)
    {
        list_dir_level(dir_name, 0);
    }
    
    int main(...)
    {
    /* ... */
                        list_dir(dirname);
    /* ... */
    }
    As to usage of static variables, as mentioned earlier in this thread, using static variables in a function is a problem if you're considering calling that function from multiple threads in a multi-threaded program. I'm not sure if there are any other situations where a static variable can cause a problem. For a windows dynamic link library, each process that uses the dynamic link library get's it's own virtual address space, sharing only the code, so static variables wouldn't be a problem in that case. I don't know how posix / linux / unix implement dynamic link libraries.

    On the other hand, you need static and/or global variables, such as mutexes and/or semaphores and/or shared data protected by mutexes and/or semaphores, in order to communicate between threads in a multi-threaded program, but this wouldn't apply to a function like your list directory function.

  5. #20
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by jcfuller View Post
    No MinGw ftw.h !
    Oh man, I'd never have guessed that. It's such a basic operation, really; it makes no sense to have to reinvent the wheel every single time.

    Let's see.

    I could easily write a safe file-tree-walker for Linux, using *at() functions (for thread-safety and independence from "current working directory", using one file descriptor per descended depth); say, interface being
    Code:
    struct file_info {
        size_t pathmax; /* Total size allocated */
        size_t dlen; /* Length of directory part, name starts at path + dlen */
        size_t nlen; /* Length of name part, total length is dlen + nlen */
        char *path; /* Full path to target file */
        int level; /* 0 for listed files/directories, 1 for files in listed directories, and so on */
        int  type; /* Synthetic: file/directory/symlink/... ? */
        int  attributes; /* Synthetic: readonly/temporary/hidden/archive/... ? */
        off_t  filesize; /* Size in bytes if applicable */
        double accessed; /* Time of last access, in seconds since Epoch (1970) */
        double modified; /* Time of last modification, in seconds since Epoch (1970) */
    };
    
    int tree_walk(const char *const list[],
                  int (*file_fn)(const struct file_info *, void *const),
                  int (*dir_fn)(const struct file_info *, void *const),
                  int (*other_fn)(const struct file_info *, void *const),
                  void *const custom_data);
    While 'double' is not the optimal type to represent times as seconds since Epoch (1 Jan 1970, 00:00:00) on any system, it is easy to convert to other, system-specific types. Also, it is very easy to select files based on last access or modification, as double can represent all 52-bit integers exactly, and that is enough for 142 years at microsecond resolution, or 142000 years at millisecond resolution.

    Similarly, some of the fields would have to be synthetic, and some of the fields omitted (like owner, group, and mode for Linux), but all common tasks should be easy to implement using what is listed above.

    Also note that instead of a single directory, the function takes a NULL-terminated array of pointers to directory and/or file names that will be considered. The idea is that each of the callback functions gets called exactly once for each item.

    Since for Linux (and *BSDs and other Unix and Unix-like system) there already exists nftw() and fts(), it does not make any sense to write that for Linux alone. However, if someone is interested in writing the same for Windows, we could collaborate, to create a safe, robust, portable file tree walker function. Drop me a PM if you're willing to tackle the Windoes side; maybe we can start a new thread, and push the results for inclusion in a FAQ or something.

  6. #21
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by rcgldr View Post
    As to usage of static variables, as mentioned earlier in this thread, using static variables in a function is a problem if you're considering calling that function from multiple threads in a multi-threaded program. I'm not sure if there are any other situations where a static variable can cause a problem.
    It also would not be guaranteed safe in any program that installs a non-default interrupt handler. That is another class of program where reentrancy is important.

    Multithreading is more common use case that gives problems in modern developments though. And since C11 has specifically included multithreading support, that is likely to be more important practically for C developers in the future. (Same for C++11).

    Quote Originally Posted by rcgldr View Post
    For a windows dynamic link library, each process that uses the dynamic link library get's it's own virtual address space, sharing only the code, so static variables wouldn't be a problem in that case.
    That is the common way of creating DLLs, yes. It is not the only way. Shared memory (which is accessible across multiple address spaces) is fairly important in some applications, and is often supported using particular types of DLLs.

    Threads in a program do share the process address space - in fact, that is often the point of using threads rather than concurrent processes - unless they are specifically designed to make use of thread local storage. Auto variables are usually thread-local (by default). Static variables are typically shared across threads.

    Quote Originally Posted by rcgldr View Post
    I don't know how posix / linux / unix implement dynamic link libraries.
    The specifics are a little different than for windows DLLs, but the same possibilities exist.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problems with recursive structure
    By heinz55 in forum C Programming
    Replies: 11
    Last Post: 09-29-2012, 08:42 AM
  2. Converting recursive function to tail recursive
    By ajacobs365 in forum C Programming
    Replies: 1
    Last Post: 10-30-2011, 08:15 AM
  3. merge sort: recursive is fasfter than non-recursive
    By rio_cat in forum C Programming
    Replies: 8
    Last Post: 12-04-2006, 12:52 AM
  4. recursive function problems
    By jomns in forum C++ Programming
    Replies: 6
    Last Post: 01-16-2004, 11:04 AM
  5. Recursive Solution to Any Maze And Stack Overflow Problems
    By PunkyBunny300 in forum C Programming
    Replies: 14
    Last Post: 12-14-2002, 07:00 PM

Tags for this Thread