Thread: C file-manipulation in Windows

  1. #1
    Registered User
    Join Date
    May 2021
    Posts
    19

    C file-manipulation in Windows

    I'm a retired firmware designer who spent far too many years programming mainly in C, with a bit of ASM and C++ here and there. But I always developed for a microcontroller target. Now I want to write a fast-as-hell file checksummer to run as a Windows console application, to verify my 150,000 music files before I back them up. This involves accessing Windows file-access calls and APIs, of which I know nothing. I know they exist, and I know what I want, but I don't know where to look for it.

    Where do I look to find the libraries (?) I need. Should I use Visual Studio, or are there better (free) alternatives? All hints and tips appreciated. Thanks.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    About File Management - Win32 apps | Microsoft Docs
    Any windows compiler will have access to this API.

    > Now I want to write a fast-as-hell file checksummer
    Bear in mind that it doesn't matter how fast your checksum is, the elephant in the room is the bandwidth of your disk.
    Using the platform specific ReadFile instead of the standard fread() is not going to magically transform the performance of your program from hours to seconds.

    > to verify my 150,000 music files before I back them up.
    Perhaps you also desire some kind of error correction in addition to error detection.
    Parchive - Wikipedia
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Visual studio community edition would be a reasonable choice.

    Here is a breadcrumb on reading and writing files using the Windows API:

    Opening a File for Reading or Writing - Win32 apps | Microsoft Docs

  4. #4
    Registered User
    Join Date
    May 2012
    Posts
    505
    Quote Originally Posted by Pattern-chaser View Post
    I'm a retired firmware designer who spent far too many years programming mainly in C, with a bit of ASM and C++ here and there. But I always developed for a microcontroller target. Now I want to write a fast-as-hell file checksummer to run as a Windows console application, to verify my 150,000 music files before I back them up. This involves accessing Windows file-access calls and APIs, of which I know nothing. I know they exist, and I know what I want, but I don't know where to look for it.

    Where do I look to find the libraries (?) I need. Should I use Visual Studio, or are there better (free) alternatives? All hints and tips appreciated. Thanks.
    The days when you had to pay for a compiler are over. Visual Studio is free, and is a highly capable C++ (and therefore C) compiler.

    Here's the old Windows file API

    Fileapi.h header - Win32 apps | Microsoft Docs

    whilst it says Win32, I believe there is also a 64 bit version. You'll need FindFirstFile and FindNextFile to enumerate the directories. If you have a tree of directories with audio files in them, you'll have to test for files which are directories and call recursively.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  5. #5
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    Reading the files is the likely bottleneck of your program. I don't see how you can speed that up much since you want to process them each through once. I would try something simple like below. Let it run overnight. Time it. You can just print out the start and end clock times as I did below since you don't need more than second resolution. Test it on a thousand files first. Multiply that by 150 for an estimate. If a thousand takes 2 minutes, 150000 will take about 5 hours. It could easily be two, three, four times faster than that. Probably depends mostly on the disk speed.

    The program below simply adds up all bytes of all mp3 files. Obviously you need to put in your own calculation. Unfortunately, I couldn't test the Windows version since I don't use windows. I wrote and lightly tested the linux version which I include at the end.
    Code:
    #include <windows.h>
    #include <stdio.h>
    #include <time.h>
     
    #define CAPACITY (5*1024*1024)
    #define TESTING   1 // if non-zero, stop after TESTLIMIT files processed
    #define TESTLIMIT 1000
     
    size_t check(const char *filename) {
        static unsigned char buf[CAPACITY];
        FILE *f = fopen(filename, "rb");
        size_t sum = 0;
        for (size_t size; (size = fread(buf, 1, CAPACITY, f)) != 0; )
            for (const unsigned char *b = buf; size-- > 0; ++b)
                sum += *b;
        fclose(f);
        return sum;
    }
     
    size_t procdir(const char *dirname) {
    #if TESTING
        static int count = 0;
        if (count >= TESTLIMIT) return 0;
    #endif
        size_t sum = 0;
        SetCurrentDirectory(dirname);
        WIN32_FIND_DATA data;
        HANDLE hfind = FindFirstFile("*.mp3", &data);
        if (hfind != INVALID_HANDLE_VALUE) {
            do {
                if (data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
                    procdir(data.cFileName);
                else {
                    sum += check(data.cFileName);
    #if TESTING
                    //printf("%s\n", data.cFileName); // don't print if timing
                    if (++count >= TESTLIMIT) break;
    #endif
                }
            } while (FindNextFile(hfind, &data));
            FindClose(hfind);
        }
        SetCurrentDirectory("..");
        return sum;
    }
     
    int main() {
        time_t t;
        time(&t);
        printf("Start: %s", asctime(localtime(&t))); // asctime adds '\n'
     
        size_t sum = procdir(".");
        printf("%zu\n", sum);
     
        time(&t);
        printf("End  : %s", asctime(localtime(&t)));
        return 0;
    }
    Code:
    #define _DEFAULT_SOURCE 1
    #include <stdio.h>
    #include <string.h>
    #include <time.h>
    #include <sys/types.h>
    #include <dirent.h>
    #include <unistd.h>
     
    #define CAPACITY (5*1024*1024)
    #define TESTING   1 // if non-zero, stop after TESTLIMIT files processed
    #define TESTLIMIT 10
     
    size_t check(const char *filename) {
        static unsigned char buf[CAPACITY];
        FILE *f = fopen(filename, "rb");
        size_t sum = 0;
        for (size_t size; (size = fread(buf, 1, CAPACITY, f)) != 0; )
            for (const unsigned char *b = buf; size-- > 0; ++b)
                sum += *b;
        fclose(f);
        return sum;
    }
     
    size_t procdir(const char *dirname) {
    #if TESTING
        static int count = 0;
        if (count >= TESTLIMIT) return 0;
    #endif
        size_t sum = 0;
        chdir(dirname);
        DIR *dir = opendir(".");
        if (!dir) printf("Can't open dir\n");
        if (dir) {
            struct dirent *file;
            while ((file = readdir(dir))) {
                const char *p = strrchr(file->d_name, '.'); // find last .
                if (p && !p[1]) continue; // continue if ends in .
                if (file->d_type == DT_DIR)
                    procdir(file->d_name);
                else if (p && strcmp(p, ".mp3") == 0) {
                    sum += check(file->d_name);
    #if TESTING
                    //printf("%s\n", file->d_name); // don't print if timing
                    if (++count >= TESTLIMIT) break;
    #endif
                }
            }
            closedir(dir);
        }
        chdir("..");
        return sum;
    }
     
    int main() {
        time_t t;
        time(&t);
        printf("Start: %s", asctime(localtime(&t))); // asctime adds '\n'
     
        size_t sum = procdir(".");
        printf("%zu\n", sum);
     
        time(&t);
        printf("End  : %s", asctime(localtime(&t)));
        return 0;
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

  6. #6
    Registered User Sir Galahad's Avatar
    Join Date
    Nov 2016
    Location
    The Round Table
    Posts
    277
    Just out of curiosity, I wonder if this compiles and runs ok on Windows?

    Code:
    #ifdef __linux__
    #include <dirent.h>
    #include <sys/stat.h>
    #include <unistd.h>
    #elif _WIN32
    #include <dir.h>
    #include <direct.h>
    #include <stat.h>
    #define getcwd _getcwd
    #define chdir _chdir
    #define lstat _stat
    #else
    #error Unsupported platform!
    #endif
    
    #include <stdbool.h>
    #include <stdlib.h>
    #include <string.h>
    
    char* filesystem_current(void) {
      size_t size = 16;
      char* buffer = malloc(size);
      while (getcwd(buffer, size) == NULL) {
        size <<= 1;
        buffer = realloc(buffer, size);
      }
      return buffer;
    }
    
    typedef bool (*filesystem_process_path)(const char* pathName, void* userData);
    
    bool filesystem_process_path_NOP_(const char* ignored, void* unused) {
      return true;
    }
    
    bool filesystem_process_recurse_(const char* pathName,
                                     void* userData,
                                     filesystem_process_path userEnterDirectory,
                                     filesystem_process_path userProcessFile,
                                     filesystem_process_path userLeaveDirectory) {
      struct stat info = {0};
      if (lstat(pathName, &info) != 0)
        return false;
      if (S_ISREG(info.st_mode))
        return userProcessFile(pathName, userData);
      else if (!S_ISDIR(info.st_mode))
        return true; /* Ignore */
      DIR* directory = opendir(pathName);
      if (!directory)
        return false;
      chdir(pathName);
      bool success = userEnterDirectory(pathName, userData);
      while (success) {
        struct dirent* next = readdir(directory);
        if (next == NULL)
          break;
        char* path = next->d_name;
        if (!strcmp(path, ".") || !strcmp(path, ".."))
          continue;
        success = filesystem_process_recurse_(path, userData, userEnterDirectory,
                                              userProcessFile, userLeaveDirectory);
      }
      userLeaveDirectory(pathName, userData);
      closedir(directory);
      chdir("..");
      return success;
    }
    
    bool filesystem_process_(const char* pathName,
                             void* userData,
                             filesystem_process_path userEnterDirectory,
                             filesystem_process_path userProcessFile,
                             filesystem_process_path userLeaveDirectory) {
      if (!userEnterDirectory)
        userEnterDirectory = filesystem_process_path_NOP_;
      if (!userProcessFile)
        userProcessFile = filesystem_process_path_NOP_;
      if (!userLeaveDirectory)
        userLeaveDirectory = filesystem_process_path_NOP_;
      char* saved = filesystem_current();
      bool success =
          filesystem_process_recurse_(pathName, userData, userEnterDirectory,
                                      userProcessFile, userLeaveDirectory);
      chdir(saved);
      free(saved);
      return success;
    }
    
    #define filesystem_process(pathName, userData, userEnterDirectory, \
                               userProcessFile, userLeaveDirectory)    \
      filesystem_process_(pathName, (void*)userData,                   \
                          (filesystem_process_path)userEnterDirectory, \
                          (filesystem_process_path)userProcessFile,    \
                          (filesystem_process_path)userLeaveDirectory)
    
    #define filesystem_process_files(pathName, userData, userProcessFile) \
      filesystem_process(pathName, userData, NULL, userProcessFile, NULL)
    
    #define filesystem_process_files_before(pathName, userData, userProcessFile, \
                                            userLeaveDirectory)                  \
      filesystem_process(pathName, userData, NULL, userProcessFile,              \
                         userLeaveDirectory)
    
    #define filesystem_process_files_after(pathName, userData, userProcessFile,   \
                                           userEnterDirectory)                    \
      filesystem_process(pathName, userData, userEnterDirectory, userProcessFile, \
                         NULL)
    
    #define filesystem_process_directories(pathName, userData, userEnterDirectory, \
                                           userLeaveDirectory)                     \
      filesystem_process(pathName, userData, userEnterDirectory, NULL,             \
                         userLeaveDirectory)
    
    /*
     Sample usage
    */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    bool print_file(char* path, size_t* total) {
      char* current = filesystem_current();
      printf("%s [%s]\n", path, current);
      free(current);
      ++(*total);
      return true;
    }
    
    void process(const char* path) {
      printf("Processing `%s`...\n", path);
      size_t count = 0;
      bool ok = filesystem_process_files(path, &count, print_file);
      printf("filesystem_process_files: ");
      if (!ok)
        puts("FAILED");
      else
        printf("success (%zu files encountered)\n", count);
    }
    
    int main(int argc, char** argv) {
      if (argc < 2)
        process(".");
      for (;;) {
        char* path = *(++argv);
        if (!path)
          break;
        process(path);
      }
    }

  7. #7
    Registered User
    Join Date
    Sep 2020
    Posts
    150
    No, Visual Studio doesn't have dir.h.

    stat.h is in sys/stat.h

  8. #8
    Registered User Sir Galahad's Avatar
    Join Date
    Nov 2016
    Location
    The Round Table
    Posts
    277
    Quote Originally Posted by thmm View Post
    No, Visual Studio doesn't have dir.h.

    stat.h is in sys/stat.h
    Ah, thanks! It looks like dirent.h may not be available on all Windows machines by default either, so a drop-in header may still be required (this one looks like a pretty good implementation).

    I just like things to be as portable as possible. Here's the corrected code:

    Code:
    #ifdef _WIN32
    #include <direct.h>
    #define getcwd _getcwd
    #define chdir _chdir
    #define lstat _stat
    #else
    #include <unistd.h>
    #endif
    
    #include <dirent.h>
    #include <stdbool.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/stat.h>
    
    char* filesystem_current(void) {
      size_t size = 8;
      char* buffer = malloc(size);
      while (getcwd(buffer, size) == NULL) {
        size <<= 1;
        buffer = realloc(buffer, size);
      }
      return buffer;
    }
    
    typedef bool (*filesystem_process_path)(const char* pathName, void* userData);
    
    bool filesystem_process_path_NOP_(const char* ignored, void* unused) {
      return true;
    }
    
    bool filesystem_process_recurse_(const char* pathName,
                                     void* userData,
                                     filesystem_process_path userEnterDirectory,
                                     filesystem_process_path userProcessFile,
                                     filesystem_process_path userLeaveDirectory) {
      struct stat info = {0};
      if (lstat(pathName, &info) != 0)
        return false;
      if (S_ISREG(info.st_mode))
        return userProcessFile(pathName, userData);
      else if (!S_ISDIR(info.st_mode))
        return true; /* Ignore */
      DIR* directory = opendir(pathName);
      if (!directory)
        return false;
      chdir(pathName);
      bool success = userEnterDirectory(pathName, userData);
      while (success) {
        struct dirent* next = readdir(directory);
        if (next == NULL)
          break;
        char* path = next->d_name;
        if (!strcmp(path, ".") || !strcmp(path, ".."))
          continue;
        success = filesystem_process_recurse_(path, userData, userEnterDirectory,
                                              userProcessFile, userLeaveDirectory);
      }
      userLeaveDirectory(pathName, userData);
      closedir(directory);
      chdir("..");
      return success;
    }
    
    bool filesystem_process_(const char* pathName,
                             void* userData,
                             filesystem_process_path userEnterDirectory,
                             filesystem_process_path userProcessFile,
                             filesystem_process_path userLeaveDirectory) {
      if (!userEnterDirectory)
        userEnterDirectory = filesystem_process_path_NOP_;
      if (!userProcessFile)
        userProcessFile = filesystem_process_path_NOP_;
      if (!userLeaveDirectory)
        userLeaveDirectory = filesystem_process_path_NOP_;
      char* saved = filesystem_current();
      bool success =
          filesystem_process_recurse_(pathName, userData, userEnterDirectory,
                                      userProcessFile, userLeaveDirectory);
      chdir(saved);
      free(saved);
      return success;
    }
    
    #define filesystem_process(pathName, userData, userEnterDirectory, \
                               userProcessFile, userLeaveDirectory)    \
      filesystem_process_(pathName, (void*)userData,                   \
                          (filesystem_process_path)userEnterDirectory, \
                          (filesystem_process_path)userProcessFile,    \
                          (filesystem_process_path)userLeaveDirectory)
    
    #define filesystem_process_files(pathName, userData, userProcessFile) \
      filesystem_process(pathName, userData, NULL, userProcessFile, NULL)
    
    #define filesystem_process_files_before(pathName, userData, userProcessFile, \
                                            userLeaveDirectory)                  \
      filesystem_process(pathName, userData, NULL, userProcessFile,              \
                         userLeaveDirectory)
    
    #define filesystem_process_files_after(pathName, userData, userProcessFile,   \
                                           userEnterDirectory)                    \
      filesystem_process(pathName, userData, userEnterDirectory, userProcessFile, \
                         NULL)
    
    #define filesystem_process_directories_then(                       \
        pathName, userData, userEnterDirectory, userLeaveDirectory)    \
      filesystem_process(pathName, userData, userEnterDirectory, NULL, \
                         userLeaveDirectory)
    
    #define filesystem_process_directories(pathName, userData, userEnterDirectory) \
      filesystem_process(pathName, userData, userEnterDirectory, NULL, NULL)
    
    /*
     Sample usage
    */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    bool print_file(char* path, size_t* total) {
      char* current = filesystem_current();
      printf("%s [%s]\n", path, current);
      free(current);
      ++(*total);
      return true;
    }
    
    void process(const char* path) {
      printf("Processing `%s`...\n", path);
      size_t count = 0;
      bool ok = filesystem_process_files(path, &count, print_file);
      printf("filesystem_process_files: ");
      if (!ok)
        puts("FAILURE");
      else
        printf("success (%zu files encountered)\n", count);
    }
    
    int main(int argc, char** argv) {
      if (argc < 2)
        process(".");
      for (;;) {
        char* path = *(++argv);
        if (!path)
          break;
        process(path);
      }
    }

  9. #9
    Registered User
    Join Date
    Sep 2020
    Posts
    150
    It works with the dirent header from above link.
    Just 1 warning: main.c(42,28): warning C4133: 'function': incompatible types - from 'stat *' to '_stat64i32 *'

  10. #10
    Registered User Sir Galahad's Avatar
    Join Date
    Nov 2016
    Location
    The Round Table
    Posts
    277
    Quote Originally Posted by thmm View Post
    It works with the dirent header from above link.
    Just 1 warning: main.c(42,28): warning C4133: 'function': incompatible types - from 'stat *' to '_stat64i32 *'
    I honestly don't understand the warning. I'm tempted to blame the WIN32 stat.h implementation (M$ loves to break API's).

  11. #11
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    If you want to speed things up you want to investigate how you can make the most of all the required resources.

    You want to keep many CPU cores busy calculating checksums, while you have a reasonable number of I/Os in flight, and making the most of memory available to you.

    You will most likely need a multi-threaded program, one thread walking the directory tree queueing up files for the ather threads to checksum.

  12. #12
    Registered User
    Join Date
    May 2021
    Posts
    19
    Thanks everyone, for taking the trouble to reply!

    No, I don't want to try to include error-correction too. In the end, usable and useful error correction takes as much space as multiple copies of the files do. [I keep one master collection and two full copies on different discs.] But I would like to be able to detect file corruption before I back up the corrupted data on top of the good! [It was doing that that stimulated my interest in checksumming and file verification.]

    One detail of the implementation intrigues me, though. Some files in Windows seem to have tags, reflecting sampling rate or composer for music files, and so on. I am wondering how these tags can be written and read programmatically under Windows? My thought is that I could use a tag to store the checksum for the file. This would be very handy, save loads of files containing the checksums, and make it easier to associate each file with its own checksum. This will, in turn, simplify the detection of new (un-checksummed) files in a folder.

    So, a 'checksum' tag for each file? How is it done, does anyone know? TIA!

  13. #13
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    What kind of checksum are you looking to calculate?
    A little inaccuracy saves tons of explanation. - H.H. Munro

  14. #14
    Registered User Sir Galahad's Avatar
    Join Date
    Nov 2016
    Location
    The Round Table
    Posts
    277
    Quote Originally Posted by Pattern-chaser View Post
    Thanks everyone, for taking the trouble to reply!

    No, I don't want to try to include error-correction too. In the end, usable and useful error correction takes as much space as multiple copies of the files do. [I keep one master collection and two full copies on different discs.] But I would like to be able to detect file corruption before I back up the corrupted data on top of the good! [It was doing that that stimulated my interest in checksumming and file verification.]

    One detail of the implementation intrigues me, though. Some files in Windows seem to have tags, reflecting sampling rate or composer for music files, and so on. I am wondering how these tags can be written and read programmatically under Windows? My thought is that I could use a tag to store the checksum for the file. This would be very handy, save loads of files containing the checksums, and make it easier to associate each file with its own checksum. This will, in turn, simplify the detection of new (un-checksummed) files in a folder.

    So, a 'checksum' tag for each file? How is it done, does anyone know? TIA!


    What you're talking about doesn't seem to be applicable to regular files. Image files carry metadata that can be extracted, which is what Windows is doing there.

    That said, there may be other ways to do this. A cursory search on StackOverflow yielded an interesting possibility using OLE. (Scroll down to the 4th answer.)

  15. #15
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    Quote Originally Posted by Sir Galahad View Post
    What you're talking about doesn't seem to be applicable to regular files. Image files carry metadata that can be extracted, which is what Windows is doing there.
    mp3 files can have id3 (v1 and v2) tags which contain metadata.
    A little inaccuracy saves tons of explanation. - H.H. Munro

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help in file Manipulation
    By arunvijay19 in forum C Programming
    Replies: 5
    Last Post: 02-07-2010, 05:23 AM
  2. File Manipulation Help
    By GCNDoug in forum C Programming
    Replies: 2
    Last Post: 05-07-2007, 12:11 PM
  3. i/o file manipulation
    By mouse163 in forum C++ Programming
    Replies: 4
    Last Post: 05-03-2003, 05:48 PM
  4. File manipulation
    By Shadow in forum C Programming
    Replies: 1
    Last Post: 04-23-2002, 08:07 AM
  5. file manipulation
    By swiss powder in forum C++ Programming
    Replies: 2
    Last Post: 02-27-2002, 01:24 PM

Tags for this Thread