Thread: Comparing 2 directories

    Nov 2005

    Comparing 2 directories

    I am trying to write a program that compares the files of 2 directories. I am having all sorts of troubles even planning, so I'm just going to directly ask my questions:

    - How would I iterate through the directories?
    - What if there are sub-directories within each directory?
    - How would I do the comparison?

    I'd want the program to start from the 1st file in dir A, look for the same file in dir B, if the file is there then compare and output the result, if the file isn't in dir B then again print out something. This should occur for every file and folder within A.

    To elaborate on the 3rd question, I've got to use some sort of encryption tool to do the comparison, so I pick MD5. But I don't know how to use it in C++. Like, how would I apply a MD5 checksum to the content of a binary file?

    Btw, the files in the folders are mixed (i.e. txt, exe, dat, doc, html, etc)

    Any help is greatly appreciated.

    Jan 2005
    If you have two sets of directory names, then C++ algorithms like set_intersection can easily tell you which names are in both sets. So I might have a class that represents the files and directories and that can be sorted by their name (with or without case depending on your platform). I would put an entry for each file and directory in the selected directory into a set. Then you could find all those that match by name with set_intersection and continue from there.

    Different file types (directories, versus text files versus binary files, for example) could have different derived classes that do different types of comparisons. Once you have a match you do a comparison between the two. Directories could recursively hold their own contents that can be compared as well if that's how you are doing it.

    Oct 2005
    Portland, Or
    It is a checksum that you are referring to, not encryption. A checksum is a unique value that is generated from input data that would be (usually but not definitely) vastly different if even one letter is off.

    So you would just feed the data into your MD5 function. Only problem is, the order must be the same or that would cause it to fail even if they are the same.

    Nov 2005
    Thanks Daved and Wraith.

    I would put an entry for each file and directory in the selected directory into a set.
    That's a good idea but to do that, how would I obtain the names of files and folders inside the parent directory?

    Also I've decided to use CSHA1 for the comparison purpose instead of MD5 checksums (seems like there is a better documentation online for it); from all the stuff on the 'net, I still can't figure out how to use MD5's library...
    Mar 2006
    You'll need some directory handling functions. <dir.h> and gang is part of the standard I think. (Or at least part of the POSIX standard.) Or use Boost::filesystem or whatever you manage to come up with for your platform.

    Also, using hashes like MD5 is only as a heuristic. Even if two files hash equal, you should really compare them anyway, just in case. After all, it's only 128 bits. Whirlpool is 512 bits and has been puh-retty hard to collide for now.
    #include <stdio.h>
    void J(char*a){int f,i=0,c='1';for(;a[i]!='0';++i)if(i==81){
    /3*3+f/3*9+f%3]==c||a[i%9+f*9]==c||a[i-i%9+f]==c)goto e;a[i]=c;J(a);a[i]
    ='0';e:;}}int main(int c,char**v){int t=0;if(c>1){for(;v[1][
    t];++t);if(t==81){J(v[1]);return 0;}}puts("sudoku [0-9]{81}");return 1;}

