Thread: how to find coomon data in 3 files without rewind..

  1. #1
    Banned
    Join Date
    Oct 2008
    Posts
    1,535

    how to find coomon data in 3 files without rewind..

    i have 3 files which
    contain data about people
    in each file the lines are indexed by the social security number
    so how find out that there are people which appear in the three files
    i am not allowed to use rewind??

    whats the algorithm for solving it?

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by transgalactic2 View Post
    i am not allowed to use rewind??
    Phew!! Maybe...are the files too big to just load them all into memory?
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If all the files are in sorted order, you can read one record from each file. If they are not the same, then read more from whichever file has the lowest index number. If they are the same, do whatever you have to do when it's a 3-way match. Repeat until any file is empty.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    if i will apply you tactics and go check a number of row each time
    the first row in file 1 then first row in file 2 and first row in file three
    and compare the data of each row
    it will not wotk for this case:
    005 appears in all file but by your method it will not show that 005 appeares in more then one file
    ??
    file1:
    000
    001
    002
    003
    005

    file2:
    003
    004
    005

    file3:
    005
    006
    007

  5. #5
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    first row check comparing 000 003 005
    second row check comparing 001 004 006
    etc..

    so by your method it will show that 005 doesnt appear
    in more then one file

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Then you did not implement what I told you to implement. Show us what you've written.

    If you read from the files you have:
    file 1:
    000
    file 2:
    003
    file 3:
    005
    Next time you read file1, keeping the other values.
    You would continue reading file1 until it gets to 003. It's still no match, so you read another from file1 (or file2 - doesn't really matter): 004. 003 from file2 (or file1) is now the lowest, so we read another from there too. We now have 004 and 004 and 005. Still no match. Read again from file1 and file2: 005 005 and 005 MATCH! Since the next read from file1 or file2 is EOF, we can stop.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    ok i understand i need to pick the highest and read from the lower two indexes till you get an equal or bigger in which case
    we need to pick the next biggest and pick from it
    etc..
    so in my exampe
    in the first check 005 is the biggest the others are 003 and 000 i check if there is equal
    i stay with 005 on file 3 and proggress the others 004 and 001 i check if there is equal
    i stay with 005 on file 3 and proggress the others 005 and 002 i check if there is equal
    but next i get EOF on file2 but there is 003 on file1
    so we missed a match of 003 between file 1 and file 2

    ??

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Hash the SSN, index a table of counters, each time you see an SSN bump the counter, at the end, any counter which == 3 corresponds to an SSN that was in all three files.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    whats a SSN?
    whats a HASH?

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    SSN is "Social Security Number". Hash is a mechanism to make a checksum/unique(ish) number from a text string or similar. I don't actually believe this is necessary.

    As to the other post:
    Matching between two files is the same as matching between three:
    Read from the one that is lowest in the set. If you find the same number in two files, do whatever you are supposed to do, then go on and read new data from the file(s) with the lowest value.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    i showed that by progressing the other two simultaneously
    i miss matches.

    so i need to progress the lowest each time??

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by transgalactic2 View Post
    i showed that by progressing the other two simultaneously
    i miss matches.

    so i need to progress the lowest each time??
    Yes, you must only progress the lowest one.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    i tried to find the smallest using these two operations
    but still those "if" combination are too many
    how to shorten the process of finding the smallest??
    Code:
    if(strcmp(f1,f2)>0)
    	 {
            if(strcmp(f2,f3)>0)
    		{
                //smallest is f3
    		}
    	 }
    Last edited by transgalactic2; 03-27-2009 at 12:33 PM.

  14. #14
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Why not do it the way 974562348 (note: I made that number up) have done it in posted "minimum" codes on the internets? Find the smallest between one and two, and then check the answer to that against three.

  15. #15
    Banned
    Join Date
    Oct 2008
    Posts
    1,535
    but you see that i dont have such option as minimum in strcmp there is only
    >0 or <0

    and it depends on many cases
    so again i have 6 if cases
    how to shorten it?
    Last edited by transgalactic2; 03-27-2009 at 12:55 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. comparing data in two files
    By nynicue in forum C Programming
    Replies: 25
    Last Post: 06-18-2009, 07:35 PM
  2. data structure design for data aggregation
    By George2 in forum C# Programming
    Replies: 0
    Last Post: 05-20-2008, 06:43 AM
  3. Replies: 26
    Last Post: 06-15-2005, 02:38 PM
  4. reading input files with different types of data
    By sanu in forum C++ Programming
    Replies: 4
    Last Post: 06-27-2002, 08:15 AM
  5. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM