how to find coomon data in 3 files without rewind..

**~~transgalactic2~~** · 03-27-2009

i have 3 files which
contain data about people
in each file the lines are indexed by the social security number
so how find out that there are people which appear in the three files
i am not allowed to use rewind??

whats the algorithm for solving it?

**MK27** · 03-27-2009

Originally Posted by transgalactic2

i am not allowed to use rewind??

Phew!! Maybe...are the files too big to just load them all into memory?

**matsp** · 03-27-2009

If all the files are in sorted order, you can read one record from each file. If they are not the same, then read more from whichever file has the lowest index number. If they are the same, do whatever you have to do when it's a 3-way match. Repeat until any file is empty.

--
Mats

**~~transgalactic2~~** · 03-27-2009

if i will apply you tactics and go check a number of row each time
the first row in file 1 then first row in file 2 and first row in file three
and compare the data of each row
it will not wotk for this case:
005 appears in all file but by your method it will not show that 005 appeares in more then one file
??
file1:
000
001
002
003
005

file2:
003
004
005

file3:
005
006
007

**~~transgalactic2~~** · 03-27-2009

first row check comparing 000 003 005
second row check comparing 001 004 006
etc..

so by your method it will show that 005 doesnt appear
in more then one file

**matsp** · 03-27-2009

Then you did not implement what I told you to implement. Show us what you've written.

If you read from the files you have:
file 1:
000
file 2:
003
file 3:
005
Next time you read file1, keeping the other values.
You would continue reading file1 until it gets to 003. It's still no match, so you read another from file1 (or file2 - doesn't really matter): 004. 003 from file2 (or file1) is now the lowest, so we read another from there too. We now have 004 and 004 and 005. Still no match. Read again from file1 and file2: 005 005 and 005 MATCH! Since the next read from file1 or file2 is EOF, we can stop.

--
Mats

**~~transgalactic2~~** · 03-27-2009

ok i understand i need to pick the highest and read from the lower two indexes till you get an equal or bigger in which case
we need to pick the next biggest and pick from it
etc..
so in my exampe
in the first check 005 is the biggest the others are 003 and 000 i check if there is equal
i stay with 005 on file 3 and proggress the others 004 and 001 i check if there is equal
i stay with 005 on file 3 and proggress the others 005 and 002 i check if there is equal
but next i get EOF on file2 but there is 003 on file1
so we missed a match of 003 between file 1 and file 2

??

**brewbuck** · 03-27-2009

Hash the SSN, index a table of counters, each time you see an SSN bump the counter, at the end, any counter which == 3 corresponds to an SSN that was in all three files.

**~~transgalactic2~~** · 03-27-2009

whats a SSN?
whats a HASH?

**matsp** · 03-27-2009

SSN is "Social Security Number". Hash is a mechanism to make a checksum/unique(ish) number from a text string or similar. I don't actually believe this is necessary.

As to the other post:
Matching between two files is the same as matching between three:
Read from the one that is lowest in the set. If you find the same number in two files, do whatever you are supposed to do, then go on and read new data from the file(s) with the lowest value.

--
Mats

**~~transgalactic2~~** · 03-27-2009

i showed that by progressing the other two simultaneously
i miss matches.

so i need to progress the lowest each time??

**matsp** · 03-27-2009

Originally Posted by transgalactic2

i showed that by progressing the other two simultaneously
i miss matches.

so i need to progress the lowest each time??

Yes, you must only progress the lowest one.

--
Mats

**~~transgalactic2~~** · 03-27-2009

i tried to find the smallest using these two operations
but still those "if" combination are too many
how to shorten the process of finding the smallest??

Code:

if(strcmp(f1,f2)>0)
	 {
        if(strcmp(f2,f3)>0)
		{
            //smallest is f3
		}
	 }

**tabstop** · 03-27-2009

Why not do it the way 974562348 (note: I made that number up) have done it in posted "minimum" codes on the internets? Find the smallest between one and two, and then check the answer to that against three.

**~~transgalactic2~~** · 03-27-2009

but you see that i dont have such option as minimum in strcmp there is only
>0 or <0

and it depends on many cases
so again i have 6 if cases
how to shorten it?

Thread: how to find coomon data in 3 files without rewind..

Thread Tools

Search Thread

Display

how to find coomon data in 3 files without rewind..

Similar Threads

comparing data in two files

data structure design for data aggregation

Data Question about Reading Files: Encapsulated Inventory Program in C++

reading input files with different types of data

gcc problem