compare and delete data from file

**moby** · 02-06-2005

Hi,

My program scans for specific data from a file called (Num) and then once it finds that data it puts it into a new file (Num_output), the only problem is that the data it finds can be duplicated in the Num file, how can I disregard the duplicates?

Sample of date thats in the Num file:

Nov 8 11:30:50 numbers(0x506885b00018e9b)
Nov 8 11:30:50 numbers(0x500a4250df03ec6)
Nov 8 11:30:50 numbers(0x5071bed00234795)
Nov 8 11:30:50 numbers(0x5006a770028b917)
Nov 8 11:30:50 numbers(0x50324ae0ffaa320)
Nov 8 11:30:50 numbers(0x50096c50df03e84)
Nov 8 11:30:50 numbers(0x505da9c00014e82)
Nov 8 11:30:50 numbers(0x50067e80ff31864)
Nov 8 11:30:50 numbers(0x506dd3000233f61)
Nov 8 11:30:50 numbers(0x50655040ffe8ffb)
Nov 8 11:30:50 numbers(0x505c8030ffe7f5b)
Nov 8 11:30:50 numbers(0x506a12800019d10)
Nov 8 11:30:50 numbers(0x506528e0ffe8fb9)
Nov 8 11:30:50 numbers(0x5066d020ffe958b)
Nov 8 11:30:50 numbers(0x505da9900014e82)
Nov 8 11:30:50 numbers(0x503c9070fe59228)
Nov 8 11:30:50 numbers(0x50611e70ffe885a)
Nov 8 11:30:50 numbers(0x503c9070fe59228)
Nov 8 11:30:50 numbers(0x50611e70ffe885a)

Here is my code:

Code:

<#include <stdio.h>>
<#include <stdlib.h>>
<#include <string.h>>


<int main(int argc, char *argv[])>

<{>
<FILE *fin, *fout;>
<char line[500], number[1024], searchfor[] = "numbers(", *x;
int offset;>


<fin = fopen( "Num.txt", "r" );>
<fout= fopen("Num_output.txt", "w");>

<offset = strlen(searchfor);>



<while( !feof( fin ) )>

<{>


<fgets( line, 500, fin );>
<x = strstr( line, searchfor );>


<if(x==NULL){>
<continue;>
<} >

<x = strstr(line, searchfor ) + offset;>

<strcpy(number, x);>

<strchr(number,')')[0] = '\0';>



<fputs(number,fout);>
<fputs("\n", fout);>

<}>


<fclose(fin);>
<fclose(fout);>

<return 0;>
<}>

Sample of my Num_Out file you'll see that there are duplictaes that it finds:

0x506885b00018e9b
0x500a4250df03ec6
0x5071bed00234795
0x5006a770028b917
0x50324ae0ffaa320
0x50096c50df03e84
0x505da9c00014e82
0x50067e80ff31864
0x506dd3000233f61
0x50655040ffe8ffb
0x505c8030ffe7f5b
0x506a12800019d10
0x506528e0ffe8fb9
0x5066d020ffe958b
0x505da9900014e82
0x503c9070fe59228
0x50611e70ffe885a
0x503c9070fe59228
0x50611e70ffe885a

Cheers,
Moby

**Salem** · 02-06-2005

OK, so what weirdness added <> around almost every line of code you posted?

**moby** · 02-06-2005

It wouldn't let me post the thread if I didn't adde the [code] and /code] to my code?

but as you can see between the <> its c code just disgard signs <>

moby

**quzah** · 02-06-2005

So why didn't you just add code tags, dumb ........? Good luck getting help now.

Quzah.

**moby** · 02-06-2005

Well, Manners are obviously not in your vocabulary, Shazam or Quasar or whatever the hell your name is! And if it means getting this kind off help from arrogant ego maniacs such as yourself, then I'm totally fine without YOUR help!

**Hunter2** · 02-06-2005

Don't mind Quzah, he's a little caustic at times but highly respected by most CProg members.

I suggest you re-post your code like so:
[code]
(your program's code, without < > around every line)
[/code]

As things stand, your code is highly unreadable

**moby** · 02-06-2005

Hello Hunter,

I appreciate your kind reply!

and I'm sorry for the unreadable code I posted it was a newbie error!

I will do as you suggested and make my code readable.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(int argc, char *argv[])

{

FILE *fin, *fout;
char line[500], number[1024], searchfor[] = "numbers(", *x;
int offset;


fin = fopen( "Num.txt", "r" );
fout= fopen("Num_output.txt", "w");

offset = strlen(searchfor);

while( !feof( fin ) )

{

fgets( line, 500, fin );
x = strstr( line, searchfor );

if(x==NULL){
continue;
} 

x = strstr(line, searchfor ) + offset;

strcpy(number, x);

strchr(number,')')[0] = '\0';


fputs(number,fout);
fputs("\n", fout);

}

fclose(fin);
fclose(fout);

return 0;
}

Kind Regards,
Moby

**anonytmouse** · 02-06-2005

Welcome to the forums!

How many records do you expect in your data set? Is performance important? There are a couple of ways to detect duplicates in a data set. The first is for every record, check every other record for a match. Obviously, for all but the smallest data sets, this can be very inefficient. The other way is to sort the data set. Then you can just loop through the records checking if there are two or more subsequent records that are the same.

The simplest method would be to add all your records to an array, call qsort() to sort them, run through the array to detect duplicates and finally output the array to file. A little more efficient approach may be to keep the records sorted as you read them in. If you have too many records to keep in memory (say more than a million or so on a modern workstation), then the task becomes more complicated.

**moby** · 02-07-2005

Hello Anonytmouse,

The maximum number of records in the data set would be at around a 2000 maximum! Yes, perfrormace would be and important factor as well.

Kind Regards,
Moby

**Salem** · 02-07-2005

Well if you're on Unix / Linux, your existing program will do just fine if you run it like this (once you provide the filenames via argv and not hard code them).

myprog Num.txt | sort -u > Num_output.txt

The -u switch to sort removes duplicates.

Thread: compare and delete data from file

Thread Tools

Search Thread

Display

compare and delete data from file

Similar Threads

Updating in a sequential file?

Abnormal Program Termination when executed from C:/Program Files

singly linked circular list

OpenGL Window

Contest Results - May 27, 2002