Thread: compare and delete data from file

  1. #1
    Registered User
    Join Date
    Nov 2004
    Posts
    5

    compare and delete data from file

    Hi,

    My program scans for specific data from a file called (Num) and then once it finds that data it puts it into a new file (Num_output), the only problem is that the data it finds can be duplicated in the Num file, how can I disregard the duplicates?


    Sample of date thats in the Num file:


    Nov 8 11:30:50 numbers(0x506885b00018e9b)
    Nov 8 11:30:50 numbers(0x500a4250df03ec6)
    Nov 8 11:30:50 numbers(0x5071bed00234795)
    Nov 8 11:30:50 numbers(0x5006a770028b917)
    Nov 8 11:30:50 numbers(0x50324ae0ffaa320)
    Nov 8 11:30:50 numbers(0x50096c50df03e84)
    Nov 8 11:30:50 numbers(0x505da9c00014e82)
    Nov 8 11:30:50 numbers(0x50067e80ff31864)
    Nov 8 11:30:50 numbers(0x506dd3000233f61)
    Nov 8 11:30:50 numbers(0x50655040ffe8ffb)
    Nov 8 11:30:50 numbers(0x505c8030ffe7f5b)
    Nov 8 11:30:50 numbers(0x506a12800019d10)
    Nov 8 11:30:50 numbers(0x506528e0ffe8fb9)
    Nov 8 11:30:50 numbers(0x5066d020ffe958b)
    Nov 8 11:30:50 numbers(0x505da9900014e82)
    Nov 8 11:30:50 numbers(0x503c9070fe59228)
    Nov 8 11:30:50 numbers(0x50611e70ffe885a)
    Nov 8 11:30:50 numbers(0x503c9070fe59228)
    Nov 8 11:30:50 numbers(0x50611e70ffe885a)


    Here is my code:

    Code:
    <#include <stdio.h>>
    <#include <stdlib.h>>
    <#include <string.h>>
    
    
    <int main(int argc, char *argv[])>
    
    <{>
    <FILE *fin, *fout;>
    <char line[500], number[1024], searchfor[] = "numbers(", *x;
    int offset;>
    
    
    <fin = fopen( "Num.txt", "r" );>
    <fout= fopen("Num_output.txt", "w");>
    
    <offset = strlen(searchfor);>
    
    
    
    <while( !feof( fin ) )>
    
    <{>
    
    
    <fgets( line, 500, fin );>
    <x = strstr( line, searchfor );>
    
    
    <if(x==NULL){>
    <continue;>
    <} >
    
    <x = strstr(line, searchfor ) + offset;>
    
    <strcpy(number, x);>
    
    <strchr(number,')')[0] = '\0';>
    
    
    
    <fputs(number,fout);>
    <fputs("\n", fout);>
    
    <}>
    
    
    <fclose(fin);>
    <fclose(fout);>
    
    <return 0;>
    <}>

    Sample of my Num_Out file you'll see that there are duplictaes that it finds:

    0x506885b00018e9b
    0x500a4250df03ec6
    0x5071bed00234795
    0x5006a770028b917
    0x50324ae0ffaa320
    0x50096c50df03e84
    0x505da9c00014e82
    0x50067e80ff31864
    0x506dd3000233f61
    0x50655040ffe8ffb
    0x505c8030ffe7f5b
    0x506a12800019d10
    0x506528e0ffe8fb9
    0x5066d020ffe958b
    0x505da9900014e82
    0x503c9070fe59228
    0x50611e70ffe885a
    0x503c9070fe59228
    0x50611e70ffe885a


    Cheers,
    Moby

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    OK, so what weirdness added <> around almost every line of code you posted?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Nov 2004
    Posts
    5
    It wouldn't let me post the thread if I didn't adde the [code] and /code] to my code?

    but as you can see between the <> its c code just disgard signs <>

    moby

  4. #4
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    So why didn't you just add code tags, dumb ........? Good luck getting help now.

    Quzah.
    Hope is the first step on the road to disappointment.

  5. #5
    Registered User
    Join Date
    Nov 2004
    Posts
    5
    Well, Manners are obviously not in your vocabulary, Shazam or Quasar or whatever the hell your name is! And if it means getting this kind off help from arrogant ego maniacs such as yourself, then I'm totally fine without YOUR help!

  6. #6
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    Don't mind Quzah, he's a little caustic at times but highly respected by most CProg members.

    I suggest you re-post your code like so:
    [code]
    (your program's code, without < > around every line)
    [/code]

    As things stand, your code is highly unreadable
    Last edited by Hunter2; 02-06-2005 at 04:35 PM.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  7. #7
    Registered User
    Join Date
    Nov 2004
    Posts
    5
    Hello Hunter,

    I appreciate your kind reply! and I'm sorry for the unreadable code I posted it was a newbie error!

    I will do as you suggested and make my code readable.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    
    int main(int argc, char *argv[])
    
    {
    
    FILE *fin, *fout;
    char line[500], number[1024], searchfor[] = "numbers(", *x;
    int offset;
    
    
    fin = fopen( "Num.txt", "r" );
    fout= fopen("Num_output.txt", "w");
    
    offset = strlen(searchfor);
    
    while( !feof( fin ) )
    
    {
    
    fgets( line, 500, fin );
    x = strstr( line, searchfor );
    
    if(x==NULL){
    continue;
    } 
    
    x = strstr(line, searchfor ) + offset;
    
    strcpy(number, x);
    
    strchr(number,')')[0] = '\0';
    
    
    fputs(number,fout);
    fputs("\n", fout);
    
    }
    
    fclose(fin);
    fclose(fout);
    
    return 0;
    }

    Kind Regards,
    Moby

  8. #8
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Welcome to the forums!

    How many records do you expect in your data set? Is performance important? There are a couple of ways to detect duplicates in a data set. The first is for every record, check every other record for a match. Obviously, for all but the smallest data sets, this can be very inefficient. The other way is to sort the data set. Then you can just loop through the records checking if there are two or more subsequent records that are the same.

    The simplest method would be to add all your records to an array, call qsort() to sort them, run through the array to detect duplicates and finally output the array to file. A little more efficient approach may be to keep the records sorted as you read them in. If you have too many records to keep in memory (say more than a million or so on a modern workstation), then the task becomes more complicated.

  9. #9
    Registered User
    Join Date
    Nov 2004
    Posts
    5
    Hello Anonytmouse,

    The maximum number of records in the data set would be at around a 2000 maximum! Yes, perfrormace would be and important factor as well.

    Kind Regards,
    Moby

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Well if you're on Unix / Linux, your existing program will do just fine if you run it like this (once you provide the filenames via argv and not hard code them).

    myprog Num.txt | sort -u > Num_output.txt

    The -u switch to sort removes duplicates.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Updating in a sequential file?
    By Ronnyv1 in forum C Programming
    Replies: 1
    Last Post: 03-24-2009, 04:41 PM
  2. Replies: 48
    Last Post: 09-26-2008, 03:45 AM
  3. singly linked circular list
    By DarkDot in forum C++ Programming
    Replies: 0
    Last Post: 04-24-2007, 08:55 PM
  4. OpenGL Window
    By Morgul in forum Game Programming
    Replies: 1
    Last Post: 05-15-2005, 12:34 PM
  5. Contest Results - May 27, 2002
    By ygfperson in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 06-18-2002, 01:27 PM