Thread: Binary File-how do you find duplicates?

  1. #1
    Registered User
    Join Date
    Oct 2012
    Location
    Philippines
    Posts
    7

    Binary File-how do you find duplicates?

    Hi guys! I'm a newbie here. I've been working on my program and I want to know how do you find duplicates in the binary file? For example, The user entered xx-xxx account#. If the account# is duplicate, the program informs the user. What should i use in finding duplicates? fread()?

    Code:
    void Input(FILE *fp, struct information info, char set[]){
         char gen;
         fp=fopen("account.dat","a+b");
         if(strlen(info.id)==0);
         { 
              system("cls");
              printf("Enter Account ID(xx-xxx): ");
              scanf("%6s",info.id);
              if (info.id[2]!='-' || strspn(info.id,set)!=6)
              { 
              system("cls");
              printf("Invalid ID format! ");
              getch();
              } 
              
              if(strcmp(info.id??)==0)
              What should i do here to find duplicates? fread() looping?
              
    
    
              fflush(stdin);
              printf("Enter the First Name: ");
              gets(info.fname);
              printf("Enter the Middle Name: ");
              gets(info.mname);
              printf("Enter the Last Name: ");
              gets(info.lname);
              printf("Enter the Address: ");
              gets(info.address);
              printf("Enter gender specification[M/F]: ");
              gen=toupper(getche());
              info.gender=gen;
              
              fwrite(&info, sizeof(struct information), 1, fp);
              
        
              } 
              
              fclose(fp); 
    }

  2. #2
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    You will need to compare the account you are about to add to all existing accounts. There is no way around that in principle, but many ways to do it in practice. Scanning the entire file for the account number would be slow for a large file, keeping all accounts in memory is another approach, in that case keeping them in something like a hash table would be fast and guarantee that no duplicates exists. Just make the account nr a key.

  3. #3
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Like subsonics said, you basically have to read every account record in the file and compare the account number for each one.

    A few other things:

    1. gets is bad, avoid it at all costs. Read this: FAQ > Why gets() is bad / Buffer Overflows - Cprogramming.com. Use fgets with stdin to read a line of input.
    2. You need to check the return value of fopen. If fp is NULL, you need to print an error, don't try to continue on, the file is gone and you will get garbage results.
    3. Don't do fflush(stdin), it results in undefined behavior. Read these links: FAQ > Why fflush(stdin) is wrong - Cprogramming.com and FAQ > Flush the input buffer - Cprogramming.com.

  4. #4
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    If the file is in order you can do a binary search on it. But if it's ordered it's more difficult to add records. A solution is to have a separate key file (mapping keys to record numbers) which is kept in order while the main data file is unordered. This is particularly useful if the records are large.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Welcome, wacky_jay!

  6. #6
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    Hey there

    First, you get the input from the user

    Read each previous record (one at a time using a loop) and compare it to the input from the user

    If there isn't any matches, append it to the end of the file; otherwise print something like "Error: Duplicate Entry Found"

    Another thing - 'fp' is opened and closed within this function. Consider taking 'fp' out of the values being passed into the subroutine and have it as a local variable.

    Like subsonics said, you basically have to read every account record in the file and compare the account number for each one.

    A few other things:


    1. gets is bad, avoid it at all costs. Read this: FAQ > Why gets() is bad / Buffer Overflows - Cprogramming.com. Use fgets with stdin to read a line of input.
    2. You need to check the return value of fopen. If fp is NULL, you need to print an error, don't try to continue on, the file is gone and you will get garbage results.
    3. Don't do fflush(stdin), it results in undefined behavior. Read these links: FAQ > Why fflush(stdin) is wrong - Cprogramming.com andFAQ > Flush the input buffer - Cprogramming.com.
    Do not ignore this advice.
    Fact - Beethoven wrote his first symphony in C

  7. #7
    Registered User
    Join Date
    Oct 2012
    Location
    Philippines
    Posts
    7
    okay thanks. I got it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How to find in binary?
    By koxson in forum C++ Programming
    Replies: 4
    Last Post: 05-30-2009, 04:21 PM
  2. find hex-code within binary file
    By t0bias in forum C++ Programming
    Replies: 14
    Last Post: 12-23-2007, 08:51 AM
  3. Binary Tree Insert & Find
    By Micko in forum C++ Programming
    Replies: 4
    Last Post: 04-11-2004, 01:18 PM
  4. Reading Binary file to find Checksum
    By Abbila in forum C++ Programming
    Replies: 0
    Last Post: 09-25-2002, 09:52 PM