Thread: Best route to the result

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Sep 2010
    Posts
    16

    Best route to the result

    I have a 100mb file which contains 10 million entries, I need to append 3 characters to the end of each line.

    I know I can do this in DOS.... but way too slow....

    Does anyone have an idea how long this would take to complete in C..... Is C the best program language for this?

    I have read I can use strcat to do this, but is this the best route for dealing with such large volumes....?

    Any help appreciated....

  2. #2
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Does anyone have an idea how long this would take to complete in C
    If you already know C, this should take about 5-10 minutes for you to write the code.

    Is C the best program language for this?
    I guess. For such a simple application, I would say that the "best" language is the one you feel most comfortable with.

    I have read I can use strcat to do this, but is this the best route for dealing with such large volumes....?
    There's nothing wrong with strcat in this this situation as long as you know your buffer is big enough to accommodate the string you are appending.
    bit∙hub [bit-huhb] n. A source and destination for information.

  3. #3
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Strings in any language are not the fastest way to work with letters. The fastest way would be simply by using blocks of a char array, and forget mucking about with the end of string char and strlen(), and such not.

    These things are for convenience's sake, and are some of the slower parts of the C language. C is certainly one of the fastest languages on the planet - if not the fastest - but string handling is rather slow in any language.

    Can you post up a sample of the file, say 50 lines or so, and what you need to append onto the lines, as well? What is possible, and fastest, depends on the specific details of the data.

    By "DOS", you mean using a bat file, right? This is on a Windows system then?
    Last edited by Adak; 09-22-2010 at 01:12 PM.

  4. #4
    Making mistakes
    Join Date
    Dec 2008
    Posts
    476
    Depends mostly on IO speed. Appending three characters to a line (even if it's ten million lines) shouldn't be too slow. Read one line at a time, strcat the characters and write.

  5. #5
    Registered User
    Join Date
    Sep 2010
    Posts
    16
    @Adak, thanks for helping...

    1st txt file contains 10million lines of

    AAAAA
    AAAAB
    AAAAC
    AAAAD
    etc

    2nd file contains 5 3 character codes
    XXX
    YYY
    ZZZ

    output required is 3 new text files containing
    FILE1
    XXXAAAAA
    XXXAAAAB
    XXXAAAAC
    etc

    FILE2
    YYYAAAAA
    YYYAAAAB
    YYYAAAAC
    etc

    So in effect I need to append 3 characters to the front of each line in a file of 10milion, in a reasonably effecient manner...

    I can do this in my sleep in DOS (yes Windows) but output is limited to 200 per minute, hoping C will complete task in an hour or 2?

    Thanks for any help...

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Only 200 per minute with a bat file? You're in for a VERY pleasant surprise with C.

    About the system you'll be running this on:

    cpu is?:

    Amount of memory?:

    Your compiler is?:

    Your operating system is Windows XP?

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Write the prefix to the new file
    Read a line from the existing file
    Write the line from the existing file to the new file

    Repeat.

    By the way.... you are not "appending" your are "Prepending"...

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    I don't see how anything is going to be much faster than this:

    Code:
    char *suffix = "foo";
    while ((ch = fgetc(input)) != EOF)
    {
        if (ch == '\n')
            fwrite(suffix, 1, 3, output);
        fputc(ch, output);
    }
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    As it turns out, this should be fast, all around. It appears the 5 letters per row data, is a 5 letter permutation list. Since I didn't have enough sample data to work with, I generated my own list, up through AZZZZ - that took just a couple seconds.

    Which made me think that maybe re-generating the entire permutation file, WITH these extra 5 char's we're trying to prepend, would take less time than coding up a prepending program with the file handling the OP wanted, etc.

    In any case, this is only slightly tested for accuracy, and not optimized, or tested against other algorithms or data structures. I avoided using strcat, because I thought it would be a slow down, and wasn't necessary. It's fast enough, imo - about 2.2 Million records prepended, in about 2.5 seconds.

    As Brafil mentioned earlier in the thread, it's running speed is largely bound by the IO throughput.

    Code:
    /* 
    prepends 3 char's from perms3.txt file, (which has 5 rows of char's), 
    each row having 3 char's and a newline we don't use. These are 
    written out into the front of each row before the 5 chars 
    (6 counting the newline), in the perms6.txt file, are written out
    to five sequentially numbered files, (allperm1.txt - allperm5.txt).
    
    It's not optimized, or tested against other algorithms, but it's fast. 
    Try it and C. ;)
    
    */
    
    #include <stdio.h>
    
    typedef struct {
      char char3[3];
      char newline;
    }record;
    
    int main(void) {
      FILE *fpin3, *fpin6, *fpout;
      int i;
      const char *filename6="perms6.txt"; //reminder to take the newline char #6
      const char *filename3="perms3.txt";//leave the newline behind
      char fileOut[]="allperm1.txt";
      record rec;
      char char6[6];
      unsigned long int count = 0;
    
      fpin3 = fopen(filename3, "rt");
      fpin6 = fopen(filename6, "rt");
      if(fpin3 == NULL || fpin6 == NULL) {
        printf("\nError opening input files");
        return 1;
      }
      if((fpout =fopen(fileOut, "wb"))== NULL) {
        printf("\nError opening output file");
        return 1;
      }
      printf("\n\n\n");
      for(i=0;i<5;i++) {
        fread(&rec, sizeof(rec), 1, fpin3);
        
        while(fread(char6, 6, 1, fpin6) >0) {
          fwrite(rec.char3, 3, 1, fpout);     //fpout or stdout (for debug)
          fwrite(char6, 6, 1, fpout);         //ditto
          ++count;
          //getch();
        }
        if(fileOut[7]=='5')
          break;
        rewind(fpin6);
        fclose(fpout);
        printf("\n closing file %s", fileOut);
        fileOut[7]++; //increment the file number in the name
        printf("\n opening file %s", fileOut);
        if((fpout =fopen(fileOut, "wb"))== NULL) {
          printf("\nError opening output file");
          return 1;
        }
      }
      fcloseall();
      printf("\n\n %lu\n\t\t\t    press enter when ready", count);
      i=getchar();
      return 0;
    }

  10. #10
    Registered User
    Join Date
    Sep 2010
    Posts
    16
    @Adak and Brewbuck.....

    Thanks great help, trialling Adaks version at the mo, is massivley quicker than I thought it would be.

    I take your point about generating the whole file from within the program, hadnt really considered that approach, and well out of my skills at the moment.... I suspect maybe one for me to try in the future!!

    Also going to amend code to have an append version as I think this will be useful for me shortly.

    Thanks again....

  11. #11
    Registered User
    Join Date
    Sep 2010
    Posts
    16
    @Adak...

    Got it working to append rather than preappend....

    The program/terminal requires a key to be pressed to exit the terminal window, is there a way round that? I woudl like the progrsm to run create the files then exit.

    I have searched, looked at break, return and exit but none seem to work?

    Thanks again...

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Remove or REM out the i=getchar() line of code. Second from the last line. By REM I mean "REMark" by putting either a // in front of the first letter of the line, or by surrounding the line, like so:

    /* i=getchar(); */

    How long is the longest number of permutations you need? 8 char's is still pretty quick, but around 15, it REALLY begins taking up more time than you'd probably like.

    May I ask what you want these permutations for?

  13. #13
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I'm not able to tell, frankly. My program would need some adjusting to handle creating 1000 files, since that exceeds the range of a char, that it uses for creating it's file names.

    If the data would have to be accessed via a CD drive, I'd see that as a big slow-down. Same with slow network drives, if you have them.

    I guess I'd vote for programming it dynamically. Especially if you go to a distributed computer project. Sounds like a deal breaker to ask people to take on 1,000 data files, unless absolutely essential.

    Just watched a video on U-Tube showing how this hack is done using Aircrack. I know the video was edited to save time, but wow - that looked very quick.

    Everyone with a wireless network and relying on WAP password for their security, should take a look and see what alternatives there are for better security on their Wifi setup.
    Last edited by Adak; 09-23-2010 at 08:45 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. power_of_two function
    By Aisthesis in forum C++ Programming
    Replies: 19
    Last Post: 09-24-2010, 02:54 PM
  2. Buidl Library with ./configure script
    By Jardon in forum C Programming
    Replies: 6
    Last Post: 07-24-2009, 09:36 AM
  3. Inserting a swf file in a windows application
    By face_master in forum Windows Programming
    Replies: 12
    Last Post: 05-03-2009, 11:29 AM
  4. Need help with basic calculation program.
    By StateofMind in forum C Programming
    Replies: 18
    Last Post: 03-06-2009, 01:44 AM
  5. Output problems with structures
    By Gkitty in forum C Programming
    Replies: 1
    Last Post: 12-16-2002, 05:27 AM

Tags for this Thread