Thread: fastest way of processing a file

  1. #16
    Just kidding.... fnoyan's Avatar
    Join Date
    Jun 2003
    Location
    Still in the egg
    Posts
    275
    Thanks for all the replies...

    I used the zStrrmv() function from iMac and implemented the changes recommended by Salem. The import time is almost the same (about 50 mins).

    I will follow ledov's advice to partially see the time consumed for each part of the import process. It turns out sqlite3 API calls takes time (since they are called frequently for each line).

  2. #17
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I've got a suggestion for you.

    Let's ignore the sqlite3 API calls for now. Take these as "fixed" for the purpose of improving the rest of the program.

    Make a dummy input file, that requires say 5 -15 minutes to process, but excludes the sqlite3 API calls.

    Post a link to that dummy input file, along with the format you want it to be in upon completion, for optimal results.

    Then go off and enjoy the holidays. When you return after Christmas, I'll bet some Arctic Elf will have the input portion of the program streamlined.

    'Tis the Season.

  3. #18
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Agreed. You need to profile the thing to work out where the time is being spent. One way to essentially do that is to comment out a certain part and see how much faster it becomes.
    So yes please comment out the sqllite calls and time that.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #19
    Registered User
    Join Date
    May 2012
    Posts
    505
    Go to my website and take my csv file parser. It might be fast enough.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  5. #20
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    After one week, we still have *zero* lines of an input file example. Also, we have no example of the final format needed for the sqlite insertion, either.

    fnoyan, can you post this info?

  6. #21
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    If, as some of us suspect, the SQL is the elephant in the room, then try this.
    Code:
    void tryit ( ) {
      const char *word = "TestElement";
    
      //!! pick some numbers which represent your data
      const int numLines = 1000000, numFields = 100;
    
      for ( int line = 0 ; line < numLines ; line++ ) {
        for ( int field = 0 ; field < numFields ; field++ ) {
          sqlite3_bind_text(stmt, 1, word, -1, SQLITE_STATIC);
        }
        if (sqlite3_step(stmt)!=SQLITE_DONE)
            return sqlite3_errcode(db);
    
        sqlite3_reset(stmt);
      }
    }
    If this still takes an hour, then you're wasting your effort saving a couple of minutes on improving (although there is some very low hanging fruit to be picked).

    Your efforts might be better employed looking for more efficient use of SQL to begin with.
    Importing CSV would be so common that there should be an efficient way of doing it.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #22
    Just kidding.... fnoyan's Avatar
    Join Date
    Jun 2003
    Location
    Still in the egg
    Posts
    275
    @Adak,
    Actually, there is no "final format". Input file is CSV and process is "importing it" into DB. Below is first 4 lines of one example file. You can duplicate last two lines as much as you want (the file I am importing has 20500 lines, not huge!)
    Code:
    BSCName,BIDIRECTION,RNCID,CELLID,NCELLRNCID,NCELLID,CIOOFFSET,CELLSFORBIDDEN1A,CELLSFORBIDDEN1B,SIB11IND,IDLEQOFFSET1SN,IDLEQOFFSET2SN,SIB12IND,CONNQOFFSET1SN,CONNQOFFSET2SN,TPENALTYHCSRESELECT,TEMPOFFSET1,TEMPOFFSET2,NPRIOFLAG,NPRIO,MBMSNCELLIND
    BSC Name,Add Bidirection Neighboring Cell Or Not,RNC ID,Cell ID,RNC ID of a neighboring cell,Neighboring cell ID,Neighboring cell oriented CIO[dB],Affect 1A threshold flag,Affect 1B threshold flag,SIB11 Indicator,IdleQoffset1sn[dB],IdleQoffset2sn[dB],SIB12 Indicator,ConnQoffset1sn[dB],ConnQoffset2sn[dB],HCS Cell Reselect Penalty Timer[s],HCS Cell Reselect TempOffset1[dB],HCS Cell Reselect TempOffset2[dB],The flag of the priority,The priority of neighbor cell,MBMS neighboring cell indicator
    O3D1_PXXX00, ,2840,12662,2340,12671,0,AFFECT,AFFECT,TRUE,0,0,FALSE,,,D0,,,FALSE,,TRUE
    O3D1_PXXX00, ,2840,12662,2340,12759,0,AFFECT,AFFECT,TRUE,0,0,FALSE,,,D0,,,FALSE,,TRUE
    @Salem,
    I was thinking so that importing CSV is so common but since sqlite's concept is to keeping things simple, there is no such function (there is one provided with sqlite command line tool, which is open source).
    Yes, you are right, first thing is to try a "simple" import to see sqlite performance.

  8. #23
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I was experimenting with fnprintf's code on the previous page, and a 20,500 line data file, as the OP suggested.

    Most assuredly, there is no input bottleneck with this or other reasonable code. fnprintf's program (as tweaked a bit to get input from a a file), handles it in less than half a second.

    Code:
    //fnprintf's code 
    #include <stdio.h>
    #include <string.h>
    #include <time.h>
     
    int main (void)
    {
       char *p, csv[100]; 
       char rmc[] = ".:;`~*\\?=)(/&%+^'!\"|}][{$ #@\t";
       char pstr[30][100];
       char ostr[30][100];
       int i=0,c=0;
       clock_t timer;
       FILE *fpIn,*fpOut;
              
       fpIn=fopen("data20500.txt", "r");
       fpOut=fopen("dataOut20500.txt","w");
         
       if(!fpIn ) { // || !fpOut) {
          printf("Error opening file\n");
          return 1;
       }
       timer=clock();
    
       while(fgets(csv, sizeof(csv),fpIn)) {
          i=0;
          p = strtok (csv, ",");
          while (p)
          {
             strcpy (pstr[i], p);
             i++;
             p = strtok (NULL, ",");
          }
          c = i;
        
          for (i = 0; i < c; i++)
          {
             p = strtok (pstr[i], rmc);
             strcpy (ostr[i], "");
             while (p)
             {
                strcat (ostr[i], p);
                strcat (ostr[i], " ");
                p = strtok (NULL, rmc);
             }
          }
          for (i = 0; i < c; i++)
          {
             fprintf (fpOut,"%d: %s\n", i, ostr[i]);
          }
       }  
       timer=clock()-timer;
       printf("Elapsed time: %f seconds\n",(double)timer/CLOCKS_PER_SEC);
     
       fclose(fpIn);
       fclose(fpOut);
    	
       return 0;
    }
    Last edited by Adak; 12-20-2012 at 10:59 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File processing: How to read a part of a text file?
    By HardOverdrive in forum C Programming
    Replies: 4
    Last Post: 12-05-2011, 12:33 PM
  2. processing file?
    By ripspinner in forum Windows Programming
    Replies: 4
    Last Post: 12-13-2009, 10:38 PM
  3. Problem with Creating File for File Processing
    By Dampecram in forum C Programming
    Replies: 2
    Last Post: 12-07-2008, 01:26 AM
  4. What's the fastest way to write a file?
    By pingpangpang in forum C++ Programming
    Replies: 12
    Last Post: 06-08-2007, 11:58 AM
  5. Reading a File - FASTEST WAY POSSIBLE??
    By Mastadex in forum C++ Programming
    Replies: 15
    Last Post: 06-19-2004, 03:19 PM