help with my mIRC log file stat generator.

This is a discussion on help with my mIRC log file stat generator. within the C Programming forums, part of the General Programming Boards category; For about the last month I've been working on a stat generator for mIRC log files, similar to mIRC Stats, ...

  1. #1
    TravisS
    Guest

    help with my mIRC log file stat generator.

    For about the last month I've been working on a stat generator for mIRC log files, similar to mIRC Stats, but of course free.

    I have it working well, but it is very very slow. A <1 meg log file takes nearly a minute to complete the scan. A log file of 9 megs takes about 10 minutes or so, and the 17 meg file has never completed (I didn't try running that one, a freind did).

    Right now I'm reading the file in, line by line, using fgets. I then break that line into word by word, and then reconstruct the line from the re-built words (rebuilding does stuff such as stripping +%@ symbols). I can then either compare the file using the entire line, or individual words from that line.

    This is where it goes slow... It has to re-scan and re-build the log file everytime it makes a new analysis. If you are testing 10 different things, then it must read the entire log 10 seperate times, re-building it word for word everytime. Because of the way I have it set up, which is very accurate, I cannot run all 10 stats in one pass of the log file without the source code getting so confusing that I would get lost.

    So, is there a quicker, easier way of doing things? I was thinking of reading the entire log file into a linked list. The "meat" of the list would be the entire re-built line, then each link after that would be like a new line of the log file. This would essentially build up the entire log file inside of memory, making it much quicker to make the multiple passes.

    But if I read the entire 17 meg log file into memory, wouldn't it take up 17+ megs of memory? And 17 megs is only half a month worth of logs, it could be easily looking at 35+ megs at the end of the month. Even with the high-end systems of today I can't afford to waste that much memory (and I have 512 megs).

    So, is there a better way to do this? I know it's possible, mIRC Stats can scan the log file, and do it quite quickly, but I don't have the slightest clue how they did it.

    (If you want to see the source, let me know. It's over 20 pages so I didn't want to post it here, and I don't think unregistered people can upload files to here)

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    Well, starting simple, have you considered parsing the orginal file into a copy? The original remains unchanged and you can make multiple passes on it, reading each line, marking it with parse values and then saving the modified line to the copy. In this way you should be able to run all 10 scans by only copying the file once and not risk much accuracy depending on what kind of stats you are looking for. I can't help much more without a good idea of what you're looking for in the IRC logs.

    -Prelude
    My best code is written with the delete key.

  3. #3
    TravisS
    Guest
    Hehe, why start simple?

    That would be a very good idea. One that kinda passed by me. I know I've thought of it, but only briefly. That would probably help greatly though because then it would only have to re-build the log once, then everytime after that would be a simple read in.


    Here's what I'm doing when I build the words:

    Code:
    int buildWords(WORDS* w, char* buf)
    {
      char *token, holdster[1024], reBuf[1024];
      int position=0, i, x;
    
      strcpy(reBuf, buf);
    
    
      token = strtok( buf, " " );
      while( token != NULL )
      {
        //While there are tokens in buf
        strcpy(w->ords[position], token);
        //Get next token:
        token = strtok( NULL, " ");
        position++;
      }
          
    
      //get rid of symbols and re-build the line (buf)
      if(w->ords[1][0]=='<')
      {
        i=0;
        while(w->ords[1][i+1]!='>')
        {
          w->ords[1][i]=w->ords[1][i+1];
          i++;
        }
        w->ords[1][i]='\0';
      }
      if(strstr(w->ords[1], "+")||strstr(w->ords[1], "%")||strstr(w->ords[1], "@"))
      {
        i=0;
        while(w->ords[1][i]!='\0')
        {
          w->ords[1][i]=w->ords[1][i+1];
          i++;
        }
        w->ords[1][i]='\0';
      }
      if(strstr(w->ords[2], "+")||strstr(w->ords[2], "%")||strstr(w->ords[2], "@"))
      {
        i=0;
        while(w->ords[2][i]!='\0')
        {
          w->ords[2][i]=w->ords[2][i+1];
          i++;
        }
        w->ords[2][i]='\0';
      }
    
    
      //change nicknames, look for either in group file or changed in log file
      //////////////////////////////////////////////////////////////////////////
    
      //if a nick was changed, keep track of it
      if(strstr(reBuf,"is now known as"))
      {
        for(i=0, x=1; i<totalGroupNames/2; i++)
        {
          if(strstr(w->ords[2], changeNick[i]) || strstr(w->ords[2], groups[x]))
          {
            fprintf(grouping, "Name change: %s was found and turned into ", changeNick[i]);
    
            x=0;
            while(w->ords[7][x]!='\0')
              x++;
            if(x>2)
              w->ords[7][x-2]='\0';
    
            strcpy(changeNick[i], w->ords[7]);
            fprintf(grouping, "%s\n", changeNick[i]);
            i=5000;
          }
          x+=2;
        }
      }
      for(i=1, x=0; i<totalGroupNames; i+=2)
      {
        if(strstr(w->ords[1], groups[i]) || strstr(w->ords[1], changeNick[x]))
        {
          fprintf(grouping, "Nick grouping: %s was turned into", w->ords[1]);
          strcpy(w->ords[1], groups[i-1]);
          fprintf(grouping, " %s\n", w->ords[1]);
        }
        x++;
      }
    
      for(i=1, x=0; i<totalGroupNames; i+=2)
      {
        if((strstr(w->ords[2], groups[i]) || strstr( w->ords[2], changeNick[x])) && !strcmp(w->ords[1], "*"))
        {
          fprintf(grouping, "Nick grouping: %s was turned into", w->ords[2]);
          strcpy(w->ords[2], groups[i-1]);
          fprintf(grouping, " %s\n", w->ords[2]);      
        }
        x++;
      }
      //////////////////////////////////////////////////////////////////////////
    
      //Select changes, for certain modes
      //////////////////////////////////////////////////////////////////////////
      if(strstr(reBuf, "sets mode:") && !strcmp(w->ords[1], "***"))
      {
        for(i=1; i<totalGroupNames; i+=2)
          if(strstr(w->ords[1], groups[i]))
            strcpy(w->ords[1], groups[i-1]);
      }
    
      if(strstr(reBuf, "was kicked by") && !strcmp(w->ords[1], "***"))
      {
        for(i=1; i<totalGroupNames; i+=2)
          if(strstr(w->ords[6], groups[i]))
            strcpy(w->ords[6], groups[i-1]);
      }
            
    
      strcpy(buf, w->ords[0]);
      for(i=1; i<position; i++)
      {
        sprintf(holdster, " %s", w->ords[i]);
        strcat(buf, holdster);
      }
    
      return(position);
    }
    w->ords is simply a string. I have it set up as a structure because of earlier things I was doing with the program, but now it's not neccessary, I just haven't seen a reason to change it back.

    buf is the buffer from reading in the file, this is just what's held in each line direct from the log.

    That's basically what I'm doing for re-building the line though. Getting rid of the < > symbols in the names (needed becase of action /me statements where there is no longer brackets around the name), getting rid of the op, voice symbols, then doing some stuff with nickname grouping.

    It works pretty well, but is there something that should maybe be changed?

    Oh, and thanks for the help and idea. I'm gonna add the temp-file sometime tonight, that will probably help, though I don't expect too much of a speed gain.

  4. #4
    TravisS
    Guest

    Talking

    Wow, I'll eat my words on that

    Times dropped from 48 seconds to 15 flat on a 727Kb file and from many many minutes (20+) on a 16.2 meg file to 3:30

    That's just awesome

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File Writing Problem
    By polskash in forum C Programming
    Replies: 3
    Last Post: 02-13-2009, 10:47 AM
  2. Data Structure Eror
    By prominababy in forum C Programming
    Replies: 3
    Last Post: 01-06-2009, 09:35 AM
  3. gcc link external library
    By spank in forum C Programming
    Replies: 6
    Last Post: 08-08-2007, 04:44 PM
  4. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 11:12 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21