Thread: Reading a Log File in Reverse

  1. #1
    Registered User
    Join Date
    Feb 2005
    Posts
    9

    Reading a Log File in Reverse

    Greetings. I'm a first time poster but a long time anonymous coward to this board.

    I'm a bit stuck on this rather simple task. For my current program which analyzes log files, I need to read lines at a time from the files. These files can range in a few megabytes to several gigabytes in size. I can't read the entire file into memory, so I have to scan in lines from the end of the file and work my way up until I've hit a certain date range (which is beyond the scope of this thread).

    My problem is that I know how to read a file in the standard way -- beginning to end -- but how is this done in reverse?

    Here is a snippet of where I am. It outputs nothing though. My thoughts were that it should scan from the end of the file backwards but it appears to advance the pointer when reading a character.
    Code:
    ifstream infile(logfilepath);
    string   strLine;
    char     szBuf;
    
    ifstream::pos_type posBeg = infile.tellg();
    
    infile.seekg(0, ios_base::end);
    
    ifstream::pos_type posCur = infile.tellg();
    
    while (posCur != posBeg)
    {
        cout << infile.get() << endl;
            
        posCur -= 2;
        infile.seekg(posCur);
    }
    
    infile.close();
    I've done this sort of thing many times with Perl, PHP, Python, and Bash but I'm baffled by how mysterious it is to do it in C++. I've tried Googling for some approaches but they all lead to reading the entire file into an STL container and using a reverse iterator, which is out of the question with such large log files in this case.

    Any help would be greatly appreciated. It's been many years since I've done any large projects with C or C++ so I'm a bit rusty.

    This program is targeted for Linux, so it has to avoid any Win32/MFC-specific calls.

  2. #2
    Confused Magos's Avatar
    Join Date
    Sep 2001
    Location
    Sweden
    Posts
    3,145
    There is no buidt-in way to read in reverse. Probably not the finest and most efficient solution, but you could traverse the file once first byte per byte (yes, will be slow) and find the locations of all line breaks (\n, \r or whatever). Store these in a list/vector. The traverse the list backwards and use std::ifstream::seekg() to go to that particular line and read it.
    MagosX.com

    Give a man a fish and you feed him for a day.
    Teach a man to fish and you feed him for a lifetime.

  3. #3
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Heres a little program I threw together. It'll print out the file in reverse. Using this as base should get you started.
    Code:
    #include <iostream>
    #include <fstream>
    using namespace std;
    int main()
    {
      ifstream in("test.txt");
      in.seekg(0, ios::beg);
      ifstream::pos_type begin = in.tellg();
      in.seekg(0, ios::end);
    
      while ( in.tellg() != begin )
      {
        cout<<static_cast<char>(in.peek());
        in.seekg(-1, ios::cur);
      }
      cout<<static_cast<char>(in.peek())<<endl;
    }
    One important piece of info: Until I put
    Code:
      in.seekg(0, ios::beg);
    in the loop never terminated.

  4. #4
    Registered User
    Join Date
    Feb 2005
    Posts
    9
    So store the ifstream:os_type of each end-of-line character from the file then reread the file starting from the first found ifstream:os_type (which would indicate the last line assuming the log file doesn't contain a blank line at the end) and read until the a newline character is found.

    This would work, but I'm not sure about the aspect of reading the file twice. It's possible that while reading the file scanning for newline characters, the daemon writing to it could append more lines and thus when the second read of the file is performed it won't include the new lines. (Can't do any file locking)

    Also there's the issue of "when to stop reading lines." It'll perform on date ranges. So when the lines are read in it will need to look at the first few characters to see if it's within range -- in the form 'Feb 21 15:30:32' -- and stop precessing more lines as soon as it reaches beyond range. So doing the first scan though looking for newline characters sounds like it will result in all the characters found (possible tens of thousands) and on the second pass looking at the date.

  5. #5
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    From what you have explained I would suggest a different approach. It sounds like you are trying to process live log files. If this is the case I would recommend processing them as they come and then writting the data to seperate files. Otherwise how would you ever know where the "end" of the file was if it's consitently being written to. Also having to go past data that is no longer pertient is a waste of effort.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Read the file into a vector of strings, then iterate though the vector backwards.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Aug 2004
    Location
    San Diego, CA
    Posts
    313
    Quote Originally Posted by Salem
    Read the file into a vector of strings, then iterate though the vector backwards.
    He already said he couldn't do that because the file size was too large for memory to handle (gigabytes of data).

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    So create an index to go alongside the text file, then use the index to read the file backwards
    http://cboard.cprogramming.com/showt...ht=fseek+ftell
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #9
    Registered User
    Join Date
    Feb 2005
    Posts
    9
    Thanks Thantos. Your example was exactly what I was looking for. I've spent entirely too much time being stuck on this little piece of this project. Here is the code snippet for others that may come in my shoes:
    Code:
    std::ifstream infile(m_szLogfile);
    
    std::string strLine;
    char buf;
    
    infile.seekg(0, std::ios::beg);
    std::ifstream::pos_type posBeg = infile.tellg();
    
    infile.seekg(-1, std::ios::end);
    
    while (infile.tellg() != posBeg)
    {
        buf = static_cast <char>(infile.peek());
    
        if (buf != '\n')
        {
            strLine += buf;
        }
        else
        {
            std::reverse(strLine.begin(), strLine.end());
    
            // Do something interesting with the line.
    
            strLine.clear();
        }
    
        infile.seekg(-1, std::ios::cur);
    }
    
    strLine += static_cast <char>(infile.peek());
    std::reverse(strLine.begin(), strLine.end());
    
    // Do something interesting with the line.
    
    infile.close();
    Using this technique, I can examine each line as it's being read (starting from the end of the file ) and choose to stop processing the file as soon as I meet some criteria.

    Thanks everyone for the other great input.

    I'm curious, and this may be OT, but I've always used the namespace std::ios_base for end and beg. When I wrote the above code using that namespace instead of std::ios, it didn't work. What is the main difference between std::ios::beg and std::ios_base::beg?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Newbie homework help
    By fossage in forum C Programming
    Replies: 3
    Last Post: 04-30-2009, 04:27 PM
  2. Inventory records
    By jsbeckton in forum C Programming
    Replies: 23
    Last Post: 06-28-2007, 04:14 AM
  3. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  4. Reading from file into structs..
    By dankas in forum C Programming
    Replies: 14
    Last Post: 10-16-2002, 10:33 PM
  5. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM