Thread: Retrieving a certain line from a large text file

  1. #1
    Registered User
    Join Date
    Apr 2012
    Posts
    7

    Retrieving a certain line from a large text file

    I'm writing a custom search engine, and I'm in the final stages of getting it to work. However, I am dealing with a very large document collection, and hence I am dealing with files weighing in the hundreds of megabytes, so loading everything into RAM is not an option.

    The internal processing of my engine tells me precisely what line of the text documents I can find the information I need, but I can't figure out how to rip out a specific line.

    Could anyone give me advice with this?

    Thanks.

  2. #2
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    You can use something like fgets and keep a count, it would be much more efficient if you knew what offset the line started at however then you could use fseek instead.

  3. #3
    Registered User
    Join Date
    Apr 2012
    Posts
    7
    Ahh, fseek would be impossible, as I'm looking through a postings list, which is a mountain of data in the form of:

    0 1 2 3 4 5
    1 4 7
    0 1 2
    1 5 8 11

    and so on, for numbers pulled completely out of my brain. Unless there's a way of finding the offset otherwise.

  4. #4
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    It would not be impossible if you knew the byte offset instead of line nr.

    The internal processing of my engine tells me precisely what line of the text documents I can find the information I need

  5. #5
    Registered User
    Join Date
    Apr 2012
    Posts
    7
    Hmm. I'm not sure how I'd be able to do that, given only a line number. I think that'd require substantial rewriting of my indexer.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    If your indexer is storing line numbers, then all you need is another parallel array storing the seek positions.

    Code:
    while ( (pos=ftell(fp))>= 0 && fgets( buff, sizeof buff, fp) != NULL ) {
        seekPositions[lineNo] = pos;
        // do stuff with buff and lineNo
    }
    When you want to retrieve a line, it's just fseek(fp,seekPositions[lineNo],SEEK_SET) and then read the line.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 9
    Last Post: 11-11-2011, 10:32 PM
  2. Best way to create a large text file?
    By KenLP in forum C++ Programming
    Replies: 13
    Last Post: 05-10-2011, 01:38 PM
  3. Read text file line by line and write lines to other files
    By magische_vogel in forum C Programming
    Replies: 10
    Last Post: 01-23-2011, 10:51 AM
  4. Replies: 7
    Last Post: 12-13-2010, 02:13 PM
  5. Searching a VERY large text file
    By Tankndozer in forum C Programming
    Replies: 4
    Last Post: 07-29-2004, 02:45 AM