One HUUUUUUUUUUUUGE text file

This is a discussion on One HUUUUUUUUUUUUGE text file within the C Programming forums, part of the General Programming Boards category; Hi all, Got a slight problem... I've got a MASSIVE text file (300 - 400 megs) and there's a couple ...

  1. #1
    Registered User
    Join Date
    May 2003
    Posts
    44

    Unhappy One HUUUUUUUUUUUUGE text file

    Hi all,

    Got a slight problem...

    I've got a MASSIVE text file (300 - 400 megs) and there's a couple of things I want to do with it...firstly, I'd like to get the number of lines in it, approximately would be okay as long as its not too far off. I don't want to use a getline() as I've tried that and it takes forever moving through every single line! There was a program I once had, called splitfile or something, that estimated the number of lines in a text file in about 3-4 seconds.

  2. #2
    Registered User
    Join Date
    May 2003
    Posts
    44
    Thanks! I didn't think of it from that point!!! *feels stupid*

  3. #3
    Pursuing knowledge confuted's Avatar
    Join Date
    Jun 2002
    Posts
    1,916
    count the number of characters in each of several lines (assuming that they are all of roughly equal length) and do an average. Divide the total size by that number.
    Away.

  4. #4
    Been here, done that.
    Join Date
    May 2003
    Posts
    1,161

    Talking not an actual solution...

    Sort the entire file by character and count all the characters from 0x00 thru 0x20 :-)

  5. #5
    Registered User GoodStuff's Avatar
    Join Date
    Jan 2003
    Posts
    65
    Not sure if this approach would help you, but:

    Count all the newlines in the file.

    To minimize the time it takes to read from disk, read into very large buffer before walking through it with a pointer. That way it is read in chunks and not an entire seek-read-check cycle for each char.

    Reading from disk is very fast. Seek time slows down HD access.

    Gus

  6. #6
    Registered User
    Join Date
    May 2003
    Posts
    44
    WaltP:

    Some how that may not get me too far


    blackrat364

    Sorry, but that's not making sense at all....


    Gus:

    Reply certainly sounds promising! Although I'm not 100% certain how big a buffer would be needed....Is there a recommended size?

  7. #7
    Registered User
    Join Date
    May 2003
    Posts
    44
    Ah right! I see now Thanks very much!

    Uni

  8. #8
    Been here, done that.
    Join Date
    May 2003
    Posts
    1,161
    WaltP:

    Some how that may not get me too far
    Certainly won't... :-)

    blackrat364

    Sorry, but that's not making sense at all....

    The idea is to estimate the entire file by looking at a few lines and assume the number of words you see per line is the average number of words for all lines.

    Gus:

    Reply certainly sounds promising! Although I'm not 100% certain how big a buffer would be needed....Is there a recommended size?
    The best size would be the size of a disk cluster -- the size your Hard Disk and Operating System interact together. You'll have to query your system to find out what that size is.

    Walt

  9. #9
    Registered User
    Join Date
    May 2003
    Posts
    44
    Blimey!!! The response is fantastic on this forum

    Thanks for clearing things up...

    Salem, that works an absolute treat!!!


    THANKS!
    Uni

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 04:53 AM
  2. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 01:36 PM
  3. struct question
    By caduardo21 in forum Windows Programming
    Replies: 5
    Last Post: 01-31-2005, 04:49 PM
  4. Simple File encryption
    By caroundw5h in forum C Programming
    Replies: 2
    Last Post: 10-13-2004, 11:51 PM
  5. what does this mean to you?
    By pkananen in forum C++ Programming
    Replies: 8
    Last Post: 02-04-2002, 03:58 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21