Thread: Slurping a file all at once/line by line

  1. #1
    Registered User
    Join Date
    Dec 2002
    Posts
    56

    Slurping a file all at once/line by line

    I'm implementing a "grep"-type thing for windows (partly for the experience, I know it's been done). Right now, I have it store an entire line of a file into a char *-style string and see if that matches the appropriate regular expression:

    Code:
    		const char charsPerLine = 1024;
    		// theFile is defined as an ifstream object
    		for( int lineNum = 1; !theFile.eof(); lineNum++ ) {
    
    			char * theLine = new char[ charsPerLine + 1 ];
    			theFile.getline( theLine, charsPerLine );
    
    			match_results results;
    
    			if( thePat.match( theLine, results ) )
    				cout << lineNum << ": " << theLine << "\n";
    
    			delete [] theLine;
    
    
    		}
    (Don't worry about the regex-specific functions, I know they are working.) My question is that this seems wasteful, and I'm looking for an alternative solution. I've defined it to allocate 1024 characters when most lines are less than 80. But it seems like a bad idea to use any finite value. Is there a way to determine the length of a line, or just keep reading from the file until you get to a '\n' (without worrying about the number of characters)?

    Alternatively, is there a way to read the whole file in at once ("slurp" it, in perl terms)? I guess I could use ifstream.read() but I would have to know how many bytes to read...TIA for advice.
    Last edited by roktsyntst; 03-02-2003 at 09:01 PM.

  2. #2
    Just because ygfperson's Avatar
    Join Date
    Jan 2002
    Posts
    2,490
    Code:
    const char charsPerLine = 1024;
    This is illegal because char variables can range only up to 256, if unsigned.

    I would not worry about using up memory. It's more wasteful and time consuming to read from the hard disk over and over.

    However, there is a solution...
    Code:
    #include <string>
    
    std::string temp;
    getline(file_in,temp);
    That will input everything up to a newline. (I'm not sure if the function's ifstream&,string or vice versa)

  3. #3
    Just a Member ammar's Avatar
    Join Date
    Jun 2002
    Posts
    953
    That will input everything up to a newline. (I'm not sure if the function's ifstream&,string or vice versa)
    I would do it like this:
    Code:
    file_in.getline(temp);

  4. #4
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    This may have no relevence to the question, but what if the file doesn't have a newline character? Also, I don't use *nix a lot, so I'm not incredibly familiar with the grep command, but what if you were looking for an expression containing a newline character? Seems like it would be kind of tough to find it if you're delimiting by newlines.
    Alternatively, is there a way to read the whole file in at once ("slurp" it, in perl terms)? I guess I could use ifstream.read() but I would have to know how many bytes to read...TIA for advice.
    This information would also interest me. How does one check the size of a file at run-time? Personally, I think I'd go with checking 80 characters at a time, making sure to check to see if the expression exists in an overlap of two different 80 character buffers. More complicated, but more usable.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  5. #5
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    getline is an overloaded function.

    char buffer[124];
    string STLstring;
    ifstream fin("filename.ext");

    //version 1
    fin.getline(buffer, 123);

    //version 2
    getline(fin, STLstring);

    both versions have a third arguments that defaults to newline char, but can be whatever valid char you want, that indicates when to terminate if specified size doesn't occur first. getline() removes delimiting char (the char that causes input into the string to stop), and adds a terminal null character to the string automatically.

    The char array (up front with C style strings, embedded in STL string) to hold the input data can be as big as you want, as long as there is adequate memory where ever (stack vs heap) you have the memory allocated.

    ASCII char values range in from 0-255, but you can have as many char per line as you like, if you define a line as all char from starting point to first newline char. If you define a line as being how many char fit across the screen, then the number of char in a line depends on the size of the screen and the font used to display the char, which will be OS and program specific.

    To determine how many bytes in a file look up fseekp(), or something like that. I think that or a similar function will do it for you.

  6. #6
    Registered User
    Join Date
    Dec 2002
    Posts
    56
    Originally posted by ygfperson
    Code:
    const char charsPerLine = 1024;
    This is illegal because char variables can range only up to 256, if unsigned.
    Sorry, you're right. That was supposed to read "const int charsPerLine = 1024;". I had that in my code, but it's way separated from the rest of the relevant code so I typed it in the post without copying and pa--you know what, it's just easier to say I'm stupid.

    However, there is a solution...
    Code:
    #include <string>
    
    std::string temp;
    getline(file_in,temp);
    Thanks (and to those who also answered with similar solutions)! I was avoiding STL strings because as I understand it, the char*'s were faster and I was going for speed. But this seems like a tradeoff worth making. Thanks again for the help.

    From piano:
    This may have no relevence to the question, but what if the file doesn't have a newline character? Also, I don't use *nix a lot, so I'm not incredibly familiar with the grep command, but what if you were looking for an expression containing a newline character? Seems like it would be kind of tough to find it if you're delimiting by newlines.
    That's why I used such a long line length, because many files output by other programs don't bother to put newlines very often. That's also why I was looking to "slurp" the file in at once.

    I'm not sure if the *nix grep matches patterns over newline characters or not (I think it may). I know that in perl (whose regex rules I'm most familiar with) the '.' metacharacter matches everything except newlines, which makes me think that most of the time, people aren't expecting a pattern to match over more than one line unless they explicitly insert a newline. So long story short, I'm not that worried about it, because I think I'll be the only one using this thing anyway.
    Last edited by roktsyntst; 03-03-2003 at 06:32 PM.

  7. #7
    Registered User
    Join Date
    Aug 2002
    Location
    Hermosa Beach, CA
    Posts
    446
    >This information would also interest me. How does one check
    > the size of a file at run-time? Personally, I think I'd go with
    > checking 80 characters at a time, making sure to check to see if
    > the expression exists in an overlap of two different 80
    > character buffers. More complicated, but more usable.

    You can use fstat on Unix-like systems (or _fstat or GetFileSize on Windows) to get the size of the file. Then you should be able to use the iostream read() function to get the whole file in one chunk. Although I'm not sure what this buys you, since iostreams are buffered anyway, ie I bet they read the whole file into memory up front even if you only ask for one line at a time. I don't know that for sure, just a guess.
    The crows maintain that a single crow could destroy the heavens. Doubtless this is so. But it proves nothing against the heavens, for the heavens signify simply: the impossibility of crows.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  2. gcc link external library
    By spank in forum C Programming
    Replies: 6
    Last Post: 08-08-2007, 03:44 PM
  3. Basic text file encoder
    By Abda92 in forum C Programming
    Replies: 15
    Last Post: 05-22-2007, 01:19 PM
  4. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  5. Contest Results - May 27, 2002
    By ygfperson in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 06-18-2002, 01:27 PM