Thread: reading text

  1. #16
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Also note that the third method fills an entire string with the data, while the other two keep overwriting the previous line/word.
    That is indeed a major problem with the test. How about all test functions actually contain the contents of the file in a string in the end?
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  2. #17
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by anon View Post
    That is indeed a major problem with the test. How about all test functions actually contain the contents of the file in a string in the end?
    I only wanted to test the read speed without any other calls getting in the way. I can try a few more things when I get home.

  3. #18
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Quote Originally Posted by cpjust View Post
    I only wanted to test the read speed without any other calls getting in the way. I can try a few more things when I get home.
    But the real use case would be completely different than in the test?

    I guess it actually makes more sense to time programs that actually do something. If it turns out that file input is really slowing down the program, you should always be able to rewrite that part and compare the performance in real use cases.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  4. #19
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by anon View Post
    But the real use case would be completely different than in the test?

    I guess it actually makes more sense to time programs that actually do something. If it turns out that file input is really slowing down the program, you should always be able to rewrite that part and compare the performance in real use cases.
    But then to make it a fair test, wouldn't I also need to change Test3 to split up the one large string into individual lines?

  5. #20
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    You could. Or you could rewrite the first two to concatenate the strings as they are read in. It still won't be a completely fair test, but it will be closer.

  6. #21
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by Daved View Post
    Would you mind testing read() and rdbuf()?
    I'm not sure what you mean about rdbuf(), I've never used it, but I added Test4() using istream.read(), and I also store all lines in 1 string and then print the final string size:

    Code:
    #include <iostream>
    #include <iterator>
    #include <fstream>
    #include <string>
    #include <cstdio>
    #include <ctime>
    
    using namespace std;
    
    
    void Test1( ifstream&  file )
    {
    	string line;
    	string fileData;
    	while ( !file.eof() )
    	{
    		getline( file, line );
    		fileData += line;
    	}
    	cout << "Test1 Bytes Read: " << fileData.size() << endl;
    }
    
    void Test2( ifstream&  file )
    {
    	string line;
    	string fileData;
    	while ( file >> line )
    	{
    		fileData += line;
    	}
    	cout << "Test2 Bytes Read: " << fileData.size() << endl;
    }
    
    void Test3( ifstream&  file )
    {
    	string fileData( (istreambuf_iterator<char>( file )),
    					  istreambuf_iterator<char>() );
    	cout << "Test3 Bytes Read: " << fileData.size() << endl;
    }
    
    void Test4( ifstream&  file )
    {
    	char block[BUFSIZ + 1];
    	string fileData;
    	while ( file.read( block, BUFSIZ ) )
    	{
    		fileData += block;
    	}
    	cout << "Test4 Bytes Read: " << fileData.size() << endl;
    }
    
    typedef void (*TestFunc)( ifstream& );
    
    clock_t TimeFunc( TestFunc  func, const char*  filename )
    {
    	ifstream file( filename );
    	clock_t start = clock();
    	func( file );
    	clock_t end = clock();
    	return (end - start);
    }
    
    int main()
    {
    	const char* filename = "E:/Test_10MB.txt";	// 10,240,000 bytes
    	for ( int i = 0; i < 1; ++i )
    	{
    		clock_t time1 = TimeFunc( &Test1, filename );
    		clock_t time2 = TimeFunc( &Test2, filename );
    		clock_t time3 = TimeFunc( &Test3, filename );
    		clock_t time4 = TimeFunc( &Test4, filename );
    
    		cout << endl << "Func1() time is: " << time1
    			 << endl << "Func2() time is: " << time2
    			 << endl << "Func3() time is: " << time3
    			 << endl << "Func4() time is: " << time4 << endl;
    	}
    
    	return 0;
    }
    Here's the new output:
    Code:
    Test1 Bytes Read: 10040000
    Test2 Bytes Read: 9040000
    Test3 Bytes Read: 10140000
    Test4 Bytes Read: 10139648
    
    Func1() time is: 1140
    Func2() time is: 3485
    Func3() time is: 1296
    Func4() time is: 375
    Surprisingly, none of the 4 functions reports the right number of bytes (10,240,000). Does anyone know why that is? I know the >> operator doesn't extract whitespace and getline() doesn't extract the newline, but I would have expected the last two functions to store the right number of bytes.

  7. #22
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Code:
    #include <sstream>
    
    ...
    
    void Test5( ifstream&  file )
    {
    	ostringstream ostr;
    	ostr << file.rdbuf();
    	cout << "Test5 Bytes Read: " << ostr.str().size() << endl;
    }
    Of course, there is no local string, I don't know if that would add overhead or not.

    You'd also want to add a null terminator to the block variable in test4.
    Last edited by Daved; 01-24-2008 at 07:37 PM.

  8. #23
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Man, look how fast Test4 is. WOO-HOO!!

  9. #24
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    OK, now with Test5, here's the results:
    Code:
    Test1 Bytes Read: 10040000
    Test2 Bytes Read: 9040000
    Test3 Bytes Read: 10140000
    Test4 Bytes Read: 10139648
    Test5 Bytes Read: 10140000
    
    Func1() time is: 1250
    Func2() time is: 3171
    Func3() time is: 1157
    Func4() time is: 343
    Func5() time is: 704
    I still don't know why none of them report the correct size?

  10. #25
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    How many lines does the file have?
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #26
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If it's a Windows file, all "newline" would be CR+LF in the file, and LF only in the data inside the program [unless you open the file in binary mode].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #27
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by CornedBee View Post
    How many lines does the file have?
    I don't feel like counting them, but 100,000 sounds about right (10240000 - 10140000).

  13. #28
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by cpjust View Post
    I don't feel like counting them, but 100,000 sounds about right (10240000 - 10140000).
    You mean you haven't got "wc" or an editor that is capable of showing you line numbers [for large files]?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #29
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by matsp View Post
    You mean you haven't got "wc" or an editor that is capable of showing you line numbers [for large files]?

    --
    Mats
    I don't know what "wc" is, but I just opened it in VC++ (I'm surprised it didn't barf opening a 10MB file), and when I went to the bottom it showed line number 100001

  15. #30
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    So Test4 still isn't quite right because of the null terminator issue, and Test3 and Test5 are correct (assuming you want newline conversion).

    I'm still interested to see if you can get Test4 to give the same size output as Test3 and Test5. How do you tell how many bytes were read by read? You need to set a null terminator at that value every time (or at least the last time, the other times you can set block[BUFSIZ] to null).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. reading a char at a time from text
    By dudeomanodude in forum C++ Programming
    Replies: 7
    Last Post: 01-29-2008, 12:27 PM
  2. struct question
    By caduardo21 in forum Windows Programming
    Replies: 5
    Last Post: 01-31-2005, 04:49 PM
  3. reading from a text file help......
    By jodders in forum C++ Programming
    Replies: 2
    Last Post: 01-25-2005, 12:51 PM
  4. Reading text file and structuring it..
    By Killroy in forum C Programming
    Replies: 20
    Last Post: 11-19-2004, 08:36 AM
  5. Reading Tab Separted Text files
    By Cathy in forum C Programming
    Replies: 1
    Last Post: 02-15-2002, 10:28 AM