Thread: size() in loops...

  1. #1
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,534

    size() in loops...

    Code:
    string str("Hello, world");
    
      for(string::size_type idx = 0; idx != str.size(); ++idx) {
           cout << str[idx] << endl;
    }
    In C, if you want to do something like that, you would use strlen(), but generally not in the loop - but instead, once before the loop, storing the value in a local variable. What about the above though - is that the equivalent of calling strlen() a bunch of times, and would it be better to call str.size() once and store it in a variable? Or does C++ do things differently?

    ~/

  2. #2
    VA National Guard The Brain's Avatar
    Join Date
    May 2004
    Location
    Manassas, VA USA
    Posts
    903
    The string class member function size( ) returns the number of characters in a string (not including the \0 null terminator) The cstring function strlen( ) also returns the same value. So, when using string class objects, use size( ) and when using cstrings, use strlen( ).

    Also.. making a function call as a loop condition = inefficient.. try to determine the function return value before entering the loop:

    Code:
    string str("Hello, world");
    
      for(string::size_type idx = 0,  size = str.size(); idx != size;  ++idx) {
           cout << str[idx] << endl;
    }
    • "Problem Solving C++, The Object of Programming" -Walter Savitch
    • "Data Structures and Other Objects using C++" -Walter Savitch
    • "Assembly Language for Intel-Based Computers" -Kip Irvine
    • "Programming Windows, 5th edition" -Charles Petzold
    • "Visual C++ MFC Programming by Example" -John E. Swanke
    • "Network Programming Windows" -Jones/Ohlund
    • "Sams Teach Yourself Game Programming in 24 Hours" -Michael Morrison
    • "Mathmatics for 3D Game Programming & Computer Graphics" -Eric Lengyel

  3. #3
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    Quote Originally Posted by The Brain
    The string class member function size( ) returns the number of characters in a string (not including the \0 null terminator) The cstring function strlen( ) also returns the same value.
    No. Normally a C++ string does not have a null (\0) terminator. But if it did, the '\0' would be included in the number of characters returned by size(). For example:
    Code:
    #include <iostream>
    #include <string>
    using namespace std;
    
    int main()
    {
        string s = "hello";
        cout << s << "world" << endl;
        cout << s.size() << endl;
        s = s + '\0';
        cout << s << "world" << endl;
        cout << s.size() << endl;
        s = s + '\0';
        cout << s << "world" << endl;
        cout << s.size() << endl;
    
        return 0;
    }
    outputs:
    Code:
    helloworld
    5
    helloworld
    6
    helloworld
    7
    Press any key to continue

    kermit: did you find
    Code:
    for(string::size_type idx = 0; idx != str.size(); ++idx)
    in a textbook? It does seem inefficient, but i wonder if a compiler would fix it.

  4. #4
    Registered User
    Join Date
    Nov 2002
    Posts
    491
    I doubt the compiler is smart enouhg to realize str.size() does not change the object. Storing it in a variable first is probably the best bet. Some compilers might have hints in their STL implemenation that say it does not mutate the object but I wouldn't count on it. And given C++'s complete lack of pure functionalism, the compiler cannot say much about that statment.

  5. #5
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,534
    Quote Originally Posted by R.Stiltskin
    kermit: did you find
    Code:
    for(string::size_type idx = 0; idx != str.size(); ++idx)
    in a textbook? It does seem inefficient, but i wonder if a compiler would fix it.
    Yeah - C++ Primer - Fourth Edition.

  6. #6
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    I have the Third edition; couldn't find it. What section is it in in your book? I'd like to see the context in which they used it.

    Anyway, I still suspect that the compiler optimizes that away. I've obviously never written a compiler, but it seems like the compiler can easily determine that s.size() never changes in that block so it can store the value once in a register the same as it would for a constant value.

    I wrote this simple program to test it, and the execution times (in a MSVC 6.0 Release build) of the two methods are essentially the same. However, in a Debug build (with 1 less zero in number of trials), the loop using the str_size constant runs about 10-15% faster. Maybe the explanation is that the Debug builds are unoptimized.
    Code:
    #include <iostream>
    #include <time.h>
    using namespace std;
    
    
    int main(){
    	static int trials = 100000000;
    	int i,j,str_size;
    	time_t start_t, end_t;
    	char ch;
    	string s="Hello World";
    
    	time(&start_t);
    	for (j=0; j < trials; j++)
    		for (i=0; i < s.size(); i++)
    			ch = s[i];
    	
    	time(&end_t);
    	cout << "The s.size() version took " << end_t-start_t << " seconds.\n";
    
    
    	time(&start_t);
    	for (j=0; j < trials; j++)
    		for (i=0,str_size=s.size(); i < str_size; i++)
    			ch = s[i];
    	time(&end_t);
    	cout << "The str_size version took " << end_t- start_t << " seconds.\n";
    
    	return 0;
    }

  7. #7
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    You all do realize that you are making the assumption that size() has to use a loop to determine the length of the string. This may very well be false. When I wrote a string class I used a member variable to keep track of the length. So my size() function was literally just a return statement and it was inlined.

  8. #8
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    In practice, the size() call will be inlined and simply return a member variable, making it fine for use in a loop.

    However, the STL documentation suggests that this is not guaranteed:
    STL Container

    Complexity guarantees

    size() is linear in the container's size. [10] max_size() and empty() are amortized constant time. If you are testing whether a container is empty, you should always write c.empty() instead of c.size() == 0. The two expressions are equivalent, but the former may be much faster.

    [10] For many containers, such as vector and deque, size is O(1). This satisfies the requirement that it be O(N)
    STL basic_string

    Note that the C++ standard does not specify the complexity of basic_string operations. In this implementation, basic_string has performance characteristics very similar to those of vector: access to a single character is O(1), while copy and concatenation are O(N).

  9. #9
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    Quote Originally Posted by Thantos
    You all do realize that you are making the assumption that size() has to use a loop to determine the length of the string. This may very well be false. When I wrote a string class I used a member variable to keep track of the length. So my size() function was literally just a return statement and it was inlined.
    Good point, but even just a return statement, at the machine level, is a function call involving multiple memory accesses. Looping based on values already loaded in registers can be done with a single machine instruction.

    Anyway, I don't think inlining would explain why the two methods run at noticeably different speeds after a debug (unoptimized) compilation, but at roughly equal speeds after a release compilation.

  10. #10
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    Quote Originally Posted by R.Stiltskin
    Anyway, I don't think inlining would explain why the two methods run at noticeably different speeds after a debug (unoptimized) compilation, but at roughly equal speeds after a release compilation.
    if you have the compiler generate assembly language for release and debug builds you will see that debug build generates a function call for s.size() while release build does not.

  11. #11
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    Quote Originally Posted by Ancient Dragon
    if you have the compiler generate assembly language for release and debug builds you will see that debug build generates a function call for s.size() while release build does not.
    Yes, I see that, but I don't understand enough of that assembly code to figure out where the release version is picking up the value of s.size().

  12. #12
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    size() is can be inlined. So its basically going stright to the object and getting the value of the member variable that contains the size. This is just as fast as reading a local variable from memory.

  13. #13
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    Quote Originally Posted by Thantos
    size() is can be inlined. So its basically going stright to the object and getting the value of the member variable that contains the size. This is just as fast as reading a local variable from memory.
    Then that would explain the speed difference I observed: it was inlined in the release version but not in the debug version.

    And I guess the authors used size() in the for-loop test for readability, and assumed (even if it's not guaranteed) that the compiler would inline it.

  14. #14
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    You write code to make it readible and working. If you then need to speed it up you profile it, find the slow spots and work on them, profile again, see if it did any good, repeat. strlen() was a 100% O(n) so you would be justified in using an O(1) option while you were writing it. However size() isn't always O(n) and isn't always O(1) so you write the clearer way first and see what the affect really is.

  15. #15
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Then that would explain the speed difference I observed: it was inlined in the release version but not in the debug version.
    Yes. This is the optimized assembly that MSVC7 produces:
    Code:
    	mov	eax, DWORD PTR _str$[esp+60]   // eax = str.size_member_variable
    	inc	ebp                            // ebp(idx)++
    	cmp	ebp, eax                       // if (ebp != eax)
    	jne	SHORT $L8528                   // goto $L8528
    And I guess the authors used size() in the for-loop test for readability, and assumed (even if it's not guaranteed) that the compiler would inline it.
    I think it is reasonable to use size() in the loop. There is a such a large body of code that does this, that any STL implementation with an O(n) size()/length() would be considered a real dog.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. size of array
    By goran00 in forum C Programming
    Replies: 38
    Last Post: 04-02-2008, 09:57 AM
  2. Heapsort
    By xENGINEERx in forum C Programming
    Replies: 2
    Last Post: 03-30-2008, 07:17 PM
  3. Invalid conversion from 'void*' to 'BYTE' help
    By bikr692002 in forum C++ Programming
    Replies: 9
    Last Post: 02-22-2006, 11:27 AM
  4. An exercise in optimization
    By Prelude in forum Contests Board
    Replies: 10
    Last Post: 04-29-2005, 03:06 PM