Thread: Just say NO to strlen.

  1. #1
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544

    Just say NO to strlen.

    The strlen function is expensive. For every call it must loop over every character until a nul terminator is found. Therefore, it is very unwise to call it more often than needed. This is bad code:
    Code:
    for ( int ix = 0; ix < strlen(a_str); ix++)
    {
         a_str[ix] = tolower( (unsigned char) a_str[ix] );
    }
    Lets consider a string 1000 characters in length. strlen will be called 1001 times and loop over 1000 characters each time. That is over one million wasted iterations. If the tolower call and assignment takes the equivalent of 10 iterations we can calculate that the operation takes a massive 10000% of the time it would take if it was written correctly.

    Computers may be much faster than they were ten years ago (although sometimes I doubt that), but a hundred times performance hit is still unacceptable. Do we really want C/C++ code running slower than the VB, PHP, Perl, Python, Javascript, punch card and little old lady with typewriter versions combined?

    With the frequent use of strlen and the resulting performance wipe out, one would assume that it is very hard to write code that doesn't use strlen in the loop condition. To the contrary, Dr Watson! Writing a loop condition without strlen is actually quite achievable, even for the most experienced coder.

    All that is required is to replace the use of strlen with:
    Code:
    for ( int ix = 0; a_str[ix] != '\0'; ix++)
    {
         a_str[ix] = tolower( (unsigned char) a_str[ix] );
    }
    or the slightly less efficient:
    Code:
    int len = strlen(a_str);
    for ( int ix = 0; ix < len; ix++)
    {
         a_str[ix] = tolower( (unsigned char) a_str[ix] );
    }
    See, it is possible! If you can't remember, feel free to bookmark this page(typically ctrl+d) and copy(ctrl+c) and paste(ctrl+v).

    If peer pressure is pushing you towards excessive use of strlen, JUST SAY NO*.


    *strlen may be permitted for medical, artistic, electoral, hunting or driving purposes in some states. Please consult your local strlen provider.

  2. #2
    Mayor of Awesometown Govtcheez's Avatar
    Join Date
    Aug 2001
    Location
    MI
    Posts
    8,823
    > Writing a loop condition without strlen is actually quite achievable, even for the most experienced coder.

    Well, I would hope an experienced coder could write it

  3. #3
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    I think it's reasonably well-known if you've been round the block a few times not to use functions in loop conditions.

    Might be one for the FAQ though, just in case someone wonders why their simple program takes minutes instead of seconds to work.

  4. #4
    &TH of undefined behavior Fordy's Avatar
    Join Date
    Aug 2001
    Posts
    5,793
    It's good practice to take notice of what's placed as the "test" for a loop like that. The same can be said of testing a container's length() function each time (though the actual overhead might be less according to how the container is written).

    Quote Originally Posted by anonytmouse
    If peer pressure is pushing you towards excessive use of strlen, JUST SAY NO*.


    *strlen may be permitted for medical, artistic, electoral, hunting or driving purposes in some states. Please consult your local strlen provider.


  5. #5
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Code:
    for ( int ix = 0; ix < strlen(a_str); ix++)
    {
         a_str[ix] = tolower( (unsigned char) a_str[ix] );
    }
    Depending on the compiler this may not be as bad as you think. Since the body of the loop doesn't change the length its very possible for the compiler to modify the code so that the function is called once and it saves the return value and uses that in the condition.

    Code:
    for ( int ix = 0; a_str[ix] != '\0'; ix++)
    {
         a_str[ix] = tolower( (unsigned char) a_str[ix] );
    }
    This is inefficent also. Everytime through the loop it has to add x *sizeof(char) to a_str to get the address. Now since sizeof(char)==1 it might be optimized in that regard. The compiler might optimize it to:
    Code:
    for ( char *tempptr = a_str; *tempptr != '\0'; tempptr++)
    {
         *tempptr = tolower( (unsigned char) *tempptr );
    }

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Quote Originally Posted by Thantos
    Depending on the compiler this may not be as bad as you think. Since the body of the loop doesn't change the length its very possible for the compiler to modify the code so that the function is called once and it saves the return value and uses that in the condition.
    Or not. Given that the compiler might not to what toupper does, it can't perform this optimization. What if toupper return '\0'? Suddenly the string is a lot shorter.
    (Mind you, the '\0' test method fails just the same, but that isn't relevant.)


    This is inefficent also. Everytime through the loop it has to add x *sizeof(char) to a_str to get the address.
    Questionable. On x86, at least, such offset-addressing can be done in a single instruction.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Considering that you are calling toupper() on the same location as you are putting it into there is no way for toupper() to return '\0' prior to reaching the actual null terminator. I would expect your better compilers to know this and be able to optimize it.

    Questionable. On x86, at least, such offset-addressing can be done in a single instruction.
    On a char array it should be possible, but for other arrays I don't think so but its been awhile since I looked at all the op codes.

    Edit: Yeah I remember one now that let you use a scaler for the offset. While it is one instruction its not nearly as fast of an instruction as say adding a fixed value.
    Last edited by Thantos; 02-10-2005 at 09:54 AM.

  8. #8
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    >>I would expect your better compilers to know this and be able to optimize it.
    Possibly, but it's hardly a 'good' assumption to make I really don't know how smart compilers are these days, but this assumes that the compiler will (a) recognize the function is named toupper(), (b) find out whether it's a user-defined function or not, (c) recognize the function strlen(), (d) find out whether it's a user-defined function or not, (e) recognize the fact that given the above conditions it is safe to optimize the construct, and (f) know how to optimize it.

    Knowing that compilers are smarter than we are these days, this is probably a reasonable assumption to make, but it's always safest just to write your code properly in the first place. After all, not every compiler is omniscient
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  9. #9
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    On my system strlen is not exactly implemented as a for-loop. It is implemented with a special string instruction.
    Nevertheless, your point is valid.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    And the compiler would also need to know that it's not our intention to break the moment toupper returns '\0', which would happen with repeated strlen calls. In other words, the optimization would change the observable behaviour of the program beyond speed changes, and this is not tolerable.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    CornedBee please tell me in what case toupper() returns '\0' when the input isn't '\0'. Because well I looked at the man page and it made no mention of that return value.

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    It doesn't. But unless the compiler has been programmed to know this (why should it?) or is capable to look into every single one of the locale-dependent translation lookup tables, the compiler can't know that.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Registered User
    Join Date
    Jan 2002
    Location
    Vancouver
    Posts
    2,212
    strlen killed my cpu and also my family

  14. #14
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    But unless the compiler has been programmed to know this (why should it?)
    Maybe because toupper() and strlen() are defined by the implentation. Also regardless of the locale there is only one way for toupper() to return '\0' so it really wouldn't be that much of a strecth for them to know the length of the string doesn't change during that loop.

    Possibly, but it's hardly a 'good' assumption to make I really don't know how smart compilers are these days, but this assumes that the compiler will (a) recognize the function is named toupper(), (b) find out whether it's a user-defined function or not, (c) recognize the function strlen(), (d) find out whether it's a user-defined function or not, (e) recognize the fact that given the above conditions it is safe to optimize the construct, and (f) know how to optimize it.
    Well you could argue that it doesn't really have to check for b and d because you aren't allowed to create user defined functions with those names. Doing so would be undefined behaviour.

  15. #15
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    Quote Originally Posted by anonytmouse
    Code:
    for ( int ix = 0; ix < strlen(a_str); ix++)
    {
         a_str[ix] = tolower( (unsigned char) a_str[ix] );
    }
    I myself am guilty of this, but only in code written in haste as an example to show somebody something... the only reason I write it like that in the first place is because I had it pounded into my head by some bad teacher in high school...

    hi, my name is Major_Small, and I'm an addict that has started the short path to recovery.


    anonytmouse: have you written a tip about it?
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Playing around strlen :)
    By audinue in forum C Programming
    Replies: 6
    Last Post: 06-13-2008, 03:22 PM
  2. strlen help
    By stewie1986 in forum C Programming
    Replies: 10
    Last Post: 12-04-2007, 12:15 PM
  3. strlen in expressions
    By justforthis1 in forum C++ Programming
    Replies: 4
    Last Post: 10-24-2006, 10:28 AM
  4. strlen()
    By exoeight in forum C Programming
    Replies: 9
    Last Post: 04-01-2005, 10:18 AM
  5. strlen
    By dirgni in forum C++ Programming
    Replies: 6
    Last Post: 12-08-2002, 11:57 PM