simple string transformation function

**Aisthesis** · 02-28-2010

In Prata, C++ Primer Plus, ch. 16 (STL, etc.) one of the exercises (ex. 3, p. 948) is to write a function converting a string to upper case. This is obviously not hard, but since my solution differs from his in the answers, I wondered which one you guys would view as superior:

My solution:

Code:

void str_toupper(std::string& s)
{
	std::transform(s.begin(), s.end(), s.begin(), std::toupper);
}

My code does require the <algorithm> header, which his doesn't:

Code:

void ToUpper(string& str)
{
     for (int i = 0; i < str.size(); i++)
          str[i] = toupper(str[i]);
}

Which version is better? Is one or the other going to be faster in handling a string of near maximum size, for example? Or is it just 6 of one, half-dozen of the other?

In terms of lines of code, I get 1 line shorter inside the function but do have to #include <algorithm> for an extra line at the top, so I view that as a tie unless the inclusion of algorithm adds unnecessary overhead...

**MK27** · 02-28-2010

I don't know, but I there is one glaring inefficiency in the second one:

Code:

 for (int i = 0; i < str.size(); i++)

Now str.size() will be called every iteration (and it is probably the most expensive op in this loop, too). It should be:

Code:

int len = str.size();
for (int i = 0; i < len; i++)

Originally Posted by Aisthesis

In terms of lines of code, I get 1 line shorter inside the function but do have to #include <algorithm> for an extra line at the top, so I view that as a tie unless the inclusion of algorithm adds unnecessary overhead...

Lines of code is not a very significant measurement of anything besides style. Being concise and tidy is good, but methinks code base can easily (tho not necessarily) be inverse to code efficiency* -- eg, by adding a line or two above, the efficiency is improved.

Including the library (algorithm) is only a cost in terms of mem usage, I think. I would also guess that using a "high level", generic/abstract rountine (such as algo lib functions) can never be more optimal than custom low level code, such as this variation:

Code:

void ToUpper(string& str)
{
    int len = str.size();
    for (int i = 0; i < len; i++) 
              if (str[i] > 96 && str[i] < 123) str[i] -= 32;
}

Still more code, still greater efficiency? But I maybe stand to be corrected...eg, the compiler may actually make up for most of this.

*especially if shrinking the code size is over prioritized in your mind.

**laserlight** · 02-28-2010

Originally Posted by Aisthesis

Which version is better?

This depends on your judging criteria

Personally, I prefer your version as the named algorithm gives a better description at a glance of what is being done than an explicit loop. On the other hand, I am still rather unsure of how stuff like locales come into play when you just pass std::toupper to std::transform.

Originally Posted by Aisthesis

there is one glaring inefficiency in the second one: (...) Now str.size() will be called every iteration (and it is probably the most expensive op in this loop, too).

It is probably not that bad though, since size() runs in constant time. But yeah, it might as well be pulled out of the loop.

Originally Posted by MK27

I would also guess that using a "high level", generic/abstract rountine (such as algo lib functions) can never be more optimal than custom low level code, such as this variation:

That is true, if you are able to take advantage of facts about your data/problem domain that generic algorithms are unaware of. In this case you are making a reasonable assumption concerning the values of the given letters in the character set. It might be a little more readable as:

Code:

void ToUpper(std::string& str)
{
    for (std::string::size_type i = 0, len = str.size(); i < len; ++i) 
        if (str[i] >= 'a' && str[i] <= 'z')
            str[i] -= 'a' - 'A';
}

**Aisthesis** · 02-28-2010

tx for the helpful comments!

My summary: Mine has some merit if you want to go high-level. Optimal is presumably laserlight's revision of Mk's solution, as it has the virtues of pulling size out of the loop (elegantly putting it in the initialization), going lower than toupper like MK and clarifying what's going on with the ASCII.

Perhaps even one additional wrinkle:

Code:

void ToUpper(std::string& str)
{
    const char DIFF = 'a' - 'A';
    for (std::string::size_type i = 0, len = str.size(); i < len; ++i) 
        if (str[i] >= 'a' && str[i] <= 'z')
            str[i] -= DIFF;
}

Thread: simple string transformation function

Thread Tools

Search Thread

Display