std::vector<char> vs. std::string

Printable View

12-14-2006
drrngrvy

std::vector<char> vs. std::string

I'm in a situation where I could save copying some memory if I could use a std::vector. The question is, what's the difference between, for instance, a std::vector<char> and a std::string? As far as time/space efficiency, is there any?
12-14-2006
whiteflags

Vector, I surmise, is an array implementation built for efficiency and safety as opposed to the C variety. It's not an incredibly far-fetched idea to use a vector of chars, because that is sorta what happens in C, but there is a heuristic you can use based on the fact that std::string has lots of member functions to do string work.

Almost always use a string unless you have a very good reason for not doing so. Like, using a vector<char> for your grades isn't such a terrible idea: a letter grade isn't really much of a string and you probably won't need string functions.

As for a performance boost... well, even if there was one, not everything in programming concerns performance.
12-14-2006
Cat

The major differences:

1. std::string has a huge number of string-related functions which make it easy to manipulate strings.

2. std::vector, on the other hand, is guaranteed to be contiguous in memory -- that is, &data[x + 1] = &data[x] + sizeof(data[x]). std::string has NO guarantee that it is contiguous in memory.

So, for example, say you're using an API call that fills a character buffer. You'd need to use the vector, not the string.
12-15-2006
Mario F.

> As far as time/space efficiency, is there any?

There are a few, I reckon. But they only become important when you try to use a vector as a string, or a string as a vector.

Another issue other than the ones already provided concerns how C++ reserves memory for both vector and string. A string doesn't necessarily retain its capacity once assigned while a vector must (given of course the assignment doesn't go over the current capacity). I'm not sure why strings behave this way, but the standard specifies this. It's important to retain the thought.

But it all boils down to not trying to use one type as the other. Meanwhile, both string and vector constructors allow for an easy and quick construction of a vector<char> from a string and the reverse.
12-15-2006
CornedBee

Vectors are extremely rigid in their implementation. They have to be a continuous block of memory, which means that all implementations look more or less the same.

Strings are far, far more flexible. They give fewer guarantees but potentially better performance.
12-15-2006
Rashakil Fol

One example of a difference is the std::string library generally used with gcc, which uses copy-on-write.
12-15-2006
grumpy

Quote:

Originally Posted by Rashakil Fol

One example of a difference is the std::string library generally used with gcc, which uses copy-on-write.

One trade-off of such implementations is that they are not thread safe (or, at least, it is fairly difficult to ensure thread safety in situations where strings are shared between threads).
12-15-2006
CornedBee

GNU's is indeed not fully threadsafe.

The language currently does not quite allow for a fully threadsafe CoW std::string implementation. The sad thing is that I think they expected it would be possible, and it only turned out it wasn't after C++ was standardized.

However, another very popular std::string implementation technique is the small string buffer. Basically, you make a buffer of, say, 16 characters in the std::string object itself. Only if the string grows longer than that do you have to allocate dynamically.
12-15-2006
Mario F.

Is it because of a possible copy-on-write implementation that the standard doesn't require std::string to retain capacity() over assignments?
12-15-2006
CornedBee

That would probably be one of the reasons, yeah. But generally all such restrictions limit the flexibility of the implementation.
12-15-2006
Mario F.

*nods* I always wondered about that one. Thanks.
12-15-2006
grumpy

People may wish to look at this article by Herb Sutter, on this and similar issues. Essentially the message is that various types of optimisations that work well in a single-threaded scenario become either costly or incorrect in a multithreaded scenario.