Thread: std::vector<string>

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    385

    std::vector<string>

    I have a general wondering about std::vector<string>.

    If you declare something like this, I have set 1 million elements to this vector.
    If I run this program just declare it like this without filling the elements with any strings.
    Does this take up any RAM memory just because I have declared this number of elements or do I have to fill them up to take up RAM.
    Then I do wonder how many MB is 1000000 (1 million elements) ?

    Code:
    std::vector<string> vec(1000000);

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    If I run this program just declare it like this without filling the elements with any strings.
    Actually, you have a million default initialised std::string objects in that vector.

    Does this take up any RAM memory just because I have declared this number of elements or do I have to fill them up to take up RAM.
    Then I do wonder how many MB is 1000000 (1 million elements)
    It depends on how much space a default initialised std::string object takes up.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    385
    Yes, thats a good question : ) I am not sure about that. I beleive a defaut initialised element is declared as "".

    Usually if you would declare an element like this:

    Code:
    std::vector<string> vec(1) ;
    vec[0] = "a";
    Does this meen 1 byte and if it is like that, peheps an emty element could take up less ?

  4. #4
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    "a" is 2 bytes and a string object also keeps track of length, so I would say it takes up at least 2 + 4 + 4 = 10 bytes of memory.
    An empty string takes up 8 or 9 bytes I would say. But it really depends on the implementation. I don't think there's any mention in the C++ standard as to what size the object should be.
    (Size [unsigned integer, 4 bytes] + string [char pointer, 4 bytes] + string data [chars, 0 bytes+]).
    So if we build on this, 9 bytes for an empty string = 9 * 1 000 000 bytes = ~8.5 MB.
    (This does not take into account vector overhead for each element, if any.)
    Last edited by Elysia; 03-23-2008 at 11:39 AM.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Does this meen 1 byte and if it is like that, peheps an emty element could take up less ?
    I think that is implementation dependent since the C++ Standard does not specify the return value of capacity() for a default constructed std::string.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    Dec 2007
    Posts
    385
    okay, I see, then I get a better picture. I was always a bit unsure what sizes it really was, so about 8.5 MB is not that much anyway, if you have about 2 vectors like this in your program and 2 GB RAM in the computer.

  7. #7
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Just remember that they grow in size, so will the memory they consume. And this way only a guess. They can take much more than this. It's all up to the implementation. You can monitor how much memory your program is using anyway, so you would see if it uses a lot of RAM when running it.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  8. #8
    Registered User
    Join Date
    Dec 2007
    Posts
    385
    Yes that is true, the larger strings, the more memory etc.... I will run some tests and see what happens.

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Elysia View Post
    "a" is 2 bytes and a string object also keeps track of length, so I would say it takes up at least 2 + 4 + 4 = 10 bytes of memory.
    An empty string takes up 8 or 9 bytes I would say. But it really depends on the implementation. I don't think there's any mention in the C++ standard as to what size the object should be.
    (Size [unsigned integer, 4 bytes] + string [char pointer, 4 bytes] + string data [chars, 0 bytes+]).
    So if we build on this, 9 bytes for an empty string = 9 * 1 000 000 bytes = ~8.5 MB.
    (This does not take into account vector overhead for each element, if any.)
    In reality on Windows it is probably much more than that.
    4 bytes for the pointer to the string on the heap,
    4 bytes for the size,
    Then there's the string on the heap which as you know requires at least 2 bytes. Your system probably wont dynamically allocate any less that 8 bytes at a time from the heap, so that's another 8 bytes.
    So, 16 bytes per string = 16000000 bytes, or over 15.2MB.

    (This is assuming that the implementation doesn't used shared COW strings for small strings, which modern compilers don't seem to any more, or so I've read)
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Maybe so. MSDN doesn't say if there's a "minimum allocation" for HeapAlloc which Microsoft's implementation of new uses.
    So there's no easy way of guessing. The easiest way is, once again, trying it for yourself.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  11. #11
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by Elysia View Post
    "a" is 2 bytes
    To nitpick: A string like "a" need only take up one byte. The string class need not store the terminating null character, until the c_str() method is called.

    The literal string "a" is 2 bytes. string("a") results in a one byte string. string("a",2) results in a 2 byte string with the second character being null. Similarly "a"[1] is valid, but string("a")[1] is not.
    Last edited by King Mir; 03-23-2008 at 02:24 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  12. #12
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Whether or not 'std::string("a")' results in consumption of only one byte and 'std::string("a")[1]' is valid, or rather sound, depends on the implementation. Some implementations, for example, use an allocation strategy such that the size is always modulus 16. Some implementations, targeted at code requiring lots of small strings, use the stack for small literals. Further, an implementation is free to terminate the data with a null on every write.

    Also, it is extremely unlikely that 'std::string("a")' will ever result in consumption of only one byte. I know of no implementation that doesn't take the terminating null into account when allocating memory or additional memory. It is mechanically unsound. (Using the logically constant 'std::string::c_str()' method would require an allocation, a deallocation, a copy and a set.)

    Soma

  13. #13
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by phantomotap View Post
    Also, it is extremely unlikely that 'std::string("a")' will ever result in consumption of only one byte. I know of no implementation that doesn't take the terminating null into account when allocating memory or additional memory. It is mechanically unsound. (Using the logically constant 'std::string::c_str()' method would require an allocation, a deallocation, a copy and a set.)
    You're actually arguing quality of implementation issues, not what the standard allows. The standard requires particular observable behaviours; it does not specifically mandate issues related to quality of implementation (eg performance, efficiency).

    Since std::string's can contain embedded NULLs (i.e. more than one zero-valued byte) it is a fair call that implementation of a std::string will handle terminating bytes differently than one would expect assuming C-style strings. There is also nothing (except, again, quality of implementation concerns) stopping the c_str() method being implemented with allocation/deallication, copy, and set .... this is one reason why changing a std::string invalidates, as far as the standard is concerned, any value previously returned by c_str(), any iterators, etc etc.

  14. #14
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    No. I am not "arguing quality of implementation issues". I'm actually explaining the issues--both what is standard and what you can expect. Obviously, you don't understand. Is it my fault? Or is it yours? Did I fail to explain it? Or are you ignorant?

    [/Quote]The standard requires particular observable behaviours;[/Quote]

    Except where it doesn't--"implementation defined".

    it does not specifically mandate issues related to quality of implementation (eg performance, efficiency).
    Wrong. The standard demands specific performance and specific efficiency in many cases. Indeed, it is rare that it doesn't specify the required performance and efficiency characteristics.

    Since std::string's can contain embedded NULLs (i.e. more than one zero-valued byte) it is a fair call that implementation of a std::string will handle terminating bytes differently than one would expect assuming C-style strings.
    Wrong. The C and C++ standard actually mesh very well regarding this behavior. A conforming implementation of 'std::basic_string' may allow embedded nulls--and possibly embedded elements of every other domain value. However, a conforming implementation must terminate the data returned by 'std::basic_string::c_str()' with a 'charT()' which is obtained through the target type as associated with traits of either 'char' or 'wchar_t'. For either of these the result must be equivalent to the associated C standard string terminator where specified. In practice this means that 'std::basic_string' implementations may contain embedded terminators, regardless of value, but that invalidates the results of 'std::basic_string::c_str()'. (That the terminator is virtually always null is just a bonus.)

    It is your fault if your implementation gives you incorrect results from using the interface wrong. Such behavior doesn't invalidate the implementation or relate to the standard in any way. The standard can't force you to use the interface correctly.

    There is also nothing (except, again, quality of implementation concerns) stopping the c_str() method being implemented with allocation/deallication, copy, and set
    Technically correct, but again flawed. The 'std::basic_string::c_str()' and 'std::basic_string::data()' methods are constant methods. The value returned by these methods must remain valid until a subsequent call to a non-constant method is made. So, such an implementation is logically flawed and mechanically unsound even if conforming in this way because the data returned would have to be cached and later released by the instance of 'std::basic_string'.

    this is one reason why changing a std::string invalidates, as far as the standard is concerned, any value previously returned by c_str(), any iterators, etc etc.
    Correct. If you mutate 'std::basic_string' in any way most of the previously reported state is considered invalid. Again, the 'std::basic_string::c_str()' and 'std::basic_string::data()' methods are defined as constant methods and are absolutely logically constant operations.

    Soma

  15. #15
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    We're not discussing implementation-defined behaviours here. You were making statements about how things (in your opinion) should be implemented, and implying those as absolute requirements -- and are continuing to do so.

Popular pages Recent additions subscribe to a feed