Thread: Construct a std::string from char*

  1. #1
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853

    Construct a std::string from char*

    Is there a way to make a std::string from a char* without copying it? If I understand correctly using the = operator will copy a c-string to a std::string. Using std::string constructor will again do the same. Is there a way to assign the internal char array of std::string to a char* or char[]?

    I am just looking for a method that I can use a std::string when a c-method asks for a char* to right on. If it asks to read, this is solved with string.c_str(). But I am kind of confused about how to write on a std::string instead of a char*.

    I don't want the copy just because of the added time needed. A memset() hack could be available, but it would depend on how std::string is implemented (where its internal char array is located)

  2. #2
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Like you said, it depends on how std::string is implemented, which breaks OOP.

    C++ doesn't provide such a facility because, for example, an implementation may not use an array at all.

    Also, STL containers like strings and vectors use and manage their own dynamic memory. They won't be able to use just any pointer you pass in. What if it needs more space later on?

  3. #3
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    string.data()

  4. #4
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Quote Originally Posted by whiteflags View Post
    string.data()
    Yup that with combination of const_cast works fine. But you can also use c_str I guess, which would append a '\0'. I always thought c_str() created a copy, but I guess they rely on the const of it not to be corrupted.

    Treating a std::string as a char* is useful on occasions that you have to work with char*. Instead of using both c-strings and std::string I find it a better idea to use only std::string and use the above method to pass them to functions that will write in them taking advantage of pointers.
    Would you consider doing:
    Code:
    std::string str;
    ....
    recv(const_cast<char*>(str.data()), str.size(), 0);
    a bad idea?
    There is no chance that you won't have memory, so I can't seem to find any problem. So you get read of c-strings and use std::string. The best would be to have a recv() function that accepts a std::string& and optionally a length parameter, still ensuring you won't write more than size() char.

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Yep, either I live with a cast or copy stuff around.

    Wouldn't use c_string instead of data though -- to be clear, the reference should only say that data "returns a pointer to the first element". Exactly what we want. Other than that, I can't really defend my preference.

  6. #6
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by C_ntua View Post
    Yup that with combination of const_cast works fine. But you can also use c_str I guess, which would append a '\0'. I always thought c_str() created a copy, but I guess they rely on the const of it not to be corrupted.
    The reason you need a const_cast is because string::data returns const - the intent is that it will not be used to modify the string's data.

    c_str() potentially returns a copy. That is implementation defined.

    Quote Originally Posted by C_ntua View Post
    Would you consider doing:
    Code:
    std::string str;
    ....
    recv(const_cast<char*>(str.data()), str.size(), 0);
    a bad idea?
    Yes, I would, for the reason I mentioned above. Also, as cyberfish said, there is no guarantee that std::string uses a contiguous array of char. Your technique assumes it does, and will break if you use an implementation of the standard library that does not do that.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    You absolutely cannot do such a thing in C++03. GCC's std::string, for example, uses reference counting, so you could be overwriting the memory that is shared between different strings.

    std::string's memory MUST be continuous, that's actually guaranteed by implication in C++03, and explicitly in C++0x. But you must never modify the memory returned by data().

    You may, however, pass &s[0] as a char* and modify from there.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by CornedBee
    std::string's memory MUST be continuous, that's actually guaranteed by implication in C++03, and explicitly in C++0x.
    Actually, it is not guaranteed by implication in C++03 due to a defect in the standard that will be fixed in the next version (as you noted). Therefore, whether you can "pass &s[0] as a char* and modify from there" in C++03 is not guaranteed (unless it has been fixed in a defect report, but I have not checked, and it does not matter in practice now anyway).
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #9
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by CornedBee View Post
    std::string's memory MUST be continuous, that's actually guaranteed by implication in C++03, and explicitly in C++0x. But you must never modify the memory returned by data().
    Strings in the C++ standard (ratified 1998 as opposed to later proposed revisions) are only required to conform with requirements of a Sequence. A Sequence - according to Section 23.1.1 - may be contiguous, but need not be (a linked list would also comply with requirements of a Sequence).

    Interestingly, both data() and c_str() are specifically allowed (but not required) to invalidate any of a basic_string's iterators. This suggests both methods (among other things) have freedom to affect the internal data representation of the string.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  10. #10
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    The string can (and probably does) also keep additional information about the string to speed it up.

    For example, the length (for an O(1) .size()). If you change the length (make it shorter) of the string itself by direct memory manipulation, you are breaking the object's invariants because the length won't be updated accordingly, and obviously bad things will happen.

    I imagine, for a paranoid implementation, it can also keep an incrementally updated hash or something, and similarly bad things will happen.

    Or for an implementation where some operations (eg. find()) need to be very fast, they could use additional data structures to complement the array, and manipulating memory directly will also break that, and bad things will happen.

    In short, bad things will happen.

  11. #11
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Quote Originally Posted by CornedBee View Post
    You absolutely cannot do such a thing in C++03. GCC's std::string, for example, uses reference counting, so you could be overwriting the memory that is shared between different strings.
    Can you elaborate on that? I don't get what you mean. Since it passes a char* it would point on a memory location with continuous memory, otherwise there is no sense returning a pointer. That continuous memory location would have size() bytes that have the same value of the std::string.
    In other words, since you can read them, why can't you write on them?

    My point is that the way I understand it you will be writing on a memory location that is reserved to have the values of the std::string. If data() makes a copy, then you will be writing on the copy, which would make the method useless. But I don't see how you can risk changing something you shouldn't change? The size() of the std::string for example cannot be on the bytes you are writing. The const char* returned is internal also, which means it has a specific location. Otherwise, calling data() two times and assuming it allocates memory somewhere and then frees that memory to allocate somewhere else, would case the pointer of the first data() to point to somewhere invalid. So I am guessing that the value returned from data() is always the same.

  12. #12
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    They can definitely not be the same.

    The pointer returned by data() is only valid until next time the string is changed (calling a non-const function).

    The memory is allocated on the heap. When it runs out of space, it will allocate another bigger chunk, and move the string there. The pointer will change.

  13. #13
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Yes, of course, you are right about that.
    What if you called reserve() first to make sure it won't re-allocate memory?

  14. #14
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    See my post #10.

    From GCC's header -
    Code:
          struct _Rep_base
          {
    	size_type		_M_length;
    	size_type		_M_capacity;
    	_Atomic_word		_M_refcount;
          };
    There's definitely more than just the array in the string.

  15. #15
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    So

    Code:
       *  A string looks like this:
       *
       *  @code
       *                                                 [_Rep]
       *                                                 _M_length
       *   [basic_string<char_type>]    _M_capacity
       *   _M_dataplus                          _M_refcount
       *   _M_p ---------------->               unnamed array of char_type
       *  @endcode
       *
       *  Where the _M_p points to the first character in the string
        ........................
       *  The reason you want _M_data pointing to the character array and
       *  not the _Rep is so that the debugger can see the string
       *  contents.
    and later on
    Code:
    data() const { return _M_data(); }
    ............
    _M_data() const  { return  _M_dataplus._M_p; }
    _CharT* _M_p; // The actual data.
    So I am guessing that data() returns a pointer to the first character of the string, always. That is after everything else. Since you can read those data, you can as well change them.

    If you do this
    Code:
    char* buf = const_cast<char*>(str.data());
    you should have buf = str._M_p. If the string re-allocates memory it would do something like
    Code:
    str._M_p = realloc(....);
    sto wouldn't
    Code:
    *buf = 'a';
    still modify the first character of the string??
    If you later do
    Code:
    str += "baa";
    it will reallocate memory and copy the values. Again won't buf be pointing at _M_p, thus at the internal buffer?

    I am reading about reference counting, so I get what CornedBee meant. I guess it becomes troublesome

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 13
    Last Post: 12-14-2007, 03:34 PM
  2. std::string: Has my compiler gone nuts??
    By Andruu75 in forum C++ Programming
    Replies: 9
    Last Post: 09-28-2007, 04:02 AM
  3. Debugging help
    By cuddlez.ini in forum C++ Programming
    Replies: 3
    Last Post: 10-24-2004, 07:08 PM
  4. DLL and std::string woes!
    By Magos in forum C++ Programming
    Replies: 7
    Last Post: 09-08-2004, 12:34 PM
  5. returning std::string
    By Unregistered in forum C++ Programming
    Replies: 3
    Last Post: 09-24-2001, 08:31 PM