Thread: Modifying a string literal.

  1. #1
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629

    Modifying a string literal.

    Hi all, hope you had a good christmas
    Em yeah so String literals, my lecturer mentioned this, you can't modify a string literal.

    I guess he meant something like this:
    Code:
               char *string = "Hello World" ;
    that would be unmodifiable because string is not only a pointer to H and somewhere in memory, its memory is already static...so using strcat on is wrong.
    in what other occasions can you not modify a string?
    also if i had
    Code:
     
            string s = "Hello" + "World" ;// why is that not possible? Joining 2 string literals? 
    //whereas this
            string str = "Hello" + string("World") //or viceversa is correct.. I 
                                                                      //  know something   to do with pointers?
    and also how come when I do this
    Code:
                printf("%p", &"Hello World") ;
    displays an actual address..which means "Hello World" is stored in memory before it is displayed to the screen or in my case before the address is displayed to the screen..
    but
    Code:
                     printf("%p", &1) //fails to work?
    Thanks.
    You ended that sentence with a preposition...Bastard!

  2. #2
    Nasal Demon Xupicor's Avatar
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    179
    Isn't string literal of type a "const char[]" and not "char[]"?
    Code:
    std::string s = "hi" + "there"; // bad - const char[] + const char[] :)
    //error: invalid operands of types 'const char [6]' and 'const char [6]' to binary 'operator+'
    Whereas there is overloaded std::string operator+(const char* cstr, const std::string& str) which works ask you would expect.

    You can't try this:
    Code:
    const char* operator+(const char* a, const char* b);
    // error: 'const char* operator+(const char*, const char*)' must have an argument of class or enumerated type
    If you're interested what the standard has to say about literals, it's in section 2.13. Sorry for the long paste:
    2.13.4 String literals [lex.string]
    string-literal:
    "s-char-sequenceopt"
    L"s-char-sequenceopt"
    s-char-sequence:
    s-char
    s-char-sequence s-char
    s-char:
    any member of the source character set except
    the double-quote ", backslash \, or new-line character
    escape-sequence
    universal-character-name
    1 A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally
    beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary
    string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n
    const char” and static storage duration (3.7), where n is the size of the string as defined below, and is
    initialized with the given characters. A string literal that begins with L, such as L"asdf", is a wide string
    literal. A wide string literal has type “array of n const wchar_t” and has static storage duration, where
    n is the size of the string as defined below, and is initialized with the given characters.
    2 Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementationdefined.
    The effect of attempting to modify a string literal is undefined.
    3 In translation phase 6 (2.1), adjacent narrow string literals are concatenated and adjacent wide string literals
    are concatenated. If a narrow string literal token is adjacent to a wide string literal token, the behavior is
    undefined. Characters in concatenated strings are kept distinct. [Example:
    "\xA" "B"
    contains the two characters ’\xA’ and ’B’ after concatenation (and not the single hexadecimal character
    ’\xAB’). ]
    4 After any necessary concatenation, in translation phase 7 (2.1), ’\0’ is appended to every string literal so
    that programs that scan a string can find its end.
    5 Escape sequences and universal-character-names in string literals have the same meaning as in character literals
    (2.13.2), except that the single quote ’ is representable either by itself or by the escape sequence \’,
    and the double quote " shall be preceded by a \. In a narrow string literal, a universal-character-name may
    map to more than one char element due to multibyte encoding. The size of a wide string literal is the total
    number of escape sequences, universal-character-names, and other characters, plus one for the terminating
    L’\0’. The size of a narrow string literal is the total number of escape sequences and other characters,
    plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating
    ’\0’.

    2.13.1 Integer literals [lex.icon]
    integer-literal:
    decimal-literal integer-suffixopt
    octal-literal integer-suffixopt
    hexadecimal-literal integer-suffixopt
    decimal-literal:
    nonzero-digit
    decimal-literal digit
    octal-literal:
    0
    octal-literal octal-digit
    hexadecimal-literal:
    0x hexadecimal-digit
    0X hexadecimal-digit
    hexadecimal-literal hexadecimal-digit
    nonzero-digit: one of
    1 2 3 4 5 6 7 8 9
    octal-digit: one of
    0 1 2 3 4 5 6 7
    __________________
    21) The term “literal” generally designates, in this International Standard, those tokens that are called “constants” in ISO C.
    15
    hexadecimal-digit: one of
    0 1 2 3 4 5 6 7 8 9
    a b c d e f
    A B C D E F
    integer-suffix:
    unsigned-suffix long-suffixopt
    long-suffix unsigned-suffixopt
    unsigned-suffix: one of
    u U
    long-suffix: one of
    l L
    1 An integer literal is a sequence of digits that has no period or exponent part. An integer literal may have a
    prefix that specifies its base and a suffix that specifies its type. The lexically first digit of the sequence of
    digits is the most significant. A decimal integer literal (base ten) begins with a digit other than 0 and consists
    of a sequence of decimal digits. An octal integer literal (base eight) begins with the digit 0 and consists
    of a sequence of octal digits.22) A hexadecimal integer literal (base sixteen) begins with 0x or 0X and
    consists of a sequence of hexadecimal digits, which include the decimal digits and the letters a through f
    and A through F with decimal values ten through fifteen. [Example: the number twelve can be written 12,
    014, or 0XC. ]
    2 The type of an integer literal depends on its form, value, and suffix. If it is decimal and has no suffix, it has
    the first of these types in which its value can be represented: int, long int; if the value cannot be represented
    as a long int, the behavior is undefined. If it is octal or hexadecimal and has no suffix, it has the
    first of these types in which its value can be represented: int, unsigned int, long int, unsigned
    long int. If it is suffixed by u or U, its type is the first of these types in which its value can be represented:
    unsigned int, unsigned long int. If it is suffixed by l or L, its type is the first of these
    types in which its value can be represented: long int, unsigned long int. If it is suffixed by ul,
    lu, uL, Lu, Ul, lU, UL, or LU, its type is unsigned long int.
    3 A program is ill-formed if one of its translation units contains an integer literal that cannot be represented
    by any of the allowed types.

  3. #3
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    em, I'm still lost haha.
    especially by the standard stuff.
    I do understand some of what it is saying,
    but i have never heard of any L string or L"..." whatever it is!

    so this is wrong
    "Hello" + "World"
    because I am adding two const strings?
    You ended that sentence with a preposition...Bastard!

  4. #4
    Nasal Demon Xupicor's Avatar
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    179
    It's because you are trying to add "const char[] " to another "const char[]" (c-strings). Isn't that what the error is saying?
    Just think about it, if you could do that, why would anybody care about writing strcat() function?
    You could think about it as adding two const pointers to char - you don't get a new combined c-string as a result, and actually adding pointers together makes little sense.

    And yeah, the standard can confuse beginners (and experienced programmers alike). Maybe I shouldn't have posted that... Still, you learned something new, so maybe it wasn't that bad after all.

    Here, it may be better in making you see what I mean:
    Why can't you add two string literals? - C++ Forums

  5. #5
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by Xupicor View Post
    It's because you are trying to add "const char[] " to another "const char[]" (c-strings). Isn't that what the error is saying?
    Just think about it, if you could do that, why would anybody care about writing strcat() function?
    You could think about it as adding two const pointers to char - you don't get a new combined c-string as a result, and actually adding pointers together makes little sense.

    And yeah, the standard can confuse beginners (and experienced programmers alike). Maybe I shouldn't have posted that... Still, you learned something new, so maybe it wasn't that bad after all.

    Here, it may be better in making you see what I mean:
    Why can't you add two string literals? - C++ Forums
    oh so basically the strings are converted to pointer, i assume the pointer to the first character and then added together which doesn't work...

    I wonder how the overloaded '+' handle concatenating pointers to a string object?

    and could you please help with the print question?
    Code:
            printf("%p" , &"Hello") ;
    it prints an address, why is the string "Hello" bothered to be stored in memory at all?
    Ty!
    You ended that sentence with a preposition...Bastard!

  6. #6
    Nasal Demon Xupicor's Avatar
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    179
    Quote Originally Posted by Eman View Post
    I wonder how the overloaded '+' handle concatenating pointers to a string object?
    It's probably something like this:
    Code:
    std::string operator+(const std::string& str; const char* cstr) {
        std::string tmp(cstr);
        return str + tmp;
    }
    std::string operator+(const char* cstr; const std::string& str) {
        std::string tmp(cstr)
        return tmp + str;
    }
    Quote Originally Posted by Eman View Post
    and could you please help with the print question?
    Code:
            printf("%p" , &"Hello") ;
    it prints an address, why is the string "Hello" bothered to be stored in memory at all?
    Ty!
    Hm, and where do you think the string literal should be stored?

  7. #7
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by Xupicor View Post
    It's probably something like this:
    Code:
    std::string operator+(const std::string& str; const char* cstr) {
        std::string tmp(cstr);
        return str + tmp;
    }
    std::string operator+(const char* cstr; const std::string& str) {
        std::string tmp(cstr)
        return tmp + str;
    }
    Hm, and where do you think the string literal should be stored?
    mmmn I wondered if it was something like that..but I don't think you can store a
    string of type char* to a string object...

    I don't know, but I assume the string "Hello World" does have an address it must be an address to where it is stored, so I assume the string is stored in a temporary loc and then memory is freed after the printf statement....?
    You ended that sentence with a preposition...Bastard!

  8. #8
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Eman View Post
    mmmn I wondered if it was something like that..but I don't think you can store a
    string of type char* to a string object...
    you absolutely can. one of the constructors REQUIRED by the C++ standard takes a const char* as a parameter.

  9. #9
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by Eman View Post
    mmmn I wondered if it was something like that..but I don't think you can store a
    string of type char* to a string object...
    There are many overloaded versions of the std::string constructor. Some of those take const char* arguments which basically take a string literal and make a std::string out of it.

    This constructor does such a thing:
    Code:
    std::string foo("Hello World");
    ...it takes the string literal "Hello World" and creates/constructs a std::string object from that.

    In the overloaded operator+ examples provided, the const char* argument is made into a temporary std::string object and this temporary string is then concatenated using another overloaded version of operator+ which takes two separate std::string objects and squishes them together. A copy of the temporary combined string is then returned to the caller. That allows such things as this to happen:
    Code:
    std::string foo1 = "Hello " + std::string("World");  // Add string literal to a std::string
    std::string foo2 = std::string("Hello ") + "World";  // Add std::string to a string literal
    The temporary combined string is used to initialize foo1/2 in the above sample. The temporary is then thrown away after being used in the initialization.



    I don't know, but I assume the string "Hello World" does have an address it must be an address to where it is stored, so I assume the string is stored in a temporary loc and then memory is freed after the printf statement....?
    String literals are stored in the programs and loaded into memory when the program is executed that is typically marked in some manner as read-only (this makes it dangerous to attempt to directly modify string literals). As they are stored in memory they do have an address but they persist in memory throughout the lifetime of the program. The memory is freed/cleared by the OS once the program is finished. It (the string literal) is temporary only in that it exists at that memory location for as long as the program is running. It is not constructed/built solely for the purpose of the printf and then freed immediately after.

    Sometimes the compiler will see the same string literal being used in many places throughout the program and decide to only have one single instance of that literal in memory since all of them contain the same data. All instances of code referring to that string literal in this case would point to the same address. This saves space in the code and can sometime be controlled through compiler options. This can be seen here:
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  10. #10
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    Quote Originally Posted by Elkvis View Post
    you absolutely can. one of the constructors REQUIRED by the C++ standard takes a const char* as a parameter.
    And obviously char* can be implicitly converted to const char*.

  11. #11
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by Elkvis View Post
    you absolutely can. one of the constructors REQUIRED by the C++ standard takes a const char* as a parameter.
    :S ok my bad then.
    I had problems with it when I was try to convert binary C string into log and latitudes floats, but it works now... I must have done something wrong then...

    that is perfectly clear then..
    what about the printf() question? Thanks
    You ended that sentence with a preposition...Bastard!

  12. #12
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    wow sorry i didn't see all the post, I was busy replying to Elkvis...

    @hk_m....
    what was that thing you used in the cout statement?

    I understand now why modifying a string literal such as that is dangerous, i thought that after the prinf("%s", "Hello World") statement the memory would be freed, but why does the compiler not see the reason to optimize if it compiles the whole code and see that there is only one printf to print Hello World out?

    so if strings like this
    Code:
          char *str = "Hello World" ;
          char str[] ="Hello World" ; 
           //is this a string literal? After all it is modifiable
          char s[100] ="Hello World" //modifiable because i Can still add 100-(strlen(s)-1) 
                                                     //characters
    
               //     and this
      
             string str = "Hello" //is it a string literal ? can still be modified using the +
    You ended that sentence with a preposition...Bastard!

  13. #13
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    Quote Originally Posted by Eman View Post
    wow sorry i didn't see all the post, I was busy replying to Elkvis...

    @hk_m....
    what was that thing you used in the cout statement?

    I understand now why modifying a string literal such as that is dangerous, i thought that after the prinf("%s", "Hello World") statement the memory would be freed, but why does the compiler not see the reason to optimize if it compiles the whole code and see that there is only one printf to print Hello World out?

    so if strings like this
    Code:
          char *str = "Hello World" ;
          char str[] ="Hello World" ; 
           //is this a string literal? After all it is modifiable
          char s[100] ="Hello World" //modifiable because i Can still add 100-(strlen(s)-1) 
                                                     //characters
    
               //     and this
      
             string str = "Hello" //is it a string literal ? can still be modified using the +
    Your string cannot be modified only in the first case, because str is a POINTER to string literal.
    In the second case str is not a pointer but an array containing a copy of "Hello World". The same situation is in the third case, where the array's size is explicitly specified.
    In the fourth case std::string instance is initialized to const char* (a constructor will be invoked). Actually std::string will make a copy of the given string, but if it is a different class, which does not do that, it still points to the literal.

    If you pass #1 to printf, you pass the value of str pointer. If you pass #2 or #3, you pass address of the first element. For #4 you need c_str() member function (will pass pointer to internal buffer, which you must not modify).

  14. #14
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Eman View Post
    I understand now why modifying a string literal such as that is dangerous, i thought that after the prinf("%s", "Hello World") statement the memory would be freed,
    The compiler is never allowed to give your memory back to the system. Same thing with (say) a const int that you declare.

    String literal means "thing you have in your program in double quotes". So all the instances of "Hello World" in your program are string literals. str is a pointer, and you have it set (at the moment) to point to a string literal, so where it points is not writable memory. The other two are arrays. The string literal is used as an initializer for those arrays, but they each are their own chunk of memory and you can do with them what you will.

  15. #15
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by tabstop View Post
    str is a pointer, and you have it set (at the moment) to point to a string literal, so where it points is not writable memory. The other two are arrays. The string literal is used as an initializer for those arrays, but they each are their own chunk of memory and you can do with them what you will.
    The first array however does not have char str[] = "Hello World" should not be modifiable because the compiler will implicitly set the length of the array with the null character, any attempts to do a strcat or '+' should result in a seg fault, should it not...?

    But for the second array how you explained it makes sense, I can still add strings to the empty space as long as I stay within bounds and remember to include a byte for the null character.
    You ended that sentence with a preposition...Bastard!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. question about string literal
    By pangzhang in forum C Programming
    Replies: 6
    Last Post: 07-31-2010, 07:25 AM
  2. Polymorphism and generic lists
    By Shibby3 in forum C# Programming
    Replies: 9
    Last Post: 07-26-2010, 05:27 AM
  3. Replies: 60
    Last Post: 05-31-2010, 10:57 AM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM