Thread: Modifying a string literal.

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629

    Modifying a string literal.

    Hi all, hope you had a good christmas
    Em yeah so String literals, my lecturer mentioned this, you can't modify a string literal.

    I guess he meant something like this:
    Code:
               char *string = "Hello World" ;
    that would be unmodifiable because string is not only a pointer to H and somewhere in memory, its memory is already static...so using strcat on is wrong.
    in what other occasions can you not modify a string?
    also if i had
    Code:
     
            string s = "Hello" + "World" ;// why is that not possible? Joining 2 string literals? 
    //whereas this
            string str = "Hello" + string("World") //or viceversa is correct.. I 
                                                                      //  know something   to do with pointers?
    and also how come when I do this
    Code:
                printf("%p", &"Hello World") ;
    displays an actual address..which means "Hello World" is stored in memory before it is displayed to the screen or in my case before the address is displayed to the screen..
    but
    Code:
                     printf("%p", &1) //fails to work?
    Thanks.
    You ended that sentence with a preposition...Bastard!

  2. #2
    Nasal Demon Xupicor's Avatar
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    179
    Isn't string literal of type a "const char[]" and not "char[]"?
    Code:
    std::string s = "hi" + "there"; // bad - const char[] + const char[] :)
    //error: invalid operands of types 'const char [6]' and 'const char [6]' to binary 'operator+'
    Whereas there is overloaded std::string operator+(const char* cstr, const std::string& str) which works ask you would expect.

    You can't try this:
    Code:
    const char* operator+(const char* a, const char* b);
    // error: 'const char* operator+(const char*, const char*)' must have an argument of class or enumerated type
    If you're interested what the standard has to say about literals, it's in section 2.13. Sorry for the long paste:
    2.13.4 String literals [lex.string]
    string-literal:
    "s-char-sequenceopt"
    L"s-char-sequenceopt"
    s-char-sequence:
    s-char
    s-char-sequence s-char
    s-char:
    any member of the source character set except
    the double-quote ", backslash \, or new-line character
    escape-sequence
    universal-character-name
    1 A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally
    beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary
    string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n
    const char” and static storage duration (3.7), where n is the size of the string as defined below, and is
    initialized with the given characters. A string literal that begins with L, such as L"asdf", is a wide string
    literal. A wide string literal has type “array of n const wchar_t” and has static storage duration, where
    n is the size of the string as defined below, and is initialized with the given characters.
    2 Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementationdefined.
    The effect of attempting to modify a string literal is undefined.
    3 In translation phase 6 (2.1), adjacent narrow string literals are concatenated and adjacent wide string literals
    are concatenated. If a narrow string literal token is adjacent to a wide string literal token, the behavior is
    undefined. Characters in concatenated strings are kept distinct. [Example:
    "\xA" "B"
    contains the two characters ’\xA’ and ’B’ after concatenation (and not the single hexadecimal character
    ’\xAB’). ]
    4 After any necessary concatenation, in translation phase 7 (2.1), ’\0’ is appended to every string literal so
    that programs that scan a string can find its end.
    5 Escape sequences and universal-character-names in string literals have the same meaning as in character literals
    (2.13.2), except that the single quote ’ is representable either by itself or by the escape sequence \’,
    and the double quote " shall be preceded by a \. In a narrow string literal, a universal-character-name may
    map to more than one char element due to multibyte encoding. The size of a wide string literal is the total
    number of escape sequences, universal-character-names, and other characters, plus one for the terminating
    L’\0’. The size of a narrow string literal is the total number of escape sequences and other characters,
    plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating
    ’\0’.

    2.13.1 Integer literals [lex.icon]
    integer-literal:
    decimal-literal integer-suffixopt
    octal-literal integer-suffixopt
    hexadecimal-literal integer-suffixopt
    decimal-literal:
    nonzero-digit
    decimal-literal digit
    octal-literal:
    0
    octal-literal octal-digit
    hexadecimal-literal:
    0x hexadecimal-digit
    0X hexadecimal-digit
    hexadecimal-literal hexadecimal-digit
    nonzero-digit: one of
    1 2 3 4 5 6 7 8 9
    octal-digit: one of
    0 1 2 3 4 5 6 7
    __________________
    21) The term “literal” generally designates, in this International Standard, those tokens that are called “constants” in ISO C.
    15
    hexadecimal-digit: one of
    0 1 2 3 4 5 6 7 8 9
    a b c d e f
    A B C D E F
    integer-suffix:
    unsigned-suffix long-suffixopt
    long-suffix unsigned-suffixopt
    unsigned-suffix: one of
    u U
    long-suffix: one of
    l L
    1 An integer literal is a sequence of digits that has no period or exponent part. An integer literal may have a
    prefix that specifies its base and a suffix that specifies its type. The lexically first digit of the sequence of
    digits is the most significant. A decimal integer literal (base ten) begins with a digit other than 0 and consists
    of a sequence of decimal digits. An octal integer literal (base eight) begins with the digit 0 and consists
    of a sequence of octal digits.22) A hexadecimal integer literal (base sixteen) begins with 0x or 0X and
    consists of a sequence of hexadecimal digits, which include the decimal digits and the letters a through f
    and A through F with decimal values ten through fifteen. [Example: the number twelve can be written 12,
    014, or 0XC. ]
    2 The type of an integer literal depends on its form, value, and suffix. If it is decimal and has no suffix, it has
    the first of these types in which its value can be represented: int, long int; if the value cannot be represented
    as a long int, the behavior is undefined. If it is octal or hexadecimal and has no suffix, it has the
    first of these types in which its value can be represented: int, unsigned int, long int, unsigned
    long int. If it is suffixed by u or U, its type is the first of these types in which its value can be represented:
    unsigned int, unsigned long int. If it is suffixed by l or L, its type is the first of these
    types in which its value can be represented: long int, unsigned long int. If it is suffixed by ul,
    lu, uL, Lu, Ul, lU, UL, or LU, its type is unsigned long int.
    3 A program is ill-formed if one of its translation units contains an integer literal that cannot be represented
    by any of the allowed types.

  3. #3
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    em, I'm still lost haha.
    especially by the standard stuff.
    I do understand some of what it is saying,
    but i have never heard of any L string or L"..." whatever it is!

    so this is wrong
    "Hello" + "World"
    because I am adding two const strings?
    You ended that sentence with a preposition...Bastard!

  4. #4
    Nasal Demon Xupicor's Avatar
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    179
    It's because you are trying to add "const char[] " to another "const char[]" (c-strings). Isn't that what the error is saying?
    Just think about it, if you could do that, why would anybody care about writing strcat() function?
    You could think about it as adding two const pointers to char - you don't get a new combined c-string as a result, and actually adding pointers together makes little sense.

    And yeah, the standard can confuse beginners (and experienced programmers alike). Maybe I shouldn't have posted that... Still, you learned something new, so maybe it wasn't that bad after all.

    Here, it may be better in making you see what I mean:
    Why can't you add two string literals? - C++ Forums

  5. #5
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by Xupicor View Post
    It's because you are trying to add "const char[] " to another "const char[]" (c-strings). Isn't that what the error is saying?
    Just think about it, if you could do that, why would anybody care about writing strcat() function?
    You could think about it as adding two const pointers to char - you don't get a new combined c-string as a result, and actually adding pointers together makes little sense.

    And yeah, the standard can confuse beginners (and experienced programmers alike). Maybe I shouldn't have posted that... Still, you learned something new, so maybe it wasn't that bad after all.

    Here, it may be better in making you see what I mean:
    Why can't you add two string literals? - C++ Forums
    oh so basically the strings are converted to pointer, i assume the pointer to the first character and then added together which doesn't work...

    I wonder how the overloaded '+' handle concatenating pointers to a string object?

    and could you please help with the print question?
    Code:
            printf("%p" , &"Hello") ;
    it prints an address, why is the string "Hello" bothered to be stored in memory at all?
    Ty!
    You ended that sentence with a preposition...Bastard!

  6. #6
    Nasal Demon Xupicor's Avatar
    Join Date
    Sep 2010
    Location
    Poland
    Posts
    179
    Quote Originally Posted by Eman View Post
    I wonder how the overloaded '+' handle concatenating pointers to a string object?
    It's probably something like this:
    Code:
    std::string operator+(const std::string& str; const char* cstr) {
        std::string tmp(cstr);
        return str + tmp;
    }
    std::string operator+(const char* cstr; const std::string& str) {
        std::string tmp(cstr)
        return tmp + str;
    }
    Quote Originally Posted by Eman View Post
    and could you please help with the print question?
    Code:
            printf("%p" , &"Hello") ;
    it prints an address, why is the string "Hello" bothered to be stored in memory at all?
    Ty!
    Hm, and where do you think the string literal should be stored?

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Eman
    A different class which does not do that? I don't understand..there will be cases where the string class would not copy the string? and why should it point?? it is an object not a pointer to a literal... ?
    What kmdv is saying is that you could write another string class that does not perform such copying. Such a custom string object could contain a pointer that merely pointed to the string literal's first character. I do not recommend that you pursue this avenue of thought until you are far more skilled in understanding the basics of the std::string class and how to use it.

    Quote Originally Posted by Eman
    So I am just wondering "You should not modify a string literal..."
    What does that statement mean?
    It means that you should not attempt to assign to any of the characters in a string literal, including the null terminator.
    Last edited by laserlight; 12-27-2010 at 12:41 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    oh right...haha I doubt I could do even do what kmdv is thinking even if I wanted to :P


    i tried this:
    Code:
           string str ="Hello" ; 
            printf("%s", str)
    It compiled, I guess because I am using a C++ compiler so it didn't give any errors.
    But what it printed was utter nonsense..
    I had to use
    str.c_str()...which converts str to a c string? I didn't know printfs receive only pointers to strings....
    Last edited by Eman; 12-27-2010 at 12:47 PM.
    You ended that sentence with a preposition...Bastard!

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Eman
    I had to use
    str.c_str()...which converts str to a c string? I didn't know printfs receive only pointers to strings....
    The %s format specifier expects a corresponding argument that is a pointer to the first character of a null terminated string.

    Quote Originally Posted by Eman
    haha i am gobsmacked, if it is illegal why not give a seg fault? lol
    Because it results in undefined behaviour. One characteristic of undefined behaviour is that the code may appear to work perfectly fine until the time when you really need it to work (e.g., when demonstrating the program to the customer in front of the boss), upon which it inevitably fails, whether by crashing or by producing incorrect results.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    The C++ standard states that attempting to modify a string literal results in undefined behaviour. Attempting to modify a string literal means attempting to assign to any of its characters, including the null character. What else do you not understand?

    Trying to come up with code that demonstrates why this rule exists has some merit, but the problem is that this is implementation dependent. The main point is that the contents of a string literal may be stored in a location that is read-only. kmdv's most recent point is that as an optimisation, repeated instances of the same constant may be reduced to a single instance.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #11
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by laserlight View Post
    The C++ standard states that attempting to modify a string literal results in undefined behaviour. Attempting to modify a string literal means attempting to assign to any of its characters, including the null character. What else do you not understand?
    yes ok I understand, writing new values to existent data in a string is string modification. But why should we not modify it? It should be safe as long as the string does not go out of bounds.

    even if i had a pointer to a string as such
    Code:
       char *s = "String" 
    
          while(*s!='\0')
          {
    
              *s = 'H' ;
               s++ ;
          }
    that should rewrite the string contents to all 'H'....
    it didn't go out of bounds or access out of bound memory.
    The rule "You should not modify a string" seems to restrictive then, I mean the programmer should know he/she shouldn't go out of bounds or something....
    You ended that sentence with a preposition...Bastard!

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Eman
    But I don't think that is the outcome you want, because you want to prove to me that a write to a string literal will not always give me a seg fault..
    Just accept that as true.

    Quote Originally Posted by Eman
    if Borland doesnt work should I try Dev C++ then? the last of the compiler
    Given enough time and motivation, I can implement a C++ compiler to do precisely that ("a write to a string literal will not always give me a seg fault" (or more generally, crash the program)), and yet conform to the C++ standard. But that would be pointless.

    Quote Originally Posted by Eman
    So a pointer to a string is a literal. But an array of characters or a string object is not a literal.
    No. A pointer to a string is not a string literal. An array of characters might be a string literal since all string literals are arrays of characters. A std::string object is not a string literal.

    Quote Originally Posted by Eman
    But why should we not modify it? It should be safe as long as the string does not go out of bounds.
    It is not safe because an implementation may place the contents of the string literal in memory that is read-only, or do various other things that assume that string literals are immutable. In other words, any write to a string literal is out of bounds, regardless of where you are writing.
    Last edited by laserlight; 12-27-2010 at 01:17 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by laserlight View Post
    No. A pointer to a string is not a string literal. An array of characters might be a string literal since all string literals are arrays of characters.
    if all string literals are an array of characters then..
    why is Str:

    Code:
         string str = "Hello" ;
    not a string literal? It stores an array of characters after all. And yet I know str is an object of type String..

    How do I differentiate both, what a string literal is?
    You ended that sentence with a preposition...Bastard!

  14. #14
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    Ok, I think you should get back to the first reply and read everything again. It has been explained two if not more times here.

    Character Sequences
    How to char* = "string literal" appropri... - C++ Forums

  15. #15
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Eman View Post
    if all string literals are an array of characters then..
    why is Str:

    Code:
         string str = "Hello" ;
    not a string literal? It stores an array of characters after all. And yet I know str is an object of type String..

    How do I differentiate both, what a string literal is?
    Because "a string literal is a sequence of characters surrounded by double quotes". More generally, you are committing converse error: if x is a string literal, then x is an array of characters, but if x is an array of characters, it does not follow that x is a string literal. Actually, worse still: you are not even claiming that every array of characters is a string literal. You are claiming that every object that stores an array of characters is a string literal.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. question about string literal
    By pangzhang in forum C Programming
    Replies: 6
    Last Post: 07-31-2010, 07:25 AM
  2. Polymorphism and generic lists
    By Shibby3 in forum C# Programming
    Replies: 9
    Last Post: 07-26-2010, 05:27 AM
  3. Replies: 60
    Last Post: 05-31-2010, 10:57 AM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM