Thread: Assigning a pointer to a string literal - undefined behavior?

  1. #1
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658

    Assigning a pointer to a string literal - undefined behavior?

    In another forum, this example code fragment was stated as being an example of undefined behavior. My understanding is that a literal string exists from program start to program termination, so I don't see the issue, even though the literal string is probably in a different part of memory.

    Code:
    /* ... */
        const char *pstr = "example";
    /* or even */
        char *pstr = "example";
    /* as long as no attempt is made to modify the data pointed to by pstr, */
    /* unless pstr is later changed to point to a stack or heap based string */
    Last edited by rcgldr; 09-13-2013 at 04:59 PM.

  2. #2
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    It's undefined because a string literal is "const char*", and maybe stored in read-only memory.
    You see:
    Code:
    char *pstr = "example";
    is different to:
    Code:
    char pstr[] = "example";
    The latter puts "example" string in the new, modifiable array "pstr", while the former is a mere pointer to the original literal string.
    Devoted my life to programming...

  3. #3
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    The code you've shown doesn't have undefined behaviour (although the definition of pstr on line 4 is an anachronism).

    Any code that attempts to modify the data pointed to by pstr (assuming the definition on line 4) such as
    Code:
       pstr[0] = '1';
    will result in undefined behaviour for the reasons that GReaper mentioned.

    The definition on line 2 is fine. Any attempt to modify the data pointed to by that version of pstr will result in a compilation error. Which is actually what you want if you try to modify that data.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  4. #4
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    In the future, please point/link us to whatever external references you mention (i.e. the other forum).
    EDIT: Upon first reading, I thought I would need more context. Turns out I didn't, but more info/context rarely hurts.

    Technically, those are not assignments, but initializations (since you declare the variable and give it an initial value in one line). On their own, neither of those statements results in undefined behavior.

    This should give you a good idea: Question 1.32
    Last edited by anduril462; 09-13-2013 at 05:29 PM.

  5. #5
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by anduril462 View Post
    In the future, please point/link us to whatever external references you mention (i.e. the other forum).
    Here is a link to the specific post within a thread:

    C: "warning assignment makes integer from pointer without a cast"

  6. #6
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    Quote Originally Posted by GReaper View Post
    It's undefined because a string literal is "const char*"
    The string literal "example" is a 'char[8]', not a 'const char *'.

  7. #7
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    Quote Originally Posted by Barney McGrew View Post
    The string literal "example" is a 'char[8]', not a 'const char *'.
    It really makes you wonder why they didn't define it as const and be done with it...
    Devoted my life to programming...

  8. #8
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by GReaper View Post
    It really makes you wonder why they didn't define it as const and be done with it...
    The reason is historical. Early versions of C did not support the concept of "const" (or the const keyword for that matter). Then, because of lobbying by lazy programmers, a rule was added which made the construct
    Code:
        char *p = "Hello";
         /*   pointer operations on p */
    valid as an alternative to
    Code:
        char pstr[] = "Hello";
        char *p = pstr;
         /*   pointer operations on p which are invalid on pstr */
    When the const keyword was eventually introduced (formally in the 1989 C standard, but there were some earlier compilers that supported it as an extension) there was enough legacy code reliant on this feature that removing it was not considered an option.

    The standardisation committee can only add or remove features like this based on a vote. The removal of this feature has yet to come up as enough of a priority to be voted on (let alone have enough people involved willing to vote it out, since a LOT of clients of members on the committee have large code bases relying on this feature, and will apply pressure to keep it).


    Such is the politics of standardisation. Bear in mind the old saying "A camel is a horse designed by committee".
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  9. #9
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    From the C89 standard, section 3.1.4:

    A character string literal has static storage duration and type "array of char", and is initialized with the given characters. A wide string literal has static storage duration and type "array of wchar_t", and is initialized with the wide characters corresponding to the given multibyte characters. ...

    Identical string literals of either form need not be distinct. If the program attempts to modify a string literal of either form, the behavior is undefined.


    So it would seem that only an attempt to modify a string is undefined, not the usage of a pointer to access a literal string. The type "array of char" or "array of wchar_t" would be the same regardless of where the string literal was stored (ROM, RAM, code section of a program, ... ).

  10. #10
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    I agree with your conclusion rcgldr.

    There's an even clearer bit in the C11 draft standard (might well appear in earlier versions too):

    6.7.9 Initialization
    ...
    32
    EXAMPLE 8
    Committee Draft — April 12, 2011
    N1570
    The declaration
    Code:
    char s[] = "abc", t[3] = "abc";
    defines ‘‘plain’’ char array objects s and t whose elements are initialized with character string literals.
    This declaration is identical to
    Code:
    char s[] = { 'a', 'b', 'c', '\0' },
    t[] = { 'a', 'b', 'c' };
    The contents of the arrays are modifiable. On the other hand, the declaration
    Code:
    char *p = "abc";
    defines p with type ‘‘pointer to char’’ and initializes it to point to an object with type ‘‘array of char’’
    with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to
    modify the contents of the array, the behavior is undefined.
    There's nothing illegal about either statement in your original post. The second is bad because a compiler might let you try to modify it, whereas it should stop you if you do const char*. It's fairly common to see the latter even in new code, for a number of possible reasons:
    • Perhaps because people get confused between const * char s (constant pointer, modifiable chars) and const char *c (modifiable pointer, constant chars) and decide not to confuse themselves.
    • More likely because people really learn a language by using it, and you're never forced to use const out of necessity.
    • Whenever const is taught, it's nearly invariably followed by explaining that you can modify a const through a non-const pointer. Which leaves the programmer thinking "const isn't very useful".
    • Laziness.
    • Using char* to create modifiable strings on a compiler where it's explicitly legal or just happens to work (e.g. z/VM V6R2.0 Information Center (April 2012)) and old gcc -fwriteable-strings, removed from gcc 4). This is a Bad Thing - why not just use a legal method instead...


    Be careful about thinking that an unmodifiable constant is "in ROM" thinking of "ROM" as a type of memory. It might be. More likely under on OS is that it's in RAM but marked as not writable, and the OS will kill the program if it tries (e.g. segfault). Or it could be in modifiable RAM, but merged with another string. E.g. "there" could point to halfway through "hello there", so if you modified one you'd modify the other. It could permanently corrupt the data section of the executable.
    Last edited by smokeyangel; 09-14-2013 at 02:55 PM.

  11. #11
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Quote Originally Posted by smokeyangel View Post
    • perhaps because people get confused between const * char s (constant pointer, modifiable chars) and const char *c (modifiable pointer, constant chars) and decide not to confuse themselves.
    You mean between char * const s and const char *c (or, equivalently, char const *c).
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  12. #12
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    Indeed I do, thanks.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Is this not undefined behavior?
    By Syscal in forum C Programming
    Replies: 6
    Last Post: 07-15-2013, 01:07 AM
  2. General question about undefined behavior
    By kjwilliams in forum C Programming
    Replies: 46
    Last Post: 06-18-2013, 01:51 PM
  3. Undefined behavior
    By jim mcnamara in forum C Programming
    Replies: 2
    Last Post: 02-18-2013, 11:14 PM
  4. Is x=x++; Undefined Behavior?
    By envec83 in forum C Programming
    Replies: 5
    Last Post: 10-04-2011, 01:27 AM
  5. Undefined behavior from VC6 to 2k5
    By m37h0d in forum C++ Programming
    Replies: 10
    Last Post: 06-22-2011, 07:56 PM