Thread: string literals vs c-strings

  1. #1
    Registered User axon's Avatar
    Join Date
    Feb 2003
    Posts
    2,572

    string literals vs c-strings

    hello all,

    I'm in the final four weeks of my first c++ class, and as of yet we have not touched string literals. We have, about a month ago, gone over cstrings. I picked up a book that was recommended in this forum, "Accelerated C++" and they use string literals right from the start of the book.

    Can someone tell me what and where one use a string literal and where a cstring? does a string literal end with a NULL? which type is preferred and could they be used interchangeably?

    thanks,

    axon

    some entropy with that sink? entropysink.com

    there are two cardinal sins from which all others spring: Impatience and Laziness. - franz kafka

  2. #2
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    A cstring (C string) is simply a char array that contains an element with value \0.

    This creates an array, size 7, and puts the name hammer into it:
    >>char name[7] = "hammer";

    The name hammer is obviously 6 characters long, the seventh byte is used to hold the \0. When you declare an array in this method, the compiler will add the \0, you don't need to do it yourself.

    You can also do this, which creates the variable with no default content, then copies in the name at a later date:

    >>char name[7];
    >>strcpy(name, "hammer");

    Just remember to ensure there is enough room in the array to hold what you're putting in it.

    A string literal is any piece of text in your code that is contained within quotes, like this: "hammer". It actually turns out to be a char array containing data, terminated with \0. This is like the example above, but with subtle differences:

    - You cannot (safely) modify a string literal at run time.
    - String literals have a static storage class.

    You could use one like this:

    >>char *p = "hammer";

    Here we have created a pointer to a string literal.

    There are also C++ strings, accessed via header <string>. These are created like so:

    >>std::string s;

    C++ strings are much more versatile and safer than C strings for general text storage.

    There's a couple of samples in the FAQ, for example this one shows a C string and C++ string in use.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  3. #3
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    You've got some terminology problems. There are:

    1)string literals: "Hello world."

    2)cstyle strings which end in \0:

    char text[]="Hello World.";

    The size of the array will be created to accomodate a \0 which is automatically added to the end of the string literal.

    3)string types: you need to include <string>

  4. #4
    jasondoucette.com JasonD's Avatar
    Join Date
    Mar 2003
    Posts
    278
    String literals are never explained very well. You have to think of a string literal as a place in memory in which the string (including the NULL character) is stored that cannot be changed. A similar type of memory is allocated (and initialized at the start of the program) for each string literal that you use.

    Now, when you use the string literal in your program, the compiler thinks of it as a pointer to that string literal. Take a look at this thread, which explains one person's problem with using string literals, and a decent explanation of what went wrong - you should learn enough here to fully understand what is goind on:
    char* types and cstrings thread on devshed

  5. #5
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    "String literals are never explained very well. You have to think of a string literal as a place in memory in which the string (including the NULL character) is stored that cannot be changed."

    Thanks JasonD. I didn't know a string literal was also stored with a \0.

  6. #6
    jasondoucette.com JasonD's Avatar
    Join Date
    Mar 2003
    Posts
    278
    7stud, your program example shows off what string literals are fairly well, just by analyzing the code.

    To think about it further, consider this:
    Code:
    char *s = "Hello World!";
    cout << s;
    cout << "Hello World!";
    As far as the compiler is concerned, you just passed the SAME pointer to the null terminated string "Hello World!" both times. In fact, the first statement initializes s to equal the pointer to whereever this string is stored - it does NOT allocate any memory for it. The memory for string literals is allocated and initialized at the start of the program, before any of your code is run. So, let's say that p points to the string literal "Hello World!", then:
    Code:
    char *s = "Hello World!";
    char *s = p;
    The above two lines produce the exact same results.

  7. #7
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    Thanks for the great explanation.

  8. #8
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>You've got some terminology problems.
    Was that aimed at my post, or axon's?

    Originally posted by 7stud
    Thanks JasonD. I didn't know a string literal was also stored with a \0.


    Originally posted by Hammer
    A string literal is any piece of text in your code that is contained within quotes, like this: "hammer". It actually turns out to be a char array containing data, terminated with \0. This is like the example above, but with subtle differences
    char *s = "Hello World!";
    char *s = p;

    The above two lines produce the exact same results.
    They don't necessarily produce the same results. The combining of string literals that are the same into one string literal is implementation defined, so it is possible that in your first line of code, s isn't assigned the same memory address as p. A minor point, but one to keep in mind.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  9. #9
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    "Was that aimed at my post"

    No, but I'll fight you anyway.
    So, let's say that p points to the string literal "Hello World!", then:

    char *s = "Hello World!";
    char *s = p;

    The above two lines produce the exact same results.

    "They don't necessarily produce the same results. The combining of string literals that are the same into one string literal is implementation defined,"
    I don't understand that because I don't see how string literals are being combined.

  10. #10
    jasondoucette.com JasonD's Avatar
    Join Date
    Mar 2003
    Posts
    278
    Originally posted by Hammer
    They don't necessarily produce the same results. The combining of string literals that are the same into one string literal is implementation defined, so it is possible that in your first line of code, s isn't assigned the same memory address as p. A minor point, but one to keep in mind.
    I see... so it could, in fact, be the same thing as having two different string literals, each stored in their own location in memory (even though they are the same string)? I guess a really stupid compiler could do that - thanks for the input. I guess you should never assume that the pointers would be the same. All I was really attempting to explain was the fact that when you refer to a string literal, you could just replace that with a pointer to where it is stored every time, since this is all the compiler does. Many people never grasp this concept because it is never taught.

  11. #11
    jasondoucette.com JasonD's Avatar
    Join Date
    Mar 2003
    Posts
    278
    Originally posted by 7stud
    "Was that aimed at my post"

    No, but I'll fight you anyway.

    I don't understand that because I don't see how string literals are being combined.
    I didn't understand him at first, either. But he means this:
    Code:
    char *p1 = "Hello World!"
    char *p2 = "Hello World!"
    For some compilers, it does not check to see that it is the SAME string, and therefore stores the string twice in memory (with a /0 at the end), and therefore p1 != p2. MOST compilers are smart, and will store it only once, and therefore p1 == p2 (which is exactly why string literals are const, since if you change it, you may be changing the string that some other char* variable points to in some other place in the code. This is why you should get a run-time error if you attempt to dereference p1 or p2, and change any characters.)

  12. #12
    Registered User
    Join Date
    Jan 2003
    Posts
    311
    Originally posted by JasonD
    which is exactly why string literals are const, since if you change it, you may be changing the string that some other char* variable points to in some other place in the code. This is why you should get a run-time error if you attempt to dereference p1 or p2, and change any characters.
    It think you mean "if there were any justice in the world, this is how they would work" You also have been a good boy and not tried to mess with the literals, and so have not noticed the gawdawful truth. Litterals pre-date const by many years, it would simply have broken far too much code to make them const as god intended.
    strcat("Big"," Trouble");
    will be happily accepted by any compiler on the planet, though if you listen carefully you can here the compiler chuckle quitetly to itself.

    argv of int main(int argc, char *argv[]) fame also lives in a very nasty neighborhood, even though we get no warning except perhaps a footnote to an addendum to the manual discussing how to count bolivian pennies.

    Note also that litterals are allowed to live in portions of memory that are marked read-only, such as code pages. The problem with modifing them goes beyond changing someone elses string. You will most likely GPF/seg fault, if you don't you can unexpectedly self modify code, very nasty. Another popular location is wherever vtables are stored.

  13. #13
    jasondoucette.com JasonD's Avatar
    Join Date
    Mar 2003
    Posts
    278
    Originally posted by grib
    You also have been a good boy and not tried to mess with the literals, and so have not noticed the gawdawful truth. Litterals pre-date const by many years, it would simply have broken far too much code to make them const as god intended.
    strcat("Big"," Trouble");
    will be happily accepted by any compiler on the planet
    Actually, I have not been a 'good boy'. I undestand that literals pre-date const, which is why compilers provide backwards compatibility to allow string literals (which should be const char*) to be passed into functions that requite just char* (i.e. no const). This is why I stated you would get a run-time error if you attempted to modify a string literal (and not a compile time error). Experimentation helps understanding a lot - never just assume what you read is correct.

    However, I did assume that the string literals would always be stored in a read-only portion of memory (on systems that allow such things), which implies that you will always get a run-time error when attempting to modify them. You seem to say that this is not the case, but perhaps you meant this only for systems that do not provide read-only memory. Please confirm.

  14. #14
    Registered User
    Join Date
    Jan 2003
    Posts
    311
    Originally posted by JasonD

    However, I did assume that the string literals would always be stored in a read-only portion of memory (on systems that allow such things), which implies that you will always get a run-time error when attempting to modify them. You seem to say that this is not the case, but perhaps you meant this only for systems that do not provide read-only memory. Please confirm.
    I don't have the standard here with me, but I am fairly certian that this is an implementation-defined thing, just like folding duplicates. Given that it's easyer to put the strings in read only memory and makes things faster I would expect this, but I have seen compilers to some crazy things. Turbo C++ in the old days used to have a clever hack in it's math lib, even for protected mode programms. In these coal-powered days many people had 386's with no math co-processor. Rather than write each operation twice and go through an indirect pointer every math operation modified itself to the prefered version. This ment that you may have had a writeable code page depending on whether or not you called sqrt.

    I am a big fan of experimentation but getting a GPF is a lot like having your airbag go off. I am fairly certain that strcat("a"."b") invokes undefined behavior, most likely an illegal attempted write of read only memory, but this could also mean silently writing a '\0' over the first byte of foo's vtable, or cause the famous deamons to fly out of your nose.

    However, litterals living in read only memory is probably a fairly safe assumption.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++ ini file reader problems
    By guitarist809 in forum C++ Programming
    Replies: 7
    Last Post: 09-04-2008, 06:02 AM
  2. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  3. String issues
    By The_professor in forum C++ Programming
    Replies: 7
    Last Post: 06-12-2007, 09:11 AM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Again Character Count, Word Count and String Search
    By client in forum C Programming
    Replies: 2
    Last Post: 05-09-2002, 11:40 AM