Thread: When to use char* vs. char[]

  1. #1
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81

    When to use char* vs. char[]

    Hello!

    It's been a more than year since I started learning C, and I can certainly say
    I'm more comfortable with the language than before.

    One thing I've been thinking about recently is when to use char* vs. char[].

    Someone gave me a hint recently that if I ever have a string I know I'll be
    using -- say, char name[] = "John Doe" -- to use char[].

    But I'm still not sure when to use char*. Usually, I name it if I plan to
    allocate memory and a dump character array into it later:

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main() {
    
        char* name;
        char john[] = "John Doe";
    
        name = malloc(100 * sizeof(char*));
    
        strcpy(name, john);
    
        printf("Hello, %s\n", name);
    
        free(name)
    
        return 0;
    }
    Is this the right thinking? Are there other useful patterns?

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    const char * => when you want to store a string literal (in which case you might also declare it static if it is local), or when you have a string parameter whose contents should not be modified, or when you want to iterate over such a string using pointers

    char * => when you want to dynamically allocate memory to store a string, or when you have a string parameter whose contents might be modified, or when you want to iterate over such a string using pointers

    char[N] => when you want a fixed size (or variable length) array to store a string such that you might modify the string's contents

    You might use a const char[N] when you want an array to store a string such that you won't modify its contents, but then you probably should consider if a const char * const will do, or if you really do want an array, e.g., because it is convenient for initialising with a string literal and then using sizeof to compute the string length at compile time.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Thanks, laserlight, as always. A follow-up question:

    char * => when you want to dynamically allocate memory to store a
    string, or when you have a string parameter whose contents might be modified, or
    when you want to iterate over such a string using pointers
    What do you mean by string parameter whose contents might be modified? Are you
    talking about passing a character array to a function?

    Code:
    void trim_string(char*);

  4. #4
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by catacombs View Post
    Thanks, laserlight, as always. A follow-up question:



    What do you mean by string parameter whose contents might be modified? Are you
    talking about passing a character array to a function?

    Code:
    void trim_string(char*);
    Forget that

    Code:
    char *str = "Hello"; /* example 1 */ /*should really be const char *str because modifying the values it points to is naughty */
    is very different to
    Code:
    char str[] = "Hello"; /* example 2 */
    In example 1, str is a pointer to a string literal; i.e. you should not modify the characters that str points to. In example 1, if you later did str[0] = 'h'; or even *str = 'h' then that is naughty.

    In example 2, str is an array that is initialised with the characters 'h', 'e', 'l', 'l', 'o', '\0'. I.e. "Hello" is a string literal that is used to initialise your array (str is an array, not a string literal). You can later do str[0] = 'h' if you want and not run into problems.

    In example 1 you cannot (well, should not) modify things. In example 2 you can

    To clarify, the two examples are different types. Example 1 is a pointer to a string literal and example 2 is an array initialised using a string literal (basically). In example 1 the compiler can choose to put that string in "read only memory" and you should assume that it does. In example 2 the "memory" is read/write.
    Last edited by Hodor; 04-30-2020 at 07:58 AM.

  5. #5
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Thanks, Hodor.

    I get what you're saying about not modifying char*. But what if I don't know the string ahead of time and need to modify it?

    Would it be better to create a buffer -- char buf[2000]; -- strcpy the string literal into that and then modify?

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by catacombs
    What do you mean by string parameter whose contents might be modified? Are you
    talking about passing a character array to a function?
    That's what "parameter" implied, but the argument doesn't have to be an array of char as it could already be a pointer to char.

    Of course, the array or pointer doesn't necessarily have to be associated with a null terminated string, but it is easier to concern yourself with those first.

    Quote Originally Posted by catacombs
    I get what you're saying about not modifying char*.
    No, unless you are sloppy or dealing with an legacy interface that uses char* to point to string literals (e.g., we have the vestiges of that in the standard library), generally you should be able to modify what a char* points to.

    Hodor's example is good in that it is a warning about being sloppy with string literals or when you have to deal with such legacy interface; but it is also bad because that's precisely why you would use const char* instead of char* for string literals: that way the compiler will warn you about trying to modify something that is const, which is what you want when you accidentally try to modify a string literal.

    It is a two-way street: when you're designing the interface for a function that will operate on a string, you should ask yourself if the function might modify the string. If so, you would use a char* for that parameter, and someone else calling your function, seeing that the parameter is a char* will know not to pass a string literal as the corresponding argument. If you use a const char* for that parameter instead, then someone else calling your function will know that it is safe to pass a string literal as the argument; in theory you can cast away constness and modify it anyway, but that would be a bug on your part, not the caller.

    Quote Originally Posted by catacombs
    But what if I don't know the string ahead of time and need to modify it?
    That is what "when you want to dynamically allocate memory to store a string" and "when you want a fixed size (or variable length) array to store a string such that you might modify the string's contents" are for.

    Quote Originally Posted by catacombs
    Would it be better to create a buffer -- char buf[2000]; -- strcpy the string literal into that and then modify?
    If you know that you want to initialise the buffer with a particular initial string value, then you might as well just initialise it with the string literal rather than strcpy.
    Last edited by laserlight; 04-30-2020 at 03:06 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by laserlight View Post
    Hodor's example is good in that it is a warning about being sloppy with string literals or when you have to deal with such legacy interface; but it is also bad because that's precisely why you would use const char* instead of char* for string literals: that way the compiler will warn you about trying to modify something that is const, which is what you want when you accidentally try to modify a string literal.
    Thanks for pointing this out. I was of course being deliberately sloppy to make my point but I should have made that clearer in my post. I rarely see somebody write char *str = "Hello"; (i.e. without the const qualifier) these days... the last time I recall seeing it without the const qualifier was in some Amiga code before C was even standardised.

    To the OP, it occurs to me that there is another thing to be aware of with string literals. As far as I can recall, if you have code like:

    Code:
    void foo(void) { 
        const char *p1 = "Hello";
        /* ... */
    }
    
    void bar(void) { 
        const char *p2 = "Hello";
        /* ... */
    }
    p1 and p2 might point to the same "memory location" (I'm pretty sure it's up to the compiler to decide). Because they're const qualified in the example above, as they should be, it doesn't matter though. Edit: what I mean to say is that if you omit the const qualification and then in foo() or bar() modify the value, say, p2[0] then you're going to end up in an awful mess if p1 and p2 point to the same location (and modifying p2[0] doesn't cause your program to crash of course); i.e. don't do it and always make sure you const qualify pointers to string literals.
    Last edited by Hodor; 04-30-2020 at 09:50 PM.

  8. #8
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Thanks again, Hodor and laserlight. Great explanations. I'll come back if I have other questions.

  9. #9
    Registered User
    Join Date
    Apr 2020
    Location
    Greater Philadelphia
    Posts
    26
    This is a small matter, but I am curious.


    In a program, main() has this line twice:


    printf("key %u not found\n", key);


    A naive compiler might store the format text twice, although it is the same in both cases. Gcc is fairly sophisticated and might recognize
    that it needs to be stored only once. But in case it isn't, one could
    declare


    char keynotfound[]="key %u not found\n";


    and use printf(keynotfound, key);


    This works. It also works with either or both of "static" and "const" in the declaration. Is there a difference? What would be best?


    The declaration:


    char *keynotfound="key %u not found\n";


    also works, but in this case haven't we allocated both the text and
    a pointer to the text? This is unnecessary unless we will
    wish to change the value of the pointer to point to a different
    text.

  10. #10
    Registered User
    Join Date
    Apr 2020
    Location
    Greater Philadelphia
    Posts
    26
    After looking at the assembler code, I seem to be mistaken about this.

    The first example, declaring an array, produces assembler code that does not contain the text as such anywhere. Instead, there is

    movabsq $7935471345745880427, %rax and a similar instruction
    a little later on.

    But with const char *keynotfound="key %u not found\n";

    we see that literal string:

    .section .rdata,"dr"
    .LC0:
    .ascii "key %u not found\12\0"
    .text

    and then:

    leaq .LC0(%rip), %rax

    I'm not at all familiar with this dialect of assember, but can only gather
    that in the first example we have space that must be initialized
    at run time, just as Laserlight explained.

  11. #11
    Registered User
    Join Date
    Dec 2017
    Posts
    1,633
    Quote Originally Posted by Alogon
    After looking at the assembler code
    It's storing the string data as "immediate" data.

    7935471345745880427 decimal is, in hex, with associated ascii-per-byte:

    Code:
    6E 20 75 25 20 79 65 6B
     n     u  %     y  e  k
    which is "key %u n", the first part of the string, backwards, due to being stored little endian.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  12. #12
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by john.c View Post
    It's storing the string data as "immediate" data.

    7935471345745880427 decimal is, in hex, with associated ascii-per-byte:

    Code:
    6E 20 75 25 20 79 65 6B
     n     u  %     y  e  k
    which is "key %u n", the first part of the string, backwards, due to being stored little endian.
    ^ this. I think it's also important to note that if you compile with optimisation (-O1, -O2 etc, including -Os) gcc doesn't store the string as immediate data. E.g. given:
    Code:
    #include <stdio.h>
    
    void foo(void)
    {
        const char *p1 = "key %u not found\n";
        //const char p1[] = "key %u not found\n";
        printf("%s (%p)\n", p1, p1);
    }
    
    void bar(void)
    {
        const char *p2 = "key %u not found\n";
        //const char p2[] = "key %u not found\n";
        printf("%s (%p)\n", p2, p2);
    }
    
    int main(void)
    {
        foo();
        bar();
        return 0;
    }
    Compiled with any level of optimization produces:
    Code:
        .text
        .section    .rodata.str1.1,"aMS",@progbits,1
    .LC0:
        .string    "key %u not found\n"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 09-25-2014, 06:12 AM
  2. Replies: 2
    Last Post: 09-25-2014, 04:03 AM
  3. Replies: 4
    Last Post: 07-24-2012, 10:41 AM
  4. undefined reference to `RunSwmmDll(char*, char*, char*)'
    By amkabaasha in forum C++ Programming
    Replies: 1
    Last Post: 10-31-2011, 12:33 PM
  5. Assigning Const Char*s, Char*s, and Char[]s to wach other
    By Inquirer in forum Linux Programming
    Replies: 1
    Last Post: 04-29-2003, 10:52 PM

Tags for this Thread