Thread: Is char *argv[] a 2-dimensional array?

  1. #1
    Registered User
    Join Date
    Apr 2021
    Posts
    2

    Question Is char *argv[] a 2-dimensional array?

    A rookie here, so please bear with me...

    I often see that the 2nd argument of main() is a pointer of type char to an array of characters argv[]: char *argv[]. I know that arrays are, in fact, a sequential set of bytes in memory, where the name of an array always points to the base address: &arr[0]. But on the other hand, the char *argv[] notation is quite helpful making it possible to loop through each character of the given string: argv[i][j], meaning it's 2-D per se.

    So, my questions are:
    1- Since argv[] is an array, it already points to a certain address in memory, i.e. address of the first element of the array, as shown above. Why bother requesting another pointer to a memory location?
    2- char *argv[] == char **argv? If yes, why? Can you please point me (no pun intended) to a source where I can learn more about this particular "array notation"?

    Thank you!
    Last edited by VictorQuebec; 04-28-2021 at 03:51 PM.

  2. #2
    Registered User
    Join Date
    Apr 2021
    Posts
    25
    1. argv is actually an array of pointers to char (char *), the first of which is at the memory address in the argv variable itself, which is why you're "requesting another pointer". Each of these pointers, in turn, points to the first character of each "argument". This is necessary because the length of each "argument" isn't guaranteed to be less than a huge number, and we obviously don't want to store a bunch of '\0' characters in memory for no reason.
    2. Arrays and pointers are actually different in C, but almost every operation on an array causes array decay, i.e. its implicit conversion to a pointer to its type that points to its first element. Google is your friend for this kind of question!

  3. #3
    Registered User
    Join Date
    Apr 2021
    Posts
    139
    There are two different ways to "do" an array of strings. They can look the same, but they're not the same way down at the bottom.

    The first way is to declare a 2-d array of some fixed size buffers:
    Code:
         
    typedef char str80[80];   // type "str80" is an array of 80 char's
    
    typedef str80  str2d[10];  // type "str2d" is an array of 10 str80's
    If I declare a variable of type str2d, then the compiler will allocate a single block of memory with 10 * 80 = 800 bytes of storage in a single spot.

    Code:
    str2d   blob_of_text;
    There are no pointers in this single block. Instead, you can use the "pointer decay" rule of C: when an array is mentioned without an index, it "decays" into the a pointer to the elements of the array.

    This is recursive: if an array is indexed, but the elements of the indexed array are actually arrays themselves, that is the same as mentioning-without-index the interior array.

    So if you say blob_of_text[3], that value will "decay" into a pointer to str80[0], which is to say a pointer-to-char.

    This enables you to do things like calling strcpy(blob_of_text[3], "foo") and having it work transparently. But you can also do 2-d array things, like access blob_of_text[2][10] and get a char value back.

    This is not how argv works.

    The second way to do lists of strings, is to create a 1-d array of pointers-to-char. That would look something like:
    Code:
    char *argv[10];
    Or dynamically, you could do:
    Code:
    char **argv = malloc(10 * sizeof (*argv));
    Now you don't have any actual characters. All you have are pointers to characters. So you will have to go find some characters someplace else. On 32-bit systems, pointers are usually 32 bits, or 4 bytes, so this would be a block of 4 * 10 = 40 bytes of memory. On 64-bits systems, 80 bytes.

    Once you find them, you can set your pointers to point to them. This is how command shells (like /bin/bash) usually create the argv array they pass to the exec() system call:

    Code:
        #define EOS '\0'        // EOS = end-of-string
    
        char * input_buffer = "ls -al foo.c";   // get input from user
    
        size_t argc = 1;        // argv always ends with a NULL pointer
        char **argv = NULL;
     
        for (char * p = input_buffer; *p != EOS; ++p) {
            while (*p != EOS && isspace(*p))
                ++p;
    
            if (*p == EOS) break;
    
            ++argc;
            argv = realloc(argv, argc * sizeof argv[0]);
            argv[argc - 1] = p;
    
            while (*p != EOS && !isspace(*p))
               ++p;
    
            *p++ = EOS;  // replace space with EOS at end of word
        }
    
        argv[argc] = NULL; // argv always ends with a NULL pointer
    
        // ... get ready ...
        execvp(argv[0], argv, ...);
    Back in the day, operating systems like MSDOS and MacOS that didn't use virtual memory (or early Windows, I think) would just give you a bunch of pointers into the command processor's address space. Because the argv array came from splitting up words in the buffer, the pointers were nicely monotonic, each one a few bytes higher than the last.

    Now, who knows what you'll get? (But if you print 'em out, don't be surprised...)

    This means you (might) have a single array of chars in one place with the actual argument words in it. And then you have an array of pointers in a different place, with nothing but addresses in it.

    In this scenario, there are two (or more- the individual words of argv may not all be together) blobs of memory, and it's important to keep them distinct in your mind.

    But!

    The argv pointer is still an array-of-pointers-to-char. Which means if you just say "argv" it will decay into a pointer-to-pointers-to-char.

    And if you index argv[0], you'll get a pointer to char, which you can subsequently index, like argv[0][12] to get a single character value.

    It's weird, but this is two different types that can be indexed using the identical syntax to yield an identical result.

    (Seriously: if you go to godbolt.org and type in two different expressions and look at the generated assembly, you will see the different computations.)

    When the type is a 2-d array, the compiler just does a multiply: argv[2][10] becomes argv + 2 * sizeof(str80) + 10 * sizeof(char).

    When the type is a 1-d array of pointers, the compiler does a multiply, a lookup, and then another multiply: argv[2][10] becomes "lookup argv + 2 * sizeof(char *)", then add that to 10 * sizeof(char).

    Finally, note that a 1-d array with no specified dimension, like "char * argv[]" is the same parameter type as a pointer, because the array when passed to the function will decay to a pointer. That means that these two declarations are the same:
    Code:
    main(int argc, char *argv[]) {}
    main(int argc, char **argv) {}
    Last edited by aghast; 04-29-2021 at 12:57 AM. Reason: Clarification, plus bugfix

  4. #4
    Registered User
    Join Date
    Apr 2021
    Posts
    2
    Thank you very much for a very detailed explanation indeed! I had to read it several times, experiment and dig some more before I could get your point. I'd like to summarise in layman terms what I've actually understood, focusing on the WHY aspect of using the double pointers. Sorry, if it sounds cryptic but I'll be happy to hear from you, guys. It all started with **argv but the idea may be applied to any other double pointers:

    • string literals in C are constants, not variables. That's why one should use a double pointer to use the same pointer with a different string literal. See my inline comments:

      Code:
      void func(char** outptr)
      {
      // set the address of the first element of pointer (address of 'A' in "AFTER" -> &outptr[0]) // as a value of inptr, hence making the latter to point to a different location in memory *outptr = "AFTER";
      } int main() {
      char* inptr = "BEFORE"; printf("Before processing in func():\ninptr address: %p \t inptr value: %s\t address of inptr[0]: %p\n", &inptr, inptr, &inptr[0]); // setting the address of pointer inptr as a function argument requires the latter be of type double pointer, as pointer itself is a placeholder for address func(&inptr); // compare with the first printf() statement and note that strings and the addresses of their first element are different now printf("After processing in func():\ninptr address: %p \t inptr value: %s\t address of inptr[0]: %p\n", &inptr, inptr, &inptr[0]); return 0;
      }
    • strings may reside in completely different memory locations, even though pointer IDs, as well as their addresses and values are identical. Because there is a difference between the actual address of the pointer in memory and its content (also an address but of the object/data it points to):

      Code:
      const char* ptr = "this is my text";
      ptr = "this is my text";


    PS: By the way, what does happen to a string when the pointer doesn't point to its memory location anymore? Like in the above snippets. Does it become "garbage" in memory? Thank you!
    Last edited by VictorQuebec; 05-07-2021 at 03:42 PM.

  5. #5
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    string literals in C are constants, not variables.
    Correct.

    That's why one should use a double pointer to use the same pointer with a different string literal.
    Sort of. But this has nothing to do with the arguments to main(), since the arguments of main() are not constants, they are variables.

    And you need to realize that C passes arguments by value. So if you want to change where a pointer points you need to pass the address by a pointer. To change the value of a parameter you need to pass the value by pointer.

    strings may reside in completely different memory locations, even though pointer IDs, as well as their addresses and values are identical. Because there is a difference between the actual address of the pointer in memory and its content (also an address but of the object/data it points to):
    I really don't understand what you're trying to say here.

    By the way your printf() statements are incorrect. The %p format specifier expects a pointer to void so you really need to cast the matching parameter to a (void*).

  6. #6
    Registered User
    Join Date
    Apr 2021
    Posts
    139
    string literals in C are constants, not variables. That's why one should use a double pointer to use the same pointer with a different string literal.
    Sorry, but this is false and wrong.

    First:
    § 6.4.5 String literals
    § 6.4.5(6) In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. (*79) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For
    character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. ... [different types] ... The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.

    (7) It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

    (*79) A string literal need not be a string (see 7.1.1), because a null character may be embedded in it by a \0 escape sequence.

    Next: quoted strings in C come in a few flavors. One flavor is a string literal used to initialize a declared array. When so used, the quoted string is translated into an array initializer and "disappears". Another flavor is a quoted literal used to initialize a pointer. This becomes an "anonymous array" someplace in memory, and is pointed to by the pointer. Note the following:
    § 6.7.9(14) An array of character type may be initialized by a character string literal or UTF–8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

    And later:
    § 6.7.9(32) EXAMPLE 8 The declaration
    Code:
    char s[] = "abc", t[3] = "abc";
    defines “plain” char array objects s and t whose elements are initialized with character string literals. This declaration is identical to
    Code:
    char s[] = { ’a’, ’b’, ’c’, ’\0’ },
         t[] = { ’a’, ’b’, ’c’ };
    The contents of the arrays are modifiable. On the other hand, the declaration
    Code:
    char *p = "abc";
    defines p with type “pointer to char” and initializes it to point to an object with type “array of char” with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
    In my experience, most desktop compilers put the quoted strings into memory somewhere, and you're able to modify them with no ill effect. This is a legacy of pre-ANSI C, where literal strings were sometimes used as cheap buffers. The rogue-like game "Larn" used this to hide/reveal information like details of inventory items and names of spells. Also, at least one system utility on Sun OS used literals with blank spaces as a buffer, and just called sprintf to write values into the strings before displaying them. (Again, all pre-ANSI. But I'll bet cash money they still work!)

    For reasons that baffle me, GCC has a warning for writing to string literals but it is not included in -Wall. Because of course -Wall is really -Wmany...

    So string literals in C are not const. But modifying them is "undefined behavior." So undefined that it just works most of the time.

    Let's move on to your second sentence. ;-)

    The reason to use a double pointer to change the string literals that you are referencing is simply because you are passing a pointer to something you wish to modify. If you chose to return a replacement value and assign it in the caller, the extra layer of pointer would not be necessary:

    Code:
        void 
        change_my_pointer(
            char ** p, 
            int rv)
        {
            char * result;
    
            if (rv & 1) {
                result = "BEFORE";
            } else {
                result = "AFTER";
            }
        
            *p = result;
        }
    
        char * 
        new_value_for_pointer(
            int rv)
        {
            char * result;
    
            if (rv & 1) {
                result = "BEFORE";
            } else {
                result = "AFTER";
            }
    
            return result;
        }
    
        void
        caller(void)
        {
            char * p = "abc";
    
            rv = random_value();
    
            change_my_pointer(&p, rv);
    
            // vs
    
            p = new_value_for_pointer(rv);
        }
    strings may reside in completely different memory locations, even though pointer IDs, as well as their addresses and values are identical. Because there is a difference between the actual address of the pointer in memory and its content (also an address but of the object/data it points to):
    I think you got this right. I'm not 100% sure what you're saying vis-a-vis "pointer IDs", but different string literals definitely reside in different memory locations. (Note that "identical" string literals may or may not reside in different memory locations, at the whim of the compiler. This is part of the problem with __FILE__ versus __func__ in C (& C++). With __FILE__ the committee declared that it would expand to a literal. (Which was good, because you could utilize it in the preprocessor as well as the compiler.) But some compilers were generating multiple copies of the literal in their compiled output, taking up too much space. With __func__ (added later) they declared that it expands to a "predefined identifier", and so it is only available at run-time.)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Read char by char into two-dimensional array pointer.
    By 'Serj Codito in forum C++ Programming
    Replies: 5
    Last Post: 09-08-2012, 09:47 AM
  2. Initialize 2 dimensional char array
    By skiabox in forum C Programming
    Replies: 4
    Last Post: 11-01-2010, 06:59 AM
  3. Two dimensional char array
    By eurus in forum C Programming
    Replies: 17
    Last Post: 01-19-2006, 12:06 AM
  4. 2 dimensional char array question
    By knutso in forum C++ Programming
    Replies: 4
    Last Post: 12-05-2002, 06:28 AM
  5. initialize 2 dimensional char array
    By meka in forum C++ Programming
    Replies: 3
    Last Post: 12-07-2001, 02:13 PM

Tags for this Thread