argv and its "formal" type

**yuklair** · 04-18-2012

The standard variable "argv" in the main function is commonly declared as "char **argv" or "char *argv[]".

However upon close scrutiny I noticed some semantic incompleteness.
"char **argv" inherently only expresses a pointer to a pointer to a single char variable and neglects to mention the full data that really is associated with the variable.
"char *argv[]" inherently only expresses an array of pointers that point to a single char variable.

I see the true type of argv as an array of pointers to an array of char variables, so a more descriptive type might be as shown here:

Code:

#include<stdio.h>
int
main(int ac,char (*av[])[]){
  puts(*av[0]); //expected; prints first argument
  return(0);
}

But if I wanted to assign this variable, I'd think that I'd need to obtain its address and store that:

Code:

#include<stdio.h>
char (*(*argv)[])[];
int
main(int ac,char (*av[])[]){
  //  av is              an array of pointers to an array of chars
  //argv is a pointer to an array of pointers to an array of chars
  argv=&av; //incompatible types
  puts(*(*argv)[0]); //undefined
  return(0);
}

I did notice that the following works:

Code:

#include<stdio.h>
char (*(*argv)[])[];
int
main(int ac,char (*av[])[]){
  //  av is              an array of pointers to an array of chars
  //argv is a pointer to an array of pointers to an array of chars
  argv=av; //still incompatible types
  argv=&av[0]; //same as above
  puts(*(*argv)[0]); //expected
  return(0);
}

Can somebody assist in the rationalization of this? Thanks for any help.

**cas** · 04-18-2012

In a function declaration, char*[] and char** are identical. They both mean pointer to pointer to char.

Ok. So:

I see the true type of argv as an array of pointers to an array of char variables

This is the primary source of your problem. argv is not an array of pointers to an array of char. It's a pointer to pointer to char. Whether you like that or not, that's what it is. You don't have to like the fact that a char* can point to a string of characters, but it can. Keep in mind that an array is not a pointer. You cannot interpret a pointer to a pointer as a pointer to an array and expect it to work.

In C there is one level of array “decay”, so an array of T can generally be treated as a pointer to T. As as result, the best you can do is think of argv as an array of char*. You just cannot treat it as an array of array because it is not one.

**memcpy** · 04-18-2012

> Can somebody assist in the rationalization of this?
Yes, I can help explain it. :P

A double pointer is, in its most basic (value) form, an integer that points to a location in memory. At the second location, you will find another value that points to another location in memory. Provided a variable declared as this:

Code:

char value;
char *sptr = &value;
char **dptr = &sptr;

The memory will look something like this:

	dptr	*dptr / sptr	**dptr / value
address of (&var)	0x7FFFFFFFF	0x80000000	0x60000000
points to (*var)	0x80000000	0x60000000	(n/a)

However, this system can also be treated as an array, using pointer arithmetic.

Code:

char *sptr[10];
char **dptr = (char **)sptr;

The initial access of (*dptr) will act like a normal access to (sptr). But, when you increment dptr, it will point to the next pointer in the array of sptr.

You could get the same effect with either of the following:

Code:

char **dptr;

char *dptr[10];

The one exception being:

Code:

char dptr[10][10];

As this is laid out as a linear array, instead of using pointer arithmetic to find the columns.

**camel-man** · 04-18-2012

Memcpy, Can you declare char **argv, as char argv[][]?

**memcpy** · 04-18-2012

Originally Posted by camel-man

Memcpy, Can you declare char **argv, as char argv[][]?

I'm not entirely sure, but I remember reading that "char argv[][]" is invalid syntax, as (per c99 standard) 2d arrays in function declarations are required to have at least one constant value.

Either way, it's worth a try (I can't test right now).

**yuklair** · 04-19-2012

Thanks for all your replies.
After some more analysis, I have reached a satisfactory conclusion.

The C99 standard dictates that argv in main be declared as "char *argv[]", but happens to be equivalent to "char **argv". First off I consider which of the two would be more "descriptive":
The former implies only a single char involved.
The latter implies an array of pointers, that each still only imply a single char, so neither types are "complete".
One problem with the latter type, which I found the hard way as described in OP, is declaring a pointer to argv using similar syntax; it is simply wrong as argv is not an array.
The "char **argv" type is more accurate in this case, and in general; the [] in argument declarations causes nothing but confusion.

However the standard dictates nothing about defining another, more "descriptive" variable to store main's argv and to be used in its place.
To help understand the problem, I make a diagram of the data that exists, regardless of how argv is declared. Let's consider an instance that had two command-line arguments "name" and "arg":

Code:

[ n][ a][ m][ e][\0][ a][ r][ g][\0][\0]
 10  11  12  13  14  15  16  17  18  19

[10][15][19]
 30  31  32

[30]
 50

What we see clearly is arrays of data, which main's argv completely neglects.

In this scenario the program sees a value of 30 at address 50, which would be main's argv declared as "char **argv".
What is the value at address 50 really pointing to though?
The top row of data is an array of chars, or "char []".
The middle row of data is an array of pointers to an array of chars, or "char (*[])[]".
The bottom datum is a pointer to an array of pointers to an array of chars, or "char (*(*)[])[]".

Therefore:

Code:

#include<stdio.h>
char (*(*foo)[])[];
int
main(int argc,char **argv){
  foo=(void *)argv;
  puts(*(*foo)[0]);
  puts(*(*foo)[1]);
  return(0);
}

When evaluating an array type (the arguments in the puts functions are each an array type), C automatically handles it as an argument to the first member so there is no problem with using it directly in functions that expect "char *", and at the same time we get a much more descriptive type than just "char **".

Incidentally, if you wanted to set foo to argv's second member (for instance, if you wanted to ignore the first command-line argument):

Code:

#include<stdio.h>
char (*(*foo)[])[];
int
main(int argc,char **argv){
  foo=(void *)&(*(void *(*)[])argv)[1]; //no need for full type in a typecast that's to be typecasted to a general pointer
  puts(*(*foo)[0]);
  puts(*(*foo)[1]);
  return(0);
}

Compare THAT to simply char **argv. :P

**dmh2000** · 04-19-2012

Except your conclusion is incorrect. argv IS an array of pointers to char. that is, in memory, there is an array of pointers, the first being argv[0], the second argv[1] etc. you are showing the data that argv[0] points to. but if you examine the memory at the location referenced by 'argv' by itself, you will see an array of pointers to 'name' and 'arg'.

**dmh2000** · 04-19-2012

Code:

int main(int argc,char *argv[])
{
	printf("%p\n",argv);		// location of argv
	printf("%p\n",argv[0]);		// contents of argv[0] (points to name of program)
	printf("%p\n",argv[1]);		// contents of argv[1] (points to argument 'name')
	printf("%p\n",argv[2]);		// contents of argv[2] (points to argument 'arg')
	printf("%s\n",argv[0]);		// %s dereferences the pointer in argv[0] and prints the name of the program
	printf("%s\n",argv[1]);		// %s dereferences the pointer in argv[1] and prints 'name'
	printf("%s\n",argv[2]);		// %s dereferences the pointer in argv[2] and prints 'arg'
	return 0;
}

program output (with comments)

Code:

00423380     location of the argv array
00423390     contents of first element of argv
004233AE     contents of second element of argv
004233B3     contents of third element of argv
e:\home\dh0072\x1.exe
name
arg

memory dump

Code:

               argv[0]        argv[1]       argv[2]
x00423380  [90 33 42 00] [ae 33 42 00] [b3 33 42 00] 00 00 00 00  .3B.®3B..3B.....
0x00423390  65 3a 5c 68 6f 6d 65 5c 64 68 30 30 37 32 5c 78  e:\home\dh0072\x
0x004233A0  5c 44 65 62 75 67 5c 78 31 2e 65 78 65 00 6e 61  \Debug\x1.exe.na
0x004233B0  6d 65 00 61 72 67 00 fd fd fd fd ab ab ab ab ab  me.arg.

**whiteflags** · 04-19-2012

OP, you should read this. Question 6.3

**yuklair** · 04-19-2012

Typo in my post:

Originally Posted by yuklair

When evaluating an array type (the arguments in the puts functions are each an array type), C automatically handles it as a pointer to the first member so there is no problem with using it directly in functions that expect "char *"

--

Originally Posted by dmh2000

argv IS an array of pointers to char.

Case 1:
foo IS an array of pointers to char.

Code:

#include<stdio.h>
char *foo[2]={"foo","bar"};
int
main(int argc,char **argv){
  char *(*pointer)[]=&foo;
  printf("%s %s\n",(*pointer)[0],(*pointer)[1]);
  return(0);
}

Case 2:
argv IS an array of pointers to char.

Code:

#include<stdio.h>
int
main(int argc,char **argv){
  char *(*pointer)[]=&argv;
  printf("%s %s\n",(*pointer)[0],(*pointer)[1]);
  return(0);
}

The second case segfaults.

--

Also, remember that this topic is about style of C code; about inferences that can be interpreted based on the declaration of variables.

**whiteflags** · 04-19-2012

Your arguments would be a lot more persuasive if your code did not present errors.

Code:

Comeau C/C++ 4.3.10.1 (Oct  6 2008 11:28:09) for ONLINE_EVALUATION_BETA2
Copyright 1988-2008 Comeau Computing.  All rights reserved.
MODE:strict errors C99 

"ComeauTest.c", line 4: error: a value of type "char ***" cannot be used to
          initialize an entity of type "char *(*)[]"
    char *(*pointer)[]=&argv;
                       ^

1 error detected in the compilation of "ComeauTest.c".

**anduril462** · 04-19-2012

@yuklair:
Your problem is (not to be rude) a lack of understanding of C, or an expectation for it to behave the way you want, with regards to pointers, 2-d arrays and how arrays behave when passed to functions. Read this and see if it helps: Arrays and Pointers. Of particular interest might be 6.1-6.4 and 6.13-6.20, but I would read them all just for the heck of it.

Some of your expectations/assumtions that you need to change:
> I see the true type of argv as an array of pointers to an array of char variables
This is wrong, and is probably the greatest source of your problem. argv is just an array of pointer to char. Each pointer in the array may point to the first element of an array of char, but it's still just an array of pointer to char.

>The former implies only a single char involved.
>The latter implies an array of pointers, that each still only imply a single char, so neither types are "complete".
Neither of those implies a single char. No pointer ever implies there is only one thing that is being pointed to. It's just an address to the start of one or more consecutive things (or it could be null).

> The "char **argv" type is more accurate in this case, and in general; the [] in argument declarations causes nothing but confusion.
This is completely subjective, and not true for everybody. Some people like that it reminds them that what is pointed to is an array.

> What we see clearly is arrays of data, which main's argv completely neglects.
I don't see any arrays of data that main's argv neglects, even partially. It is not neglect, it is simply how arrays are passed to functions in C, i.e. how they decay to a pointer. argv is a variable that exists on the stack, therefor it has an address on the stack, which you can get by using &argv. It would be 50 in your example. argv without the ampersand is the address of the first element of the array, or &argv[0]. That would be 30 in your example, with &argv[1] and &argv[2] being 31 and 32 respectively. Each element of argv is a pointer to char (really a pointer to one or more consecutive chars). So in your example, argv[0] is 10, argv[1] is 15. You're slightly off with argv[2] though, it points to NULL. Sure, on some crazy architecture, NULL could be 19, but I doubt it. It's best to think of it as zero:

Code:

[10][15][0]
 30  31  32

> To help understand the problem, I make a diagram of the data that exists, regardless of how argv is declared.
You can't say "regardless of how argv is declared". Any environment that conforms to the standard will pass in the data for argv in a specific format, namely char **argv (or it's equivalent, char *argv[]). If you declare argv in an invalid manner, then you can't properly interpret that data through argv. Because of this, the following interpretations are off.

> The top row of data is an array of chars, or "char []".
> The middle row of data is an array of pointers to an array of chars, or "char (*[])[]".
> The bottom datum is a pointer to an array of pointers to an array of chars, or "char (*(*)[])[]".
The top row of data is not really an array of chars. They are not (AFAIK) required to be sequential. Really, what you have is two distinct arrays of char, one for "name" and one for "arg".
The middle row of data is an array of pointers to char. Each element points to the first char in an array from row 1, except the last, which points to null.
The bottom row is a pointer to an array of pointer to char.

> When evaluating an array type (the arguments in the puts functions are each an array type), C automatically handles it as an argument to the first member so there is no problem with using it directly in functions that expect "char *", and at the same time we get a much more descriptive type than just "char **".
What you've done in that example is use a type case to cover up the fact that argv and foo are incompatible types. Anything can be assigned to a void *, and a void * can be assigned to anything, but that doesn't mean it should. Remove the cast and the compiler should complain that you're doing something stupid. The type of the arguments to puts are array type only because you cast argv to something it isn't. Yes, *(*foo)[0] is array type, but it is not an accurate description of what it really is, which is argv[0], which is pointer to char.

> Incidentally, if you wanted to set foo to argv's second member (for instance, if you wanted to ignore the first command-line argument):
Your description is a little off, IMO. You basically want foo to be just like argv, except skipping the first element. In that case, what you really want to do is set foo to the address of argv's second member. Something like:

Code:

char **foo = &argv[1];

foo points to the second element in argv, so foo[0] == argv[1], foo[1] == argv[2], etc. Setting foo to argv's second member would simply be:

Code:

char *foo = argv[1];

**yuklair** · 04-19-2012

Thanks for your replies.

@whiteflags: Typecast the expression to the right of the = to a generic pointer.