Need help understanding a piece of code

**youjustreadthis** · 07-05-2012

Hi,

I came across an example program for command line arguments in C and it contains a piece of code I can't get my head around. The following is the program:

Code:

#include <stdio.h>
   #include <stdlib.h>
   #include <string.h>
   
   main( int argc, char *argv[] )
   {
     int m, n,                              /* Loop counters. */
         l,                                 /* String length. */
         x,                                 /* Exit code. */
         ch;                                /* Character buffer. */
     char s[256];                           /* String buffer. */
   
     for( n = 1; n < argc; n++ )            /* Scan through args. */
     {
       switch( (int)argv[n][0] )            /* Check for option character. */
       {
       case '-':
       case '/': x = 0;                   /* Bail out if 1. */
                 l = strlen( argv[n] );
                 for( m = 1; m < l; ++m ) /* Scan through options. */
                 {
                   ch = (int)argv[n][m];
                   switch( ch )
                   {
                   case 'a':              /* Legal options. */
                   case 'A':
                   case 'b':
                   case 'B':
                   case 'C':
                   case 'd':
                   case 'D': printf( "Option code = %c\n", ch );
                             break;
                   case 's':              /* String parameter. */
                   case 'S': if( m + 1 >= l )
                             {
                               puts( "Illegal syntax -- no string!" );
                               exit( 1 );
                             }
                             else
                             {
                               strcpy( s, &argv[n][m+1] );
                               printf( "String = %s\n", s );
                             }
                             x = 1;
                             break;
                   default:  printf( "Illegal option code = %c\n", ch );
                             x = 1;      /* Not legal option. */
                             exit( 1 );
                             break;
                   }
                   if( x == 1 )
                   {
                     break;
                   }
                 }
                 break;
       default:  printf( "Text = %s\n", argv[n] ); /* Not option -- text. */
                 break;
       }
     }
     puts( "DONE!" );
   }

What I don't understand is the code in the second switch statement marked with string parameter. What do I have to input to get to the printf statement: string = %s ? what does the -s option do?

All in all I am confused and I would really appreciate any help that can help me crawl out of the dark state of noobness I'm in now. Thanks.

**Nominal Animal** · 07-05-2012

It is a good example of how buffer overruns can occur. At minimum, it should have used

Code:

strncpy(s, &argv[n][m+1], sizeof s);
s[(sizeof s) - 1] = '\0';

instead, to avoid overrunning the s array (and making sure it will be properly terminated in all cases).

Above n is the index to the current command-line argument, and m is the index of the current option character within that string. So, it will copy whatever follows the option character in the same option.

Try -sfoo or /sfoo command line options to see for yourself.

**dmh2000** · 07-05-2012

outer switch looks for a '-' or a '/', indicating an argument. if it sees one, it enters the case
the length of the current argument is taken and assigned to 'l'
the loop iterates over all the characters in the current argument except the initial '-' or '/'
if the first character seen is an 's' or 'S', it enters that case.
if the current index 'm' + 1 is greater than the length of the argument string, then there is no string and it enters the 'illegal' print. in other words, the code looks to see if there are characters immediately following the '-s'. if there are not, it is illegal.
if there are characters following the '-s', then it enters the else clause and copies those characters to the char array variable 's' (with a possible overrun as the length is not checked that it will fit in 's').

so to get it to print "String =...", you would enter ><program name> -sthisisthestring
and it should print out String=thisisthestring.

**youjustreadthis** · 07-05-2012

Thanks a lot guys! I am used to it being a space between the argument and the string, but now I see. Out of curiosity, how would one add a space between '-s' and the string without getting an error? Again thanks, it's nice to see that some forums still are helpful

**rags_to_riches** · 07-05-2012

Comment on the code you found: using lower-case L as a variable name is generally a bad idea, as it is too easily mistaken for the numeral one. In fact, outside of loop constructs, single-letter variable names in general are frowned upon.

**Nominal Animal** · 07-05-2012

Originally Posted by youjustreadthis

I am used to it being a space between the argument and the string

It is more common, yes. Technically, the argument is in the next parameter.

Originally Posted by youjustreadthis

Out of curiosity, how would one add a space between '-s' and the string without getting an error?

If you mean without modifying the program: By supplying the parameter quoted so that the shell will not split them into separate arguments. For example, "-s thishasaspaceinfront" (including the quotes, verbatim).

If you meant to ask how would one need to modify the program so it would accept the parameter from the next command-line argument (as is common), then you could replace the relevant if clauses with

Code:

if( m + 1 >= l )
{
    if( n + 1 >= argc )
    {
        puts("-s: No parameter specified");
        exit(1);
    }

    /* It is considered bad form to modify the for loop variable like this, but we have to. */
    n++;

    strncpy( s, argv[n], sizeof s );
    s[ (sizeof s) - 1 ] = '\0';
}
else
{
    strncpy( s, argv[n], sizeof s );
    s[ (sizeof s) - 1 ] = '\0';
}

printf( "String = %s\n", s );

**youjustreadthis** · 07-05-2012

That worked nicely, but why exactly is it considered bad form if it works? I can see why single letter variable names like l are frowned upon as rags_to_riches pointed out, but not sure if I got what was bad form. Thanks again, you are all being really helpful, despite my thickness haha

**Nominal Animal** · 07-05-2012

Originally Posted by youjustreadthis

That worked nicely, but why exactly is it considered bad form if it works?

If you have a for loop, say

Code:

for (i = 0; i < n; i++) {
    /* Loop body */
}

it is natural to look at the for statement, see the i++, and assume that i increases by one every iteration.

Modifying the loop variable i in the loop body is allowed, but may be surprising to humans. Unexpected surprises often lead to bugs later on in the code; such unexpected surprises are therefore considered bad form.

To avoid such errors, I recommend using a while loop instead. That way readers and future developers won't make any incorrect assumptions. Here is the basic skeleton I use to parse command-line parameters:

Code:

int options(int argc, const char *argv[])
{
    int iarg = 1;
    int oarg = 1;

    while (iarg < argc)
        if (argv[iarg][0] == '-' && argv[iarg][1] == '-' && argv[iarg][2] == '\0') {
            /* No more options. */
            iarg++;
            while (iarg < argc)
                argv[oarg++] = argv[iarg++];

        } else
        if (argv[iarg][0] == '-' && argv[iarg][1] == '-') {
            const char *opt = argv[iarg++] + 2;
            /* Long option handling skipped for brevity */

        } else
        if (argv[iarg][0] == '-') {
            const char *opt = argv[iarg++];
            /* Short option handling skipped for brevity */

        } else
            argv[oarg++] = argv[iarg++];

    /* NULLify extra elements in argv[] array. */
    for (iarg = oarg; iarg < argc; iarg++)
        argv[iarg] = NULL;

    /* Return new argc. */
    return oarg;
}

I do recommend using the standard getopt() interface instead.

I personally don't use getopt() myself, because I prefer the following option parsing rules instead:

A double dash -- marks the end of parameters.
All arguments to short options are in the next parameter.
A dash - starts an option specification, which never contains any values.
In other words, -nd 2 / and -n 2 -d / do exactly the same thing (n=2, d=/), but -n2 -d/ is an error.
Long options with arguments may follow the option immediately after an equals sign. If there is no equals sign, and the long option takes an argument, the argument is in the next parameter.
For example, --date=now or --date now

Standard getopt() supports only short options, and not in the same way, and getopt_long() is a GNU extension. (The reason for handling short option arguments this way is due to the way an empty string argument is handled. It would be semi-ambiguous using the standard getopt() rules, but using the above rules an empty string argument is always explicit. If you've ever written scripts that need to be able to differentiate between an empty string as an argument, and no argument, you'll know why that might be important.)

I haven't developed the above rules myself. In my experience, they're just the set of rules that my users have found most intuitive and easy to understand that is "close enough" to what all other Linux command-line utilities use. (In other words, these option parsing rules seem to breed the least amount of surprises and questions.)

**christop** · 07-05-2012

Originally Posted by Nominal Animal

I personally don't use getopt() myself, because I prefer the following option parsing rules instead:

A double dash -- marks the end of parameters.

I'm pretty sure the standard getopt() interface stops processing options on "--".

All arguments to short options are in the next parameter.
A dash - starts an option specification, which never contains any values.
In other words, -nd 2 / and -n 2 -d / do exactly the same thing (n=2, d=/), but -n2 -d/ is an error.

This is quite different behavior from every tool I've ever used, which almost all allow an argument immediately after an option letter. In particular, I've never seen "-nd 2 /" being allowed by any tool. I generally separate option arguments from the option so I can pass a possibly empty argument to the option in shell scripts, but I sometimes do put the option and argument together if I know the argument is not empty. getopt allows them to be put together probably mostly for backwards compatibility with older scripts and tools that used that feature.

Long options with arguments may follow the option immediately after an equals sign. If there is no equals sign, and the long option takes an argument, the argument is in the next parameter.
For example, --date=now or --date now

Standard getopt() supports only short options, and not in the same way, and getopt_long() is a GNU extension. (The reason for handling short option arguments this way is due to the way an empty string argument is handled. It would be semi-ambiguous using the standard getopt() rules, but using the above rules an empty string argument is always explicit. If you've ever written scripts that need to be able to differentiate between an empty string as an argument, and no argument, you'll know why that might be important.)

I haven't developed the above rules myself. In my experience, they're just the set of rules that my users have found most intuitive and easy to understand that is "close enough" to what all other Linux command-line utilities use. (In other words, these option parsing rules seem to breed the least amount of surprises and questions.)

You're right about long options; those are a GNU extension, and as I mentioned above about shell scripts, it's easy enough to unambiguously pass an empty argument to an option using the standard getopt rules.

Edit: I forgot to point out how confusing it would be to allow the syntax "-nd 2 /" as the number of options being used increases, which increases the distance between each option and its argument. That syntax would also be interpreted as "-n d 2 /" following the most common and standard rules.

Also, there is one partial exception to the rule: tar. If the argument list doesn't start with a dash, it almost follows your rule about option arguments. For example, with "tar cfz foo.tgz" the argument for the "f" option is "foo.tgz". On the other hand, if the options start with a dash, they follow the conventional rules: "tar -cfz foo.tgz" will create a tar file named "z". This is one of those historical blunders which cannot be fixed as scripts might rely on this strange behavior of tar.

**Nominal Animal** · 07-05-2012

Let me reiterate: unless you know differently, it is best to use getopt().

I personally do not, but that might be just because I work with weird people in weird environments. And I am pretty wonky myself.

I just thought my weird approach might be relevant, because the original program at issue here was parsing options. In particular, I intended to emphasize how a while loop yields more predictable/easy-to-read code, than a for loop where the loop variable is occasionally modified also in the loop body.

Incrementing the loop variable in the loop body is necessary when an option has an argument as a separate parameter, as opposed to immediately following the option character (as the original code in this thread expects).

Originally Posted by christop

I'm pretty sure the standard getopt() interface stops processing options on "--".

Now that you mention it, it does and should. It is even explicitly mentioned in the tenth guideline in IEEE Std 1003.1-2001 Utility Argument Syntax chapter.

Originally Posted by christop

This is quite different behavior from every tool I've ever used, which almost all allow an argument immediately after an option letter.

I know. In computing cluster environments, where things are started via noninteractive scripts, having the argument immediately follow the option is problematic, as in many cases previous errors cause the argument to be empty. Not allowing it (and telling the users to avoid that usage) reduces problems.

The rules I described are just a consequence of disallowing immediate arguments to single-character options.

Originally Posted by christop

it's easy enough to unambiguously pass an empty argument to an option using the standard getopt rules.

Yes, but allowing the argument to immediately follow the option causes unexpected behaviour if the argument might be empty. Consider a script, for example, with the argument taken from a shell variable.

I've found warning against and disallowing immediate arguments to single-character options avoids such problems. Especially since users tend to start fixing the first symptom they see, instead of finding out the root cause first.

Originally Posted by christop

Also, there is one partial exception to the rule: tar.

Yes, which also happens to be one of the more common command-line tools in computing cluster environments; i.e. one of the tools my users are most familiar with. Funnily enough, I think I've seen cases similar to tar -cfz foo.tar.gz (followed by "This system must be broken! Where did my data go?") more than once in real life..

**christop** · 07-05-2012

Originally Posted by Nominal Animal

I know. In computing cluster environments, where things are started via noninteractive scripts, having the argument immediately follow the option is problematic, as in many cases previous errors cause the argument to be empty. Not allowing it (and telling the users to avoid that usage) reduces problems.

The rules I described are just a consequence of disallowing immediate arguments to single-character options.

I see no problem with disallowing immediate option arguments. It shouldn't cause any problems with new software (only older programs should still accept immediate options for the sake of backwards compatibility). If a user tries to call the program with an immediate argument (eg, "-n2"), the program should complain about an invalid option, preventing their scripts from breaking in the future. I only had a little issue with accepting option arguments that don't follow the option that it corresponds to, eg "-nd 2 /". That's an additional feature that just seems strange and error-prone to me.

Yes, which also happens to be one of the more common command-line tools in computing cluster environments; i.e. one of the tools my users are most familiar with. Funnily enough, I think I've seen cases similar to tar -cfz foo.tar.gz (followed by "This system must be broken! Where did my data go?") more than once in real life..

I was thinking of something an "experienced" sysadmin did at a previous job I had when I wrote about tar. He ended up with a large tar file named "z".

**youjustreadthis** · 07-05-2012

Thanks to both of you for clearing up so many things! did not know a lot of this. I will do as you recomended and use getopt() for my little project as it seems to provide me with exactly what I was looking for. Really appreciate it

**Nominal Animal** · 07-05-2012

Originally Posted by christop

I only had a little issue with accepting option arguments that don't follow the option that it corresponds to, eg "-nd 2 /". That's an additional feature that just seems strange and error-prone to me.

Well, it turns out that when you implement the short option handling as a loop over each character in the short option string, that is just the easiest and simplest way to handle the arguments.

To be honest, I think this might be the first time I've mentioned at all that my option parsing allows -nd 2 / as a shorthand for -n 2 -d / . I don't even know why I blurted it out.. I only show the expanded form in usage and documentation.

I've posted a Bash implementation of the option parsing on another site here.

Originally Posted by christop

I was thinking of something an "experienced" sysadmin did at a previous job I had when I wrote about tar. He ended up with a large tar file named "z".

In some cases it is useful to transfer files directly using an ssh connection and tar at both ends (as opposed to using scp). One out of every ten or so times I still end up with a file named "-" at one end.

**youjustreadthis** · 07-06-2012

Just one more thing regarding optget(), more specifically optget_long(). Why are the flags necessary? as far as I can see they don't do much.

Code:

static struct option long_options[] =
             {
               /* These options set a flag. */
               {"verbose", no_argument,       &verbose_flag, 1},
               {"brief",   no_argument,       &verbose_flag, 0},
               /* These options don't set a flag.
                  We distinguish them by their indices. */
               {"add",     no_argument,       0, 'a'},
               {"append",  no_argument,       0, 'b'},
               {"delete",  required_argument, 0, 'd'},
               {"create",  required_argument, 0, 'c'},
               {"file",    required_argument, 0, 'f'},
               {0, 0, 0, 0}

Code:

/* Instead of reporting ‘--verbose’
          and ‘--brief’ as they are encountered,
          we report the final status resulting from them. */
       if (verbose_flag)
         puts ("verbose flag is set");

**Nominal Animal** · 07-06-2012

The man 3 getopt_long man page is your friend. In particular:

flag

specifies how results are returned for a long option. If flag is NULL, then getopt_long() returns val. (For example, the calling program may set val to the equivalent short option character.) Otherwise, getopt_long() returns 0, and flag points to a variable which is set to val if the option is found, but left unchanged if the option is not found.

If you really need to have a value of -1, ':', or '?', you need to use the flag pointer instead of having getopt_long() return that value. getopt_long() will return -1 when it is done, and ':' or '?' if it encounters an unknown or ambiguous option.

I think the true reason for the flag pointer in the structure is to allow different approches, however. This way developers can freely pick whether they want to handle each value in a separate location, or as a return value from the getopt_long() function. It has very little cost -- it's not like you're going to have thousands and thousands of options, so an extra pointer to an int per option is a small cost --, but it potentially allows for much simpler code in some cases.

As a developer, I like being able to choose the best approach, and not just forced into a random one by a library function.

Thread: Need help understanding a piece of code

Thread Tools

Search Thread

Display

Need help understanding a piece of code

Similar Threads

Help on understanding a piece of code

Anyone with a Mac willing to compile a piece of code?

Help with a piece of code

Help with a little piece of code

What is your favorite piece of code?