Thread: Need help understanding a piece of code

  1. #1
    Registered User
    Join Date
    Jul 2012
    Posts
    7

    Need help understanding a piece of code

    Hi,

    I came across an example program for command line arguments in C and it contains a piece of code I can't get my head around. The following is the program:

    Code:
    #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       
       main( int argc, char *argv[] )
       {
         int m, n,                              /* Loop counters. */
             l,                                 /* String length. */
             x,                                 /* Exit code. */
             ch;                                /* Character buffer. */
         char s[256];                           /* String buffer. */
       
         for( n = 1; n < argc; n++ )            /* Scan through args. */
         {
           switch( (int)argv[n][0] )            /* Check for option character. */
           {
           case '-':
           case '/': x = 0;                   /* Bail out if 1. */
                     l = strlen( argv[n] );
                     for( m = 1; m < l; ++m ) /* Scan through options. */
                     {
                       ch = (int)argv[n][m];
                       switch( ch )
                       {
                       case 'a':              /* Legal options. */
                       case 'A':
                       case 'b':
                       case 'B':
                       case 'C':
                       case 'd':
                       case 'D': printf( "Option code = %c\n", ch );
                                 break;
                       case 's':              /* String parameter. */
                       case 'S': if( m + 1 >= l )
                                 {
                                   puts( "Illegal syntax -- no string!" );
                                   exit( 1 );
                                 }
                                 else
                                 {
                                   strcpy( s, &argv[n][m+1] );
                                   printf( "String = %s\n", s );
                                 }
                                 x = 1;
                                 break;
                       default:  printf( "Illegal option code = %c\n", ch );
                                 x = 1;      /* Not legal option. */
                                 exit( 1 );
                                 break;
                       }
                       if( x == 1 )
                       {
                         break;
                       }
                     }
                     break;
           default:  printf( "Text = %s\n", argv[n] ); /* Not option -- text. */
                     break;
           }
         }
         puts( "DONE!" );
       }
    What I don't understand is the code in the second switch statement marked with string parameter. What do I have to input to get to the printf statement: string = %s ? what does the -s option do?

    All in all I am confused and I would really appreciate any help that can help me crawl out of the dark state of noobness I'm in now. Thanks.

  2. #2
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    It is a good example of how buffer overruns can occur. At minimum, it should have used
    Code:
    strncpy(s, &argv[n][m+1], sizeof s);
    s[(sizeof s) - 1] = '\0';
    instead, to avoid overrunning the s array (and making sure it will be properly terminated in all cases).

    Above n is the index to the current command-line argument, and m is the index of the current option character within that string. So, it will copy whatever follows the option character in the same option.

    Try -sfoo or /sfoo command line options to see for yourself.

  3. #3
    Registered User
    Join Date
    Mar 2011
    Posts
    546
    outer switch looks for a '-' or a '/', indicating an argument. if it sees one, it enters the case
    the length of the current argument is taken and assigned to 'l'
    the loop iterates over all the characters in the current argument except the initial '-' or '/'
    if the first character seen is an 's' or 'S', it enters that case.
    if the current index 'm' + 1 is greater than the length of the argument string, then there is no string and it enters the 'illegal' print. in other words, the code looks to see if there are characters immediately following the '-s'. if there are not, it is illegal.
    if there are characters following the '-s', then it enters the else clause and copies those characters to the char array variable 's' (with a possible overrun as the length is not checked that it will fit in 's').

    so to get it to print "String =...", you would enter ><program name> -sthisisthestring
    and it should print out String=thisisthestring.

  4. #4
    Registered User
    Join Date
    Jul 2012
    Posts
    7
    Thanks a lot guys! I am used to it being a space between the argument and the string, but now I see. Out of curiosity, how would one add a space between '-s' and the string without getting an error? Again thanks, it's nice to see that some forums still are helpful

  5. #5
    Registered User
    Join Date
    Dec 2007
    Posts
    2,675
    Comment on the code you found: using lower-case L as a variable name is generally a bad idea, as it is too easily mistaken for the numeral one. In fact, outside of loop constructs, single-letter variable names in general are frowned upon.

  6. #6
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by youjustreadthis View Post
    I am used to it being a space between the argument and the string
    It is more common, yes. Technically, the argument is in the next parameter.

    Quote Originally Posted by youjustreadthis View Post
    Out of curiosity, how would one add a space between '-s' and the string without getting an error?
    If you mean without modifying the program: By supplying the parameter quoted so that the shell will not split them into separate arguments. For example, "-s thishasaspaceinfront" (including the quotes, verbatim).

    If you meant to ask how would one need to modify the program so it would accept the parameter from the next command-line argument (as is common), then you could replace the relevant if clauses with
    Code:
    if( m + 1 >= l )
    {
        if( n + 1 >= argc )
        {
            puts("-s: No parameter specified");
            exit(1);
        }
    
        /* It is considered bad form to modify the for loop variable like this, but we have to. */
        n++;
    
        strncpy( s, argv[n], sizeof s );
        s[ (sizeof s) - 1 ] = '\0';
    }
    else
    {
        strncpy( s, argv[n], sizeof s );
        s[ (sizeof s) - 1 ] = '\0';
    }
    
    printf( "String = %s\n", s );

  7. #7
    Registered User
    Join Date
    Jul 2012
    Posts
    7
    That worked nicely, but why exactly is it considered bad form if it works? I can see why single letter variable names like l are frowned upon as rags_to_riches pointed out, but not sure if I got what was bad form. Thanks again, you are all being really helpful, despite my thickness haha

  8. #8
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by youjustreadthis View Post
    That worked nicely, but why exactly is it considered bad form if it works?
    If you have a for loop, say
    Code:
    for (i = 0; i < n; i++) {
        /* Loop body */
    }
    it is natural to look at the for statement, see the i++, and assume that i increases by one every iteration.

    Modifying the loop variable i in the loop body is allowed, but may be surprising to humans. Unexpected surprises often lead to bugs later on in the code; such unexpected surprises are therefore considered bad form.

    To avoid such errors, I recommend using a while loop instead. That way readers and future developers won't make any incorrect assumptions. Here is the basic skeleton I use to parse command-line parameters:
    Code:
    int options(int argc, const char *argv[])
    {
        int iarg = 1;
        int oarg = 1;
    
        while (iarg < argc)
            if (argv[iarg][0] == '-' && argv[iarg][1] == '-' && argv[iarg][2] == '\0') {
                /* No more options. */
                iarg++;
                while (iarg < argc)
                    argv[oarg++] = argv[iarg++];
    
            } else
            if (argv[iarg][0] == '-' && argv[iarg][1] == '-') {
                const char *opt = argv[iarg++] + 2;
                /* Long option handling skipped for brevity */
    
            } else
            if (argv[iarg][0] == '-') {
                const char *opt = argv[iarg++];
                /* Short option handling skipped for brevity */
    
            } else
                argv[oarg++] = argv[iarg++];
    
        /* NULLify extra elements in argv[] array. */
        for (iarg = oarg; iarg < argc; iarg++)
            argv[iarg] = NULL;
    
        /* Return new argc. */
        return oarg;
    }
    I do recommend using the standard getopt() interface instead.

    I personally don't use getopt() myself, because I prefer the following option parsing rules instead:
    • A double dash -- marks the end of parameters.
    • All arguments to short options are in the next parameter.
      A dash - starts an option specification, which never contains any values.
      In other words, -nd 2 / and -n 2 -d / do exactly the same thing (n=2, d=/), but -n2 -d/ is an error.
    • Long options with arguments may follow the option immediately after an equals sign. If there is no equals sign, and the long option takes an argument, the argument is in the next parameter.
      For example, --date=now or --date now

    Standard getopt() supports only short options, and not in the same way, and getopt_long() is a GNU extension. (The reason for handling short option arguments this way is due to the way an empty string argument is handled. It would be semi-ambiguous using the standard getopt() rules, but using the above rules an empty string argument is always explicit. If you've ever written scripts that need to be able to differentiate between an empty string as an argument, and no argument, you'll know why that might be important.)

    I haven't developed the above rules myself. In my experience, they're just the set of rules that my users have found most intuitive and easy to understand that is "close enough" to what all other Linux command-line utilities use. (In other words, these option parsing rules seem to breed the least amount of surprises and questions.)
    Last edited by Nominal Animal; 07-05-2012 at 01:39 PM.

  9. #9
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Quote Originally Posted by Nominal Animal View Post
    I personally don't use getopt() myself, because I prefer the following option parsing rules instead:
    • A double dash -- marks the end of parameters.
    I'm pretty sure the standard getopt() interface stops processing options on "--".



    • All arguments to short options are in the next parameter.
      A dash - starts an option specification, which never contains any values.
      In other words, -nd 2 / and -n 2 -d / do exactly the same thing (n=2, d=/), but -n2 -d/ is an error.
    This is quite different behavior from every tool I've ever used, which almost all allow an argument immediately after an option letter. In particular, I've never seen "-nd 2 /" being allowed by any tool. I generally separate option arguments from the option so I can pass a possibly empty argument to the option in shell scripts, but I sometimes do put the option and argument together if I know the argument is not empty. getopt allows them to be put together probably mostly for backwards compatibility with older scripts and tools that used that feature.



    • Long options with arguments may follow the option immediately after an equals sign. If there is no equals sign, and the long option takes an argument, the argument is in the next parameter.
      For example, --date=now or --date now

    Standard getopt() supports only short options, and not in the same way, and getopt_long() is a GNU extension. (The reason for handling short option arguments this way is due to the way an empty string argument is handled. It would be semi-ambiguous using the standard getopt() rules, but using the above rules an empty string argument is always explicit. If you've ever written scripts that need to be able to differentiate between an empty string as an argument, and no argument, you'll know why that might be important.)

    I haven't developed the above rules myself. In my experience, they're just the set of rules that my users have found most intuitive and easy to understand that is "close enough" to what all other Linux command-line utilities use. (In other words, these option parsing rules seem to breed the least amount of surprises and questions.)
    You're right about long options; those are a GNU extension, and as I mentioned above about shell scripts, it's easy enough to unambiguously pass an empty argument to an option using the standard getopt rules.

    Edit: I forgot to point out how confusing it would be to allow the syntax "-nd 2 /" as the number of options being used increases, which increases the distance between each option and its argument. That syntax would also be interpreted as "-n d 2 /" following the most common and standard rules.

    Also, there is one partial exception to the rule: tar. If the argument list doesn't start with a dash, it almost follows your rule about option arguments. For example, with "tar cfz foo.tgz" the argument for the "f" option is "foo.tgz". On the other hand, if the options start with a dash, they follow the conventional rules: "tar -cfz foo.tgz" will create a tar file named "z". This is one of those historical blunders which cannot be fixed as scripts might rely on this strange behavior of tar.
    Last edited by christop; 07-05-2012 at 02:44 PM.

  10. #10
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Let me reiterate: unless you know differently, it is best to use getopt().

    I personally do not, but that might be just because I work with weird people in weird environments. And I am pretty wonky myself.

    I just thought my weird approach might be relevant, because the original program at issue here was parsing options. In particular, I intended to emphasize how a while loop yields more predictable/easy-to-read code, than a for loop where the loop variable is occasionally modified also in the loop body.

    Incrementing the loop variable in the loop body is necessary when an option has an argument as a separate parameter, as opposed to immediately following the option character (as the original code in this thread expects).

    Quote Originally Posted by christop View Post
    I'm pretty sure the standard getopt() interface stops processing options on "--".
    Now that you mention it, it does and should. It is even explicitly mentioned in the tenth guideline in IEEE Std 1003.1-2001 Utility Argument Syntax chapter.

    Quote Originally Posted by christop View Post
    This is quite different behavior from every tool I've ever used, which almost all allow an argument immediately after an option letter.
    I know. In computing cluster environments, where things are started via noninteractive scripts, having the argument immediately follow the option is problematic, as in many cases previous errors cause the argument to be empty. Not allowing it (and telling the users to avoid that usage) reduces problems.

    The rules I described are just a consequence of disallowing immediate arguments to single-character options.

    Quote Originally Posted by christop View Post
    it's easy enough to unambiguously pass an empty argument to an option using the standard getopt rules.
    Yes, but allowing the argument to immediately follow the option causes unexpected behaviour if the argument might be empty. Consider a script, for example, with the argument taken from a shell variable.

    I've found warning against and disallowing immediate arguments to single-character options avoids such problems. Especially since users tend to start fixing the first symptom they see, instead of finding out the root cause first.

    Quote Originally Posted by christop View Post
    Also, there is one partial exception to the rule: tar.
    Yes, which also happens to be one of the more common command-line tools in computing cluster environments; i.e. one of the tools my users are most familiar with. Funnily enough, I think I've seen cases similar to tar -cfz foo.tar.gz (followed by "This system must be broken! Where did my data go?") more than once in real life..

  11. #11
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Quote Originally Posted by Nominal Animal View Post
    I know. In computing cluster environments, where things are started via noninteractive scripts, having the argument immediately follow the option is problematic, as in many cases previous errors cause the argument to be empty. Not allowing it (and telling the users to avoid that usage) reduces problems.

    The rules I described are just a consequence of disallowing immediate arguments to single-character options.
    I see no problem with disallowing immediate option arguments. It shouldn't cause any problems with new software (only older programs should still accept immediate options for the sake of backwards compatibility). If a user tries to call the program with an immediate argument (eg, "-n2"), the program should complain about an invalid option, preventing their scripts from breaking in the future. I only had a little issue with accepting option arguments that don't follow the option that it corresponds to, eg "-nd 2 /". That's an additional feature that just seems strange and error-prone to me.

    Yes, which also happens to be one of the more common command-line tools in computing cluster environments; i.e. one of the tools my users are most familiar with. Funnily enough, I think I've seen cases similar to tar -cfz foo.tar.gz (followed by "This system must be broken! Where did my data go?") more than once in real life..
    I was thinking of something an "experienced" sysadmin did at a previous job I had when I wrote about tar. He ended up with a large tar file named "z".

  12. #12
    Registered User
    Join Date
    Jul 2012
    Posts
    7
    Thanks to both of you for clearing up so many things! did not know a lot of this. I will do as you recomended and use getopt() for my little project as it seems to provide me with exactly what I was looking for. Really appreciate it

  13. #13
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by christop View Post
    I only had a little issue with accepting option arguments that don't follow the option that it corresponds to, eg "-nd 2 /". That's an additional feature that just seems strange and error-prone to me.
    Well, it turns out that when you implement the short option handling as a loop over each character in the short option string, that is just the easiest and simplest way to handle the arguments.

    To be honest, I think this might be the first time I've mentioned at all that my option parsing allows -nd 2 / as a shorthand for -n 2 -d / . I don't even know why I blurted it out.. I only show the expanded form in usage and documentation.

    I've posted a Bash implementation of the option parsing on another site here.

    Quote Originally Posted by christop View Post
    I was thinking of something an "experienced" sysadmin did at a previous job I had when I wrote about tar. He ended up with a large tar file named "z".
    In some cases it is useful to transfer files directly using an ssh connection and tar at both ends (as opposed to using scp). One out of every ten or so times I still end up with a file named "-" at one end.

  14. #14
    Registered User
    Join Date
    Jul 2012
    Posts
    7
    Just one more thing regarding optget(), more specifically optget_long(). Why are the flags necessary? as far as I can see they don't do much.

    Code:
    static struct option long_options[] =
                 {
                   /* These options set a flag. */
                   {"verbose", no_argument,       &verbose_flag, 1},
                   {"brief",   no_argument,       &verbose_flag, 0},
                   /* These options don't set a flag.
                      We distinguish them by their indices. */
                   {"add",     no_argument,       0, 'a'},
                   {"append",  no_argument,       0, 'b'},
                   {"delete",  required_argument, 0, 'd'},
                   {"create",  required_argument, 0, 'c'},
                   {"file",    required_argument, 0, 'f'},
                   {0, 0, 0, 0}
    Code:
    /* Instead of reporting ‘--verbose’
              and ‘--brief’ as they are encountered,
              we report the final status resulting from them. */
           if (verbose_flag)
             puts ("verbose flag is set");

  15. #15
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    The man 3 getopt_long man page is your friend. In particular:
    flag
    specifies how results are returned for a long option. If flag is NULL, then getopt_long() returns val. (For example, the calling program may set val to the equivalent short option character.) Otherwise, getopt_long() returns 0, and flag points to a variable which is set to val if the option is found, but left unchanged if the option is not found.
    If you really need to have a value of -1, ':', or '?', you need to use the flag pointer instead of having getopt_long() return that value. getopt_long() will return -1 when it is done, and ':' or '?' if it encounters an unknown or ambiguous option.

    I think the true reason for the flag pointer in the structure is to allow different approches, however. This way developers can freely pick whether they want to handle each value in a separate location, or as a return value from the getopt_long() function. It has very little cost -- it's not like you're going to have thousands and thousands of options, so an extra pointer to an int per option is a small cost --, but it potentially allows for much simpler code in some cases.

    As a developer, I like being able to choose the best approach, and not just forced into a random one by a library function.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help on understanding a piece of code
    By Ducky in forum C++ Programming
    Replies: 6
    Last Post: 05-12-2011, 10:55 AM
  2. Anyone with a Mac willing to compile a piece of code?
    By thefeedinghand in forum C++ Programming
    Replies: 9
    Last Post: 09-18-2010, 08:41 PM
  3. Help with a piece of code
    By Victor4015 in forum C++ Programming
    Replies: 1
    Last Post: 11-16-2005, 05:38 PM
  4. Help with a little piece of code
    By cdonlan in forum C Programming
    Replies: 5
    Last Post: 11-15-2004, 12:38 PM
  5. What is your favorite piece of code?
    By Yoshi in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 01-22-2002, 07:12 AM