Thread: UNIX: Parsing *argv[] with quoted arguments

  1. #1
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11

    UNIX: Parsing *argv[] with quoted arguments

    I have a custom newgrp command that I'm working on, that apart from switching your group also starts a command. The syntax looks like this:

    $ newgrp group [-e command [arg1, [arg2, [... argN]]]]

    For example:

    $ newgrp staff -e /some/path/server/start -a -e -m "Servern är startad som grupp staff."

    Now, the problem is those double quotes up there. My program needs to pass them to the command it starts, but my program never receives them in the first place, because the shell strips them away when creating my program's *argv[]. This is, of course, normal and not an error in itself. But I need to get around it.

    I need to either know which entries in my *argv[] were quoted when my program was started (so I can put quotes around those arguments myself when passing them to the command to launch), or find another way of solving this.

    One thought that struck me was that I could put double quotes around all entries in my *argv[] when my newgrp starts the command it's supposed to start. But then it won't work with special shell characters such as >, <, |, and so on.

    FYI, I'm running AIX 5.3. My newgrp uses system() to launch the command.

    I'd appreciate any help. I've run quite a few searches, but I haven't found anything of value.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > $ newgrp staff -e /some/path/server/start -a -e -m "Servern är startad som grupp staff."
    So
    argv[0] = "newgrp"
    argv[1] = "staff"
    argv[2] = "-e"
    argv[3] = "/some/path/server/start"

    and so on.

    If you just want to do
    /some/path/server/start -a -e -m "Servern är startad som grupp staff."
    at some point in the code, then it's simply
    execv( argv[3], &argv[3] );

    You said you were using system(), so if you're expecting to do something after this, then you'll need the whole fork() and wait() stuff as well.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by Salem View Post
    > $ newgrp staff -e /some/path/server/start -a -e -m "Servern är startad som grupp staff."
    So
    argv[0] = "newgrp"
    argv[1] = "staff"
    argv[2] = "-e"
    argv[3] = "/some/path/server/start"

    and so on.
    Right.

    If you just want to do
    /some/path/server/start -a -e -m "Servern är startad som grupp staff."
    at some point in the code, then it's simply
    execv( argv[3], &argv[3] );
    Hmm. Unless I'm mistaken, that won't work if my command contains special shell characters such as redirection and pipes, because the program isn't run through a shell. I'd have to insert "/bin/sh" and "-c" into *argv[] first. Actually, I shouldn't have to "insert" that at all - I could simply overwrite argv[1] (the group) and argv[2] (the -e to my program) with "/bin/sh" and "-c" respectively. And then, of course, supply argv[1] instead of argv[3] to execv.

    I think that should work. Thanks for pointing me in the right direction!

  4. #4
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by Enfors View Post
    I think that should work. Thanks for pointing me in the right direction!
    On second thought, no, I don't think that'll work either. It only solves part of the problem.

    You see, for some bizarre reason we're still running in 7-bit mode on these machines that this program is supposed to work on. That means, for example, that the Swedish character "ö" (which sometimes will appear inside a quoted argument) is interpreted as a pipe by the shell. Therefore, I need to preserve the original double quotes. Without them, the shell will think that all ö characters (even those within double quotes) in the arguments are pipes, which of course will create problems.

    Back to the drawing board, I guess...

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Maybe provide some additional command line examples so we don't all waste time trying to guess what you mean when you say 'redirection' or some other things.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by Salem View Post
    Maybe provide some additional command line examples so we don't all waste time trying to guess what you mean when you say 'redirection' or some other things.
    Oh, sorry. I thought everybody who was into UNIX programming was familiar with that term.

    When I say redirection, I mean the redirection of input and output streams:

    $ grep enfors /etc/passwd >outfile.txt

    $ grep foo * 2>/dev/null

    $ mail [email protected] <mailfile

    And so on.

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > that the Swedish character "ö" (which sometimes will appear inside a quoted argument) is interpreted as a pipe by the shell.
    But there is no shell to (re)interpret them if you're just passing the arguments on using an exec call.
    Yes it would be a problem if you were reconstructing the command line to pass to another system() call, but exec doesn't interpret the data at all.

    > I could simply overwrite argv[1] (the group) and argv[2] (the -e to my program) with "/bin/sh" and "-c" respectively
    Hold on, are you saying that server/start is a shell script ?

    OK, I'm confused again
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by Salem View Post
    > that the Swedish character "ö" (which sometimes will appear inside a quoted argument) is interpreted as a pipe by the shell.
    But there is no shell to (re)interpret them if you're just passing the arguments on using an exec call.
    There would be, if I, like I said, inserted "/bin/sh", and "-c" (sh's command line option that means "run the following string as a command in this shell") into *argv[]. So what I'd do is use exec to start a shell, and have that shell start my command.

    Yes it would be a problem if you were reconstructing the command line to pass to another system() call, but exec doesn't interpret the data at all.
    Right, but since I wouldn't be starting my command directly with exec(), I'd be starting a shell and asking that to run my command (so that I'd get the shell benefits of redirection, pipes, etc).

    > I could simply overwrite argv[1] (the group) and argv[2] (the -e to my program) with "/bin/sh" and "-c" respectively
    Hold on, are you saying that server/start is a shell script ?
    It could be (hence I couldn't call it directly with exec() even if I wanted to). But it could also be a binary. This is used from a menu system, so the command is whatever that particular menu option is configured to run.

    OK, I'm confused again
    Sorry about that, I'll try to clear it up :-)

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    OK, I think I see what's going on now.

    I think all you need to do is locate the argv[ i ] which contains your quoted string, allocate a block of memory which has space for that string plus two chars for a pair of " ". So you can turn
    argv[ i ] = hello world
    into
    argv[ i ] = "hello world"
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by Salem View Post
    OK, I think I see what's going on now.

    I think all you need to do is locate the argv[ i ] which contains your quoted string,
    ... which brings us back to the original problem - how do I, from an arbitrary unknown command line parsed into an *argv[], know which particular *argv[] entries were originally quoted?

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well you could search each argv[ i ] for any of the following
    - white space
    - characters < 32
    - characters > 128
    - a range of shell meta characters.

    If the arg fails the test, then duplicate it inside double quotes.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Is escaping the quotes (e.g. use \" instead of just ") while invoking your program considered cheating? It seems to be the most reasonable way to accomplish this.
    If you understand what you're doing, you're not learning anything.

  13. #13
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by Salem View Post
    Well you could search each argv[ i ] for any of the following
    - white space
    - characters < 32
    - characters > 128
    - a range of shell meta characters.

    If the arg fails the test, then duplicate it inside double quotes.
    No, I can't do that either. Like I said, the machines are running in 7 bit mode, so there are no characters above 127. For example, the Swedish character "ö" is used as the pipe character if it appears outside a quoted string. However, it is frequently used inside quoted strings (where it doesn't mean pipe, then it's just a letter in a message or heading). So the only way of knowing if it should be considered a pipe character or not, is to check for the double quotes.

    The only way I can see to solve this is if there was some way to access the original command line string (as a simple char *), before it was parsed into the *argv[].

  14. #14
    Registered User
    Join Date
    Apr 2007
    Location
    Karlstad, Sweden
    Posts
    11
    Quote Originally Posted by itsme86 View Post
    Is escaping the quotes (e.g. use \" instead of just ") while invoking your program considered cheating? It seems to be the most reasonable way to accomplish this.
    That would work, only I can't do that. The command lines that I'm using appear in old legacy configuration files (which are used by other systems too), so I can't change them.

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Code:
    #include <stdio.h>
    int main ( int argc, char *argv[] ) {
        int i;
        for ( i = 0 ; argv[1][i] ; i++ ) {
            unsigned char ch = argv[1][i];
            printf( "&#37;d: %c %3d %02x\n", i, argv[1][i], ch, ch );
        }
        return 0;
    }
    
    foo.exe "Servern &#228;r"
    0: S  83 53
    1: e 101 65
    2: r 114 72
    3: v 118 76
    4: e 101 65
    5: r 114 72
    6: n 110 6e
    7:    32 20
    8: &#245; 228 e4
    9: r 114 72
    Shows 8 bit characters in a string being passed as an argument.

    Are you saying this doesn't happen on your machine working in 7-bit mode?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. command line arguments
    By vurentjie in forum C Programming
    Replies: 3
    Last Post: 06-22-2008, 06:46 AM
  2. How to program in unix
    By Cpro in forum Linux Programming
    Replies: 21
    Last Post: 02-12-2008, 10:54 AM
  3. Setting up a Unix box
    By @nthony in forum Tech Board
    Replies: 6
    Last Post: 07-22-2007, 10:22 PM
  4. NULL arguments in a shell program
    By gregulator in forum C Programming
    Replies: 4
    Last Post: 04-15-2004, 10:48 AM
  5. About Unix Programming - Making a career desision
    By null in forum C Programming
    Replies: 0
    Last Post: 10-14-2001, 07:37 AM