Parsing of arguments and 2D array (array of pointers to *char)

**barracuda** · 01-15-2015

Hi guys I have such problem. I try to parse arguments in my program. I already have parsing function but I have problem there in the part where I try to get the arguments fallowing -arguments substring.

This is example of the arguments line in project:
-kernels gaussian;gaussian2;gaussian3 -device_id 2 -mode basic -arguments "image=image1.bmp; gaussian=4; sigma=1" "image=image2.bmp; gaussian=4; sigma=2" "image=image3.bmp; gaussian=5; sigma=3"

What I am trying to parse is the part -arguments "arguments for first kernel" "arguments for second kernel" "arguments for third kernel"

In this moment I don't know how many kernels I parse so I don't know how many arguments there is after -arguments and before -nextargument or end of string.

This is one problem. I cannot allocate memory because I don't know how many substrings "..." are there between -arguments and (end of string or next substring "..."

Ok. Let's make better example:

I now I need to save the substrings:
"image=image1.bmp; .. "
"image=image2.bmp; .. "
"image=image3.bmp; .. "
to a struct member called arguments.

My struct:

Code:

typedef struct  {
  bool install;
  bool load;
  int argumentsCount;
  int * arguments; // array of pointers to *char
  bool loadSpecifiedKernels;
  char strLoadedSpecifiedKernels[2560];
  int loadedSpecifiedKernelsCount;
} SETTINGS;

My parsing function (old version) contains this part of code:

Code:

else if(strcmp(argv[i],"-arguments")==0)
 {
  i++;
  settings->arguments = (char*) malloc(strlen(argv[i])+1);
  strcpy(settings->arguments, trim(argv[i]));
  }

but arguments here is string (old version). I have changed the struct so now it is array. But I need:
1) find out how many substrings like "..." are there between -arguments and next argument or end of string
2) to create loop that will allocate memory and increase i + 1
and will copy the substrings to the array of pointers to *char

And whole job seems so complicated to me that I don't know how to do it. Can you help me? How to perform the test (conditions) to find out of the next argument argv[i] starts as -next_argument

At least how to get the count of specified substrings and how to parse the string.

I try something but it is full of errors and warnings. First of all how to access first character of *argv[i]?

Code:

else if(strcmp(argv[i],"-arguments")==0)
 {
 int count=0; int c=0; int bytes=0;
 // How to detect count ...
 while ( *argv[i][0]=="-" && isAlpha( *argv[i][1]) ){
 i++;
 count++;
 bytes+=strlen(argv[i]);
 }
 i=i-count;
 settings->arguments = (int*) malloc(count*bytes); // warning:  comparison between pointer and integer [enabled by default]|
 while ( argv[i][0]=="-" && isAlpha(argv[i][1]) ){
 i++;
 c++;
 settings->arguments[c] = (char*) malloc(strlen(argv[i])+1);
 strcpy(settings->arguments, trim(argv[i]));
 }
 }

Code:

bool isAlpha(char s){
return (
    (*s >= 65 && *s <= 90 ) ||
    (*s >= 97 && *s <= 122)
    );
};

**anduril462** · 01-15-2015

As for the first issue, knowing how many "args for kernel N" there are, you can simply iterate through argv, find the index of -arguments and the index of -nextargument, then use some very simple math to figure out how many arguments lie in between the two. However, you can also get this information from the -kernels argument. If you specify 3 kernels (as in your example above), then there better be at least 3 arguments. If there are less, you have an error. If there are more, you can choose to make that an error or simply ignore the extra ones.

Why is arguments declared as an int pointer? It's supposed to be a string.

There are a number of problems with your isAlpha.

s is a char, not a pointer to char. This makes the * (dereference) operator wrong. That function shouldn't even compile.
You use magic numbers. 65, 90, 97 and 122 are all specific to ASCII and compatible character sets. While this covers the vast majority of systems out there, there are other character sets that you could encounter (such as EBCDIC) where those numbers don't correspond to 'A', 'Z', 'a' and 'z'. Also, there is no guarantee that the alphabetic characters will be in contiguous and in order, i.e. you may have non-letters in between, and A may not be the lowest number, nor Z the highest.
There's an isalpha in the standard library. You just have to remember to #include <ctype.h> to use it. Google for more info and examples.

A method you should always use, but especially for complicated problems, is:

Figure out how to solve your problem with paper and pencil. If you can't solve it, you can't program a computer to solve it.
Pay careful attention to even the smallest steps you use when working it out on paper, this will be the basis of your algorithm.
Work in small chunks and tackle one little aspect of your problem at a time. Counting the number of kernels passed in the -kernel argument. Allocating an array of the right size. Copying strings. Validating arguments (i.e. checking them for errors, reporting errors and quitting if need be).
Compile often (after each small chunk), and do so at the maximum warning level. Resolve all warnings and errors.
Thoroughly test your code so far. Never move on to the next small chunk until all previous code is working perfectly.
Keep repeatig the above until your program is done.

**barracuda** · 01-15-2015

If I use type * char as an argument for isAlpha so I got this warning:
note: expected 'char *' but argument is of type 'char'|

there is probably problem with this: [code]*argv[i][0][code] but I don't know hot to correct the access to the first character. Same problem with *argv[i][1].

SETTINGS->arguments is array of integers because I expect arguments should be array of pointers to substrings. Results should be saved into elements of array arguments.

I am not sure how isalpha works. I need to find characters between a-z and A-Z. These are allowed for -any_argument. I don't look for word like -+blabla or -(blabla) or any other special characters. There are just word in english used as "commands".

I don't use planning on paper, as I don't think I could do good plans as a beginner; there are still many things changeing in my code once I find out something is wrong; that the logic of code is wrong; so I work on the correct plan when I am developing the program it's evolution, not planning at this stage of my beginners skills.

**Jimmy** · 01-15-2015

Originally Posted by barracuda

I am not sure how isalpha works. I need to find characters between a-z and A-Z. These are allowed for -any_argument. I don't look for word like -+blabla or -(blabla) or any other special characters. There are just word in english used as "commands".

Then find out, because it is very good advice to use library functions instead of creating your own mess.

Originally Posted by barracuda

I don't use planning on paper, as I don't think I could do good plans as a beginner; there are still many things changeing in my code once I find out something is wrong; that the logic of code is wrong; so I work on the correct plan when I am developing the program it's evolution, not planning at this stage of my beginners skills.

That is also the wrong way to go about it. Unless you plan before you write your code you will just write broken code based on previous broken code. Sitting down and thinking through things first is the only way to understand what works and why.

It will make you learn so much faster.

Your way is like trying to learn how to patch up a hole in a boat by drilling holes in it.

**nul** · 01-15-2015

I would recommend a hybrid approach as to what Jimmy recommends. He says don't roll your own whenever possible, and that is accurate when you are writing code that will be used. But for academic purposes, I recommend first attempting it yourself. Then, when you are content with what you have, or become stuck in some way, you should compare your's to the standard.

Jumping straight to the standard source code robs you of the opportunity of trying to create your own solution without being influenced by what is considered the "correct" solution. Who knows? Maybe you can create something better than what is already used.... Not likely, but not impossible.

**barracuda** · 01-15-2015

Originally Posted by Jimmy

That is also the wrong way to go about it. Unless you plan before you write your code you will just write broken code based on previous broken code. Sitting down and thinking through things first is the only way to understand what works and why.

It will make you learn so much faster.

Your way is like trying to learn how to patch up a hole in a boat by drilling holes in it.

Not at all. That is your aproach, stay with it and don't push it to creative ppl.

Originally Posted by nul

I would recommend a hybrid approach as to what Jimmy recommends. He says don't roll your own whenever possible, and that is accurate when you are writing code that will be used. But for academic purposes, I recommend first attempting it yourself. Then, when you are content with what you have, or become stuck in some way, you should compare your's to the standard.

Exactly that is what I am doing. It is faster for me then to play with plans which would be useless at this stage.

**Jimmy** · 01-15-2015

Originally Posted by barracuda

Not at all. That is your aproach, stay with it and don't push it to creative ppl.

I think you misunderstand what it means to be creative.

Originally Posted by barracuda

Exactly that is what I am doing. It is faster for me then to play with plans which would be useless at this stage.

So you want to continue writing crap code instead of learning something? Go ahead.

**nul** · 01-15-2015

Writing 'crap code' and learning something are not mutually exclusive.

**Jimmy** · 01-15-2015

Originally Posted by nul

Writing 'crap code' and learning something are not mutually exclusive.

No, you are right, you learn to write crap code by writing crap code.

When learning a subject you should learn by doing it according to best practices as soon as possible or you will develop bad habits that are really hard to unlearn. That goes for any subject you are trying to learn. How to structure your code is something you teach yourself until you do it without thinking. As an experienced programmer you can often just look at a piece of code and know instinctively that something is wrong, because the person that wrote the code did it in a strange way.

Most of the time you can actually identify who wrote a piece of code just by looking at it, it is almost like a fingerprint. So sometimes you just go "Oh, right, this code is written by John, so it is crap code for sure".

**anduril462** · 01-15-2015

@nul/Jimmy:
You can learn from writing crap code, but it's usually a long and painful process. It requires writing crappy code, then using and maintaing the resulting software long enough for all your crappy code to cause you enough headache to realize it's crap. Then you have to figure out the right way to do it. That's long and painful, and doesn't always work (depends on who is trying to learn this way). Quicker still is to write crap code, then have somebody point out what that is wrong with it and why it's inferior. That's very cumbersome for the reviewer.
[EDIT]
The quickest way to learn to write good code is to take the advice of experts from the beginning. At some point, you will encounter bad code. Then, you will be able to identify what is bad about it and why it's bad. Plus, as Jimmy pointed out, no bad habits to unlearn that way.
[/EDIT]

And yes, for academic purposes, "rolling your own" version of library functions can be a very useful teaching tool. In my opinion and experience, isalpha is not the type/complexity of function that teaches much by rolling your own. More useful might be string library functions, which will solidify concepts of loops, pointers, arrays, etc. barracuda did write his/her isAlpha, but it was sub-optimal. The problems with their implementation has been mentioned, as was that it's beneficial to using existing standard library functions. This should be sufficient for convincing barracuda and others to use isalpha().

@barracuda:
You may choose to do things your own way, but that will pretty much ensure you that people like Jimmy and I wont want to help you. After all, why would we spend time sharing our wisdom, experience and advice with you if you're going to ignore it?

> I don't use planning on paper, as I don't think I could do good plans as a beginner; there are still many things changeing in my code once I find out something is wrong; that the logic of code is wrong; so I work on the correct plan when I am developing the program it's evolution, not planning at this stage of my beginners skills.
You don't think you can make good plans? Well, what do you think would be worse: working with mediocre plans, or working with no plans at all? Imagine trying to build a house without any plans or blueprint. You will probably fail, or have a horrible house. If you get it to work, there will likely be lots of problems. You'll put the plumbing in the wrong place, screw up the wiring, cut boards too short, the foundation will be weak, walls wont be straight, etc. Working without plans makes your life more difficult. You don't have to have 100% of it planned on paper first, but you need to have a pretty good idea -- shoot for ~80% planned before you start coding. The people who can successfully complete projects without much planning tend to have many, many years of experience, and since you're clearly a novice, you should work on developing your planning and problem solving skills, along with your coding.

> If I use type * char as an argument for isAlpha so I got this warning:
> note: expected 'char *' but argument is of type 'char'|
[I]> there is probably problem with this: [code]*argv[i][0][code] but I don't know hot to correct the access to the first character. Same problem with *argv[1].
There is no need to pass a char * to isAlpha. Declare isAlpha to take a plain char, and pass a plain char to it. As long as you haven't defined argv to be the standard char **, then that is the problem. Assuming argv is a char **, then argv[i] is a char *, and argv[i][0] is a plain char. You can only dereference something that is a pointer. If it's not a pointer, you can't use * to dereference it. Something really doesn't add up here though.

> SETTINGS->arguments is array of integers because I expect arguments should be array of pointers to substrings. Results should be saved into elements of array arguments.
This doesn't quite make sense. Substring is not an actual type. You probably mean "string", as int SETTINGS->arguments is an array of strings. Well, a string in C is a bunch of sequential chars. That's char, not int. Thus an int * makes no sense, and is the wrong type. The compiler should produce warnings.

> I am not sure how isalpha works. I need to find characters between a-z and A-Z. These are allowed for -any_argument. I don't look for word like -+blabla or -(blabla) or any other special characters. There are just word in english used as "commands".
If you bothered to check Google for examples and tutorials on using isalpha, you would know how it works. You would know that it does exactly what you want it to do (checks if a char is A-Z or a-z). You would have lots of examples of how to use it with a single char, or to evaluate a whole string.

You seem to have some fairly fundamental problems with argv and strings, arrays and pointers. I strongly suggest you spend a little more time with your textbook, notes and tutorials working with strings, pointers and arrays (specifically 2-d arrays) in simpler ways. Once you have a solid understanding, then you can make use of them in this program. Back to the building-a-house analogy, you're trying to build the house before you know how to use the tools -- how can you create a foundation without first knowing how to build the forms for the concrete, or mix the concrete, or set the anchor points for the walls?

**nul** · 01-15-2015

Right. If you don't compare your code to the standard when you've finished your attempt, like I recommended above, you'll lose out on that. But giving it an honest try yourself to write these simple programs/functions as exercises is not a bad idea.

**Jimmy** · 01-15-2015

Originally Posted by nul

Right. If you don't compare your code to the standard when you've finished your attempt, like I recommended above, you'll lose out on that. But giving it an honest try yourself to write these simple programs/functions as exercises is not a bad idea.

It is better to do it the other way around.

Understanding then doing. You will learn much faster that way and your understanding will be more complete.

You become a master by imitating and studying masters.

And people that learn programming by writing crappy code instead of listening to experience, especially in languages like C, tend to have a lot of bad habits. They rely on unportable constructs, because "hey, it works on my compiler", and rely on undefined behavior because "hey, it works on my compiler".

**barracuda** · 01-16-2015

This discussion course is at 95% useless. I am not interested in philosofy so I will read only the posts which are not philosophical and stick to the code. If you have nothing to say to the code then better don't waste your time. I will find way myself. Bye

**Nominal Animal** · 01-16-2015

The way I'd solve this, is separate the parsing of each kernel spec (separate command-line argument in the argv[] array) to a helper function. I'd also use a separate structure for each kernel, just to make it easier:

Code:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

struct kernel {
    char    *image;
    double   gaussian;
    double   sigma;
};

static void kernel_init(struct kernel *const k)
{
    /* Set defaults: */
    k->image = NULL;
    k->gaussian = 0.0;
    k->sigma = 0.0;
}

static void kernel_free(struct kernel *const k)
{
    /* Free dynamically allocated strings: */
    free(k->image);

    /* Set "invalid" values: */
    k->image = NULL;
    k->gaussian = 0.0;
    k->sigma = 0.0;
}

The static means the functions are only visible in the current file. I used it because the struct kernel is also only defined in this file. void means the functions do not return anything.

The kernel_init() function is used to initialize a kernel before use. This is useful, because then you only need to set sensible initial values in one place. After the kernel is no longer needed anywhere, you can use the kernel_free() function to destroy it.

It is not necessary to destroy kernels before the program exits, as the operating system will release the resources anyway when the program exists. However, if you use many kernels, and the program is long-lived, it is a good idea to become accustomed to initializing and freeing resources as they are needed; this is often a practice that many find harder to learn later on.

Next, let's declare a kernel string parser:

Code:

#define PARSED_IMAGE    (1 << 0)
#define PARSED_GAUSSIAN (1 << 1)
#define PARSED_SIGMA    (1 << 2)
#define PARSED_ALL      (PARSED_IMAGE | PARSED_GAUSSIAN | PARSED_SIGMA)

static int parse_kernel(const char *s, struct kernel *const k);

The idea is that parse_kernel() takes a string (such as "image=image1.bmp; gaussian=4; sigma=1"), parses it into the structure pointed to by k, and returns which fields were parsed from the string. If an error occurs, I'd return 0 with errno set to indicate the reason.

To test the function, let's write a test main(), so we can test our hard-to-write function early and often:

Code:

int main(int argc, char *argv[])
{
    int arg, parsed;
    struct kernel k;

    for (arg = 1; arg < argc; arg++) {

        kernel_init(&k);

        parsed = parse_kernel(argv[arg], &k);
        if (!parsed) {
            fprintf(stderr, "%s: Could not parse kernel: %s.\n", argv[arg], strerror(errno));
            fflush(stderr);

        } else {
            printf("Kernel '%s':\n", argv[arg]);

            if (parsed & PARSED_IMAGE)
                printf("\tImage '%s'\n", k.image);
            else
                printf("\tImage was not supplied.\n");

            if (parsed & PARSED_GAUSSIAN)
                printf("\tGaussian %g\n", k.gaussian);
            else
                printf("\tGaussian was not specified.\n");

            if (parsed & PARSED_SIGMA)
                printf("\tSigma %g\n", k.sigma);
            else
                printf("\tSigma was not specified.\n");
        }

        kernel_free(&k);
    }

    return EXIT_SUCCESS;
}

There are several different methods on how the parse_kernel() function can handle its task: separating into tokens using strtok() and parsing each token using sscanf(), opportunistic parsing using sscanf() and %n to see how much of the string was consumed, or using strspn() and strcspn() for tokenization and strncmp() for matching names and sscanf() for converting values. Although the above link to the Linux man pages online, the referenced functions are portable (due to C89 listed in the "Conforming to" -sections!) and not at all Linux-specific.

Typically, coursework solutions use the tokenization approach. I don't like it myself, because it modifies the string in-place, and that's sometimes problematic. I like the opportunistic parsing, but it has a downside that it completely ignores whitespace, even between letters in the identifiers (image, gaussian, sigma); it'd accept both "image=foo.gif" and "i mage=foo.gif". Although the third option is probably the most complex, I'll show you that one; that way there is most opportunity for learning, and you're actually unlikely to just use it but write your own code instead.

Here, only the image field is a dynamically allocated string, but you often have several. To update those fields, I like to use a helper function. If the field has already been set, the helper function will free it before replacing it, so it won't leak any memory. (Consider what would otherwise happen with e.g. "image=a.png;image=b.png;...;image=z.png". This function takes a pointer to the string pointer, the source string pointer, and length (since most likely the source string is delimited and does not end where the string ends):

Code:

static int set_string(char **const dst, const char *const src, const size_t len)
{
    /* We must be given a valid pointer to a string pointer. */
    if (dst == NULL)
        return errno = EINVAL;

    /* If len is nonzero, we need a valid source string pointer, too. */
    if (len > 0 && src == NULL)
        return errno = EINVAL;

    /* Free the old destination string, if any. */
    if (*dst != NULL)
        free(*dst);

    /* Allocate memory for a new string. */
    *dst = malloc(len + 1);
    if (*dst == NULL)
        return errno = ENOMEM; /* Failed, out of memory. */

    /* Copy the source string, and terminate the string. */
    if (len > 0)
        memcpy(*dst, src, len);
    (*dst)[len] = '\0';

    /* Done! */
    return 0;
}

The above function returns 0 if successful, otherwise it returns nonzero with errno set to indicate the error (ENOMEM = not enough memory, or EINVAL = invalid parameters). But carefully note that it takes a pointer to the destination string pointer, because it needs to change it!

The parser function I'd use is as follows:

Code:

static int parse_kernel(const char *s, struct kernel *const k)
{
    int parsed = 0;
    double value;
    int len;

    if (s == NULL || k == NULL) {
        /* Fail due to invalid parameters. */
        errno = EINVAL;
        return 0;
    }

    while (1) {

        /* Skip separator characters. */
        s += strspn(s, "\t\n\v\f\r ;");

        /* End of string? */
        if (*s == '\0')
            break;

        if (!strncmp(s, "image=", 6)) {
            /* image=... */
            s += 6;

            /* find separator (or end of string, if no separator) */
            len = strcspn(s, ";");

            /* Set k->image to (a copy of) len chars starting at s. */
            if (set_string(&(k->image), s, len))
                return 0; /* Error occurred, errno already set. */            

            /* Mark 'image' field updated, and advance to next part. */
            parsed |= PARSED_IMAGE;
            s += len;
            continue;
        }

        if (!strncmp(s, "gaussian=", 9)) {
            /* gaussian=... */
            s += 9;

            /* Attempt to parse the value as a number. */
            len = -1;
            (void)sscanf(s, " %lf %n", &value, &len);
            if (len > 0) {

                /* Save value. */
                k->gaussian = value;

                /* Mark 'gaussian' field updated, and advance to next part. */
                parsed |= PARSED_GAUSSIAN;
                s += len;
                continue;
            }
        }

        if (!strncmp(s, "sigma=", 6)) {
            /* sigma=... */
            s += 6;

            /* Attempt to parse the value as a number. */
            len = -1;
            (void)sscanf(s, " %lf %n", &value, &len);
            if (len > 0) {

                /* Save value, and mark 'sigma' field updated. */
                k->sigma = value;

                /* Mark 'sigma' field updated, and advance to next part. */
                parsed |= PARSED_SIGMA;
                s += len;
                continue;
            }
        }

        /* Unknown field. For future compatibility, we skip
         * but also warn about such unknown fields. */

        /* First, find the length of the name, */
        len = strcspn(s, "=;");
        /* and write the warning to standard error. */
        if (len > 0) {
            fwrite(s, len, 1, stderr);
            fprintf(stderr, ": Ignoring unknown kernel identifier.\n");
            fflush(stderr);
        }

        /* Find the length of the ignored field, and skip to next part. */
        len = strcspn(s, ";");
        if (len > 0) {
            s += len;
            continue;
        }

        /* Parsing error. We parsed nothing, but can't skip anything either,
         * and we're not at the end of the string, either. */
        errno = EBADMSG;
        return 0;
    }

    /* We set errno to 0 (no error, OK) in case parsed == 0. */
    errno = 0;
    return parsed;
}

The logic is that in the infinite loop, we first skip any separators (semicolons) and whitespace characters that are allowed before a field name. If after that we are at the end of the string, the loop is done.

In the loop body, we use the strncmp(s, "IDENTIFIER=", 11) to check if s starts with the 11-character prefix string "IDENTIFIER=". If it does, the function returns zero. You can read if (!strncmp()) ... as "if strncmp() returns zero, then ..."; i.e. the if body is only executed if the string does start with the desired prefix.

If the prefix matches, the matching part is skipped with s += 11;. I could have written a helper function to hide these details, but I think it is useful in learning this stuff.

After the prefix has been found, we know we have the desired value starting at s. We can also use strcspn(s, ";") to count the number of characters till the next semicolon (not including that semicolon itself), or till the end of the string if there is no semicolons in the string. I use this to find out the image name length, supplied to set_string() helper above.

For parsing numbers, I like to use the idiom

Code:

    len = -1;
    (void)sscanf(string, "... %n", ... , &len);
    if (len >= 0) {
        /* Parsed successfully till string+len */
    }

It is a bit strange-looking pattern, because the C standards failed to clearly define whether a successful "%n" is counted in the result or not, and it depends on the C library used. However, all that matters is that if len becomes nonnegative, the complete pattern was parsed without issues, and you can just add it to the string pointer to skip over that part.

Finally, the latter half of the function body deals with unknown field names -- i.e. anything besides "image", "gaussian", or "sigma". In this case, I've made it easy to add new fields in the future. Older versions of the function will just ignore them, but warn, to standard error, so that the user knows something was ignored.

Questions?

**barracuda** · 01-16-2015

Thanks for your asnwer. I am working on my own solution now. I will check your solution later.

I have found this source. So I realized the mistake in the argument of function isAlpha...
isalpha.c - minilib-c - Mini C library, for small embedded MCU - Google Project Hosting

Also the comparison should not be argv[i][0]=="-" but argv[i][0]=='-' as the first compares integer with character. But I want to compare integer with integer.

Also I used the standard library function isalpha() and now I have missing header so I added:

Code:

#include <ctype.h> // isalpha standard library

Code:

isalpha( argv[i][1] )

...\codeblocks_32bit\mingw\bin\..\lib\gcc\mingw32\ 4.7.1\..\..\..\..\include\string.h|45|note: expected 'char *' but argument is of type 'int *'|

I know where is problem dont know to convert it to char*

Thread: Parsing of arguments and 2D array (array of pointers to *char)

Thread Tools

Search Thread

Display

Parsing of arguments and 2D array (array of pointers to *char)

Similar Threads

parsing char array to array of struct to process packets

Parsing and returning array pointers?

parsing char array

ANSI C parsing char array as pointer

Parsing char array to CString and int array