Thread: Please critique and suggest improvements for parser

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    35

    Please critique and suggest improvements for parser

    Overview: parse_field acts like fgets, except while fgets returns lines, parse_field returns whitespace delimited fields. However, unlike fgets, which simply returns the next line, parse_field also replaces all instances of '&' with char *user.

    Before I extend the attached code any further, I would like more experienced programmers to critique any flaws, particularly those regarding design and harmful programming practices. Seeing as this code serves as the foundation for my soon-to-be larger program, I want to make sure that it's stable (and hopefully bulletproof).

    However, char *field has been posing quite a bit of a problem (highlighted in red). If you notice, field is defined in parse_field, but since I return field, I have no chance to free it. Therefore, I made field global (which, if I'm not mistaken, is not the best programming practice) and free it in parse_line.

    Note: Although state is not being used yet, it will be later.

    Thanks in advance for any helpful criticism.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    char *field;
    const char *save, *user = "username";
    int state = 0;
    
    char *parse_line(const char *line);
    char *parse_field(const char *line);
    
    int main(void) {
        char *line = "The user is & \t and  his home is /home/&";
        parse_line(line);
        return 0;
    }
    
    char *parse_line(const char *line) {
        while (parse_field(line)) {
            printf("Field: %s\n", field);
            free(field);
        }
    }
    
    char *parse_field(const char *line) {
        const char *ampersand, *end, *start, *traverse;
        int ampersand_count = 0;
        size_t len, userlen = strlen(user);
    
        if (save && *save == '\0') {
            save = NULL;
            return NULL;
        }
        if (!save) save = line;
        while(isspace(*save)) ++save;
        end = save;
        start = save;
        while (*end != '\0' && !isspace(*end)) ++end;
        save = end;
        len = end - start;
        field = malloc(len + 1);
        strncpy(field, start, len);
        field[len] = 0;
    
        traverse = start;
        for (; traverse != end; ++traverse) {
            if (*traverse == '&') ++ampersand_count;
        }
        if (ampersand_count > 0) {
            free(field);
            field = malloc(len + ampersand_count * (userlen - 1) + 1);
            if (field == NULL) {
                return NULL;
            }
            strncpy(field, start, len);
            field[len] = 0;
            while ((ampersand = strchr(field, '&'))) {
                memmove(ampersand + userlen, ampersand + 1, strlen(ampersand + 1));
                memcpy(ampersand, user, userlen);
            }
        }
    
        ++state;
        return field;
    }

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,666
    Well if you assigned the return result rather than just testing it, you wouldn't need an icky global.

    Code:
    void parse_line(const char *line) {
        char *temp;
        while ( (temp=parse_field(line)) != NULL ) {
            printf("Field: %s\n", temp);
            free(temp);
        }
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2006
    Posts
    35
    Quote Originally Posted by Salem
    Well if you assigned the return result rather than just testing it, you wouldn't need an icky global.

    Code:
    void parse_line(const char *line) {
        char *temp;
        while ( (temp=parse_field(line)) != NULL ) {
            printf("Field: %s\n", temp);
            free(temp);
        }
    }
    Thanks, Salem. Any suggestions and/or criticism on the rest of the code?

  4. #4
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Would you want to generalize the functions so that other delimiters and/or replacement text might be used?

    Might you want to use passed parameters of destination and size rather than doing dynamic allocation?
    Code:
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    
    int  parse_field(char *dst, size_t size, const char **src, char delim, const char *repl);
    /* implementation elsewhere */
    
    void parse_engine(const char *line, char delim, const char *user)
    {
        char output[16];
        while ( parse_field(output, sizeof output, &line, delim, user) )
        {
            printf("output = \"%s\"\n", output);
        }
    }
    
    int main(void)
    {
        parse_engine("The user is & \t and  &&&'s home is /home/&",
                     '&', "username");
        return 0;
    }
    
    /* my output
    output = "The"
    output = "user"
    output = "is"
    output = "username"
    output = "and"
    output = "username's"
    output = "home"
    output = "is"
    output = "/home/username"
    */
    That is, leave the size to the caller or the engine (as in this case)?
    Last edited by Dave_Sinkula; 09-18-2006 at 07:08 PM. Reason: Oops. Accidentally removed the colored 16.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    35
    Quote Originally Posted by Dave_Sinkula
    That is, leave the size to the caller or the engine (as in this case)?
    Could you provide that link, so that I can understand how it would actually work?

  6. #6
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by Queue
    Could you provide that link, so that I can understand how it would actually work?
    Did you mean this (that I accidentally munged in an edit)?
    Code:
    char output[16];
    Or my parse_field implementation?
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

Popular pages Recent additions subscribe to a feed