Thread: Using 'regex.h'

  1. #1
    Registered User
    Join Date
    Nov 2009
    Posts
    11

    Using 'regex.h'

    I'm trying to do some input parsing using <regex.h> , but I can't get it work quite the way I want.

    In this case I'm trying to match a line that starts with any amount of whitespace, then a '#'. Right now I'm trying to use:


    Code:
    int isComment(char *line)
    {
      regex_t* temp;
      temp = malloc(sizeof(regex_t));
      regcomp(temp, "[:blanc:]+#", 0);
      
      if (regexec(temp, line, 0, NULL , 0) == 0)
      {
        printf("%s", line);
        return 0;
      }
      else
        return 1;
     }
    But this isn't working, any ideas why? Am I using :space: correctly?

    Edit: I also want to do this to other lines, including some where I need to replace the original string with the new pattern, but can't figure out how to do this.
    Last edited by TheDenominater; 11-09-2009 at 11:40 PM. Reason: Forgot Stuff

  2. #2
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    There are a few problems:
    1. It's spelled “blank”
    2. Your [:blank:] needs to be inside of brackets, so: [[:blank:]]
    3. + matches a + in basic regular expressions. Either use \+ (which means you need \\+ in a string literal to escape the \), or tell regcomp to use extended regular expressions with the REG_EXTENDED flag

    In short, use either of the following:
    Code:
    regcomp(temp, "[[:blank:]]\\+#", 0);
    regcomp(temp, "[[:blank:]]+#", REG_EXTENDED);
    I've also got a couple of suggestions for using regex(3). You want to regfree() your regex_t or else you'll leak memory. You also don't need to make it a pointer and allocate space (unless you need it to live beyond the lifetime of the function, which you don't for this example). It just makes extra work for you to do it that way. Instead:
    Code:
    regex_t temp;
    regcomp(&temp, "whatever", 0);
    ...
    regfree(&temp);

  3. #3
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by TheDenominater View Post
    I'm trying to do some input parsing using <regex.h> , but I can't get it work quite the way I want.

    In this case I'm trying to match a line that starts with any amount of whitespace, then a '#'.
    Does "any amount of whitespace" mean zero or more whitespace characters? If so the "+" should really be a "*", as in
    Code:
    pattern == "^[[:blank:]]*#";
    The code segment below is executed inside of isComment() every time a line of input is read.
    Code:
      regex_t* temp;
      temp = malloc(sizeof(regex_t));
      regcomp(temp, "[:blanc:]+#", 0);
    Instead compile the regexp once and execute many times by moving the code into the function that calls isComment(), as in
    Code:
      regex_t *temp = (regex_t *) malloc(sizeof(regex_t));
      regcomp(temp, "^[[:blank:]]*#", 0);
      isComment(temp, line);

  4. #4
    Registered User
    Join Date
    Nov 2009
    Posts
    11
    Thanks for your help.

    So what I've got now is:

    Code:
    int isComment(char* line)
    {
      regex_t temp;
      regcomp(&temp, "[:blanc:]*#",REG_EXTENDED);
      if ( regexec(&temp, line, 0, NULL , 0) == 0)
      {
        regfree(&temp);
        printf("%s", line);
        return 0;
      }
      else
        {
        regfree(&temp);
        return 1;
        }
     }
    (I use the function in so many different places I'm not sure I should take the declaration out of it, it would get really messy..)

    However, the second regfree() causes a segfault. I'm pretty sure it needs to be there. Is that correct? And I also can't get the '^' to work.

  5. #5
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Did you not read the post by cas. He pointed out the typo ie "blank" NOT "blanc".
    Don't regfree() temp as it's a local variable and will be freed automatically by the stack popping.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by itCbitC View Post
    Don't regfree() temp as it's a local variable and will be freed automatically by the stack popping.
    I would not count on that as it is used by regcomp, and the regex.h docs I have seen recommend freeing it in all cases.

    However, I don't use regex.h that much, maybe someone knows the sure answer to this one -- I guess a valgrind test or something will tell you.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    You still have multiple errors that I've pointed out to you.

    At the very least, check the return value of regcomp(). You're proceeding whether or not it returns error, and since your regular expression is invalid, it will be returning an error.

  8. #8
    Registered User
    Join Date
    Nov 2009
    Posts
    11
    Oh, not sure how I missed that. I still can't get the anchor '^' to work. I'm using:

    Code:
    regcomp(&temp, "^[:blank:]*#",REG_EXTENDED);
    When I include '^' it misses every line for some reason.

  9. #9
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    Quote Originally Posted by MK27 View Post
    I would not count on that as it is used by regcomp, and the regex.h docs I have seen recommend freeing it in all cases.

    However, I don't use regex.h that much, maybe someone knows the sure answer to this one -- I guess a valgrind test or something will tell you.
    You're correct. regcomp() will allocate memory, and that memory needs to be freed regardless of where the regex_t was allocated. It's just like storing the return value of malloc() into a local pointer. The pointer itself is destroyed automatically, but what it points to needs to be manually freed.

  10. #10
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by TheDenominater View Post
    Oh, not sure how I missed that. I still can't get the anchor '^' to work. I'm using:

    Code:
    regcomp(&temp, "^[:blank:]*#",REG_EXTENDED);
    When I include '^' it misses every line for some reason.
    It's not the anchor but as cas pointed out it's your regexp syntax that is totally wrong.
    Read "[[:blank:]]" NOT "[:blank:]". Do you see the difference??

  11. #11
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by itCbitC View Post
    Instead compile the regexp once and execute many times by moving the code into the function that calls isComment(), as in
    Code:
      regex_t *temp = (regex_t *) malloc(sizeof(regex_t));
      regcomp(temp, "^[[:blank:]]*#", 0);
      isComment(temp, line);
    That's a good advice, it seems like an unecessary amount of work to do every time the function is called, especially if it's called from with in a loop.

  12. #12
    Registered User
    Join Date
    Nov 2009
    Posts
    11
    Ok, I re-read all the posts to make sure I didn't miss anything.

    Code:
    int isComment(char* line)
    {
      regex_t temp;
      
      if(regcomp(&temp, "^[[:blank:]]*#",REG_EXTENDED) != 0)
      {
        printf("Incorrect regex\n");
        exit(EXIT_FAILURE);
      }
    
      if ( regexec(&temp, line, 0, NULL , 0) == 0)
      {
        regfree(&temp);
        printf("%s", line);
        return 0;
      }
      else
      {
        regfree(&temp);
        return 1;
      }
     }
    The only problem is that regfree(&temp) in the else causes big problems. Am I not doing that correctly?

  13. #13
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by MK27 View Post
    I would not count on that as it is used by regcomp, and the regex.h docs I have seen recommend freeing it in all cases.

    However, I don't use regex.h that much, maybe someone knows the sure answer to this one -- I guess a valgrind test or something will tell you.
    Whoops! I was thinking stack allocation but you're correct regfree() complements regcomp().

  14. #14
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Perhaps then have only one regfree() placed right after the call to regexec() as in
    Code:
    int isComment(char* line)
    {
        int rc;
        regex_t temp;
      
        if(regcomp(&temp, "^[[:blank:]]*#",REG_EXTENDED) != 0)
        {
            printf("Incorrect regex\n");
            exit(EXIT_FAILURE);
        }
    
        rc = regexec(&temp, line, 0, NULL , 0);
        regfree(&temp);
      
        if (rc == 0) {
            printf("%s", line);
            return 0;
        }
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. regex.h - extracting matches
    By jmelai in forum C Programming
    Replies: 5
    Last Post: 07-11-2009, 12:56 PM
  2. Regular Expressions (regex.h) small problem
    By _Marcel_ in forum C Programming
    Replies: 0
    Last Post: 03-31-2009, 05:13 AM
  3. regex.h
    By pktcperlc++java in forum C++ Programming
    Replies: 4
    Last Post: 01-15-2005, 09:08 PM