Posix Regex in C

This is a discussion on Posix Regex in C within the C Programming forums, part of the General Programming Boards category; Hi, Does anyone know a flag so that regexec can search a string and match a regex exactly once? I ...

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    12

    Posix Regex in C

    Hi, Does anyone know a flag so that regexec can search a string and match
    a regex exactly once?
    I have a string for example 123456 and a regular expression of
    [0-9]+ but this also matches 123456hello since a decimal is present!
    Thanks

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    Quote Originally Posted by march5th View Post
    Hi, Does anyone know a flag so that regexec can search a string and match
    a regex exactly once?
    I have a string for example 123456 and a regular expression of
    [0-9]+ but this also matches 123456hello since a decimal is present!
    Thanks
    I don't see no stinking decimal point.

    If you want your regex to match 123456 and not 123456hello, then put word boundaries around [0-9]+, as in \<[0-9]+\> (or maybe it's \b[0-9]+\b, I don't remember).
    Last edited by Dino; 12-26-2008 at 08:15 PM. Reason: typo
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  3. #3
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    Sorry I actually meant a integer but I am not sure how you would use this?
    If I have [+-]?[0-9]+ where do I put the \b \b? I checked it is \b thanks for that.

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    Put the \b on either side of your complete expression, like

    \b[-+]?[0-9]+\b

    (Note I changed the order of the [-+]. The range character, -, has to go first.
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  5. #5
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    I am now having problem that it doesnt match any of the integer values.
    Just to check is this right:
    regex expression:
    "\b[+-]?[0-9]+\b";

    Strings:
    100
    0
    49

  6. #6
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    Would this have anything to do because of posix regex are greedy?

  7. #7
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    Post your code.
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  8. #8
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,243
    Quote Originally Posted by march5th View Post
    Hi, Does anyone know a flag so that regexec can search a string and match
    a regex exactly once?
    I have a string for example 123456 and a regular expression of
    [0-9]+ but this also matches 123456hello since a decimal is present!
    Thanks
    Why not specify "the whole string and nothing but the string?"

    ^[0-9]+$
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    Sorry guys, my code is too large and scattered to post, I am sure it has to do with the regex,
    the ^[0-9]+$ didnt solve it either. I really am thinking its greedy and tries to match as much
    as it can, thats the difference between regex in C and the ones used in Java.
    The way which I worked around is not very nice but it works.
    I got the string lenght of the
    captured group and compared it with the actual string. If it matched then its fine if not then
    it didnt.
    You dont really know which part of the group matched unless you capture and compare.

  10. #10
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    It's been a while, but I'm pretty sure Java regex are greedy too. I can't see how ^[0-9]+$ could possibly match 123456hello, though.

    But the idea behind "post your code" is that you're supposed to make something small and complete that shows the problem. (Generally you don't end up posting it, because in the process you discover that function A actually performs steps x, y, and z, while you thought it performed steps x, z, and q.)

  11. #11
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    A simple example to show:
    Code:
    #include <stdio.h>
    #include <regex.h>
    
    int main(void) {
        const char *string1 = "123456";
        const char *string2 = "123456hello";
        const char *pattern1 = "[0-9]+";
        const char *pattern2 = "^[0-9]+$";
        regex_t p1, p2;
    
        if (regcomp(&p1, pattern1, REG_EXTENDED | REG_NOSUB)) {
            printf("Pattern 1 did not compile.\n");
            return(1);
        }
        if (regcomp(&p2, pattern2, REG_EXTENDED | REG_NOSUB)) {
            printf("Pattern 2 did not compile.\n");
            return(1);
        }
    
        if (regexec(&p1, string1, 0, NULL, 0)) {
            printf("String 1 did not match pattern 1.\n");
        } else {
            printf("String 1 did match pattern 1.\n");
        }
        if (regexec(&p1, string2, 0, NULL, 0)) {
            printf("String 2 did not match pattern 1.\n");
        } else {
            printf("String 2 did match pattern 1.\n");
        }
        if (regexec(&p2, string1, 0, NULL, 0)) {
            printf("String 1 did not match pattern 2.\n");
        } else {
            printf("String 1 did match pattern 2.\n");
        }
        if (regexec(&p2, string2, 0, NULL, 0)) {
            printf("String 2 did not match pattern 2.\n");
        } else {
            printf("String 2 did match pattern 2.\n");
        }
    
        regfree(&p1);
        regfree(&p2);
        return 0;
    }
    Output:
    Code:
    $ ./numbers
    String 1 did match pattern 1.
    String 2 did match pattern 1.
    String 1 did match pattern 2.
    String 2 did not match pattern 2.

  12. #12
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    Here's a mod to tabstop's pattern that is more general purpose for pattern #2:
    Code:
    const char *pattern2 = "(^| )[0-9]+( |$)";
    (Edit - for whatever reason, the \b word delimiters are not working at all. I even tried \\b, \< \\<, \y, \\y, and none work.)
    Last edited by Dino; 12-28-2008 at 08:14 AM.
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    If you want to use word delimiters, you should use the word delimiters:
    Code:
    const char *pattern2 = "[[:<:]][0-9]+[[:>:]]"
    Don't blame me, that's what my man regex on OS X says. (It also says that's an extension which may differ -- but I'm guessing that's what system Dino is on.)

  14. #14
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    Ah. My regex book (Mastering Regular Expressions) doesn't point that out (that I saw). It does say those are the word boundaries for MySQL. I was just reading the man pages and I saw a reference to [[:<:]], but I didn't put 2 & 2 together that they also applied to C. Duh.

    Thanks. Yes, I use OS X (XP too, but mostly OS X)

    (Edit - here's the code with tabstop's tip)
    Code:
    #include <stdio.h>
    #include <regex.h>
    
    int main(void) {
    	const char *string1 = "123456";
    	const char *string2 = "123456hello";
    	const char *pattern1 = "[0-9]+";
    //	const char *pattern2 = "(^| )[0-9]+( |$)";
    	const char *pattern2 = "[[:<:]][0-9]+[[:>:]]";
        regex_t p1, p2;
    //	regmatch_t  mymatch ; 
    	int rc ; 
    
    	rc = regcomp(&p1, pattern1, REG_EXTENDED | REG_NOSUB) ; 
        if (rc) {
            printf("Pattern 1 did not compile.\n");
            return(1);
        }
    	rc = regcomp(&p2, pattern2, REG_EXTENDED | REG_NOSUB) ; 
        if (rc) {
            printf("Pattern 2 did not compile.\n");
            return(1);
        }
    	rc = regexec(&p1, string1, 0, NULL, 0) ; 
        if (rc) {
            printf("String 1 did not match pattern 1, rc=%d.\n", rc);
        } else {
            printf("String 1 did match pattern 1, rc=%d.\n", rc);
        }
    
    	rc = regexec(&p1, string2, 0, NULL, 0) ; 
        if (rc) {
            printf("String 2 did not match pattern 1, rc=%d.\n", rc);
        } else {
            printf("String 2 did match pattern 1, rc=%d.\n", rc);
        }
    
    	rc = regexec(&p2, string1, 0, NULL, 0) ; 
        if (rc) {
            printf("String 1 did not match pattern 2, rc=%d.\n", rc);
        } else {
            printf("String 1 did match pattern 2, rc=%d.\n", rc);
        }
    
    	rc = regexec(&p2, string2, 0, NULL, 0) ; 
        if (rc) {
            printf("String 2 did not match pattern 2, rc=%d.\n", rc);
        } else {
            printf("String 2 did match pattern 2, rc=%d.\n", rc);
        }
    
        regfree(&p1);
        regfree(&p2);
        return 0;
    }
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. My own regex...............class?
    By misplaced in forum C++ Programming
    Replies: 5
    Last Post: 04-08-2005, 09:18 AM
  2. <regex.h> regex syntax in C
    By battersausage in forum C Programming
    Replies: 7
    Last Post: 03-24-2004, 12:35 PM
  3. posix regex
    By jnsk in forum Linux Programming
    Replies: 2
    Last Post: 03-12-2004, 01:37 PM
  4. POSIX on windows anyone?
    By Lynux-Penguin in forum Linux Programming
    Replies: 1
    Last Post: 08-27-2003, 12:56 AM
  5. How is regex used?
    By Strider in forum C++ Programming
    Replies: 0
    Last Post: 12-14-2001, 07:15 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21