Thread: Posix Regex in C

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    12

    Posix Regex in C

    Hi, Does anyone know a flag so that regexec can search a string and match
    a regex exactly once?
    I have a string for example 123456 and a regular expression of
    [0-9]+ but this also matches 123456hello since a decimal is present!
    Thanks

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Quote Originally Posted by march5th View Post
    Hi, Does anyone know a flag so that regexec can search a string and match
    a regex exactly once?
    I have a string for example 123456 and a regular expression of
    [0-9]+ but this also matches 123456hello since a decimal is present!
    Thanks
    I don't see no stinking decimal point.

    If you want your regex to match 123456 and not 123456hello, then put word boundaries around [0-9]+, as in \<[0-9]+\> (or maybe it's \b[0-9]+\b, I don't remember).
    Last edited by Dino; 12-26-2008 at 09:15 PM. Reason: typo
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    Sorry I actually meant a integer but I am not sure how you would use this?
    If I have [+-]?[0-9]+ where do I put the \b \b? I checked it is \b thanks for that.

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Put the \b on either side of your complete expression, like

    \b[-+]?[0-9]+\b

    (Note I changed the order of the [-+]. The range character, -, has to go first.
    Mainframe assembler programmer by trade. C coder when I can.

  5. #5
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    I am now having problem that it doesnt match any of the integer values.
    Just to check is this right:
    regex expression:
    "\b[+-]?[0-9]+\b";

    Strings:
    100
    0
    49

  6. #6
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    Would this have anything to do because of posix regex are greedy?

  7. #7
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Post your code.
    Mainframe assembler programmer by trade. C coder when I can.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by march5th View Post
    Hi, Does anyone know a flag so that regexec can search a string and match
    a regex exactly once?
    I have a string for example 123456 and a regular expression of
    [0-9]+ but this also matches 123456hello since a decimal is present!
    Thanks
    Why not specify "the whole string and nothing but the string?"

    ^[0-9]+$
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    Registered User
    Join Date
    Dec 2008
    Posts
    12
    Sorry guys, my code is too large and scattered to post, I am sure it has to do with the regex,
    the ^[0-9]+$ didnt solve it either. I really am thinking its greedy and tries to match as much
    as it can, thats the difference between regex in C and the ones used in Java.
    The way which I worked around is not very nice but it works.
    I got the string lenght of the
    captured group and compared it with the actual string. If it matched then its fine if not then
    it didnt.
    You dont really know which part of the group matched unless you capture and compare.

  10. #10
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    It's been a while, but I'm pretty sure Java regex are greedy too. I can't see how ^[0-9]+$ could possibly match 123456hello, though.

    But the idea behind "post your code" is that you're supposed to make something small and complete that shows the problem. (Generally you don't end up posting it, because in the process you discover that function A actually performs steps x, y, and z, while you thought it performed steps x, z, and q.)

  11. #11
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    A simple example to show:
    Code:
    #include <stdio.h>
    #include <regex.h>
    
    int main(void) {
        const char *string1 = "123456";
        const char *string2 = "123456hello";
        const char *pattern1 = "[0-9]+";
        const char *pattern2 = "^[0-9]+$";
        regex_t p1, p2;
    
        if (regcomp(&p1, pattern1, REG_EXTENDED | REG_NOSUB)) {
            printf("Pattern 1 did not compile.\n");
            return(1);
        }
        if (regcomp(&p2, pattern2, REG_EXTENDED | REG_NOSUB)) {
            printf("Pattern 2 did not compile.\n");
            return(1);
        }
    
        if (regexec(&p1, string1, 0, NULL, 0)) {
            printf("String 1 did not match pattern 1.\n");
        } else {
            printf("String 1 did match pattern 1.\n");
        }
        if (regexec(&p1, string2, 0, NULL, 0)) {
            printf("String 2 did not match pattern 1.\n");
        } else {
            printf("String 2 did match pattern 1.\n");
        }
        if (regexec(&p2, string1, 0, NULL, 0)) {
            printf("String 1 did not match pattern 2.\n");
        } else {
            printf("String 1 did match pattern 2.\n");
        }
        if (regexec(&p2, string2, 0, NULL, 0)) {
            printf("String 2 did not match pattern 2.\n");
        } else {
            printf("String 2 did match pattern 2.\n");
        }
    
        regfree(&p1);
        regfree(&p2);
        return 0;
    }
    Output:
    Code:
    $ ./numbers
    String 1 did match pattern 1.
    String 2 did match pattern 1.
    String 1 did match pattern 2.
    String 2 did not match pattern 2.

  12. #12
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Here's a mod to tabstop's pattern that is more general purpose for pattern #2:
    Code:
    const char *pattern2 = "(^| )[0-9]+( |$)";
    (Edit - for whatever reason, the \b word delimiters are not working at all. I even tried \\b, \< \\<, \y, \\y, and none work.)
    Last edited by Dino; 12-28-2008 at 09:14 AM.
    Mainframe assembler programmer by trade. C coder when I can.

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    If you want to use word delimiters, you should use the word delimiters:
    Code:
    const char *pattern2 = "[[:<:]][0-9]+[[:>:]]"
    Don't blame me, that's what my man regex on OS X says. (It also says that's an extension which may differ -- but I'm guessing that's what system Dino is on.)

  14. #14
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Ah. My regex book (Mastering Regular Expressions) doesn't point that out (that I saw). It does say those are the word boundaries for MySQL. I was just reading the man pages and I saw a reference to [[:<:]], but I didn't put 2 & 2 together that they also applied to C. Duh.

    Thanks. Yes, I use OS X (XP too, but mostly OS X)

    (Edit - here's the code with tabstop's tip)
    Code:
    #include <stdio.h>
    #include <regex.h>
    
    int main(void) {
    	const char *string1 = "123456";
    	const char *string2 = "123456hello";
    	const char *pattern1 = "[0-9]+";
    //	const char *pattern2 = "(^| )[0-9]+( |$)";
    	const char *pattern2 = "[[:<:]][0-9]+[[:>:]]";
        regex_t p1, p2;
    //	regmatch_t  mymatch ; 
    	int rc ; 
    
    	rc = regcomp(&p1, pattern1, REG_EXTENDED | REG_NOSUB) ; 
        if (rc) {
            printf("Pattern 1 did not compile.\n");
            return(1);
        }
    	rc = regcomp(&p2, pattern2, REG_EXTENDED | REG_NOSUB) ; 
        if (rc) {
            printf("Pattern 2 did not compile.\n");
            return(1);
        }
    	rc = regexec(&p1, string1, 0, NULL, 0) ; 
        if (rc) {
            printf("String 1 did not match pattern 1, rc=%d.\n", rc);
        } else {
            printf("String 1 did match pattern 1, rc=%d.\n", rc);
        }
    
    	rc = regexec(&p1, string2, 0, NULL, 0) ; 
        if (rc) {
            printf("String 2 did not match pattern 1, rc=%d.\n", rc);
        } else {
            printf("String 2 did match pattern 1, rc=%d.\n", rc);
        }
    
    	rc = regexec(&p2, string1, 0, NULL, 0) ; 
        if (rc) {
            printf("String 1 did not match pattern 2, rc=%d.\n", rc);
        } else {
            printf("String 1 did match pattern 2, rc=%d.\n", rc);
        }
    
    	rc = regexec(&p2, string2, 0, NULL, 0) ; 
        if (rc) {
            printf("String 2 did not match pattern 2, rc=%d.\n", rc);
        } else {
            printf("String 2 did match pattern 2, rc=%d.\n", rc);
        }
    
        regfree(&p1);
        regfree(&p2);
        return 0;
    }
    Mainframe assembler programmer by trade. C coder when I can.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. My own regex...............class?
    By misplaced in forum C++ Programming
    Replies: 5
    Last Post: 04-08-2005, 09:18 AM
  2. <regex.h> regex syntax in C
    By battersausage in forum C Programming
    Replies: 7
    Last Post: 03-24-2004, 01:35 PM
  3. posix regex
    By jnsk in forum Linux Programming
    Replies: 2
    Last Post: 03-12-2004, 02:37 PM
  4. POSIX on windows anyone?
    By Lynux-Penguin in forum Linux Programming
    Replies: 1
    Last Post: 08-27-2003, 12:56 AM
  5. How is regex used?
    By Strider in forum C++ Programming
    Replies: 0
    Last Post: 12-14-2001, 08:15 AM