Thread: Problem with regex expression

  1. #1
    Registered User
    Join Date
    Jan 2009
    Posts
    3

    Problem with regex expression

    Hi all,

    I have a problem with a regular expression in c++. It should be very simple to do, but I don't find any helpful information on the web regarding the usage of the regex.h library. I don't understand why my pattern should be wrong. Maybe someone can explain me where I'm wrong.

    I have to use the regex.h library becausing I am using it on an embedded system where I don't want to use some other library like boost.
    Code:
    #include "stdio.h"
    #include "stdlib.h"
    #include "regex.h"
    #include <string>
    
    using std::string;
    
    int main() {
        string str = "ab:";
        // this doesn't work
        string pattern = "[0-9|a-f][0-9|a-f]:";
        // this doesn't work either
        //string pattern = "[0-9|a-f]{2}:";
        // this would work - but it's not the correct pattern
        //string pattern = "[0-9|a-f]*[0-9|a-f]*:";
        regex_t pattern_t;
    
        // I have also used this
        //     int ret = regcomp(&pattern_t, pattern.c_str(), 0);
        int ret = regcomp(&pattern_t, pattern.c_str(), REG_EXTENDED|REG_NOSUB);
        printf ("ret: %d\n", ret);
        ret = regexec(&pattern_t, pattern.c_str(), (size_t) 0, NULL, 0);
        printf ("ret: %d\n", ret);
    
        return 0;
    
        // correct output
        // ret: 0
        // ret: 0
    }
    I would be greateful for any help or hint.

    pippo

  2. #2
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Code:
    [0-9|a-f][0-9|a-f]:
    I'm guessing you perhaps meant
    Code:
    [0-9a-f][0-9a-f]:
    Any characters inside square brackets are considered options for matching. You only use '|' when you're specifying alternative strings, for example,
    Code:
    cat|dog
    begin|Begin|BEGIN
    I don't know for sure, but I'd guess that regex interpreted your expression as either the string "[0-9" or the string "a-f]", which is clearly not what you wanted.

    Finally, in case a colon is a special character (but I don't think it is), you could try escaping it with a backslash in your expression.

    BTW, the syntax for regex.h is similar to grep's syntax or Perl's syntax (although perl adds quite a bit of stuff that regex.h doesn't support), so if you're stuck for syntax, you can read tutorials for those.
    Last edited by dwks; 01-16-2009 at 02:01 PM.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    3
    Ok, I did have definitely a very simple programming error. This now is the correct version of the code. Maybe it's helpful for someone else which gets stuck into regular expressions using regex.h
    Code:
    #include "stdio.h"
    #include "stdlib.h"
    #include "regex.h"
    #include <string>
    
    using std::string;
    
    int main() {
        string str = "Ae:3B:22:77:AC:22";
        string pattern = "[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}";
        regex_t pattern_t;
        regmatch_t pmatch_t;
    
        int ret = regcomp(&pattern_t, pattern.c_str(), REG_ICASE|REG_EXTENDED);
        printf ("ret: %d\n", ret);
        ret = regexec(&pattern_t, str.c_str(), 1, &pmatch_t, 0);
        printf("ret: %d\n", ret);
    
        if (ret == 0 && pmatch_t.rm_so == 0 && str.length() == 17)
        {
            printf("Correct MAC address!\n");
        }
        else
        {
            printf("Wrong MAC address!\n");
        }
    
        regfree(&pattern_t);
    
        return 0;
    }
    I want to check if a given string represents a MAC address. That's it. Thx anyway for your help.

    As far as I got known and tested regex.h doesn't support nested regular expressions. First I wanted to write
    Code:
    string pattern = "[[[:xdigit:]]{2}:]{5}[[:xdigit:]]{2}";
    but that doesn't provide the correct results. That's why I have to use the much longer version
    Code:
    string pattern = "[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}:[[:xdigit:]]{2}";
    which is definitely not so smart but at least works

    pippo

  4. #4
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Code:
    [[[:xdigit:]]{2}:]{5}[[:xdigit:]]{2}
    shouldn't that be (using parens instead of brackets) :

    Code:
    ([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    By the way, the _t suffix on many names means "type", so it's not a good suffix to give your variable names.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    You say that like it is as a good idea to use a '_t' suffix on type names.

    Soma

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    It's what the C standard and POSIX do. So it might be a dangerous thing to do, since you run a risk of conflicting with future extensions of those standards, but I'm not sure about the rules there.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Registered User
    Join Date
    Jan 2009
    Posts
    3
    The version with parenthesis works like a shame. Perfect!
    Code:
    ([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}
    In relation to the coding standard it seems obviously better not to use the '_t' suffix on variable names. I will respect that in the future.

    Thx
    pippo

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Laptop Problem
    By Boomba in forum Tech Board
    Replies: 1
    Last Post: 03-07-2006, 06:24 PM
  2. Replies: 5
    Last Post: 11-07-2005, 11:34 PM
  3. searching problem
    By DaMenge in forum C Programming
    Replies: 9
    Last Post: 09-12-2005, 01:04 AM
  4. half ADT (nested struct) problem...
    By CyC|OpS in forum C Programming
    Replies: 1
    Last Post: 10-26-2002, 08:37 AM
  5. binary tree problem - help needed
    By sanju in forum C Programming
    Replies: 4
    Last Post: 10-16-2002, 05:18 AM