Thread: Problem using regcomp() regexec()

  1. #1
    Registered User
    Join Date
    Dec 2010
    Posts
    8

    Problem using regcomp() regexec()

    Hi all

    I have the following problem. I am using PERL language to read text file, and in this file I use pattern matching to search for specific line formatting. I need this because I am analyzing big code output.
    For instance if I look for line such as
    44.333e-02 44.221e-03 22.332e-04

    I can define in PERL


    Code:
    if($line=~/(\d{2}\.\d{3}e-\d{2}) +(\d{2}\.\d{3}e-\d{2}) +(\d{2}\.\d{3}e\d{2})/){}

    And the variables that are matched shall be stored in variables $1 $2 $3. This is a standard trick in PERL using pattern matching and brackets.

    The problem is that I need to do the same using C or C++. For instance in C I can use



    Code:
    #include <regex.h>
    
    
    regex_t r;
     char *str = "777777777.77777e333 string one string two";
     regmatch_t matches[4];
     regcomp(&r, "([0-9]{7,9}\\.[0-9]{5}e[0-9]{2,3})", REG_EXTENDED);
     regexec(&r, str, 4, matches, 0);
     printf ("Found at %d",matches[1]);

    This will match 777777777.77777e333 but how can I print out what is matched, I could have very well matched 999999999.88888e333?

    My second question is can I use something similar as the PERL example given above to match more than one pattern, and then print out the matches.?

    Thank you in advance

  2. #2
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    REG_NOSUB
    Support for substring addressing of matches is not required.
    The nmatch and pmatch arguments to regexec() are ignored if the
    pattern buffer supplied was compiled with this flag set.
    Red the doc?

    Code:
           int regexec(const regex_t *preg, const char *string, size_t nmatch,
                       regmatch_t pmatch[], int eflags);
    
           The  regmatch_t  structure  which  is  the type of pmatch is defined in
           <regex.h>.
    
               typedef struct {
                   regoff_t rm_so;
                   regoff_t rm_eo;
               } regmatch_t;
    Each rm_so element that is not -1 indicates the start offset of the
    next largest substring match within the string. The relative rm_eo
    element indicates the end offset of the match, which is the offset of
    the first character after the matching text.
    Edit: You know the offset(start,end) of the match, you can copy to another temporary using memcpy/strncpy,etc... or you could just also use printf()..
    Last edited by Bayint Naung; 12-18-2010 at 11:56 AM.

  3. #3
    Registered User
    Join Date
    Dec 2010
    Posts
    8

    Results :

    First of all I would like to thank you for your quick response. I modified the code in the following way:
    Code:
     regex_t r;
     char *str = "777777777.77777e333 333.333 string one string two";
     regmatch_t matches[3];
     regcomp(&r, "([0-9]{7,9}\\.[0-9]{5}e[0-9]{2,3}) ([0-9]{3}\\.[0-9]{3})", REG_EXTENDED);
    regexec(&r, str, 3, matches, 0);	
    printf ("Found at %d %d \n",matches[0].rm_so,matches[0].rm_eo);
    printf ("Found at %d %d \n",matches[1].rm_so,matches[1].rm_eo);
    printf ("Found at %d %d \n",matches[2].rm_so,matches[2].rm_eo);
    Which gives the desired result, the offsets of the string which I am looking for :
    Code:
    Found at 0 27 
    Found at 0 19 
    Found at 20 27
    I guess that in matches[0].rm_so,matches[0].rm_eo the offsets of the whole
    Code:
    ([0-9]{7,9}\\.[0-9]{5}e[0-9]{2,3}) ([0-9]{3}\\.[0-9]{3})
    I just need to figure out how to print the string, if I use
    Code:
      mat=strndup(str+matches[1].rm_so,matches[1].rm_eo);
    	
    	printf(mat);
    I get Segmentation fault.
    Or some way in order to initialize the results from matching to variables. I would be grateful if you can help me with this. I would be also very grateful if you can recommend some book or online resource where I can read more about using regular expressions with regex.h .
    Last edited by wronski11; 12-18-2010 at 02:31 PM.

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    So do you know how printf works? Maybe you should read that manual too.

  5. #5
    Registered User
    Join Date
    Dec 2010
    Posts
    8

    re:

    As far as I know since mat is a pointer and strndup() shall return a pointer to a newly allocated block of memory I do not know why it does not work. I presume that you have red the manual, maybe you can give some answer.

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Is that the code I was looking at? It looks a lot better than the code I (thought I) saw earlier. If that is the code I saw earlier, then I must have been hallucinating. Sorry.

    Using your variable as the format string is a bad idea, since if it has any characters that are special to printf, then you're in trouble.
    Code:
    printf("%s", mat);
    is always a much better idea.

    As to the segfault, strndup can return NULL if something bad happens. If printf is looking for another argument that you didn't give, a segfault can happen.

    EDIT: Also, you should run gdb with your code to see what the values of various things are when the segfault happens.
    Last edited by tabstop; 12-18-2010 at 03:17 PM.

  7. #7
    Registered User
    Join Date
    Dec 2010
    Posts
    8

    re:

    I managed to resolve the segmentation fault, it was coming from other part of the code i am using

    In my opinion it is not necessary to use %s and so on, since mat is a pointer
    Code:
     printf(“%s”,mat);
    
    just 
    
    Printf(mat);
    will do perfect fine, my problem is that I am trying to take out the substring which is a string, and use the float value it contains. I tried atof() but did not get much success. Since I know where the offsets of the substrings are I would like to assign variable names to the floats I am scanning for. This is my problem maybe I expressed myself not clear enough.

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    The issue is not that mat is a pointer.
    Code:
    #include <stdio.h>
    int main(void) {
        char *mat = "%d %d %d %s %d %s\n";
        printf(mat);
        return 0;
    }
    If you're lucky, your machine will still work after running that code, but I'm not going to guarantee it. (Actually it shouldn't do any lasting harm since after all it's reading and not writing.)

    strtof is the string-to-float function. Make sure you pass mat to it (for whichever value you want). The other issue you'll have is that "777777777.77777e333" is a ludicrous value for a float -- i.e., you'll probably end up with Inf instead after the float overflow happens.

  9. #9
    Registered User
    Join Date
    Dec 2010
    Posts
    8
    Ok I agree, when I run atof() I got inf because the value 777777777.77777e333 is idiotic I changed it to 777777777.77777e01 and works fine. I defined
    Code:
      value=atof(mat); 
    Printf(“%f”,value);
    And I got ok results, thank you for the help. I am going to implement the new solutions in my code.
    I would be grateful if someone can give me idea where I can read more about regex in C.

  10. #10
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by wronski11 View Post
    I would be grateful if someone can give me idea where I can read more about regex in C.
    Regular Expressions - The GNU C Library

    The GNU libc manual is something you should have bookmarked anyway (IMO).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need help understanding a problem
    By dnguyen1022 in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2009, 04:21 PM
  2. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  3. Someone having same problem with Code Block?
    By ofayto in forum C++ Programming
    Replies: 1
    Last Post: 07-12-2007, 08:38 AM
  4. A question related to strcmp
    By meili100 in forum C++ Programming
    Replies: 6
    Last Post: 07-07-2007, 02:51 PM
  5. WS_POPUP, continuation of old problem
    By blurrymadness in forum Windows Programming
    Replies: 1
    Last Post: 04-20-2007, 06:54 PM