regex.h - extracting matches

This is a discussion on regex.h - extracting matches within the C Programming forums, part of the General Programming Boards category; Can someone please teach me how to extract a regex match and print it out? For example, say my string ...

  1. #1
    Registered User
    Join Date
    Jul 2009
    Posts
    1

    regex.h - extracting matches

    Can someone please teach me how to extract a regex match and print it out? For example, say my string is "abcdef" and I've setup a regex for "abc" (as below). It will match, but how do I extract and print the matched text?

    char string1[] = "abcdef";
    int rc;
    regex_t * myregex = calloc(1, sizeof(regex_t));
    rc = regcomp(myregex, "(abc)", REG_EXTENDED);

    Thanks.

  2. #2
    Making mistakes
    Join Date
    Dec 2008
    Posts
    476
    Use PCRE (Perl-compatible regular expressions). It's a very nice library once you've understood the fundamentals.

  3. #3
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Quote Originally Posted by jmelai View Post
    Can someone please teach me how to extract a regex match and print it out? For example, say my string is "abcdef" and I've setup a regex for "abc" (as below). It will match, but how do I extract and print the matched text?

    char string1[] = "abcdef";
    int rc;
    regex_t * myregex = calloc(1, sizeof(regex_t));
    rc = regcomp(myregex, "(abc)", REG_EXTENDED);

    Thanks.
    So, have if you've read the manual far enough to get to regcomp, then surely you noticed that the next function is regexec?

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Strangely I haven't used regexp's much in C but I have kept this on file in case I have to do it again, if you want a working example:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <regex.h>
    
    char *regexp (char *string, char *patrn, int *begin, int *end) {     
            int i, w=0, len;                  
            char *word = NULL;
            regex_t rgT;
            regmatch_t match;
            regcomp(&rgT,patrn,REG_EXTENDED);
            if ((regexec(&rgT,string,1,&match,0)) == 0) {
                    *begin = (int)match.rm_so;
                    *end = (int)match.rm_eo;
                    len = *end-*begin;
                    word=malloc(len+1);
                    for (i=*begin; i<*end; i++) {
                            word[w] = string[i];
                            w++; }
                    word[w]=0;
            }
            regfree(&rgT);
            return word;
    }
    
    
    int main() {
    	int b,e;
    	char *match=regexp("this and7 that","[a-z]+[0-9]",&b,&e);
    	printf("->%s<-\n(b=%d e=%d)\n",match,b,e);
    	return 0;
    }
    beware that regexp.h uses POSIX style notation (no \d or \w, etc). Supposedly, there is stuff like [:digit:] but if I substitute this
    Code:
    char *match=regexp("this and7 that","[:alpha:]+[:digit:]",&b,&e);
    on my system, instead of the predicatable:

    ->and7<-
    (b=5 e=9)


    I get the bizarre and inexplicable:

    ->hi<-
    (b=1 e=3)


    which does not look like a number of letters and a digit to me!
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    cas
    cas is offline
    Registered User
    Join Date
    Sep 2007
    Posts
    975
    Supposedly, there is stuff like [:digit:]
    You want [[:digit:]].

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by cas View Post
    You want [[:digit:]].
    Wow. I honestly don't
    I mean that is only one character less than [0123456789]! Why not make it:
    Code:
    [[[[any single numeral, 1, 2, 3, 4, 5, 6, 7, 8, 9 and also including ZERO]]]]
    But thanks.
    Last edited by MK27; 07-11-2009 at 12:59 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Regular Expressions (regex.h) small problem
    By _Marcel_ in forum C Programming
    Replies: 0
    Last Post: 03-31-2009, 05:13 AM
  2. Extracting info from a packet trace file
    By Ho ming in forum C Programming
    Replies: 1
    Last Post: 03-31-2008, 11:54 AM
  3. Extracting specific parts from a string
    By (TNT) in forum C++ Programming
    Replies: 5
    Last Post: 07-10-2003, 06:43 PM
  4. extracting words from an array of words
    By axon in forum C++ Programming
    Replies: 2
    Last Post: 04-02-2003, 10:21 PM
  5. searching array for matches
    By blight2c in forum C++ Programming
    Replies: 2
    Last Post: 04-13-2002, 02:16 PM

Tags for this Thread


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21