Thread: Regular expression tutorial help

  1. #1
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27

    Regular expression tutorial help

    Can anyone recommend a tutorial or some example code detailing the use of regular expressions in C? I'm familiar with regex usage in Perl and Ruby but can't work out how to use them in C. This documentation, although probably useful, was not very clear on how to use the functions described:
    http://www.gnu.org/software/libc/man...ar-Expressions
    The sort of thing I have in mind at the moment is directory listing to find files with names matching particular strings; Perl &c. are a bit slow with large volumes of data.
    Thanks in advance for any suggestions.

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I too would like more info here. I coded one up not too long ago to test these myself. I'm no regex guru by any stretch, but I don't think the base libs that come on a Mac work very well. Read the "man" pages for regex has more "gotchya's" and "bewares" than useable info.

    I've been told to download and use the "pcre" lib - perl compatible regular expression lib. I downloaded it, but have not installed it yet.

    Here's my recent hack - that does not work like I expect it to.

    Code:
    #include <stdlib.h> 
    #include <string.h>
    #include <regex.h>
    
    int main (int argc, const char * argv[]) {
    	char data[80] ; 
    	int rc ; 
    	regex_t * myregex ; 
    //	regmatch_t arrayOfMatches[2];
    	
    	// Compile the regular expression 
    	rc = regcomp( myregex, 
    		"^ *[-+]?([0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][-+]?[0-9]+)? *$", 
    		REG_EXTENDED | REG_NOSUB ) ; 
    	printf("RC from regcomp() = %d\n", rc); 
    	
    	printf("Enter a double value\n");
    	scanf("%s",data) ; 
    	
    	// Compare the entered value to the regex 
    	if (!regexec(myregex, data, 0 , 0 , 0 ) ) {
    		printf("double %s is valid.\n", data ) ; 
    		printf("converted value is %lf\n", atof(data) )  ; 
     
    	}
    	else { 
    		printf("double %s is not valid\n", data ); 
    		printf("converted value is %lf\n", atof(data) )  ; 
    	}
    	return 0;
    }
    Todd

  3. #3
    Registered User
    Join Date
    Mar 2005
    Posts
    140
    Looks like you need to allocate some space for *myregex.

    Also, need to escape your backslashes in the expression
    Code:
    "^ *[-+]?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([eE][-+]?[0-9]+)? *$",

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Ahhhhhh.... that would probably fix the compiler warnings too... duh me.

    When you say allocate space for *myregex - the pointer - doesn't the regcomp() do that, and then I have to free that storage when I'm done with the regex?

    Todd

  5. #5
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    Todd,
    I edited your code a bit and got this:

    Code:
    #include <stdlib.h> 
    #include <string.h>
    #include <regex.h>
    #include <stdio.h>
    
    int main () 
    {
    	char string1[] = "matching_name.zip"; 
    	char string2[] = "zip.non_matching_name"; 
    	int rc; 
    	regex_t * myregex = calloc(1, sizeof(regex_t)); 
    
    	if (NULL == myregex)
    	{
    		return 1;
    	}
    	
    	// Compile the regular expression 
    	rc = regcomp( myregex, "^.*.zip$", REG_EXTENDED | REG_NOSUB ); 
    	/* this should compile, but doesn't!
    	rc = regcomp( myregex, "^.*\.zip$", REG_EXTENDED | REG_NOSUB ); 
            */
    	printf("RC from regcomp() = &#37;d\n", rc); 
    	
    	// Compare the entered value to the regex 
    	if (0 == regexec(myregex, string1, 0 , 0 , 0 ) ) 
    	{
    		printf("String %s matches.\n", string1 ) ; 
    	}
    	else 
    	{ 
    		printf("String %s does not match.\n", string1 ); 
    	}
    	if (0 == regexec(myregex, string2, 0 , 0 , 0 ) ) 
    	{
    		printf("String %s matches.\n", string2 ) ; 
    	}
    	else 
    	{ 
    		printf("String %s does not match.\n", string2 ); 
    	}
    
    	free(myregex);
    	return 0;
    }
    That compiles and runs. I'm not sure why it doesn't like "\." - a literal "." rather than "any character - though. It complains of an unknown escape sequence.
    Last edited by knirirr; 01-23-2008 at 09:00 AM.

  6. #6
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    See prior post in this thread - the backslash has to be escaped!

  7. #7
    Registered User
    Join Date
    Jan 2008
    Location
    Oxford, England
    Posts
    27
    D'oh!
    Yes, I completely forgot to do that. Thanks.

  8. #8
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by Todd Burch View Post
    When you say allocate space for *myregex - the pointer - doesn't the regcomp() do that, and then I have to free that storage when I'm done with the regex?

    Todd
    Maybe, maybe not, I don't know personally but if it (the regcomp function) did it for you you would have had to pass the address of the pointer to the function in order for it to work:
    Code:
    regex_t * myregex ; 
    
    ...
    
    rc = regcomp( &myregex,"^ *[-+]?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([eE][-+]?[0-9]+)? *$", 
    		REG_EXTENDED | REG_NOSUB ) ;
    Otherwise it looks like you have to allocate memory yourself.
    Last edited by hk_mp5kpdw; 01-23-2008 at 09:27 AM.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  9. #9
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    It seems reasonable that I would have to allocate memory. Still researching...

  10. #10
    Registered User nenpa8lo's Avatar
    Join Date
    Jan 2008
    Posts
    42
    It's all fine, but I work in embedded stuff. So I have 16k RAM total and I can not allocate and so on. I don't have regex_t type as well, so is there any 'small' function/library about? Or do I have to write my RegEx()?
    Generally I want to grab pointer to <CR><LF> from string:
    Code:
    [digit 1-255],[digit 0-4],["string of any form and max len of 40B surrounded by quotes"],[digit 1-255]<CR><LF>89bacdf4847fcbade3d4f5bc87

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If your string is always going to be a fairly fixed format, why not use a less flexible approach of parsing the string - if what you are looking for is the string after CR/LF, search just for a CR followed by LF, for example?

    Doing regex in an embedded system, it's probably going to take up more memory than you want to spend on this...

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    Registered User nenpa8lo's Avatar
    Join Date
    Jan 2008
    Posts
    42
    Je but the thing is that "string" may have any characters, including <CR><LF>

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    So, what does a REAL input for this look like? Are there double quotes and square brackets in the input, or are they just your way to indicate what's there?

    If you have the quotes and/or square brackets, it shouldn't take that much to come up with something that traipses through and finds the relevant components. If it's "anything anywhere", then you have a problem [and I doubt a regex parser will solve that either].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Why not just work backwards from the end of the data?

  15. #15
    Registered User nenpa8lo's Avatar
    Join Date
    Jan 2008
    Posts
    42
    Reading backward, that will be easier than regex for embedded C :-D

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  2. recursion error
    By cchallenged in forum C Programming
    Replies: 2
    Last Post: 12-18-2006, 09:15 AM
  3. Regular Expression
    By tintifaxe in forum C++ Programming
    Replies: 3
    Last Post: 06-14-2006, 07:16 AM
  4. Please Help - Problem with Compilers
    By toonlover in forum C++ Programming
    Replies: 5
    Last Post: 07-23-2005, 10:03 AM
  5. Regular Expression Troubles
    By Unregistered in forum C++ Programming
    Replies: 2
    Last Post: 04-11-2002, 04:21 PM