Thread: Need regexp help

  1. #16
    Registered User
    Join Date
    Sep 2008
    Posts
    10
    Quote Originally Posted by Thantos View Post
    *? means that * shouldn't be greedy.
    This is precisely the problem. The ? doesn't make the * lazy. If it did, my problem would be solved.

  2. #17
    Registered User
    Join Date
    Sep 2008
    Posts
    10
    Quote Originally Posted by MK27 View Post
    (re: Salem) okay! parse an .xml into one line, presuming unix linefeeds:

    Code:
    char *linefile (char *file) {
    	size_t len, mem=0;
    	char *cumul, *line = NULL;
    	static char err[]="ERROR";
    	FILE *FST_mine = fopen(file, "r");
    	if (FST_mine == NULL) return line;
    	while ((line = linein(FST_mine)) != NULL) {
    		len = strlen(line);
    		if (mem == 0) { mem = len+1;
    			cumul = malloc(mem);
    			strcpy(cumul,line);
    		}
    		else {	mem += len;
    			cumul = (char *)realloc(cumul,mem);
    			strcat(cumul,line);
    		}
    	}
    	if (fclose(FST_mine) != 0) { puts("fclose fail linefile()");
    		return err;}
    	return cumul;
    }
    char *myline=linefile("myfile.xml")
    Feed that through the above "regexp" and each time it returns convert all characters in "myline" upto the last rgxp.end into spaces (remember to collect bgn and end somewhere):
    Code:
    for (i=0; i <rgxp.end; i++) myline[i]=32;
    Then maybe free(myline) and you'll end up with an array (or whatever you did to collect the bgns and ends) consisting of the character positions for HELLO in the entire file.

    (re: Thanatos) oh!
    I appreciate your effort, but HELLO was just used as a marker. HELLO can be anything, and that anything is what I'm trying to fetch.

  3. #18
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    regex.h is the POSIX engine, yes.

    Question: Do you want nested tags? Instead of matching *?, you could maybe match [^<]? (match any character other than open-pointy).

  4. #19
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    "HELLO can be anything, and that anything is what I'm trying to fetch."

    If you read a little more carefully, garton, you would realize that fetching anything is exactly what I showed you how to do.

    Part of your problem is you that using you are using ".*?" on either side of a pattern to match an entire line (and that really is the LAZY way to go), then complaining that all you wanted was a token or substring. Using character positions may seem more tedious (or maybe just new), but please notice that finding character positions is exactly what regex.h does.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #20
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    pick pick pick!

    Anyway, I appreciate it BUT/AND:

    What happens if you read a string that is "ERROR"? Why not return NULL on error? [edit] Like you do when the file couldn't be opened?
    Well, failing to open a file and failing to close it aren't the same, are they?
    They are not the same error, but they have the same effect: no output file.
    what does linein return? Is it an malloc()'ed value? (If so, why aren't you freeing it?)
    There is something I don't quite understand here. "linein" obviously returns a char pointer, so to me it would seem the malloc'n'freeing would/could go on OUTSIDE the function. In practice I just use the pointer, which is local to another function and freed with it I'm told...
    If it's freed from the "another function" you are in a big heap of trouble -- you have NO VALUE (because the memory no longer belongs to you, and will be reclaimed by the system Real Soon Now). It's possible linein malloc's the memory for you -- you wrote it, I guess, so you should know -- but even if it does, that memory still belongs to you and it is your job to dispose of it appropriately.
    Never assign the result of realloc back to the same pointer. If it returns NULL, you've leaked memory.
    Huh again. Is that w/r/t realloc in particular, or all functions in general?
    Well, realloc is the only reallocation function we have, so "yes".
    Why make strcat() traverse the line when you already know where the NULL terminator is?
    By NULL terminator you mean "\0"? What is my other option (working character by character from mem to mem+len)? In that case, why does strcat even exist?
    \0 is the null terminator, yes. strcat does two things: (1) finds the end of the string and (2) strcpy's the target string to that end of the string. Since you went to a great deal of trouble to do (1), you don't need to do it again -- just do (2) by calling strcpy your own self.

  6. #21
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    what does linein return?

    wow, sorry i missed this -- linein is my function, not a standard one. For posterity's sake it looks like this:
    Code:
    char *linein (FILE *stream) {
    	size_t len=0;
    	char *line = NULL;
    	if((getline(&line,&len,stream)) == -1) return NULL;
    	return line;
    }
    So no, there's no malloc...but thanks for pointing this issue out, because I've been overlooking it and am right now going through a couple of programs with my "free" pen in hand.

    Correction: Actually getline is a GNU extension and "allocates the initial buffer for you by calling malloc", so right before return cumul linefile should contain free(line);

    by calling strcpy your own self.
    How do I use strcpy to add something to the end of a line (and not just overwrite the line)?
    Last edited by MK27; 09-05-2008 at 12:10 PM. Reason: getline
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #22
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    [I][COLOR="Red"]How do I use strcpy to add something to the end of a line (and not just overwrite the line)?
    In your example,
    Code:
    strcpy(cumul+(mem-len),line);
    In other words, you *know* where the previous stuff ends, so copy right into that location.

  8. #23
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300

    Arrow

    Code:
    strcpy(cumul+(mem-len),line);
    Altho that makes sense to me, inserting it in linefile() caused this error when the function is called:

    *** glibc detected *** ./program: free(): invalid next size (normal): 0x08174a48 ***

    and then the "backtrace". So I guess I stick with strcat.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  9. #24
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Well, because you've gone and been obfuscated, you have to subtract one from that because mem counts the NULL.
    Code:
    strcpy(cumul+(mem-len-1),line);
    Correction: Actually getline is a GNU extension and "allocates the initial buffer for you by calling malloc", so right before return cumul linefile should contain free(line);
    That is precisely what I meant. And why use a non-standard function anyway?

    > static char err[]="ERROR";
    In addition to what dwks mentioned, how would you free this?
    I wouldn't...
    The trouble is, how can you tell the difference between "ERROR" (as a pointer to the static string) and "ERROR" (as an malloc()'d line in the file)? You can't. Either you leak memory or you try to free a static array.

    Seriously, just return NULL upon error. The calling function doesn't care why the function couldn't perform its duty -- it only cares that it couldn't. fopen() can fail for a multitude of different reasons, but it always returns NULL if it does.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  10. #25
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    why use a non-standard function anyway?
    do you mean GNU getline or my own functions? how am i suppose to write a computer program without including a few "non-standard" functions...

    Seriously, just return NULL upon error.
    In fact I have made this change (or, rather, made it a fatal error since being unable to close a file that you just opened would be an unusual and alarming event in that context.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #26
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    why use a non-standard function anyway?
    do you mean GNU getline or my own functions? how am i suppose to write a computer program without including a few "non-standard" functions...
    We mean "functions that perform the same activity as standard functions, except for the parts I screwed up and therefore cause bugs/leaks/other bad stuff and therefore you have to use in a very specific way".

  12. #27
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Seriously, just return NULL upon error.
    In fact I have made this change (or, rather, made it a fatal error since being unable to close a file that you just opened would be an unusual and alarming event in that context.
    If you feel that you have more than one error to return -- well, make the function return an int or something, and return error codes that way. You can still get the character data out of the function by using a char** argument.

    Or you could raise a fatal error. But I don't really think that's the best route. Yes, it's a strange error. But it's an error that you can recover from. All you have to do is return. So what if the file couldn't be closed? Probably the worst that will happen is you won't be able to open the file next time, because it's still open or the media it was on was removed or simply because you've reached the maximum number of files open (FOPEN_MAX).

    I only like to abort the program with a fatal error when it really is a fatal error. Running out of memory might be considered a fatal error. Not being able to close a file, when you've already gotten the information you need out of it, probably shouldn't be.

    BTW -- you can use quote tags if you want: [quote]Quoted text goes here.[/quote]
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  13. #28
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    We mean "functions that perform the same activity as standard functions...".
    I did not find any of those in my code sir!
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #29
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    I did not find any of those in my code sir!
    Are you honestly claiming that your linein is not eerily similar to gets? (Yes, I know no one should use gets these days, but it also comes with a safe more-or-less equivalent of fgets.) And the "oh I guess the malloc happens in my input function after all" is a symptom.

  15. #30
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300

    Cool

    okay, okay, "linein" is probably the first function I ever wrote. Or close to it. And the "Teach Yourself C in 21 Days" I got from the library (1990) doesn't mention fgets, but it does warn against the use of "gets".

    Excuses aside, however, I will defend myself by noting that with fgets "You must supply count characters worth of space in [the char pointer]" (GNU documentation). So linein seems more versatile to me -- ie. without testing it looks like you have to do your own malloc using fgets.

    So long live linein. Or else I'm still wrong.
    Last edited by MK27; 09-05-2008 at 09:45 PM. Reason: just because
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Php regexp --> C++
    By michkine in forum C++ Programming
    Replies: 8
    Last Post: 02-07-2005, 01:19 PM