Thread: Need assistance with some utility text parsing functions.

  1. #1
    Registered User
    Join Date
    Apr 2012
    Posts
    2

    Need assistance with some utility text parsing functions.

    I am looking for some utility functions to do some simple text parsing without having to use regex directly. I am not that great with c but am handy with php. Heres what I'm looking for. I want to send a function a string and have it send back a string as a return value that I can use. Heres an example:

    value = between('<test>', '</test>', '<test>hi there</test>'); //returns 'hi there' and value now carries this string
    value = after('Test', 'Testhi there'); //returns 'hi there' and value now carries this string
    value = before('test', 'hi theretest'); //returns 'hi there' and value now carries this string


    A good example of the PHP version of this can be found here: PHP: substr - Manual


    I know that c has substr but I wouldnt have the slightest idea how to convert these functions to c. The plans I have for these functions is to parse out text from strings and use them as arguments for other C functions that take strings as arguments. That is the end goal.


    (This is for regular C/Ansi C or whatever its called, not C++ btw).
    Last edited by lowlight; 04-29-2012 at 07:31 PM.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    This is so trivial (and since it's your first time here). In the future though, post up the code you're working on, so we don't need to waste time asking a zillion questions, before we can make a specific response that makes sense.

    There may be simpler ways to do this, and _strrev may not be on your compiler (easy enough to write your own function to reverse a string though).

    Code:
    #include <stdio.h>
    #include <string.h>
    
    #define MAX 200
    
    int main(void) {
       int i,j;
       char *ch;
       char target[80];
       char mystr[MAX]= {"This is a test (it's no joke), it's a #@75% test! So here /n"
    "it is: <text>hi there</text>. Now, can we parse it <LOL>?\n"
    "That is the test."};
    
       ch = strstr(mystr, "</text>");
       if(ch--) {
          i=j=0;
          while(*ch !='>') {
             target[i++]=*ch;
             --ch;
             ++j;
          }
       }    
       target[j]='\0';
       _strrev(target);
       printf("%s ",target);
       printf("\n");
       return 0;
    }

  3. #3
    Registered User
    Join Date
    Apr 2012
    Posts
    2
    Hmm, ok. Well, I'm not quite sure how you can compare this to something trivial as I'm sure it's very likely not from the experiences I've had in trying to convert these functions over the last few days.

    Also, I am thinking you didnt understand what I had asked for. I'm interested in a RETURN string value or a pointer that can be turned into one I guess which is why I asked for utility functions, not a main loop. From what I've been reading, C cannot return strings, it can only return integers and thus thats why returning a pointer works I guess, since it's an integer that points at a memory address? I dont know how it all works, but thats my uneducated understanding of it. What you have here doesnt return anything, and it seems to be compiler dependent and not something I could turn into a utility function anyway without using a specific compiler. I use GNU's GCC on Linux and it was indeed undefined like you mentioned.

    Also, I dont have any code to post because I havent the slightest clue on how this would be accomplished in C and I'm starting to think that it probably isnt even possible to begin with anyway. About the only thing I could post would be examples of how I want to use it, which I did post already. I've asked around a little on IRC but have had no luck in finding any real answer on how to convert these parsing functions to C. I use them daily with PHP and they are just awesome. I have not missed regex one bit since using these.

    Oh and something else. I noticed that in your code, you hard coded the <test> </test> thing. Well, these are only example strings, the end results would obviously not be hard coded as such so that it could support any kind of opening/closing tag. I believe that this is probably far to complex for a simple solution and I may have to limp along with a before() or after() function that utilizes substr somehow. I dont want to have a 50000 line app just to facilitate a between() function. I'd be happy with a small 10 liner for either before() or after().

    Anyway thanx for at least trying. But I dont think this would be the right answer.

    Maybe the following code would make it more clear what I want? This code actually works, the only thing missing are the guts of the utility functions since I dont know what should go there. I dont know the difference between const char and char, either one seems to work. Is there any reason I should use one over the other?

    Code:
    char const * between(char const * this, char const * and_this, char const * in_this)
    {
       char const * return_val = this;
       return return_val;
    }
    
    
    char const * before(char const * this, char const * in_this)
    {
       char const * return_val = this;
       return return_val;
    }
    
    
    char const * after(char const * this, char const * in_this)
    {
       char const * return_val = this; //This is just an example, the return_val would actually be the result of the parsing code that goes in this function.
       return return_val;
    }
    
    
    int main()
    {
       char const * this = "this";
       char const * and_this = "and_this";
       char const * in_this = "in this";
       char const * between_test = "";
       between_test = between(this, and_this, in_this);
       puts(between_test);
    }
    Last edited by lowlight; 04-29-2012 at 10:48 PM.

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by lowlight
    From what I've been reading, C cannot return strings, it can only return integers and thus thats why returning a pointer works I guess, since it's an integer that points at a memory address?
    In C, you cannot specify an array type as the return type of a function. However, you are not limited to specifying an integer type as the return type, e.g., you can also specify a struct type. Now, to return an array from a function, you would specify the return type to be a pointer type, then the returning of the array actually returns a pointer to the arrays first element. The problem is, if the array is a non-static local array, it would be destroyed after the function returns, hence the caller would get a pointer to an element that no longer exists, which is a Bad Thing.

    Quote Originally Posted by lowlight
    A good example of the PHP version of this can be found here: PHP: substr
    No, that is not an example of the PHP version. PHP's substr takes substrings based on index. You want to find a substring and then return another substring in a position relative to the substring found. Doing this in PHP would likely involve both the use of strstr and substr.

    Quote Originally Posted by lowlight
    Also, I dont have any code to post because I havent the slightest clue on how this would be accomplished in C and I'm starting to think that it probably isnt even possible to begin with anyway.
    It is possible in C. As Adak demonstrated, you would use strstr. However, in the case of after, you just need to return a pointer to the first character after the substring found. For between and before, you may want to operate on the source string by inserting a null character to shorten the string, i.e., you assume that the caller has made a copy of the original string.

    Quote Originally Posted by lowlight
    I noticed that in your code, you hard coded the <test> </test> thing. Well, these are only example strings, the end results would obviously not be hard coded as such so that it could support any kind of opening/closing tag.
    They were hard coded to facilitate the example.

    Quote Originally Posted by lowlight
    I believe that this is probably far to complex for a simple solution and I may have to limp along with a before() or after() function that utilizes substr somehow. I dont want to have a 50000 line app just to facilitate a between() function. I'd be happy with a small 10 liner for either before() or after().
    Sigh. I don't know what's with Adak's example. Your after function is a one liner: it is just strstr + strlen (though it would be slightly longer for error checking). Your before function requires a single call to strstr, and then you null terminate the string. Your between function requires a call to strstr and the before function.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    You didn't understand. Adak knew exactly what he was doing.

    We don't do peoples work for them here; we will only help you do your work. He was giving you an example of how to approach the implementation of the functionality you want. That's more than I would have done with what little you gave.

    That said, he did use `_strrev' which isn't widely available. (It is easy to implement.) He did give you `strstr', which if you take the time to read the documentation, can be used to get most of what you want very easily. The most trouble you'll have is allocating new memory and copying the found range so that the new string can be properly null terminated or passing a pointer to already allocated memory and the size of that memory.

    Soma

  6. #6
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    I dont have any code to post because I havent the slightest clue on how this would be accomplished in C and I'm starting to think that it probably isnt even possible to begin with anyway.
    That's idiotic! What do you think PHP is written in, fairy-dust? It's written in C.

    Anyway, Adak's code does seem over-complicated to me. For the between function I'd just use a couple of strstr calls to find the beginning of the start and end delimiters, giving a start and end ptr. Then I'd move the start ptr to the end of the start delimiter by adding the length of the delimiter. Then I'd malloc the appropriate amount of memory and use strncpy to copy the chars to it, zero-terminate that sucker, and return it. Frankly, it would've been easier to write the code than this paragraph.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. parsing a text data as...
    By AngKar in forum C Programming
    Replies: 7
    Last Post: 04-22-2006, 12:18 AM
  2. Parsing Text
    By flaran in forum C++ Programming
    Replies: 13
    Last Post: 10-19-2005, 12:08 PM
  3. Utility to rotate text
    By Yasir_Malik in forum Linux Programming
    Replies: 2
    Last Post: 10-03-2004, 03:21 PM
  4. C++ Text Parsing
    By LrdChaos in forum C++ Programming
    Replies: 2
    Last Post: 09-04-2002, 08:16 PM
  5. Need assistance for major text base Arena RPG project
    By Ruflano in forum C++ Programming
    Replies: 0
    Last Post: 04-04-2002, 11:11 AM