Thread: text pattern recognition

  1. #1
    mtsmox
    Guest

    text pattern recognition

    Hi all!

    I'm making an algorithm to find a pattern in a text. This pattern can include numbers and strings. It's a bit like (s)scanf, but I keep looking if I can't find it at the beginning but I can't read/convert the pattern to variables yet. I also do 'type checking', because if the pattern specifies a byte value and the value in the text <0 or >255 then the pattern doesn't match. This all works fine, but here's the question?

    Is there already some (possibly a standard) function that does this. So I can compare and see which one is faster?
    Because I have the idea it's kinda slow. But then again if I time this function:

    void stupidFunc( char * text )
    {
    while( *(text++) );
    }

    it is even slower than a call to scanf. And I don't use it in my pattern finding routine, I do most searching manually (thus slow?). Like: if ( *text == '%' ) and that sorts of things. But I do use strstr. Is this fast?

    But enough of my rambling...
    Any resources or links would be appriciated!

    Joren

  2. #2
    S­énior Member
    Join Date
    Jan 2002
    Posts
    982
    I've no idea what most of your post was going on about, but you can do something like -

    Code:
    #include <iostream> 
    #include <string>
    
    using namespace std; 
    
    int main(int argc, char *argv[]) 
    { 
    
    	string w = "Is there a pattern here?";
    	string p = "e a p";
    
    	cout << "p found in w at position: " <<  w.find(p,0) << '\n';
    			
        return 0;
    }
    to find a string within a string (if you're searching a file, you could read it in line by line, and do this for each line). Ignore me if you mean something completely different.

  3. #3
    Registered User
    Join Date
    Aug 2001
    Posts
    41
    Like I said, most of the time I was just rambling. I'll try again...

    This is what a call to my function looks like:

    char * result = findPattern( "String to search: 123.123.123.123", "%i.%i.%i.%i" );

    "String to search: 123.123.123.123" is searched for a pattern of type: "%i.%i.%i.%i", which means: 4 integers seperated by "."
    In this case result would be the adres of "123.123.123.123"

    Other types you can search for in a string are:
    %s : string
    %c : one character
    %u : unsigned
    %hi : short
    %hu : unsigned short
    %b : byte

    That's it basicly. So it requires lots of string matching (for the string seperators) and type checking for the variable search types.
    Are there any algorithms (or perhaps just ideas) on how to do this?

    Joren

  4. #4
    S­énior Member
    Join Date
    Jan 2002
    Posts
    982
    Oh, I see. One thing you could do is create a parsing class, that has member functions that test for each type (%s,%c, etc). Then when you're parsing the format string send the char* (or std:string substring) at the correct position to the relevant function, and have it return a bool as to whether the first symbol encountered corresponds to what was required in the format string.

    You could attempt to extract each type (in each member function) from the string in a typesafe manner with a stringstream, if the stringstream extraction fails you know that the pattern doesn't match and you need to restart from the next character (from the previous begining) in the string and from the begining of the format string.

    You'd keep doing this until you either reach the end of the possible candidate or the format string.

  5. #5
    Registered User
    Join Date
    Aug 2001
    Posts
    41
    That's kind of what I'm doing.

    What do you mean by stringstream extraction? I currently just check each character in the string to see if it matches the type. But I don't really want to do everything manually, because it's bound to be slower than conversions that already exist.

    And one thing what I also do now:

    If the type of the current format doesn't match you have to go back in the format string and try to "expand" the previous type (while it matches the text), and then from the new end try to find the next type in the pattern.
    This is because otherwise the following string wouldn't be parsed, while it can be:
    "abcd..123" with the pattern "%s.%i"
    First the string abcd is found with terminator ".", when it tries to check for the integer, it fails "." is next character. So you have to "expand" the string to include the "." and try to find the next terminator "." So the string will be "abcd." and then "123" is found as the integer, so it succeeds.

    Joren

  6. #6
    Skunkmeister Stoned_Coder's Avatar
    Join Date
    Aug 2001
    Posts
    2,572
    look into sscanf().
    Free the weed!! Class B to class C is not good enough!!
    And the FAQ is here :- http://faq.cprogramming.com/cgi-bin/smartfaq.cgi

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A bunch of Linker Errors...
    By Junior89 in forum Windows Programming
    Replies: 4
    Last Post: 01-06-2006, 02:59 PM
  2. Appending text to an edit control
    By dit6a9 in forum Windows Programming
    Replies: 3
    Last Post: 08-13-2004, 09:52 PM
  3. Text positioning and Text scrolling
    By RealityFusion in forum C++ Programming
    Replies: 3
    Last Post: 08-13-2004, 12:35 AM
  4. Scrolling The Text
    By GaPe in forum C Programming
    Replies: 3
    Last Post: 07-14-2002, 04:33 PM
  5. Replies: 1
    Last Post: 07-13-2002, 05:45 PM