Thread: Searching for a series of possible substrings inside a string

  1. #1
    Syncopated Kestrel andrew.bolster's Avatar
    Join Date
    Nov 2007
    Location
    Belfast
    Posts
    45

    Question Searching for a series of possible substrings inside a string

    Hi Guys
    I'm looking for some advice again, I'm doing some code that looks at long strings ("long" being less than 200 chars) and picking out numbers of, eg, in english, counting the number of "the" || "a" || "and" || "then" etc.
    I dont need to know the number of "the"s and "a"s, just the total.

    Considering im looking at a list of about 20 of these substrings to look for I'd love to know if theres anything more efficient of searching for each one in turn.

    Thoughts?

    Cheers Folks

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by andrew.bolster View Post
    Hi Guys
    I'm looking for some advice again, I'm doing some code that looks at long strings ("long" being less than 200 chars) and picking out numbers of, eg, in english, counting the number of "the" || "a" || "and" || "then" etc.
    I dont need to know the number of "the"s and "a"s, just the total.

    Considering im looking at a list of about 20 of these substrings to look for I'd love to know if theres anything more efficient of searching for each one in turn.

    Thoughts?

    Cheers Folks
    So, medium length strings then.

    It's a sequential process, Andrew. You can't know the sum of`the words you want, without counting through all the strings.

    My idea would be to:

    1) separate each word in the string, into a data file, one word per line.
    2) sort the words in the data file, after you've read all the strings.
    (and you can use the OS to sort the data file by words, very easily)

    3) Now your word count runs at break neck speed, and all the words are counted up in one fell swoop.

    How's that sound?
    Last edited by Adak; 02-10-2008 at 12:19 AM. Reason: grammatical error

  3. #3
    Syncopated Kestrel andrew.bolster's Avatar
    Join Date
    Nov 2007
    Location
    Belfast
    Posts
    45
    Thanks but i was simply using the English as an example. I'm trying to build up a source portfolio and atm are looking at a source code metric, eg, i need to count the operators and operands, but i just want to create a function that goes through a mother-string looking for a series of sub-strings, a la how getopt looks at the argv string array for the characters in the string that you submit to it, just with mor than one char.

    another example of a mother string
    Code:
    else if((flag)&(NULL!=strstr(string,"foo")))
    im looking for a function that would take this as one argument, and a list of operators to look for eg
    Code:
    {"&","!=","||", et al}
    and in this example would print something like 2, if were just looking at logical operators.

    Does that help explanation wise?

  4. #4
    Syncopated Kestrel andrew.bolster's Avatar
    Join Date
    Nov 2007
    Location
    Belfast
    Posts
    45
    Ahh well, just wondering if anyone had any ingenious functions. Will post what i end up with.

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    They say that the Irish have the gift of a "silver" tongue. Are you sure you're in Belfast, Ireland?

    Your second explanation turns a cloudy understanding of what you want, into an entirely opaque one, IMO.

    Good luck, in any case.

  6. #6
    Syncopated Kestrel andrew.bolster's Avatar
    Join Date
    Nov 2007
    Location
    Belfast
    Posts
    45
    you will of course take the time of the day into account

    essentially, i was to count the number of occurrences of any of a list of substrings within a larger string, like the number of vowels in a word, but with more than one character being matched.

    Like, counting punctuation marks in a line; tell the function you are looking for punctuation marks and, for instance, in this sentence it would say 5 . It doest matter that 2 are commas and one is a semicolon, you only care if that substring is in the string or not.

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by andrew.bolster View Post
    you will of course take the time of the day into account

    essentially, i was to count the number of occurrences of any of a list of substrings within a larger string, like the number of vowels in a word, but with more than one character being matched.

    Like, counting punctuation marks in a line; tell the function you are looking for punctuation marks and, for instance, in this sentence it would say 5 . It doest matter that 2 are commas and one is a semicolon, you only care if that substring is in the string or not.
    Code:
    OK, I'd use a char array like target[] = {',', '.', ':', ';', ''',} and a simple variable to denote the number found.
    
    while(char != '\0')  {   /* end of string marker */
       Then 
       for {
          each letter in the string, use a for loop to see if any of the char's match between      
          the target elements, and the char being checked in the string. Every one that matches, 
          increment the variable, found.
       } /* end of for */
    } /* end of while */
    You are up late, Andrew! Get some sleep, guy.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I'd suggest some kind of Boyer Moore string matching algorithm. I'd perhaps modify it into a full-blown finite state machine that searches for multiple matches simultaneously.
    Of course that would take a while to come up with. Are you sure the simple approach is too slow?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Message class ** Need help befor 12am tonight**
    By TransformedBG in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2006, 11:03 PM
  2. Calculator + LinkedList
    By maro009 in forum C++ Programming
    Replies: 20
    Last Post: 05-17-2005, 12:56 PM
  3. Another overloading "<<" problem
    By alphaoide in forum C++ Programming
    Replies: 18
    Last Post: 09-30-2003, 10:32 AM
  4. lvp string...
    By Magma in forum C++ Programming
    Replies: 4
    Last Post: 02-27-2003, 12:03 AM
  5. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM