Thread: Searching for a series of possible substrings inside a string

1. Searching for a series of possible substrings inside a string

Hi Guys
I'm looking for some advice again, I'm doing some code that looks at long strings ("long" being less than 200 chars) and picking out numbers of, eg, in english, counting the number of "the" || "a" || "and" || "then" etc.
I dont need to know the number of "the"s and "a"s, just the total.

Considering im looking at a list of about 20 of these substrings to look for I'd love to know if theres anything more efficient of searching for each one in turn.

Thoughts?

Cheers Folks

2. Originally Posted by andrew.bolster
Hi Guys
I'm looking for some advice again, I'm doing some code that looks at long strings ("long" being less than 200 chars) and picking out numbers of, eg, in english, counting the number of "the" || "a" || "and" || "then" etc.
I dont need to know the number of "the"s and "a"s, just the total.

Considering im looking at a list of about 20 of these substrings to look for I'd love to know if theres anything more efficient of searching for each one in turn.

Thoughts?

Cheers Folks
So, medium length strings then.

It's a sequential process, Andrew. You can't know the sum of`the words you want, without counting through all the strings.

My idea would be to:

1) separate each word in the string, into a data file, one word per line.
2) sort the words in the data file, after you've read all the strings.
(and you can use the OS to sort the data file by words, very easily)

3) Now your word count runs at break neck speed, and all the words are counted up in one fell swoop.

How's that sound?

3. Thanks but i was simply using the English as an example. I'm trying to build up a source portfolio and atm are looking at a source code metric, eg, i need to count the operators and operands, but i just want to create a function that goes through a mother-string looking for a series of sub-strings, a la how getopt looks at the argv string array for the characters in the string that you submit to it, just with mor than one char.

another example of a mother string
Code:
`else if((flag)&(NULL!=strstr(string,"foo")))`
im looking for a function that would take this as one argument, and a list of operators to look for eg
Code:
`{"&","!=","||", et al}`
and in this example would print something like 2, if were just looking at logical operators.

Does that help explanation wise?

4. Ahh well, just wondering if anyone had any ingenious functions. Will post what i end up with.

5. They say that the Irish have the gift of a "silver" tongue. Are you sure you're in Belfast, Ireland?

Your second explanation turns a cloudy understanding of what you want, into an entirely opaque one, IMO.

Good luck, in any case.

6. you will of course take the time of the day into account

essentially, i was to count the number of occurrences of any of a list of substrings within a larger string, like the number of vowels in a word, but with more than one character being matched.

Like, counting punctuation marks in a line; tell the function you are looking for punctuation marks and, for instance, in this sentence it would say 5 . It doest matter that 2 are commas and one is a semicolon, you only care if that substring is in the string or not.

7. Originally Posted by andrew.bolster
you will of course take the time of the day into account

essentially, i was to count the number of occurrences of any of a list of substrings within a larger string, like the number of vowels in a word, but with more than one character being matched.

Like, counting punctuation marks in a line; tell the function you are looking for punctuation marks and, for instance, in this sentence it would say 5 . It doest matter that 2 are commas and one is a semicolon, you only care if that substring is in the string or not.
Code:
```OK, I'd use a char array like target[] = {',', '.', ':', ';', ''',} and a simple variable to denote the number found.

while(char != '\0')  {   /* end of string marker */
Then
for {
each letter in the string, use a for loop to see if any of the char's match between
the target elements, and the char being checked in the string. Every one that matches,
increment the variable, found.
} /* end of for */
} /* end of while */```
You are up late, Andrew! Get some sleep, guy.

8. I'd suggest some kind of Boyer Moore string matching algorithm. I'd perhaps modify it into a full-blown finite state machine that searches for multiple matches simultaneously.
Of course that would take a while to come up with. Are you sure the simple approach is too slow?