Searching

**gsoft** · 12-23-2004

In my book there is a exercise for me to try and do a search which is both case sensitive and insensitive

I did it however Im interested in knowing if there is an easier way or more efficient way

Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

char *caseSensitive(char *string, char *match);
char *caseInSensitive(char *string, char *match);

int main(void)
{
	
	char string[100], match[100], *ptr, *pos;
	int searchtype;
	
	printf("Enter a string to be searched: ");
	fgets(string, sizeof string, stdin);
	
	if ((ptr=strchr(string,'\n'))!=NULL)
		*ptr = '\0';
	
	printf("String to match: ");
	fgets(match, sizeof match, stdin);
	
	if ((ptr=strchr(match,'\n'))!=NULL)
		*ptr = '\0';
	
	printf("\n1 for case sensitive search\n2 for case insensitive search\n");
	scanf("%d",&searchtype);
	
	if (searchtype == 1)
	{
		pos = caseSensitive(string, match);
		if (pos==NULL)
			printf("%s wasnt found in %s\n",match,string);
		else
			printf("%s was found in %s at position %d\n",match, string,(pos-string));
	} 
	else
	{
		pos = caseInSensitive(string, match);
		if (pos==NULL)
			printf("%s wasnt found in %s\n",match,string);
		else
			printf("%s was found in %s at position %d\n",match, string,(pos-string));
	}
	
	return 0;

}

char *caseSensitive(char *string, char *match)
{

	char *pos;
	pos = strstr(string,match);
	return pos;

}

char *caseInSensitive(char *string, char *match)
{
	
	char *pos;
	int i;
	
	for (i=0;i<strlen(string);i++)
		(string)[i]=tolower((string)[i]);
	
	for (i=0;i<strlen(match);i++)
		(match)[i]=tolower((match)[i]);
	
	pos = strstr(string,match);
	return pos;

}

**misplaced** · 12-23-2004

here's is (somethign like) what i'd do..forgive me for syntax and/or logic errors...i just did this off the top of my head and did not put a whole lot of thought into it...my main objective was to show you how to use 'char s' and 'char m'......you can play around with the code to make it best fit your needs......(remember not to increment 'str' and 'mat' before trying to dereference them (ex:*str) within the loop. if you don't know what a seg fault is, then try it once. and try to guess why it happens)

Code:

int caseInsensitive(char *str, char *mat)
{
	//this is just an idea - you must take into consideration that 1 string may be longer
	//than the other but matches up to the end of the shortest string
        int i = 0;				
        char s, m;
	while(*str && *mat)
	{
		s = tolower(*str);
		m = tolower(*mat);
		if(!(s == m))
			return 0;		//return zero - match failed
		str++;
		mat++;
		i++;	
	}
	return i;				//return the number of characters matched 
						// (only if match was successful to end of shortest string)
						// (should be the length of the shortest string)
}

if you wanted return the string that match you could simply append a character onto a string within the 'if(s == m)' block...in your case, that string would be 'pos'

here is a bit of inconsistency here...you'll probably actually want to return the (index of the character+1) in which the match failed. if the match did not fail, return 0.

**Sebastiani** · 12-23-2004

>> pos = strstr(string,match);

isn't the point of this excercise to write your *own* version of strstr?

>> (string)[i]=tolower((string)[i]);

big no no. the function should not modify the contents of the arguments in this case.

**[Ren]** · 12-23-2004

Code:

	for (i=0;i<strlen(string);i++)
		(string)[i]=tolower((string)[i]);
	
	for (i=0;i<strlen(match);i++)
		(match)[i]=tolower((match)[i]);

What if either string or match is a string literal? C does not allow you to change the characters of a string literal, so your function would fail. You might make a copy of each string, but that is not efficient. It is better in this case not to use the standard string library.

**gsoft** · 12-23-2004

Originally Posted by Sebastiani

>> pos = strstr(string,match);

isn't the point of this excercise to write your *own* version of strstr?

>> (string)[i]=tolower((string)[i]);

big no no. the function should not modify the contents of the arguments in this case.

No its wasnt the point it was to use it in away that would allow case sensitve and case insensitive searches which where required to be one function each.

(string)[i]=tolower((string)[i]);

So if i used a pointer within that function to point to string and use that new pointer e.g.

Code:

char *caseInSensitive(char *string, char *match)
{
	
	char *pos, *string2, *match2;
	int i;
	
	for (i=0;i<strlen(string);i++)
		(string2)[i]=tolower((string)[i]);
	
	for (i=0;i<strlen(match);i++)
		(match2)[i]=tolower((match)[i]);
	
	pos = strstr(string2,match2);
	return pos;

}

Would that be better as its not modifying to pointers being sent to the function.

Ren, so what would I do then in the above code I guess you say I copy it what is a more efficient way.

**Sebastiani** · 12-23-2004

>> Would that be better as its not modifying to pointers being sent to the function.

the problem is, whenever the user passes the two strings to the function, the strings get permanantly lowercased. that's not what a 'test' function is supposed to do.

>> Ren, so what would I do then in the above code I guess you say I copy it what is a more efficient way.

it's not a question of efficiency. try calling the function like this:

Code:

char * found = caseSensitive("Foobar", "foo");

it crashes because you attempt to modify the contents of a read-only string. make a copy of the data in the function, lowercase the copy, then do your comparison. better yet, go with a solution similar to itsme86's that doesn't require a copy at all.

**[Ren]** · 12-23-2004

Code:

char *caseInSensitive(char *string, char *match)
{
	
	char *pos, *string2, *match2;
	int i;
	
	for (i=0;i<strlen(string);i++)
		(string2)[i]=tolower((string)[i]);
	
	for (i=0;i<strlen(match);i++)
		(match2)[i]=tolower((match)[i]);
	
	pos = strstr(string2,match2);
	return pos;

}

This will fail too. string2 and match2 do not point to memory you can write to. The right way to copy is this.

Code:

char *caseInSensitive(char *string, char *match)
{
	
	char *pos;
        char *string2 = malloc(strlen(string) + 1);
        char *match2 = malloc(strlen(match) + 1);
	int i;
	
	for (i=0;i<strlen(string);i++)
		(string2)[i]=tolower((string)[i]);
	string2[i] = '\0';
	for (i=0;i<strlen(match);i++)
		(match2)[i]=tolower((match)[i]);
	match2[i] = '\0';
	pos = strstr(string2,match2);
        free(match2);
	return pos;

}

Problem now is pos does not point within string, so you can not free string2 and changes to pos will not be reflected in string. A better and more efficient way is to rewrite strstr using toupper or tolower when comparing characters.

**itsme86** · 12-23-2004

A better and more efficient way is to rewrite strstr using toupper or tolower when comparing characters.

Except that then you're doing toupper/tolower on each character N times where N is the number of characters in the match substring. I'm not sure that's more efficient especially for large substrings.

**[Ren]** · 12-23-2004

Originally Posted by itsme86

Except that then you're doing toupper/tolower on each character N times where N is the number of characters in the match substring. I'm not sure that's more efficient especially for large substrings.

Not more efficient than what? Calling malloc twice to make copies? Or modifying the original string to be upper or lower case?

Performance is at most 2N * M times where M is the number of unsuccessful matches and N is number of characters in the match substring because toupper or tolower is used on both source and search string. Performance is N + M where N is the length of source string and M is the length of search string for modifying both source and search string, and two malloc and free calls are almost sure to be much slower. Not to mention memory use for a copy of source and search string and harder code to fix the problem of returning a pointer within the original search string.

Copying is not a good solution because it is slow and is not as simple to do right. Modifying the original string is really not a good solution because it fails for a string literal or permanently changes both strings. The fastest, simplest and shortest correct code using the least memory is strstr rewritten with toupper or tolower for comparisons.

**jim mcnamara** · 12-23-2004

strdup() (SVID3 compliance) is a reasonable way to clone strings for testing.
Assuming the OP is on unix. I don't know if VC++ or .net has strdup().

**gsoft** · 12-23-2004

Originally Posted by [Ren]

Not more efficient than what? Calling malloc twice to make copies? Or modifying the original string to be upper or lower case?

Performance is at most 2N * M times where M is the number of unsuccessful matches and N is number of characters in the match substring because toupper or tolower is used on both source and search string. Performance is N + M where N is the length of source string and M is the length of search string for modifying both source and search string, and two malloc and free calls are almost sure to be much slower. Not to mention memory use for a copy of source and search string and harder code to fix the problem of returning a pointer within the original search string.

Copying is not a good solution because it is slow and is not as simple to do right. Modifying the original string is really not a good solution because it fails for a string literal or permanently changes both strings. The fastest, simplest and shortest correct code using the least memory is strstr rewritten with toupper or tolower for comparisons.

Not sure Im ready to modify built in functions just yet still relatively new to C and not 100% sure on how strstr checks the strings

**quzah** · 12-23-2004

Originally Posted by gsoft

Not sure Im ready to modify built in functions just yet still relatively new to C and not 100% sure on how strstr checks the strings

Pretty much with a few loops. It's a fairly simple concept:

Code:

search for the first letter of the find-me string in the search-me string
when found:
    move down both the search-me and find-me string as long as they both are the same
    if you run out of find-me, you've found your match

Repeat that concept until you run out of search-me string.

Quzah.

**gsoft** · 12-23-2004

I thought it would be simple however didnt think it to be that simple. So follow how strstr does it to point to the first occurance if find-me is found or NULL or 0 if not. However to make it case-insensitive wouldnt I still need to copy the match and string variables.

**Thantos** · 12-23-2004

Here is a case senstive search I did up. I put some optimizations in there (like it would stop searching if there wasn't enough room left in the haystack for the needle). Also I did some pointer magic because well I like pointer magic

From this you should be able to get an insenstive search by just using tolower or toupper. These are relativly fast functions so there is no real need to copy and change the strings.

Code:

char *caseSen(const char *haystack, const char *needle)
{
  int haylen = strlen(haystack);
  int needlen = strlen(needle);
  const char *hayiterator = NULL;
  const char *neediterator = NULL;

  if ( haylen == 0 )
    return NULL;
  if ( needlen == 0 )
    return (char *)haystack;

  for (hayiterator = haystack;
       hayiterator <= haystack + (haylen-needlen);
       hayiterator++)
  {
    neediterator = needle;
    if ( *hayiterator == *neediterator )
    {
      int count=0;
      for (;neediterator<needle+needlen; neediterator++, count++)
        if ( hayiterator[count] != *neediterator)
          break;
      if ( neediterator == needle+needlen )
        return (char*)hayiterator;
    }
  }
  return NULL;
}

A note about tolower and toupper. They don't take a character they take an integer. So you get in the practice of casting the character to an unsigned char to avoid the potential of bad things. IE:

Code:

char fe = 'h';
fe = toupper( (unsigned char)fe);

**itsme86** · 12-23-2004

I had an idea that might make a strstr() function really efficient, but now I don't have time to write a sample program before I leave work. Argh!

Anyway, you could do a kind of "hash" on the substring. Say you just sum up the value of all of the characters in the substring. Then you do the same for the first substrlen characters of the containing-string. Then you walk through the containing-string comparing that "hash" value to the substring one. As you walk you just subtract the value of the first character and add the value of the next character to the sum. If the 2 hash values match then you can run a strncmp() to make sure that they really are the same characters instead of different characters that add up to the same value.

If I have this whole O-notation thing right it should change the efficiency from n^2 to 2n. I'm sure it's probably been done before, but if it hasn't then why not?!

I'll write up an example of my algorithm when I get home since I'm sure that my description of what I'm talking about didn't make any sense to anyone

If anyone has any notion of why this algorithm would suck or wouldn't work then let me know what I've missed

Thread: Searching

Thread Tools

Search Thread

Display

Searching

Similar Threads

Logical errors with seach function

C++ std routines

Visual C++ 2005 linking and file sizes

Searching and matching strings: segmentation fault