Thread: Searching

  1. #1
    Registered User
    Join Date
    Jul 2004
    Posts
    68

    Searching

    In my book there is a exercise for me to try and do a search which is both case sensitive and insensitive

    I did it however Im interested in knowing if there is an easier way or more efficient way

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    
    char *caseSensitive(char *string, char *match);
    char *caseInSensitive(char *string, char *match);
    
    int main(void)
    {
    	
    	char string[100], match[100], *ptr, *pos;
    	int searchtype;
    	
    	printf("Enter a string to be searched: ");
    	fgets(string, sizeof string, stdin);
    	
    	if ((ptr=strchr(string,'\n'))!=NULL)
    		*ptr = '\0';
    	
    	printf("String to match: ");
    	fgets(match, sizeof match, stdin);
    	
    	if ((ptr=strchr(match,'\n'))!=NULL)
    		*ptr = '\0';
    	
    	printf("\n1 for case sensitive search\n2 for case insensitive search\n");
    	scanf("%d",&searchtype);
    	
    	if (searchtype == 1)
    	{
    		pos = caseSensitive(string, match);
    		if (pos==NULL)
    			printf("%s wasnt found in %s\n",match,string);
    		else
    			printf("%s was found in %s at position %d\n",match, string,(pos-string));
    	} 
    	else
    	{
    		pos = caseInSensitive(string, match);
    		if (pos==NULL)
    			printf("%s wasnt found in %s\n",match,string);
    		else
    			printf("%s was found in %s at position %d\n",match, string,(pos-string));
    	}
    	
    	return 0;
    
    }
    
    char *caseSensitive(char *string, char *match)
    {
    
    	char *pos;
    	pos = strstr(string,match);
    	return pos;
    
    }
    
    char *caseInSensitive(char *string, char *match)
    {
    	
    	char *pos;
    	int i;
    	
    	for (i=0;i<strlen(string);i++)
    		(string)[i]=tolower((string)[i]);
    	
    	for (i=0;i<strlen(match);i++)
    		(match)[i]=tolower((match)[i]);
    	
    	pos = strstr(string,match);
    	return pos;
    
    }

  2. #2
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    here's is (somethign like) what i'd do..forgive me for syntax and/or logic errors...i just did this off the top of my head and did not put a whole lot of thought into it...my main objective was to show you how to use 'char s' and 'char m'......you can play around with the code to make it best fit your needs......(remember not to increment 'str' and 'mat' before trying to dereference them (ex:*str) within the loop. if you don't know what a seg fault is, then try it once. and try to guess why it happens)
    Code:
    int caseInsensitive(char *str, char *mat)
    {
    	//this is just an idea - you must take into consideration that 1 string may be longer
    	//than the other but matches up to the end of the shortest string
            int i = 0;				
            char s, m;
    	while(*str && *mat)
    	{
    		s = tolower(*str);
    		m = tolower(*mat);
    		if(!(s == m))
    			return 0;		//return zero - match failed
    		str++;
    		mat++;
    		i++;	
    	}
    	return i;				//return the number of characters matched 
    						// (only if match was successful to end of shortest string)
    						// (should be the length of the shortest string)
    }
    if you wanted return the string that match you could simply append a character onto a string within the 'if(s == m)' block...in your case, that string would be 'pos'

    here is a bit of inconsistency here...you'll probably actually want to return the (index of the character+1) in which the match failed. if the match did not fail, return 0.
    Last edited by misplaced; 12-23-2004 at 03:43 AM.
    i seem to have GCC 3.3.4
    But how do i start it?
    I dont have a menu for it or anything.

  3. #3
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> pos = strstr(string,match);

    isn't the point of this excercise to write your *own* version of strstr?

    >> (string)[i]=tolower((string)[i]);

    big no no. the function should not modify the contents of the arguments in this case.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  4. #4
    Registered User
    Join Date
    Dec 2004
    Posts
    20
    Code:
    	for (i=0;i<strlen(string);i++)
    		(string)[i]=tolower((string)[i]);
    	
    	for (i=0;i<strlen(match);i++)
    		(match)[i]=tolower((match)[i]);
    What if either string or match is a string literal? C does not allow you to change the characters of a string literal, so your function would fail. You might make a copy of each string, but that is not efficient. It is better in this case not to use the standard string library.

  5. #5
    Registered User
    Join Date
    Jul 2004
    Posts
    68
    Quote Originally Posted by Sebastiani
    >> pos = strstr(string,match);

    isn't the point of this excercise to write your *own* version of strstr?

    >> (string)[i]=tolower((string)[i]);

    big no no. the function should not modify the contents of the arguments in this case.
    No its wasnt the point it was to use it in away that would allow case sensitve and case insensitive searches which where required to be one function each.

    (string)[i]=tolower((string)[i]);

    So if i used a pointer within that function to point to string and use that new pointer e.g.

    Code:
    char *caseInSensitive(char *string, char *match)
    {
    	
    	char *pos, *string2, *match2;
    	int i;
    	
    	for (i=0;i<strlen(string);i++)
    		(string2)[i]=tolower((string)[i]);
    	
    	for (i=0;i<strlen(match);i++)
    		(match2)[i]=tolower((match)[i]);
    	
    	pos = strstr(string2,match2);
    	return pos;
    
    }
    Would that be better as its not modifying to pointers being sent to the function.

    Ren, so what would I do then in the above code I guess you say I copy it what is a more efficient way.

  6. #6
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> Would that be better as its not modifying to pointers being sent to the function.

    the problem is, whenever the user passes the two strings to the function, the strings get permanantly lowercased. that's not what a 'test' function is supposed to do.

    >> Ren, so what would I do then in the above code I guess you say I copy it what is a more efficient way.

    it's not a question of efficiency. try calling the function like this:

    Code:
    char * found = caseSensitive("Foobar", "foo");
    it crashes because you attempt to modify the contents of a read-only string. make a copy of the data in the function, lowercase the copy, then do your comparison. better yet, go with a solution similar to itsme86's that doesn't require a copy at all.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  7. #7
    Registered User
    Join Date
    Dec 2004
    Posts
    20
    Code:
    char *caseInSensitive(char *string, char *match)
    {
    	
    	char *pos, *string2, *match2;
    	int i;
    	
    	for (i=0;i<strlen(string);i++)
    		(string2)[i]=tolower((string)[i]);
    	
    	for (i=0;i<strlen(match);i++)
    		(match2)[i]=tolower((match)[i]);
    	
    	pos = strstr(string2,match2);
    	return pos;
    
    }
    This will fail too. string2 and match2 do not point to memory you can write to. The right way to copy is this.
    Code:
    char *caseInSensitive(char *string, char *match)
    {
    	
    	char *pos;
            char *string2 = malloc(strlen(string) + 1);
            char *match2 = malloc(strlen(match) + 1);
    	int i;
    	
    	for (i=0;i<strlen(string);i++)
    		(string2)[i]=tolower((string)[i]);
    	string2[i] = '\0';
    	for (i=0;i<strlen(match);i++)
    		(match2)[i]=tolower((match)[i]);
    	match2[i] = '\0';
    	pos = strstr(string2,match2);
            free(match2);
    	return pos;
    
    }
    Problem now is pos does not point within string, so you can not free string2 and changes to pos will not be reflected in string. A better and more efficient way is to rewrite strstr using toupper or tolower when comparing characters.

  8. #8
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    A better and more efficient way is to rewrite strstr using toupper or tolower when comparing characters.
    Except that then you're doing toupper/tolower on each character N times where N is the number of characters in the match substring. I'm not sure that's more efficient especially for large substrings.
    If you understand what you're doing, you're not learning anything.

  9. #9
    Registered User
    Join Date
    Dec 2004
    Posts
    20
    Quote Originally Posted by itsme86
    Except that then you're doing toupper/tolower on each character N times where N is the number of characters in the match substring. I'm not sure that's more efficient especially for large substrings.
    Not more efficient than what? Calling malloc twice to make copies? Or modifying the original string to be upper or lower case?

    Performance is at most 2N * M times where M is the number of unsuccessful matches and N is number of characters in the match substring because toupper or tolower is used on both source and search string. Performance is N + M where N is the length of source string and M is the length of search string for modifying both source and search string, and two malloc and free calls are almost sure to be much slower. Not to mention memory use for a copy of source and search string and harder code to fix the problem of returning a pointer within the original search string.

    Copying is not a good solution because it is slow and is not as simple to do right. Modifying the original string is really not a good solution because it fails for a string literal or permanently changes both strings. The fastest, simplest and shortest correct code using the least memory is strstr rewritten with toupper or tolower for comparisons.

  10. #10
    .
    Join Date
    Nov 2003
    Posts
    307
    strdup() (SVID3 compliance) is a reasonable way to clone strings for testing.
    Assuming the OP is on unix. I don't know if VC++ or .net has strdup().

  11. #11
    Registered User
    Join Date
    Jul 2004
    Posts
    68
    Quote Originally Posted by [Ren]
    Not more efficient than what? Calling malloc twice to make copies? Or modifying the original string to be upper or lower case?

    Performance is at most 2N * M times where M is the number of unsuccessful matches and N is number of characters in the match substring because toupper or tolower is used on both source and search string. Performance is N + M where N is the length of source string and M is the length of search string for modifying both source and search string, and two malloc and free calls are almost sure to be much slower. Not to mention memory use for a copy of source and search string and harder code to fix the problem of returning a pointer within the original search string.

    Copying is not a good solution because it is slow and is not as simple to do right. Modifying the original string is really not a good solution because it fails for a string literal or permanently changes both strings. The fastest, simplest and shortest correct code using the least memory is strstr rewritten with toupper or tolower for comparisons.
    Not sure Im ready to modify built in functions just yet still relatively new to C and not 100% sure on how strstr checks the strings

  12. #12
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by gsoft
    Not sure Im ready to modify built in functions just yet still relatively new to C and not 100% sure on how strstr checks the strings
    Pretty much with a few loops. It's a fairly simple concept:
    Code:
    search for the first letter of the find-me string in the search-me string
    when found:
        move down both the search-me and find-me string as long as they both are the same
        if you run out of find-me, you've found your match
    Repeat that concept until you run out of search-me string.

    Quzah.
    Hope is the first step on the road to disappointment.

  13. #13
    Registered User
    Join Date
    Jul 2004
    Posts
    68
    I thought it would be simple however didnt think it to be that simple. So follow how strstr does it to point to the first occurance if find-me is found or NULL or 0 if not. However to make it case-insensitive wouldnt I still need to copy the match and string variables.

  14. #14
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Here is a case senstive search I did up. I put some optimizations in there (like it would stop searching if there wasn't enough room left in the haystack for the needle). Also I did some pointer magic because well I like pointer magic

    From this you should be able to get an insenstive search by just using tolower or toupper. These are relativly fast functions so there is no real need to copy and change the strings.

    Code:
    char *caseSen(const char *haystack, const char *needle)
    {
      int haylen = strlen(haystack);
      int needlen = strlen(needle);
      const char *hayiterator = NULL;
      const char *neediterator = NULL;
    
      if ( haylen == 0 )
        return NULL;
      if ( needlen == 0 )
        return (char *)haystack;
    
      for (hayiterator = haystack;
           hayiterator <= haystack + (haylen-needlen);
           hayiterator++)
      {
        neediterator = needle;
        if ( *hayiterator == *neediterator )
        {
          int count=0;
          for (;neediterator<needle+needlen; neediterator++, count++)
            if ( hayiterator[count] != *neediterator)
              break;
          if ( neediterator == needle+needlen )
            return (char*)hayiterator;
        }
      }
      return NULL;
    }
    A note about tolower and toupper. They don't take a character they take an integer. So you get in the practice of casting the character to an unsigned char to avoid the potential of bad things. IE:
    Code:
    char fe = 'h';
    fe = toupper( (unsigned char)fe);
    Last edited by Thantos; 12-23-2004 at 06:03 PM.

  15. #15
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    I had an idea that might make a strstr() function really efficient, but now I don't have time to write a sample program before I leave work. Argh!

    Anyway, you could do a kind of "hash" on the substring. Say you just sum up the value of all of the characters in the substring. Then you do the same for the first substrlen characters of the containing-string. Then you walk through the containing-string comparing that "hash" value to the substring one. As you walk you just subtract the value of the first character and add the value of the next character to the sum. If the 2 hash values match then you can run a strncmp() to make sure that they really are the same characters instead of different characters that add up to the same value.

    If I have this whole O-notation thing right it should change the efficiency from n^2 to 2n. I'm sure it's probably been done before, but if it hasn't then why not?!

    I'll write up an example of my algorithm when I get home since I'm sure that my description of what I'm talking about didn't make any sense to anyone If anyone has any notion of why this algorithm would suck or wouldn't work then let me know what I've missed
    Last edited by itsme86; 12-23-2004 at 05:50 PM.
    If you understand what you're doing, you're not learning anything.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Logical errors with seach function
    By Taka in forum C Programming
    Replies: 4
    Last Post: 09-18-2006, 05:20 AM
  2. C++ std routines
    By siavoshkc in forum C++ Programming
    Replies: 33
    Last Post: 07-28-2006, 12:13 AM
  3. Visual C++ 2005 linking and file sizes
    By Rune Hunter in forum C++ Programming
    Replies: 2
    Last Post: 11-12-2005, 10:41 PM
  4. Searching and matching strings: segmentation fault
    By Smola in forum C Programming
    Replies: 18
    Last Post: 07-11-2005, 12:25 AM