Thread: Finding incidence of one string within another

  1. #1
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303

    Finding incidence of one string within another

    I've just done a C exercise from a book, the object of which was to write my own version of a function which returns a pointer to the first incidence of one string within another, or NULL if there is no match. It works, but I just get the feeling it's bloated and could be a lot neater. In particular I'm unhappy about the levels of nesting in the function. Should I be? Any advice? My apologies for the indentation, I used tabs and I couldn't work out how to quickly change tab indents to spaces in Visual Studio C++ for posting to a forum.

    Code:
    #include <stdio.h>
    #include <string.h>
    
    char * string_in(char * str1, char * str2);
    
    
    int main(void)
    {
    	char string[200];
    	char match_string[200];
    	char * result;
    	
    
    	printf("Enter a string to search: ");
    	while (gets(string) && string[0] != '\0')
    	{
    		printf("Enter a string to search for: ");
    		do
    		{
    			if (strlen(gets(match_string)) > strlen(string))
    				printf("\nMust not exceed length of first string. Try again: ");
    		}
    		while (strlen(match_string) > strlen(string));
    
    		result = string_in(string, match_string);
    
    		if (result)
    			printf("String was found at position %d", (result - string + 1));
    		else
    			printf("String was not found.\n\n");
    		printf("\nEnter another string to search: ");
    	
    
    	}
    
    	return 0;
    }
    
    char * string_in(char * s1, char * s2)
    {
    	char first_ch = s2[0];
    	int i, j, match = 0;
    	int len_s1 = strlen(s1);
    	int len_s2 = strlen(s2);
    
    
    	for (i = 0; i <= (len_s1 - len_s2); i++)
    	{
    		if (s1[i] == first_ch)
    		{
    			match = 1;
    			for (j = 1; j < len_s2; j++)
    			{
    				if (s1[i + j] != s2[j])
    				{
    					match = 0;
    					break;
    				}
    			}
    		}
    		if (match == 1)
    			break;
    	}
    
    	if (match == 1)
    		return (s1 + i);
    	else
    		return NULL;
    }
    Last edited by Sharke; 03-08-2009 at 02:47 PM.

  2. #2
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    You have a working program, now try to reduce the code size by removing parts that are not needed.

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Sharke View Post
    It works, but I just get the feeling it's bloated and could be a lot neater.
    I get the same feeling. The nesting is fine, but think: you can return from the middle of the function on the first match, or just return NULL by default at the end if there was no match. So you don't need the boolean flag (int=0 or 1)...you just need your "breaks" and "returns" in the right places.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    Ah, so return from the j loop when the null character is reached, or break on the first mismatch of characters. Of course! I had a feeling the flag wasn't needed.

  5. #5
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    AFAICT, there's a lot you can prune from string_in(). Note reaching the end of s2 means it matched so you can use that instead of the inner loop to return from the function.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by itCbitC View Post
    AFAICT, there's a lot you can prune from string_in(). Note reaching the end of s2 means it matched so you can use that instead of the inner loop to return from the function.
    The OP might, in fact, want to be sure some boundaries are not exceeded in this, although I guess just peeking over the edge won't hurt.

    [edit] Nevermind that won't happen...what have I been thinking
    Last edited by MK27; 03-08-2009 at 05:23 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    Quote Originally Posted by itCbitC View Post
    AFAICT, there's a lot you can prune from string_in(). Note reaching the end of s2 means it matched so you can use that instead of the inner loop to return from the function.

    If what you mean is that I can return the successful value if the inner loop completes in full, then that means I have to break out of the inner loop if a mismatch occurs. The trouble is, using a break would take me to the next statement after the inner loop, which in this case would be the successful 'return' statement. I can't think of a way to break out of the inner loop without encountering the successful return, unless I use a goto (wince).

    The only thing I've been able to think of so far is to test for end-of-string in the inner loop and return if so....and to break out of the inner loop on a mismatch. It's pruned a little, but not much!

    Code:
    char * string_in(char * s1, char * s2)
    {
      char first_ch = s2[0];
      int i, j;
      int len_s1 = strlen(s1);
      int len_s2 = strlen(s2);
    
      for (i = 0; i <= (len_s1 - len_s2); i++)
        {
          if (s1[i] == first_ch)
    	{
    	  for (j = 1; j <= len_s2; j++)
    	    {
    	       if (s2[j] == '\0')
    	         return (s1 + i);
    	       else if (s1[i + j] != s2[j])
    		  break;
    	    }
            }
        }
      return NULL;
    }

  8. #8
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Don't need the inner for loop and no need to get the length of strings s1 and s2. Note end of s2 means match found while end of s1 means no match found. Use this for the terminal condition instead of the strlen()'s as in.
    Code:
      for (i = 0; s[i]; i++)

  9. #9
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    I don't mean to sound like an idiot here, but how can I get by without an inner for loop if the full string has to be matched? The outer loop iterates through each letter of s1...now when a letter of s1 matches the first letter of s2, then surely we have to iterate through the letters of s2 in order to see if they all match in sequence within s1?

    Just to clarify here - I'm doing this without any external string functions (except strlen). However it didn't specify this in the question and I've just come across the author's solution online:

    Code:
    #include <string.h>
    char * string_in(const char * s1, const char * s2)
    {
        int l2 = strlen(s2);
        int tries;            /* maximum number of comparisons    */
        int nomatch = 1;    /* set to 0 if match is found        */
        
        tries = strlen(s1) + 1 - l2;
        if (tries > 0)
            while (( nomatch = strncmp(s1, s2, l2)) && tries--)
                s1++;
        if (nomatch)
            return NULL;
        else
            return (char *) s1;  /* cast const away */
    }
    But I set out to do it without strncmp of course - I had thought it was an exercise in writing a function from scratch!
    Last edited by Sharke; 03-08-2009 at 08:13 PM.

  10. #10
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    here's my 2c:
    Code:
    char *string_in(char *s1, char *s2)
    {
        int i,j;
    
        for (i=j=0; s1[j]; j++) {
            if (s2[i] == s1[j]) {
                if (!s2[++i])
                    return (s1+j+1-i);
            }
            else if (i)
                i = 0;
        }
        return NULL;
    }

  11. #11
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    That's a neat idea (once I'd wrapped my noob head around it!) but unfortunately it has a flaw which becomes evident if there are repetitions in string s1, e.g:

    Code:
    Enter a string to search: BooBoos
    Enter a string to search for: Boos
    String not found.

  12. #12
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Good catch! I did make a booboo and here's the updated one.
    Code:
    char *string_in(char *s1, char *s2)
    {
        int i,j;
    
        for (i=j=0; s1[j]; j++) {
            if (s2[i] != s1[j])
                i = 0;
            else
                if (!s2[++i])
                    return (s1+j+1-i);
        }
        return NULL;
    }
    Last edited by itCbitC; 03-08-2009 at 10:23 PM.

  13. #13
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    Still not working, although I'm not clever enough to work out why at this point in time:

    Code:
    Enter a string to search:Boo Boos
    Enter a string to search for: Boos
    String was found at position 5
    Enter a string to search: BooBoos
    Enter a string to search for: Boos
    String not found.

  14. #14
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Argh! copied and pasted from the wrong window.
    Code:
    char *string_in(char *s1, char *s2)
    {
        int i,j;
    
        for (i=j=0; s1[j]; j++) {
            if (s2[i] != s1[j])
                i = 0;
            if (s2[i] == s1[j])
                if (!s2[++i])
                    return (s1+j+1-i);
        }
        return NULL;
    }

  15. #15
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    I note that this:
    Code:
    if (s2[i] == s1[j])
        if (!s2[++i])
            return (s1+j+1-i);
    can be simplified to:
    Code:
    if (s2[i] == s1[j] && !s2[++i])
        return (s1+j+1-i);
    Last edited by laserlight; 03-09-2009 at 01:29 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. RicBot
    By John_ in forum C++ Programming
    Replies: 8
    Last Post: 06-13-2006, 06:52 PM
  2. problems with overloaded '+' again
    By Brain Cell in forum C++ Programming
    Replies: 9
    Last Post: 04-14-2005, 05:13 PM
  3. Linked List Help
    By CJ7Mudrover in forum C Programming
    Replies: 9
    Last Post: 03-10-2004, 10:33 PM
  4. creating class, and linking files
    By JCK in forum C++ Programming
    Replies: 12
    Last Post: 12-08-2002, 02:45 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM