Thread: Looking for optimal solution to: bool HasSameChars()

  1. #1
    Registered User
    Join Date
    Jan 2007
    Posts
    40

    Looking for optimal solution to: bool HasSameChars()

    Create an algorithm that checks whether two strings have the same characters (duplicates don't matter). Assuming a char == 1 byte so there are 2^8 = 256 possible characters.

    Function prototype given: int HasSameChars(char *c1, char *c2);

    ------------------------------

    The algorithm that I've come up with is:

    Code:
    - array a[256], b[256]
    - for char in c1:
          a[char] = 'y';  // meaning we flag with 'y' that it exists
    - for char in c2:
          b[char] = 'y';
    - for i = 0 to 256:
          if a[i] != b[i]
               return false;
    - return true;
    The following in C code is this...

    PHP Code:
    #include <stdio.h>
    #include <string.h>

    #define MAX_NUM_CHARS 256
    #define TRUE  1
    #define FALSE 0

    int HasSameChars(char *c1char *c2)
    {
        
    char tbl[2][MAX_NUM_CHARS];
        
    int i 0;

        if (
    c1 == NULL || c2 == NULL)
        {
        
    printf("invalid input\n");
        return 
    FALSE;
        }

        
    memset(tbl0MAX_NUM_CHARS);

        for (
    0strlen(c1); i++)
        {
        
    tbl[0][*(c1+i)] = 'm';
        }

        for (
    0strlen(c2); i++)
        {
        
    tbl[1][*(c2+i)] = 'm';
        }

        for (
    0MAX_NUM_CHARSi++)
        {
        if (
    tbl[0][i] != tbl[1][i])
        {
            return 
    FALSE;
        }
        }

        return 
    TRUE;
    }

    int main()
    {

        
    char str1[] = "aflkj4 lkj34l fkj alkdjf flkaj wecae4cp9cj alkdjgfdsf";
        
    char str2[] = "aflkj4 lkj34l fkj alkdjf flkaj wecae4cp9cj alkdjgfdsf";

        if (
    HasSameChars(str1str2))
        {
        
    printf("has same chars\n");
        } else {
        
    printf("does not have same chars\n");
        }

        return 
    0;



    Is this the optimal solution? I have a feeling it's not the optimal solution in terms of space, but it seems speedy...

    Code:
    bash-4.0$ time cmpr
    has same chars
    
    real	0m0.003s
    user	0m0.001s
    sys	0m0.002s

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Naturally, you will have to choose. You can have good space performance, or you can have good time performance. It is a rare case indeed when you will have both. (EDIT: Although 512 bytes is a pretty small price to pay for the time, IMO.)

  3. #3
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Looks fine. This may be a little more compact, but not much:

    Code:
    int HasSameChars(const char *c1, const char *c2)
    {
    	char tbl1[MAX_NUM_CHARS] = {FALSE}, tbl2[MAX_NUM_CHARS] = {FALSE};
    	while(*c1)
    		tbl1[*c1++] = TRUE;
    	while(*c2)
    		tbl2[*c2++] = TRUE;
    	return memcmp(tbl1, tbl2, MAX_NUM_CHARS) == 0; 
    }

  4. #4
    Registered User
    Join Date
    Jan 2007
    Posts
    40
    Yes. Can anyone find a faster way to do this?

  5. #5
    Registered User
    Join Date
    Jan 2007
    Posts
    40
    Quote Originally Posted by tabstop View Post
    Naturally, you will have to choose. You can have good space performance, or you can have good time performance. It is a rare case indeed when you will have both. (EDIT: Although 512 bytes is a pretty small price to pay for the time, IMO.)
    Quote Originally Posted by Sebastiani View Post
    Looks fine. This may be a little more compact, but not much:

    Code:
    int HasSameChars(const char *c1, const char *c2)
    {
    	char tbl1[MAX_NUM_CHARS] = {FALSE}, tbl2[MAX_NUM_CHARS] = {FALSE};
    	while(*c1)
    		tbl1[*c1++] = TRUE;
    	while(*c2)
    		tbl2[*c2++] = TRUE;
    	return memcmp(tbl1, tbl2, MAX_NUM_CHARS) == 0; 
    }
    This seems to be slightly slower on a few trials, but less bloated and prettier Thanks for that!

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I don't like it because


    1) It doesn't always work. Put an 'a' as the last char in s1 and leave s2 as is. Fail.

    2) It's not clear what's with all the memset with 'm' is (to me at least). Why not use the very simple function like this:

    Code:
    int hasSameChars(char *s1, char *s2) {
      int i = 0;
    
      while(s1[i] == s2[i])
        ++i;
       if(i && s1[--i] == '\0')
         return TRUE;
       return FALSE;
    }
    it's shorter, clearer, needs less memory, and faster.

  7. #7
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Adak View Post
    I don't like it because


    1) It doesn't always work. Put an 'a' as the last char in s1 and leave s2 as is. Fail.

    2) It's not clear what's with all the memset with 'm' is (to me at least). Why not use the very simple function like this:

    Code:
    int hasSameChars(char *s1, char *s2) {
      int i = 0;
    
      while(s1[i] == s2[i])
        ++i;
       if(i && s1[--i] == '\0')
         return TRUE;
       return FALSE;
    }
    it's shorter, clearer, needs less memory, and faster.
    The only downfall is that it does something completely different than the goal of the original function. (You appear to be doing isTheSameString.)

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Do you consider these to pass or fail?

    abcd
    abccd

    They have the same characters, but they don't have the exact same characters. Where as these do:

    acbcd
    abccd

    What about this?

    abccd
    abcdd

    Does that pass the 'same characters' test, or not?


    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Ah! <lightbulb++>

    So the strings being identical in the OP's program, was just a coinci-dink!

    Cunning!

  10. #10
    Registered User
    Join Date
    Jan 2007
    Posts
    40
    Quote Originally Posted by quzah View Post
    Do you consider these to pass or fail?

    abcd
    abccd

    They have the same characters, but they don't have the exact same characters. Where as these do:

    acbcd
    abccd

    What about this?

    abccd
    abcdd

    Does that pass the 'same characters' test, or not?


    Quzah.
    Duplicates don't matter. So yes the function should return true on abccd/abcdd.

    So the strings being identical in the OP's program, was just a coinci-dink!
    Yeah. It just so happened that the last test case I ran was testing for a positive match.


    Anymore suggestions for a quicker solution/algorithm?

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    My only suggestion is to use:

    Code:
    int len;
    
    len = strlen(c1); // or (c2)
    
    for(i = 0; i < len; i++)
    //etc.
    Which eliminates the multiple calling for strlen(), during the for loop.

    As a practical optimization, you know that letters and numbers will all <= ascii of 'z'. No need to loop beyond that, unless you have
    foreign languages, special math char's, etc.
    Last edited by Adak; 10-21-2009 at 05:46 PM.

  12. #12
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826

    Lightbulb

    Code:
    int foo( char *s1, char *s2 )
    {
        return strspn( s1, s2 ) == strlen( s1 ) && strspn( s2, s1 ) == strlen( s2 );
    }
    Quzah.
    Hope is the first step on the road to disappointment.

  13. #13
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I wonder how that function compares for run-time?

    Good one, Quzah.

  14. #14
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by quzah View Post
    Code:
    int foo( char *s1, char *s2 )
    {
        return strspn( s1, s2 ) == strlen( s1 ) && strspn( s2, s1 ) == strlen( s2 );
    }
    Quzah.
    Interesting. I wouldn't have thought of that one.

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by brooksbp View Post
    This seems to be slightly slower on a few trials, but less bloated and prettier Thanks for that!
    That one has my vote!
    However, I cannot believe that it could be slower than the code that makes calls to strlen for each iteration through two of the loops. That's just not possible!

    This method has great Big-Oh notation running time, but it has a high fixed-overhead! An optimal solution would switch to a method with worse Big-Oh notation but lower overhead if either (or perhaps both) of the strings are shorter than a certain length. E.g. Comparing "az" to "za" would clearly be faster using a different technique.
    Note that you may or may not care about that if short strings are not particularly common.

    One could reduce the fixed-overhead by using a technique similar to a flipping-Z-Buffer.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++ ini file reader problems
    By guitarist809 in forum C++ Programming
    Replies: 7
    Last Post: 09-04-2008, 06:02 AM
  2. BOOL bool ? unresolved external symbol
    By xwielder in forum C Programming
    Replies: 6
    Last Post: 05-20-2008, 08:39 AM
  3. C problem with legacy code
    By andy_baptiste in forum C Programming
    Replies: 4
    Last Post: 05-19-2008, 06:14 AM
  4. Problem displaying bitmaps
    By batman123 in forum Game Programming
    Replies: 2
    Last Post: 01-09-2005, 02:01 AM
  5. Need Solution!!!
    By Joanna in forum Windows Programming
    Replies: 4
    Last Post: 11-29-2001, 12:16 PM