Looking for optimal solution to: bool HasSameChars()

**brooksbp** · 10-21-2009

Create an algorithm that checks whether two strings have the same characters (duplicates don't matter). Assuming a char == 1 byte so there are 2^8 = 256 possible characters.

Function prototype given: int HasSameChars(char *c1, char *c2);

------------------------------

The algorithm that I've come up with is:

Code:

- array a[256], b[256]
- for char in c1:
      a[char] = 'y';  // meaning we flag with 'y' that it exists
- for char in c2:
      b[char] = 'y';
- for i = 0 to 256:
      if a[i] != b[i]
           return false;
- return true;

The following in C code is this...

PHP Code:


#include <stdio.h>

#include <string.h>



#define MAX_NUM_CHARS 256

#define TRUE  1

#define FALSE 0



int HasSameChars(char *c1, char *c2)

{

    char tbl[2][MAX_NUM_CHARS];

    int i = 0;



    if (c1 == NULL || c2 == NULL)

    {

    printf("invalid input\n");

    return FALSE;

    }



    memset(tbl, 0, 2 * MAX_NUM_CHARS);



    for (i = 0; i < strlen(c1); i++)

    {

    tbl[0][*(c1+i)] = 'm';

    }



    for (i = 0; i < strlen(c2); i++)

    {

    tbl[1][*(c2+i)] = 'm';

    }



    for (i = 0; i < MAX_NUM_CHARS; i++)

    {

    if (tbl[0][i] != tbl[1][i])

    {

        return FALSE;

    }

    }



    return TRUE;

}



int main()

{



    char str1[] = "aflkj4 lkj34l fkj alkdjf flkaj wecae4cp9cj alkdjgfdsf";

    char str2[] = "aflkj4 lkj34l fkj alkdjf flkaj wecae4cp9cj alkdjgfdsf";



    if (HasSameChars(str1, str2))

    {

    printf("has same chars\n");

    } else {

    printf("does not have same chars\n");

    }



    return 0;

}

Is this the optimal solution? I have a feeling it's not the optimal solution in terms of space, but it seems speedy...

Code:

bash-4.0$ time cmpr
has same chars

real	0m0.003s
user	0m0.001s
sys	0m0.002s

**tabstop** · 10-21-2009

Naturally, you will have to choose. You can have good space performance, or you can have good time performance. It is a rare case indeed when you will have both. (EDIT: Although 512 bytes is a pretty small price to pay for the time, IMO.)

**Sebastiani** · 10-21-2009

Looks fine. This may be a little more compact, but not much:

Code:

int HasSameChars(const char *c1, const char *c2)
{
	char tbl1[MAX_NUM_CHARS] = {FALSE}, tbl2[MAX_NUM_CHARS] = {FALSE};
	while(*c1)
		tbl1[*c1++] = TRUE;
	while(*c2)
		tbl2[*c2++] = TRUE;
	return memcmp(tbl1, tbl2, MAX_NUM_CHARS) == 0; 
}

**brooksbp** · 10-21-2009

Yes. Can anyone find a faster way to do this?

**brooksbp** · 10-21-2009

Originally Posted by tabstop

Naturally, you will have to choose. You can have good space performance, or you can have good time performance. It is a rare case indeed when you will have both. (EDIT: Although 512 bytes is a pretty small price to pay for the time, IMO.)

Originally Posted by Sebastiani

Looks fine. This may be a little more compact, but not much:

Code:

int HasSameChars(const char *c1, const char *c2)
{
	char tbl1[MAX_NUM_CHARS] = {FALSE}, tbl2[MAX_NUM_CHARS] = {FALSE};
	while(*c1)
		tbl1[*c1++] = TRUE;
	while(*c2)
		tbl2[*c2++] = TRUE;
	return memcmp(tbl1, tbl2, MAX_NUM_CHARS) == 0; 
}

This seems to be slightly slower on a few trials, but less bloated and prettier

Thanks for that!

**Adak** · 10-21-2009

I don't like it because

1) It doesn't always work. Put an 'a' as the last char in s1 and leave s2 as is. Fail.

2) It's not clear what's with all the memset with 'm' is (to me at least). Why not use the very simple function like this:

Code:

int hasSameChars(char *s1, char *s2) {
  int i = 0;

  while(s1[i] == s2[i])
    ++i;
   if(i && s1[--i] == '\0')
     return TRUE;
   return FALSE;
}

it's shorter, clearer, needs less memory, and faster.

**tabstop** · 10-21-2009

Originally Posted by Adak

I don't like it because

1) It doesn't always work. Put an 'a' as the last char in s1 and leave s2 as is. Fail.

2) It's not clear what's with all the memset with 'm' is (to me at least). Why not use the very simple function like this:

Code:

int hasSameChars(char *s1, char *s2) {
  int i = 0;

  while(s1[i] == s2[i])
    ++i;
   if(i && s1[--i] == '\0')
     return TRUE;
   return FALSE;
}

it's shorter, clearer, needs less memory, and faster.

The only downfall is that it does something completely different than the goal of the original function. (You appear to be doing isTheSameString.)

**quzah** · 10-21-2009

Do you consider these to pass or fail?

abcd
abccd

They have the same characters, but they don't have the exact same characters. Where as these do:

acbcd
abccd

What about this?

abccd
abcdd

Does that pass the 'same characters' test, or not?

Quzah.

**Adak** · 10-21-2009

Ah! <lightbulb++>

So the strings being identical in the OP's program, was just a coinci-dink!

Cunning!

**brooksbp** · 10-21-2009

Originally Posted by quzah

Do you consider these to pass or fail?

abcd
abccd

They have the same characters, but they don't have the exact same characters. Where as these do:

acbcd
abccd

What about this?

abccd
abcdd

Does that pass the 'same characters' test, or not?

Quzah.

Duplicates don't matter. So yes the function should return true on abccd/abcdd.

So the strings being identical in the OP's program, was just a coinci-dink!

Yeah. It just so happened that the last test case I ran was testing for a positive match.

Anymore suggestions for a quicker solution/algorithm?

**Adak** · 10-21-2009

My only suggestion is to use:

Code:

int len;

len = strlen(c1); // or (c2)

for(i = 0; i < len; i++)
//etc.

Which eliminates the multiple calling for strlen(), during the for loop.

As a practical optimization, you know that letters and numbers will all <= ascii of 'z'. No need to loop beyond that, unless you have
foreign languages, special math char's, etc.

**quzah** · 10-21-2009

Code:

int foo( char *s1, char *s2 )
{
    return strspn( s1, s2 ) == strlen( s1 ) && strspn( s2, s1 ) == strlen( s2 );
}

Quzah.

**Adak** · 10-21-2009

I wonder how that function compares for run-time?

Good one, Quzah.

**Sebastiani** · 10-21-2009

Originally Posted by quzah

Code:

int foo( char *s1, char *s2 )
{
    return strspn( s1, s2 ) == strlen( s1 ) && strspn( s2, s1 ) == strlen( s2 );
}

Quzah.

Interesting. I wouldn't have thought of that one.

**iMalc** · 10-22-2009

Originally Posted by brooksbp

This seems to be slightly slower on a few trials, but less bloated and prettier

Thanks for that!

That one has my vote!
However, I cannot believe that it could be slower than the code that makes calls to strlen for each iteration through two of the loops. That's just not possible!

This method has great Big-Oh notation running time, but it has a high fixed-overhead! An optimal solution would switch to a method with worse Big-Oh notation but lower overhead if either (or perhaps both) of the strings are shorter than a certain length. E.g. Comparing "az" to "za" would clearly be faster using a different technique.
Note that you may or may not care about that if short strings are not particularly common.

One could reduce the fixed-overhead by using a technique similar to a flipping-Z-Buffer.

Thread: Looking for optimal solution to: bool HasSameChars()

Thread Tools

Search Thread

Display

Looking for optimal solution to: bool HasSameChars()

Similar Threads

C++ ini file reader problems

BOOL bool ? unresolved external symbol

C problem with legacy code

Problem displaying bitmaps

Need Solution!!!