anagram program help

**brewbuck** · 04-28-2008

Why not just do it right:

Code:

bool not_letter(char ch_p) { return !isalpha(ch_p); }
char to_lower(char ch_p) { return tolower(ch_p); }

std::string normalize(const std::string &str_p)
{
    std::string s(str_p);
    std::remove_copy_if(s.begin(), s.end(), s.begin(), not_letter);
    std::transform(s.begin(), s.end(), s.begin(), to_lower);
    std::sort(s.begin(), s.end());
    return s;
}

bool is_anagram_of(const std::string &a_p, const std::string &b_p)
{
    return normalize(a_p) == normalize(b_p);
}

**phantomotap** · 04-28-2008

Why not just do it right:

Well, if your not bothered by the cost... indeed:

Code:

bool not_letter(char ch_p) { return !isalpha(ch_p); }
char to_lower(char ch_p) { return tolower(ch_p); }

std::string normalize(std::string s)
{
    s.erase(std::remove_if(s.begin(), s.end(), not_letter), s.end());
    std::transform(s.begin(), s.end(), s.begin(), to_lower);
    std::sort(s.begin(), s.end());
    return s;
}

bool is_anagram_of(const std::string &a_p, const std::string &b_p)
{
    return normalize(a_p) == normalize(b_p);
}

Soma

**m37h0d** · 04-28-2008

Code:

char * genPhraseVal(const char *phrase,char charmap[26])
{
        unsigned int i=strlen(phrase);
        while(i)
        {
                --i;
                unsigned int cv = tolower(phrase[i])-'a';
                if(cv<26)
                {
                        ++charmap[cv];
                }
        }
        return charmap;
}

bool anagram(const char *phrase1,const char *phrase2)
{
        unsigned char result1[26]="00000000000000000000000000";
        unsigned char result2[26]="00000000000000000000000000";
        genPhraseVal(phrase1,result1);
        genPhraseVal(phrase2,result2);
        bool result=true;
        unsigned int i=0;
        while(i<26 && result)
        {
                result = result1[i]==result2[i];
                ++i;
        }
        return result;
}

well, this is a rather brute force method, but in light of my utter failure at devising a more elegant method, simplicity is better, at least for those of us who are only moderately clever at best.

**brewbuck** · 04-28-2008

Originally Posted by phantomotap

Well, if your not bothered by the cost... indeed:

Yikes. Yes, there was a bug there

**phantomotap** · 04-28-2008

Yikes. Yes, there was a bug there

Don't feel bad; my prime example (^_^) exhibits undefined behavior, is ill-formed, and in all probability the source as posted doesn't even compile. (I used addition rather than multiplication as you suggested, forgot to subtract 'A' in the index arithmetic, and didn't account for '0' appearing during calculation.) I would fix it, but I don't see a good reason to do so.

Soma

**matsp** · 04-28-2008

Originally Posted by phantomotap

Well, if your not bothered by the cost... indeed:

Code:

bool not_letter(char ch_p) { return !isalpha(ch_p); }
char to_lower(char ch_p) { return tolower(ch_p); }

std::string normalize(std::string s)
{
    s.erase(std::remove_if(s.begin(), s.end(), not_letter), s.end());
    std::transform(s.begin(), s.end(), s.begin(), to_lower);
    std::sort(s.begin(), s.end());
    return s;
}

bool is_anagram_of(const std::string &a_p, const std::string &b_p)
{
    return normalize(a_p) == normalize(b_p);
}

Soma

And of course, cost would be big if you are doing lots and lots of anagrams, but most of these programs are used to perform analysis on hand-entered inputs, or perhaps a few lines of text in a file - I bet you can't come up with ONE anagram that takes 1 second on a 486 processor with that code (or more than about 0.1 second on a modern processor).

Sure, if you feed it a long list of lots of anagrams and non-anagrams, then efficiency may be important.

I did a quick search on the web and found a 623-long Shakespearean one, and another that was 1108 characters. In both cases, blanks are included.

Ok, I take that back:
http://www.anagrammy.com/long/emergencies
may actually be long enough to take 1s or so on a 486 with it's 13 990 letters [I think not counting spaces].

The biggest factor is quicksort (sort()), which has a worst case of O(n**2) -> 13990 * 13990 -> 196000k operations. At 100MHz, with about 20 cycles per operation, we're looking at 2 seconds or so.

--
Mats

**brewbuck** · 04-28-2008

Originally Posted by matsp

And of course, cost would be big if you are doing lots and lots of anagrams, but most of these programs are used to perform analysis on hand-entered inputs, or perhaps a few lines of text in a file - I bet you can't come up with ONE anagram that takes 1 second on a 486 processor with that code (or more than about 0.1 second on a modern processor).

Actually, this method is part of a very fast anagram finding algorithm. The target phrase, as well as all the dictionary words, are pre-normalized into this form. Then, there is a (somewhat) clever method of quickly recursing down through words to find complete anagrams.

In the final count, the time spent in this normalize() function is completely invisible.

I might post the whole program later.

**phantomotap** · 04-28-2008

I bet you can't come up with ONE anagram that takes 1 second on a 486 processor with that code

O_o

I can't "come up" with any, but I can create arbitrarily many. I have a program around here somewhere.

Anyway, I wasn't thinking about speed of generation but comparison.

Soma

**m37h0d** · 04-28-2008

i just executed my and brewbuck's algs 100 times on the 13990 character anagrams

results are:
me 78ms
brewbuck 672ms

perhaps it's not so unclever afterall

**brewbuck** · 04-28-2008

Originally Posted by m37h0d

does anyone know a reliable method for clocking speed of execution for something like this?

for the 13390 character anagrams, my alg consistently executes in <1 ms, and brewbuck's usually comes up as 15 or 16 ms, but occasionally returns in <1 ms also.

Time a bunch of identical runs and take the smallest time. It's not like the measurement is in error, there's just system noise being added in. The smallest time is how fast it ran when everything else on the system happened "just right."

Anyway, my method is not really meant for determining if two strings are anagrams of each other. It's part of a larger algorithm for finding complete, multi-word anagrams.

A better method for one-shot use is to tally how many of each character occurred:

Code:

std::vector<int> count_letters(const std::string &str_p)
{
    std::vector<int> count(26);
    for(int i = 0; i < 26; i++) count[i] = 0;
    for(std::string::const_iterator si = str_p.begin(), end = str_p.end(); si != end; ++si)
    {
        char ch = *si;
        if(isalpha(ch)) ++count[tolower(ch) - 'a'];
    }
    return count;
}

bool is_anagram_of(const std::string &a_p, const std::string &b_p)
{
    return (a_p.size() == b_p.size() && count_letters(a_p) == count_letters(b_p));
}

**m37h0d** · 04-28-2008

heh yeah i posted before thinking. see edit...

Originally Posted by brewbuck

Anyway, my method is not really meant for determining if two strings are anagrams of each other. It's part of a larger algorithm for finding complete, multi-word anagrams.

that's exactly what mine does.

Originally Posted by brewbuck

Anyway, my method is not really meant for determining if two strings are anagrams of each other. It's part of a larger algorithm for finding complete, multi-word anagrams.

i'm intrigued, you mean for generating possible anagrams from words? that could be a lot of fun

**brewbuck** · 04-28-2008

Originally Posted by m37h0d

i'm intrigued, you mean for generating possible anagrams from words? that could be a lot of fun

Yes, something like the Internet Anagram Server.

I'll post the code tonight. It's on a box I can't access right now.

**brewbuck** · 04-28-2008

Originally Posted by matsp

Ok, I take that back:
http://www.anagrammy.com/long/emergencies
may actually be long enough to take 1s or so on a 486 with it's 13 990 letters [I think not counting spaces].

I don't find such long anagrams to be interesting, because there are so many letters available it is very easy to arrange them into an order than makes sense. After all, the distribution of letter frequencies in English (and most languages) is basically constant when averaged over large amounts of text.

**pjr5043** · 04-28-2008

wow .. i haven't checked this in a few days .. thanks for all these replies .. but i was about to submit my own code and see where to go from there ..

this is what i have right now.. and what this does is take an input like sldkfjLLKJLKJ8(&9879 and only output letters, so the output would be sldkfjLLKJLKJ. Now this is where i'm stuck i need to count how many times each letter comes up in each string and then determine if they match.. how can i start do to that?

heres my code:

Code:

#include <iostream>
#include <cctype>
#include <cstring>

using namespace std;

int main (){


	char mystring[500];
	char myalpha[500];
	cout << "enter some shiz" << endl;
	cin.getline(mystring, 500, '\n');
 int a=0;
	for (int i=0; mystring[i] !='\0'; i++){
		if (isalpha(mystring[i])){

			myalpha[a]=mystring[i];
			a++;
		}
	}
myalpha[a]='\0';
	cout << myalpha << endl;


	return 0;

}