Thread: anagram program help

  1. #31
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Why not just do it right:

    Code:
    bool not_letter(char ch_p) { return !isalpha(ch_p); }
    char to_lower(char ch_p) { return tolower(ch_p); }
    
    std::string normalize(const std::string &str_p)
    {
        std::string s(str_p);
        std::remove_copy_if(s.begin(), s.end(), s.begin(), not_letter);
        std::transform(s.begin(), s.end(), s.begin(), to_lower);
        std::sort(s.begin(), s.end());
        return s;
    }
    
    bool is_anagram_of(const std::string &a_p, const std::string &b_p)
    {
        return normalize(a_p) == normalize(b_p);
    }

  2. #32
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Why not just do it right:
    Well, if your not bothered by the cost... indeed:

    Code:
    bool not_letter(char ch_p) { return !isalpha(ch_p); }
    char to_lower(char ch_p) { return tolower(ch_p); }
    
    std::string normalize(std::string s)
    {
        s.erase(std::remove_if(s.begin(), s.end(), not_letter), s.end());
        std::transform(s.begin(), s.end(), s.begin(), to_lower);
        std::sort(s.begin(), s.end());
        return s;
    }
    
    bool is_anagram_of(const std::string &a_p, const std::string &b_p)
    {
        return normalize(a_p) == normalize(b_p);
    }
    Soma

  3. #33
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    Code:
    char * genPhraseVal(const char *phrase,char charmap[26])
    {
            unsigned int i=strlen(phrase);
            while(i)
            {
                    --i;
                    unsigned int cv = tolower(phrase[i])-'a';
                    if(cv<26)
                    {
                            ++charmap[cv];
                    }
            }
            return charmap;
    }
    
    bool anagram(const char *phrase1,const char *phrase2)
    {
            unsigned char result1[26]="00000000000000000000000000";
            unsigned char result2[26]="00000000000000000000000000";
            genPhraseVal(phrase1,result1);
            genPhraseVal(phrase2,result2);
            bool result=true;
            unsigned int i=0;
            while(i<26 && result)
            {
                    result = result1[i]==result2[i];
                    ++i;
            }
            return result;
    }
    well, this is a rather brute force method, but in light of my utter failure at devising a more elegant method, simplicity is better, at least for those of us who are only moderately clever at best.

  4. #34
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by phantomotap View Post
    Well, if your not bothered by the cost... indeed:
    Yikes. Yes, there was a bug there

  5. #35
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Yikes. Yes, there was a bug there
    Don't feel bad; my prime example (^_^) exhibits undefined behavior, is ill-formed, and in all probability the source as posted doesn't even compile. (I used addition rather than multiplication as you suggested, forgot to subtract 'A' in the index arithmetic, and didn't account for '0' appearing during calculation.) I would fix it, but I don't see a good reason to do so.

    Soma

  6. #36
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by phantomotap View Post
    Well, if your not bothered by the cost... indeed:

    Code:
    bool not_letter(char ch_p) { return !isalpha(ch_p); }
    char to_lower(char ch_p) { return tolower(ch_p); }
    
    std::string normalize(std::string s)
    {
        s.erase(std::remove_if(s.begin(), s.end(), not_letter), s.end());
        std::transform(s.begin(), s.end(), s.begin(), to_lower);
        std::sort(s.begin(), s.end());
        return s;
    }
    
    bool is_anagram_of(const std::string &a_p, const std::string &b_p)
    {
        return normalize(a_p) == normalize(b_p);
    }
    Soma

    And of course, cost would be big if you are doing lots and lots of anagrams, but most of these programs are used to perform analysis on hand-entered inputs, or perhaps a few lines of text in a file - I bet you can't come up with ONE anagram that takes 1 second on a 486 processor with that code (or more than about 0.1 second on a modern processor).

    Sure, if you feed it a long list of lots of anagrams and non-anagrams, then efficiency may be important.

    I did a quick search on the web and found a 623-long Shakespearean one, and another that was 1108 characters. In both cases, blanks are included.

    Ok, I take that back:
    http://www.anagrammy.com/long/emergencies
    may actually be long enough to take 1s or so on a 486 with it's 13 990 letters [I think not counting spaces].

    The biggest factor is quicksort (sort()), which has a worst case of O(n**2) -> 13990 * 13990 -> 196000k operations. At 100MHz, with about 20 cycles per operation, we're looking at 2 seconds or so.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #37
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by matsp View Post
    And of course, cost would be big if you are doing lots and lots of anagrams, but most of these programs are used to perform analysis on hand-entered inputs, or perhaps a few lines of text in a file - I bet you can't come up with ONE anagram that takes 1 second on a 486 processor with that code (or more than about 0.1 second on a modern processor).
    Actually, this method is part of a very fast anagram finding algorithm. The target phrase, as well as all the dictionary words, are pre-normalized into this form. Then, there is a (somewhat) clever method of quickly recursing down through words to find complete anagrams.

    In the final count, the time spent in this normalize() function is completely invisible.

    I might post the whole program later.

  8. #38
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I bet you can't come up with ONE anagram that takes 1 second on a 486 processor with that code
    O_o

    I can't "come up" with any, but I can create arbitrarily many. I have a program around here somewhere.

    Anyway, I wasn't thinking about speed of generation but comparison.

    Soma

  9. #39
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    i just executed my and brewbuck's algs 100 times on the 13990 character anagrams

    results are:
    me 78ms
    brewbuck 672ms

    perhaps it's not so unclever afterall
    Last edited by m37h0d; 04-28-2008 at 10:49 AM.

  10. #40
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by m37h0d View Post
    does anyone know a reliable method for clocking speed of execution for something like this?

    for the 13390 character anagrams, my alg consistently executes in <1 ms, and brewbuck's usually comes up as 15 or 16 ms, but occasionally returns in <1 ms also.
    Time a bunch of identical runs and take the smallest time. It's not like the measurement is in error, there's just system noise being added in. The smallest time is how fast it ran when everything else on the system happened "just right."

    Anyway, my method is not really meant for determining if two strings are anagrams of each other. It's part of a larger algorithm for finding complete, multi-word anagrams.

    A better method for one-shot use is to tally how many of each character occurred:

    Code:
    std::vector<int> count_letters(const std::string &str_p)
    {
        std::vector<int> count(26);
        for(int i = 0; i < 26; i++) count[i] = 0;
        for(std::string::const_iterator si = str_p.begin(), end = str_p.end(); si != end; ++si)
        {
            char ch = *si;
            if(isalpha(ch)) ++count[tolower(ch) - 'a'];
        }
        return count;
    }
    
    bool is_anagram_of(const std::string &a_p, const std::string &b_p)
    {
        return (a_p.size() == b_p.size() && count_letters(a_p) == count_letters(b_p));
    }

  11. #41
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    heh yeah i posted before thinking. see edit...

    Quote Originally Posted by brewbuck
    Anyway, my method is not really meant for determining if two strings are anagrams of each other. It's part of a larger algorithm for finding complete, multi-word anagrams.
    that's exactly what mine does.

    Quote Originally Posted by brewbuck
    Anyway, my method is not really meant for determining if two strings are anagrams of each other. It's part of a larger algorithm for finding complete, multi-word anagrams.
    i'm intrigued, you mean for generating possible anagrams from words? that could be a lot of fun

  12. #42
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by m37h0d View Post
    i'm intrigued, you mean for generating possible anagrams from words? that could be a lot of fun
    Yes, something like the Internet Anagram Server.

    I'll post the code tonight. It's on a box I can't access right now.

  13. #43
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by matsp View Post
    Ok, I take that back:
    http://www.anagrammy.com/long/emergencies
    may actually be long enough to take 1s or so on a 486 with it's 13 990 letters [I think not counting spaces].
    I don't find such long anagrams to be interesting, because there are so many letters available it is very easy to arrange them into an order than makes sense. After all, the distribution of letter frequencies in English (and most languages) is basically constant when averaged over large amounts of text.

  14. #44
    Registered User
    Join Date
    Jan 2008
    Posts
    37
    wow .. i haven't checked this in a few days .. thanks for all these replies .. but i was about to submit my own code and see where to go from there ..

    this is what i have right now.. and what this does is take an input like sldkfjLLKJLKJ8(&9879 and only output letters, so the output would be sldkfjLLKJLKJ. Now this is where i'm stuck i need to count how many times each letter comes up in each string and then determine if they match.. how can i start do to that?

    heres my code:
    Code:
    #include <iostream>
    #include <cctype>
    #include <cstring>
    
    using namespace std;
    
    int main (){
    
    
    	char mystring[500];
    	char myalpha[500];
    	cout << "enter some shiz" << endl;
    	cin.getline(mystring, 500, '\n');
     int a=0;
    	for (int i=0; mystring[i] !='\0'; i++){
    		if (isalpha(mystring[i])){
    
    			myalpha[a]=mystring[i];
    			a++;
    		}
    	}
    myalpha[a]='\0';
    	cout << myalpha << endl;
    
    
    	return 0;
    
    }

  15. #45
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    This post shows how to count all characters: http://cboard.cprogramming.com/showp...7&postcount=23

    Todd
    Mainframe assembler programmer by trade. C coder when I can.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 02-21-2008, 10:39 AM
  2. Using variables in system()
    By Afro in forum C Programming
    Replies: 8
    Last Post: 07-03-2007, 12:27 PM
  3. BOOKKEEPING PROGRAM, need help!
    By yabud in forum C Programming
    Replies: 3
    Last Post: 11-16-2006, 11:17 PM
  4. Replies: 3
    Last Post: 03-04-2005, 02:46 PM
  5. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM