Thread: anagram program

  1. #1
    Registered User meriororen's Avatar
    Join Date
    Dec 2008
    Posts
    22

    anagram program

    Hi there people,

    I need a little help here with a program I made, its about finding how many pairs of anagram there are in a given input (can be from keyboard, or a text data). The number of words given are not known before EOF.

    for example data.txt :
    Code:
    atm
    mat
    rat
    art
    spit
    pits
    run
    will give me 4 (assuming that "run" is a pair of anagram to itself)
    words are all in lowercase and no space,

    here is my program :
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #define MAX 33
    
    
    int areAnags(char *str1, char *str2){
    	int i,temp;
    	int count1[256] = {0}; 
    	int count2[256] = {0};
    	
    	if(strlen(str1) != strlen(str2)) return 0;	
    
    	for(i=0;i<strlen(str1);i++){
    		temp = (int)str1[i];
    		count1[temp]++;
    	}
    	for(i=0;i<strlen(str2);i++){
    		temp = (int)str2[i];
    		count2[temp]++;
    	}
    
    	for(i=97;i<=122;i++){
    		if(count1[i] != count2[i]) return 0;	
    	}
    	
    	return 1;
    }
    
    
    int main(void){
    	char input[MAX];
    	int count = 0,i;
    	char **anagram = NULL;
    	
    	anagram = malloc(sizeof(char *));
    	anagram[0] = malloc(MAX * sizeof(char));
    	if(anagram == NULL){
    		printf("Insufficient memory available.\n");		
    		return EXIT_FAILURE;
    	}
    
    	while(scanf("%s", input) != EOF){	
    		anagram[0] = strdup(input);
    		for(i=0;i<sizeof(anagram);i++){
    			if(areAnags(anagram[i],input) == 0){
    				anagram = realloc(anagram, (count+1) * sizeof(char *));
    				anagram[count] = malloc(MAX * sizeof(char));
    
    				if(anagram == NULL || anagram[count] == NULL){
    						printf("Insufficient memory available.\n");		
    						return EXIT_FAILURE;		
    				}
    					
    				anagram[count] = strdup(input);
    				count++;
    			}
    		}
    	}
    
    	for(i=0;i<count;i++)	
    		printf("%d\n", sizeof(anagram));
    	
    
    	for(i=0;i<sizeof(anagram);i++) 
    		free(anagram[i]);
    	free(anagram);
    	return 0;
    }
    The program compiled successfully, but when I run it, it keeps getting me a Segmentation Faullt, I am using a bit confusing pointers here, so I am not quite sure where I did the mistake..
    I tried gdb, and it gave me this :

    Code:
    Program received signal SIGSEGV, Segmentation fault.
    0xb7de7613 in strlen () from /lib/tls/i686/cmov/libc.so.6
    Thanks in advance.
    meriororen.
    Baka nanka janai. Shoshinsha tte iu mon da.

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I'm not sure where your problem is but one thing I would note (which might help to narrow this down) is that:
    Code:
    if(strlen(str1) != strlen(str2)) return 0;
    It looks to me like these two numbers will not change for the duration of the function, so you should really put them into two variables at the beginning:
    Code:
    int len1 = strlen(str1), len2 = strlen(str2);
    This is because the processor has to perform all the calculations everytime you call strlen. That includes every single iteration in a for loop:
    Code:
    for(i=0;i<strlen(str1);i++){
    So if the loop runs ten times, you have ten strlen() calls instead of 0 calls with
    Code:
    for(i=0;i<len1;i++){
    And, as mentioned, debugging gets a little easier when you minimize unnecessary function calls.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Well that looks like it crashed in a strlen function.

    - check your use of strlen()
    - think about whether any of them could have passed a bad pointer
    - think about whether any of them could have been passed a valid pointer, but without a \0 in a reasonable place.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    for(i=0;i<sizeof(anagram);i++){
    What is this supposed to be doing?


    Quzah.
    Hope is the first step on the road to disappointment.

  5. #5
    Registered User meriororen's Avatar
    Join Date
    Dec 2008
    Posts
    22
    Quote Originally Posted by quzah View Post
    Code:
    for(i=0;i<sizeof(anagram);i++){
    What is this supposed to be doing?


    Quzah.
    it supposed to check whether the angram of the next given "input" is already inside the anagram[] array..

    I check areAnags() in another program (i simply put two words as the input) and it worked, the strlen() function only exist inside areAnags(), not in main, I don't know why it worked on another program but not this.

    also, I erased
    Code:
     anagram[0] = strdup(input);
    from the while loop, it was stupid mistake.

    I put the strlen inside len1 and len2 variable, thanks to mk27.

    Thanks for the advices,
    However, I am still lost. I am working on it, but need some more advices. thanks
    Baka nanka janai. Shoshinsha tte iu mon da.

  6. #6
    Registered User meriororen's Avatar
    Join Date
    Dec 2008
    Posts
    22
    ok, I found the segfault reason,

    sizeof(anagram) give out 4 (as sizeof(char *) is 4).. I changed it..

    but another problem came, no matter what input I gave, the result keep showing 1. ~___~

    here is my current program :

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #define MAX 33
    
    
    
    int areAnags(const char *str1,const char *str2){
    	int i,temp;
    	int count1[256] = {0}; 
    	int count2[256] = {0};
    	int len1 = strlen(str1), len2 = strlen(str2);
    
    	if(len1 != len2) return 0;	
    
    	for(i=0;i<len1;i++){
    		temp = (int)str1[i];
    		count1[temp]++;
    	}
    	for(i=0;i<len2;i++){
    		temp = (int)str2[i];
    		count2[temp]++;
    	}
    
    	for(i=97;i<=122;i++){
    		if(count1[i] != count2[i]) return 0;	
    	}
    	
    	return 1;
    }
    
    
    int main(void){
    	char input[MAX];
    	int count = 0,i,anagsize;
    	char **anagram = NULL;
    	
    	anagram = malloc(sizeof(char *));
    	anagram[0] = malloc(MAX * sizeof(char));
    	if(anagram == NULL){
    		printf("Insufficient memory available.\n");		
    		return EXIT_FAILURE;
    	}
    
    	anagram[0] = " ";
    
    	anagsize = sizeof(anagram) / sizeof(char *);
    	
    	while(scanf("%s", input) != EOF){	
    		for(i=0;i<anagsize;i++){
    			if(areAnags(anagram[i],input) == 0){
    				anagram = realloc(anagram, (count+1) * sizeof(char *));
    				anagram[count] = malloc((strlen(input)+1) * sizeof(char));
    
    				if(anagram == NULL || anagram[count] == NULL){
    						printf("Insufficient memory available.\n");		
    						return EXIT_FAILURE;		
    				}
    					
    				anagram[count] = strdup(input);
    				count++;
    			}
    		}
    	}
    
    	printf("%d\n", anagsize);
    	
    
    	for(i=0;i<anagsize;i++)
    		free(anagram[i]);
    	free(anagram);
    	return 0;
    }
    Baka nanka janai. Shoshinsha tte iu mon da.

  7. #7
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    Your loop isn't right.
    Code:
    while(scanf("%s", input) != EOF){	
    		for(i=0;i<anagsize;i++){
    			if(areAnags(anagram[i],input) == 0){
    Look at how they are getting compared. For example let's say the input is
    Code:
    hey
    you
    uoy
    you
    there
    This is what your code is comparing everytime
    Code:
    hey
    anagram[0] =  
    input = hey
    you
    anagram[0] = hey
    input = you
    uoy  
    anagram[0] = hey
    input = uoy
    you
    anagram[0] = hey
    input = you
    there
    anagram[0] = hey
    input = there
    1
    The first loop compares hey to nothing! Then it never compares all of the input, just anagram[0] to input. Also, malloc calls are safer if you don't use
    Code:
    sizeof(type)
    but rather
    Code:
    sizeof(pointer)
    So your code would look like this
    Code:
    	anagram[0] = malloc(MAX * sizeof(char)); // not this
    anagram[0] = malloc(MAX*sizeof(*anagram[0])); // but this
    This is so if you change the type of your variable your code will still run. The FAQ is interesting.

  8. #8
    Registered User meriororen's Avatar
    Join Date
    Dec 2008
    Posts
    22
    I finally could get this program to work :

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #define MAX 33
    
    
    
    int areAnags(const char *str1,const char *str2){
    	int i,temp;
    	int count1[256] = {0};
    	int count2[256] = {0};
    	int len1 = strlen(str1), len2 = strlen(str2);
    
    	if(len1 != len2) return 0;	
    
    	for(i=0;i<len1;i++){
    		temp = (int)str1[i];
    		count1[temp]++;
    	}
    	for(i=0;i<len2;i++){
    		temp = (int)str2[i];
    		count2[temp]++;
    	}
    
    	for(i=97;i<=122;i++){
    		if(count1[i] != count2[i]) return 0;	
    	}
    	
    	return 1;
    }
    
    int isInside(char **array, char *string, int s){
    	int i=0,in;
    
    	while(i<s){
    		if(areAnags(array[i], string) == 1){
    			in = 1;
    			break;
    		}else{
    			in = 0;
    		}
    		i++;
    	}
    
    	return in;
    }
    
    int main(void){
    	char input[MAX];
    	int count = 0,i,a_size=1;
    	char **anagram;
    	
    	anagram = malloc(sizeof(*anagram));
    	anagram[0] = malloc(MAX * sizeof(*anagram[0]));
    	
    	if(anagram == NULL || anagram[0] == NULL){
    		printf("Insufficient memory available.\n");		
    		return EXIT_FAILURE;
    	}
    
    
    	while(scanf("%s", input) != EOF){	
    		if(isInside(anagram, input, a_size) == 0){
    			anagram = realloc(anagram, (count+1) * sizeof(*anagram));
    			anagram[count] = malloc((strlen(input)+1) * sizeof(*anagram[count]));
    
    			if(anagram == NULL || anagram[count] == NULL){
    					printf("Insufficient memory available.\n");		
    					return EXIT_FAILURE;		
    			}
    				
    			strncpy(anagram[count], input, strlen(input)+1);
    			count++;
    			a_size = count;
    			printf("%s<--\n", anagram[count-1]);
    		}
    	}
    
    	printf("%d\n", a_size);
    	
    	for(i=0;i<count-1;i++)
    		free(anagram[i]);
    	free(anagram);
    	return 0;
    }
    but it seems like its lacking speed, I tried inputting word list txt file containing about 234000 words, it keeps looping till 20+ minutes run under my netbook (asus eeePC atom 1.6GHz, 1GB ram).


    I wonder how to add speed to my program...
    Baka nanka janai. Shoshinsha tte iu mon da.

  9. #9
    Lost in the C ZaC's Avatar
    Join Date
    Jun 2008
    Location
    Italy
    Posts
    47
    I'd change the function to find anagrams... I sudgest you to sort each letter of the word, than if a sorted word like this already exists (strcmp on a list of anagrams, if you can use hash table this is a good choose) you don't store else you will store this new word. It should be faster.
    Last edited by ZaC; 07-03-2009 at 10:15 AM.
    Sorry for my bad English
    and also for my bad programming style...

    ZaC'ZaCoder (?!)

  10. #10
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    I wish I could help. Even just for my own amusement. Sounds like an intriguing problem but your statement of needing pairs of anagrams has me stumped. What is the program supposed to do exactly? Usually an anagram finder figures out if there are words existing containing the desired letters... "pairs" makes no sense to me.

  11. #11
    Registered User meriororen's Avatar
    Join Date
    Dec 2008
    Posts
    22
    Sorry about the lack of explanation,

    I am not too good to express things in English :P


    The program was meant to find groups of anagrams in given words, e.g. :

    Code:
    file
    life
    run
    break
    brake
    
    which would be :
    
    {file, life}
    {break, brake}
    {run}
    
    and then, the program outputs:
    3 (there are 3 groups of anagrams)


    Eventually, I start things over because the last program I made crazily compared everything inside the array to the next input,,

    I decided to sort the inputted words into binary trees (I learnt this in one night without sleeping >.<) and succeeded reducing the execution time to 4seconds (233514 words -> 213566 groups)


    thanks people!
    Baka nanka janai. Shoshinsha tte iu mon da.

  12. #12
    Lost in the C ZaC's Avatar
    Join Date
    Jun 2008
    Location
    Italy
    Posts
    47
    it is a good choose, but why don't you think about my suggestion?
    You should not save each word but just one for each group...
    From the example:
    file
    life
    run
    break
    brake
    here you have to save only
    efil
    nru
    abekr
    Theese are the first word of each founded group but with sorted letters.
    If you sort each word before to look for it over the tree, for each group you will get the same word... I think that this saves time and memory
    Sorry for my bad English
    and also for my bad programming style...

    ZaC'ZaCoder (?!)

  13. #13
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Thanks for the additional explanation. Looks like you shouldn't need binary trees and whatnots. Here's how I would do it.

    Since the input is guaranteed to contain contiguous lists of possible pairs of anagrams...

    1 - Just save the first incoming word in the count1 array as you've done. (Clever!)
    2 - For subsequent possible anagram word, first check if the word is the same length...
    then
    3- Subtract out the corresponding letters from the count1 array - if any entries try to go below zero, the anagram test fails. [means: a letter in the new word was never in the old word]

    Any words failing the anagram test is candidate for possible pair of new anagram.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 02-21-2008, 10:39 AM
  2. Using variables in system()
    By Afro in forum C Programming
    Replies: 8
    Last Post: 07-03-2007, 12:27 PM
  3. BOOKKEEPING PROGRAM, need help!
    By yabud in forum C Programming
    Replies: 3
    Last Post: 11-16-2006, 11:17 PM
  4. Replies: 3
    Last Post: 03-04-2005, 02:46 PM
  5. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM