-
which data struct?
Hi, I recently learned about different data structures and I have an assignment where I have to make a word frequency counter.
Basically, I need to read in a text file and print out to stdout a list of all the unique words in the file and how many times they appear in descending order. I was wondering what people would think is the best data structure to use to implement this as I would also need to sort the structures when printing out the frequencies in descending order.
Any ideas would be appreciated.
Thanks.
-
I would beleve that a linked list would be the best since it is an easy way to sort, and provides an easy way to store any multitude of names.
-
I would use a binary search tree, it's already sorted by definition and a snap to check if a new word already exists :-)
Code:
tree *add(tree *root, char *word)
{
int cmp;
if (root == 0) /* Doesn't exist, add it */
{
root = malloc(sizeof *root);
root->word = malloc(strlen(word)+1);
/* Error checking not implemented for simplicity */
root->frequency = 1;
root->left = root->right = 0;
strcpy(root->word,word);
}
else if((cmp = strcmp(word, root->word)) < 0)
{
root->left = add(root->left, word);
}
else if(cmp > 0)
{
root->right = add(root->right, word);
}
else /* Match */
{
root->frequency++;
}
return root;
}
With word frequencies you also don't have to worry about tree balancing unless the input file is a dictionary or something equally sorted, but that's not likely since you're counting occurances. :-)
-
Thanks for your replies. I was wondering...if I choose the binary search tree method and store the word and frequency in each node, what would be the best way to then resort the tree by frequencies? (Since I have to print them out in descending order). I am assuming the original tree would be sorted by the words.
One idea I had was to make each node of the tree store the word and point to another node in a linked list of integers, and when I have to sort the integers, I just sort the linked list. What do you guys think? Thanks in advance for any ideas. :-)