# which data struct?

• 01-28-2003
newbie2c
which data struct?
Hi, I recently learned about different data structures and I have an assignment where I have to make a word frequency counter.

Basically, I need to read in a text file and print out to stdout a list of all the unique words in the file and how many times they appear in descending order. I was wondering what people would think is the best data structure to use to implement this as I would also need to sort the structures when printing out the frequencies in descending order.

Any ideas would be appreciated.

Thanks.
• 01-28-2003
_Cl0wn_
I would beleve that a linked list would be the best since it is an easy way to sort, and provides an easy way to store any multitude of names.
• 01-29-2003
Cela
I would use a binary search tree, it's already sorted by definition and a snap to check if a new word already exists :-)
Code:

```tree *add(tree *root, char *word) {   int cmp;   if (root == 0) /* Doesn't exist, add it */   {     root = malloc(sizeof *root);     root->word = malloc(strlen(word)+1);     /* Error checking not implemented for simplicity */     root->frequency = 1;     root->left = root->right = 0;     strcpy(root->word,word);   }   else if((cmp = strcmp(word, root->word)) < 0)   {     root->left = add(root->left, word);   }   else if(cmp > 0)   {     root->right = add(root->right, word);   }   else /* Match */   {     root->frequency++;   }   return root; }```
With word frequencies you also don't have to worry about tree balancing unless the input file is a dictionary or something equally sorted, but that's not likely since you're counting occurances. :-)
• 01-29-2003
newbie2c
Thanks for your replies. I was wondering...if I choose the binary search tree method and store the word and frequency in each node, what would be the best way to then resort the tree by frequencies? (Since I have to print them out in descending order). I am assuming the original tree would be sorted by the words.

One idea I had was to make each node of the tree store the word and point to another node in a linked list of integers, and when I have to sort the integers, I just sort the linked list. What do you guys think? Thanks in advance for any ideas. :-)