Thread: which data struct?

  1. #1
    newbie2c
    Guest

    which data struct?

    Hi, I recently learned about different data structures and I have an assignment where I have to make a word frequency counter.

    Basically, I need to read in a text file and print out to stdout a list of all the unique words in the file and how many times they appear in descending order. I was wondering what people would think is the best data structure to use to implement this as I would also need to sort the structures when printing out the frequencies in descending order.

    Any ideas would be appreciated.

    Thanks.

  2. #2
    Registered User
    Join Date
    Jan 2003
    Posts
    99
    I would beleve that a linked list would be the best since it is an easy way to sort, and provides an easy way to store any multitude of names.

  3. #3
    Registered User Cela's Avatar
    Join Date
    Jan 2003
    Posts
    362
    I would use a binary search tree, it's already sorted by definition and a snap to check if a new word already exists :-)
    Code:
    tree *add(tree *root, char *word)
    {
      int cmp;
    
      if (root == 0) /* Doesn't exist, add it */
      {
        root = malloc(sizeof *root);
        root->word = malloc(strlen(word)+1);
        /* Error checking not implemented for simplicity */
        root->frequency = 1;
        root->left = root->right = 0;
        strcpy(root->word,word);
      }
      else if((cmp = strcmp(word, root->word)) < 0)
      {
        root->left = add(root->left, word);
      }
      else if(cmp > 0)
      {
        root->right = add(root->right, word);
      }
      else /* Match */
      {
        root->frequency++;
      }
    
      return root;
    }
    With word frequencies you also don't have to worry about tree balancing unless the input file is a dictionary or something equally sorted, but that's not likely since you're counting occurances. :-)
    *Cela*

  4. #4
    newbie2c
    Guest
    Thanks for your replies. I was wondering...if I choose the binary search tree method and store the word and frequency in each node, what would be the best way to then resort the tree by frequencies? (Since I have to print them out in descending order). I am assuming the original tree would be sorted by the words.

    One idea I had was to make each node of the tree store the word and point to another node in a linked list of integers, and when I have to sort the integers, I just sort the linked list. What do you guys think? Thanks in advance for any ideas. :-)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Looking for a way to store listbox data
    By Welder in forum C Programming
    Replies: 20
    Last Post: 11-01-2007, 11:48 PM
  2. Replies: 16
    Last Post: 10-29-2006, 05:04 AM
  3. Data struct question
    By ronenk in forum C Programming
    Replies: 4
    Last Post: 03-05-2005, 08:41 AM
  4. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM