Thread: Simple help (STRCMP, word counts etc)

  1. #1
    Registered User
    Join Date
    Dec 2004
    Posts
    6

    Simple help (STRCMP, word counts etc)

    Hey everybody, i understand that obviously this isnt a place to get your homework done and ive read the thread about it too.

    Im currently doing a uni course with some basic C, but abit of it is going over my head -

    my task so far has been to write a program that calculates the number of words in a text file. and ive managed this ok...

    Code:
    /* Document Analyser
    	Author - JJ Singer
    	Version 1.0
    	Bulid Date - 02/12/04
    
    	A program that asks the user for the name 
    	of a text file to read, reads it and counts 
    	the number of words in the file.  It then 
    	will display this number to the user */
    
    
    #include <stdio.h> //includes input/output command library
    
    
    void main ()
    
    {	// VARIABLES DEFINED
    
    	int counter, fileend; 
    	// counter: stores the number of words in the file,
    	// fileend: tells the program when the file has ended
    	char content[300], filename[10];
    	// content: tells the program the length of the field to expect and its type
    	// filename: tells the program the length of the field to expect and its type
    	
    	FILE *filein;
    	// give the program a file pointer to the filein stream
    
    	printf("***************************\n");
    	printf("*                         *\n");
    	printf("*   DOCUMENT ANALYSER     *\n");
    	printf("*   Author - JJ Singer    *\n");
    	printf("*      Version 1.0        *\n");
    	printf("*  Build Date - 02/12/04  *\n");
    	printf("*                         *\n");
    	printf("***************************\n\n\n");
    	//header 
    
    
    
    	printf ("Please type in the name of the file you wish to read and press enter\n\n");
    	//dispays the prompt on screen 
    	scanf ("%s",filename);
    	// makes the program read the text entered (specifically the file name)
    
    	filein=fopen(filename,"r");
    	//opens the file stream of the required file
    	fileend=fscanf(filein, "%s",content);
    	// assigns the integer value of true of false (1/0) to the fileend variable
    
    	while(fileend!=EOF) // loop - counts the number of words in the file
    						// while the end of the file is not reached
    {
    		counter++;
    		fileend=fscanf(filein,"%s",content);
    }
    
    	printf("Analysing document..................\n\n\n");
    	printf("This file contains %d words\n\n",counter);
    	// displays the word count on screen 
    
    	fclose(filein);
    	// closes the file stream
    
    
    }

    the part i am having problems with is part of the next task which builds on the current code - im required to count the number of UNIQUE words in the text file and give a total of these.

    I think the way i should do this is -

    using for loops (possibly 2?) and compare two strings (strcmp function) and a counter that increases by one each time non unique words are found. Then by calculating the difference between the unique total and (general) total.

    however, im not sure where or how to implement this section.. as i say, its going over my head a little bit.

    could anyone put some light on this matter for me?

    PS (heard i might need to use a "to lower / to upper " function.. however this may be some variable instead.. not sure)

    cheers for any help.
    Josh.

  2. #2
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    What you can do is add an array of strings to your program. Every time you read a word in from your file, loop through that new array and see if the string has already been stored there. If it hasn't, store the string you just read from the file in the array. At the end, display all the strings in the array.
    If you understand what you're doing, you're not learning anything.

  3. #3
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    void main ()
    http://faq.cprogramming.com/cgi-bin/...&id=1043284376

    printf("* Build Date - 02/12/04 *\n");
    And you are just now asking questions almost 10 months later?

    Now to the fun part
    Have you been given any idea on how many unique words may be present in the file? If not then I would keep a link list of words. Everytime you find a unique word you add it to the list. To test for a unique word you compare it against the words already in the list.
    Last edited by Thantos; 12-08-2004 at 05:10 PM. Reason: I R Gud Speler

  4. #4
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Quote Originally Posted by Thantos
    printf("* Build Date - 02/12/04 *\n");
    And you are just now asking questions almost 10 months later?
    I was assuming DD/MM/YY
    If you understand what you're doing, you're not learning anything.

  5. #5
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Hmmm you might be right

  6. #6
    Registered User
    Join Date
    Dec 2004
    Posts
    6
    just to put this straight.. im in the UK... where the standard date stuff is dd/mm/yy! ritey.. will get on with reading your replies

  7. #7
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Dang UKers

  8. #8
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Ok recieved this PM from Esto (New Member with 0 posts), since it pertains to this post I'm replying here.
    Hey there, I was just reading your reply to the word frequency count. Surely the link list would become corrupted once the file has been used?

    Sorry to come across as if I'm criticising, Im just intrigued as to how you would create a word frequency counter of a text file in C.

    If you can, show us what you mean.

    Thanks,

    Pete (Rookie but keen Programmer)
    Well first its not a word frequency as we don't care how many times a word shows up (though it would be super easy to make it do that)

    I won't write the entire thing but:
    Code:
    struct Node {
    struct Node *next;
    struct Node *prev;
    struct Node *word;
    /* unsigned freq; */ /* in case we want to add that ability in */
    };
     
    /* return will be NULL on error or the new tail */
    struct Node * addnode ( struct Node *tail, char *word )
    {
    int len=0;
    struct Node *temp = NULL;
    temp = malloc (sizeof(struct Node));
    if ( temp == NULL )
    	return NULL;
     
    len = strlen(word);
    temp->word = malloc(len + 1); /* +1 for null char */
    if ( temp->word == NULL )
    {
    	free(temp);
    	return NULL;
    }
    strcpy(temp->word, word); 
      /* strcpy() should be ok here since there isn't a chance to overrun the array */
    tail->next = temp;
    temp->prev = tail;
    temp->next = NULL;
    /* temp->freq = 1; */ /* if we are including that */
    return temp;
    }
    Warning: I have not compiled nor tested the above code. There may be an error in there

    Now when you destroy the tree at the end of the program you'll have to free the word before free the Node.
    Last edited by Thantos; 12-08-2004 at 05:09 PM. Reason: Forgot to copy over the word to the node

  9. #9
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Make it a hash table for faster lookups. Use an array of linked lists, and use the word length for the hash.

    Quzah.
    Hope is the first step on the road to disappointment.

  10. #10
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Quote Originally Posted by quzah
    Make it a hash table for faster lookups. Use an array of linked lists, and use the word length for the hash.

    Quzah.
    Allow for word lists in excess of 4GB, add unicode support, and do it without using strings too?
    If you understand what you're doing, you're not learning anything.

  11. #11
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Well of course the link lists in the hash should be done alphabetical (sp?) order but in the ancient greek alphabet

  12. #12
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    I was actually being serious. If you're storing a file of unknown size, and storing all unique words, and you need to continually check each new word read with the entire list to see if it's unique or not...
    Code:
    List *table[SOMESIZE];
    List *ptr, *nextptr;
    char buf[BUFSIZ];
    
    ...read word into buffer...
    ...smash case of buffer...
    for( ptr = list[ strlen( buf ) ]; ptr; ptr = nextptr )
    {
        nextptr = ptr->next;
        if( strcmp( ptr->word, buf ) == 0 )
            ...word is not unique so stop looking...
        else
        if( ptr->next == NULL )
            ...stick this word onto this list...
        /* else, we're not at the end of the list so we keep going, which as been covered */
    }
    Quzah.
    Hope is the first step on the road to disappointment.

  13. #13
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Of course a hash table is a good idea. I agree. I just meant that if the experience isn't there to count the frequency of words then something that's even more complicated is unlikely to be in the OP's toolkit. But, maybe the concept of a hash table is only more advanced in my eyes.
    If you understand what you're doing, you're not learning anything.

  14. #14
    Registered User
    Join Date
    Jun 2004
    Posts
    722
    Hash table? Hum.. it would be better to use a Trie Tree. Each node has a maximum of 26 sub-nodes, each one representing a diferent char. When working with dictionaries it is one of the most eficient data structure, and not as failable as the hash table.

    http://www.csse.monash.edu.au/~lloyd...gDS/Tree/Trie/
    Last edited by xErath; 12-08-2004 at 10:50 PM.

  15. #15
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    One of the reasons I suggested a hash table, is because if you understand a linked list, how much harder is it really to have an array of them? Not much. All you need is a size for your array, and something to decide what slot they fall into.

    Quzah.
    Hope is the first step on the road to disappointment.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. seg fault at vectornew
    By tytelizgal in forum C Programming
    Replies: 2
    Last Post: 10-25-2008, 01:22 PM
  2. C++ Simple Puzzle Word Game
    By ijAcK in forum C++ Programming
    Replies: 2
    Last Post: 09-23-2008, 06:41 AM
  3. Find a word in a 2d grid
    By The_Kingpin in forum C++ Programming
    Replies: 2
    Last Post: 02-24-2005, 05:38 PM
  4. finding strings in strings
    By watshamacalit in forum C Programming
    Replies: 14
    Last Post: 01-11-2003, 01:08 AM