word occurence counter 'forgetting' about already saved word

This is a discussion on word occurence counter 'forgetting' about already saved word within the C Programming forums, part of the General Programming Boards category; The snippets below are from my program that gets words, then prints them with the number of occurences. It works ...

  1. #1
    Registered User
    Join Date
    Nov 2011
    Posts
    39

    word occurence counter 'forgetting' about already saved word

    The snippets below are from my program that gets words, then prints them with the number of occurences.
    It works almost fine - it "forgots" that that entry has been saved before and does NOT increment counter associated with that. So I get eg. we - 1 we - 1 we - 1 instead of we - 3.

    Code:
    typedef struct { 
        char *word; 
        int occ; 
    } 
    words; 
    words *data=NULL;
    Code:
    words *findword(char *word)
    {
        words *ptr = data;
        
        if ((strcmp(word, ptr->word)) != 0)
                return ptr;
        return NULL;          
        
    }
    Code:
    int main(int argc, char **argv) 
    { 
        char *word; 
        words *temp; 
        int c,i,num;
        FILE *infile; 
        words *ptr = NULL; 
    
        if(argc!=2) 
        { }         
        if((infile=fopen(argv[1],"r"))==NULL) 
        { } 
        num=0;
        data = malloc(sizeof(words));
            data->word = "";
            data->occ = 0; 
        while(1) 
        { 
            c=fgetc(infile); 
            if(c==EOF) break; 
            if(!isalpha(c)) continue; 
            else ungetc(c,infile); 
            word=getword(infile); 
                
            if(findword(word)) 
            { 
                
                if(!(temp=realloc(data,sizeof(words)*(num+1)))) 
                { } 
                else 
                    data=temp;
    
                ptr = findword(word);
                data[num].word=strdup(word); 
                data[num].occ=1; 
                num++;
                
            } 
            else 
                ptr->occ++;                 
     
            free(word); 
        } 
       
        /* sort procedure here, irrelevant for the purpose of topic */
        for(i=0;i<num;i++) 
        { 
            printf("%s - %d\n",data[i].word,data[i].occ); 
            free(data[i].word); 
        } 
        free(data); 
        if(fclose(infile)) 
        {/* error handling */  } 
        return 0; 
    }
    What's wrong with that code?
    Thanks in advance!

  2. #2
    SAMARAS std10093's Avatar
    Join Date
    Jan 2011
    Location
    Nice, France
    Posts
    2,681
    line23 getword is a function you made i guess.Shouldn't you post it to? strdup too

  3. #3
    Registered User
    Join Date
    Nov 2011
    Posts
    39
    strdup is from <string.h>
    I've only shown parts relevant to the problem. getword works fine, no need to make a mess in the post and extend it to hundreds of lines of code.
    getword returns char *

  4. #4
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    The problem with only pasting the "relevant" parts is that you don't necessarily know which parts are relevant. And if getword is "hundreds of lines" long, there's definitely something wrong with it!

    Anyway, look at findword. Think. How can it possibly find any word but the first one? There's no loop!
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  5. #5
    Registered User
    Join Date
    Nov 2011
    Posts
    39
    OK. So following Your advice I've made:
    Code:
    words *findword(char *word)
    {
        words *ptr = data;
        
        for (ptr; /*what to put as stop cond. */; ptr++) {
            if ((strcmp(word, ptr->word)) != 0)
                return ptr;
        }
        return NULL;          
        
    }
    What should be the stop cond. ?

  6. #6
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    You'll have to keep a count of the length of the array in main and pass it in to findword.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  7. #7
    Been here, done that.
    Join Date
    May 2003
    Posts
    1,161
    Try a WHILE loop instead. Exit when either
    a) word found or
    b) end of word list

    is TRUE
    Definition: Politics -- Latin, from
    poly meaning many and
    tics meaning blood sucking parasites
    -- Tom Smothers

  8. #8
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    For some reference, here is a word counting thread I started 6 weeks ago:

    "http://cboard.cprogramming.com/c-programming/149993-warnings-error-messages.html"

    std10093 is the best.
    Last edited by cfanatic; 09-18-2012 at 08:21 PM. Reason: fixed url
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  9. #9
    Registered User
    Join Date
    Nov 2011
    Posts
    39
    Quote Originally Posted by WaltP View Post
    Try a WHILE loop instead. Exit when either
    a) word found or
    b) end of word list

    is TRUE
    I don't know how to represent the end of constantly updated array of structures.
    You'll have to keep a count of the length of the array in main and pass it in to findword.
    My search function must have prototype struct abc *find(char *word)
    so I can't pass that parameter.

  10. #10
    Registered User
    Join Date
    Nov 2011
    Posts
    39

    Arrow

    EDIT: I can't edit my post so I present new approach to my code. In gdb I can see that the problem is with iterating my ptr pointer. The program finds words properly, but ptr doesn't change to that found entry, so it constantly increments the word that is the first word in file.
    Modified code:

    Code:
    #include <stdio.h> 
    #include <stdlib.h> 
    #include <string.h> 
    #include <ctype.h>
     
    int num = 0; /* made num global so that I don't have to pass it to search function */
     
    typedef struct{ 
        char * word; 
        int occ; 
    } 
    words; 
    words *data=NULL; 
     
    words *search(char *word)  
    { 
        words *ptr = data;
        
        int i; 
        for(i=0; i< num; i++) 
        { 
            if(!strcmp(data[i].word,word)) return ptr; 
        } 
        return NULL; 
    } 
     
    int main(int argc, char **argv) 
    { 
        char *word; 
        words *temp; 
        int c,i; 
        words *ptr = NULL;
        
        data = calloc(100, sizeof(words)); /* initialized some memory
                                      because I was getting SIGSEGV */
        data->word = "";                
       data->occ = 0; 
            
        while(1) 
        { 
            word=getword(infile); 
                      
            if((ptr = search(word))) {
                ptr->occ++;  /* if the word is found, then this ptr doesn't move; always increments the first encountered word in file. PROBLEM is here, but I don't know how to make it "moving" */
            } else {
                if(!(temp=realloc(data,sizeof(words)*(num+1)))) 
                { 
                } 
                else 
                    data=temp;
                         
                data[num].word=strdup(word); 
                data[num].occ=1; 
                num++;
            }    
        
            free(word); 
        } 
        for(i=0;i<num;i++) 
        { 
            printf("%s - %d\n",data[i].word,data[i].occ); 
            free(data[i].word); 
        } 
        free(data); 
        
        return 0; 
    }
    Last edited by Pole; 09-19-2012 at 09:40 AM.

  11. #11
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    Code:
    words *search(char *word)  
    { 
        words *ptr = data;
        
        int i; 
        for(i=0; i< num; i++) 
        { 
            if(!strcmp(data[i].word,word)) return ptr; 
        } 
        return NULL; 
    } 
    ... 
    int main(int argc, char **argv) 
    { 
    ...
    if((ptr = search(word))) {
        ptr->occ++;  /* if the word is found, then this ptr doesn't move; always increments the first encountered word in file. PROBLEM is here, but I don't know how to make it "moving" */
    You are always returning the first element of data from search() because you set "ptr" to data[0] but never move it.
    You should return &data[i] and you won't need "ptr" in search().

    Bye, Andreas

  12. #12
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    Code:
    data = calloc(100, sizeof(words)); /* initialized some memory
                                      because I was getting SIGSEGV */
    I forgot:
    Why do you allocate memory for 100 elements if you later reallocate it unconditionally anyway?
    The likely problem for your seg fault is that you don't allocate memory for the initial word (at line 36). Allocating memory for 100 elements of "words" is the wrong solution.

    Bye, Andreas

  13. #13
    Registered User
    Join Date
    Nov 2011
    Posts
    39
    I did calloc (with a random value - I've chosen 100) so that I can initialize data->word = ""; data->occ = 0; . This was suggested to avoid SIGSEGV caused by strcmp in search function with a NULL word.

  14. #14
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    The problem with your code is that it is very inefficient.
    You first allocate memory for an array of 100 structs. Then as soon as you store the first word in it, you shrink it to an array of one struct and later resize it only step by step.

    The usual way (at least how I've learned it) is to start with a reasonable size and only resize it by a constant factor (for example double it) if you reach the limit:
    Code:
    int max = 10;  // for the current limit
    int counter = 0; // for the used size
    data = calloc(max, sizeof(*data));  // start with 10 elements;
    ...
    if (counter == max)
    {
        temp = realloc(data, sizeof(*data) * max * 2);  // try to get a bigger array
        if (temp == NULL) 
        { 
             // error handling
        }
        else
        {
             data = temp;
             max *= 2;  // new limit;
        }
    }
    // processing the new word
    If you want to avoid using a global counter variable you could use a sentinel value (e.g. data->word = "") at the end of your array.

    Bye, Andreas

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 28
    Last Post: 10-23-2011, 07:17 PM
  2. reading text-and-numbers file word by word
    By bored_guy in forum C Programming
    Replies: 22
    Last Post: 10-26-2009, 10:59 PM
  3. Word Counter
    By Krush in forum C Programming
    Replies: 5
    Last Post: 11-16-2002, 12:26 PM
  4. Word Counter
    By supaben34 in forum C++ Programming
    Replies: 4
    Last Post: 09-12-2002, 07:41 PM
  5. open file, search of word, replace word with another
    By Unregistered in forum C++ Programming
    Replies: 0
    Last Post: 06-05-2002, 01:16 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21