Thread: Please help: counting string occurances using scanf and arrays

  1. #1
    Registered User Nancy Franklin's Avatar
    Join Date
    Sep 2011
    Posts
    15

    Please help: counting string occurances using scanf and arrays

    Goal: count the number of unique strings from the input stream. The typedef, STRING, scanf, and %s are required. After that, sort each unique string according to how many times its appeared. The output of the program should look like:

    word3 6 //word3 has appeared 6 times
    word1 3 //word1 has appeared 3 times
    word4 2 word4 has appeared 2 times
    word2 1 //word2 has appeared 1 time

    My approach: use two arrays: int count[] to store the number of occurances, and char* word[100], which stores each string only if it is unique. The variable counter is the link between the array indexes: when counter=4, count[4] is the number of times that the string stored in word[4] has occured. With each iteration of the while loop, a different word from the input is processed.

    My problem #1: The incoming word is stored in EVERY single index of count[] before counter. It should only be stored in word[counter]. Instead, the end of output looks like:

    s not found.
    counter is: 4
    word in array: dogvvv
    it was entered into: word[4]
    printing count/word arrays:
    count[j] is: 1
    word[j] is: dogvvv
    count[j] is: 1
    word[j] is: dogvvv
    count[j] is: 1
    word[j] is: dogvvv
    count[j] is: 1

    I'm getting that just from running this simplified version of it:

    Code:
    #include <stdlib.h>
     #include <stdio.h>
     #include <string.h>
    
     typedef char STRING[20];
     int main(int argc, char *argv[]){
           STRING s;
           int counter=0;
           char* word[100];
           int count[100];
           int place=0;
           int fakebool=1; //acts like a java Boolean variable
           int i;      int j;      int k;
           for(i=0; i<101;i++){count[i]=0;}    //fills count[] with 0s
           for(i=0; i<101;i++){word[i]=" .";}  //fills word[] with “ .”
                 while(scanf("%s",s)==1){
                 printf("current word: %s\n",s);
                 word[place]=s;
                 printf("s added.\n");  
                 printf("counter is: %i \n", counter);
                 printf("word in array: %s\n", word[counter]);
                 printf("it was entered into: word[%i]\n",counter);
                 count[counter]=1; //because its the first occurrence
                 printf("count/word arrays: \n");
                
                 for(j=0;j<5;j++){
                             printf("count[j] is: %i",count[j]);
                             printf(" word[j] is: %s\n",word[j]);
                 }
                 counter++;
                 printf("counter is now: %i \n", counter);
                 place++;
                 printf("w: \n");
           } //end if not there
     }//end while
           //exit(0);
    
     //}
    Trouble#2: In the longer version of this, I'm having trouble comparing strings, so the program knows whether it should add the incoming string to the end of word[] because it is unique. (If the incoming string is a duplicate of something already entered, it should just increase its value in count[].) But this never recognizes a duplicate word!
    Code:
    #include <stdlib.h>
     #include <stdio.h>
     #include <string.h>
    
     typedef char STRING[20];
     int main(int argc, char *argv[]){
           STRING s;
           int counter=0;
           char* word[100];
           int count[100];
           int place=0;
           int fakebool=1; //acts like a java Boolean variable
           int i;      int j;      int k;
           for(i=0; i<101;i++){count[i]=0;}    //fills count[] with 0s
           for(i=0; i<101;i++){word[i]=" .";}  //fills word[] with “ .”
                 while(scanf("%s",s)==1){
                 printf("current word: %s\n",s);
    for(i=0; i<counter;i++){ //check if s is already there k=strcmp(s,word[i]); //compare the strings if(k<0){ // s is already in array fakebool=0; printf("*******s found.********\n"); count[i]++; printf("numOccurances: %i \n", count[i]); printf("the count/word arrays: \n"); for(j=0;j<5;j++){ printf("count[j] is: %i",count[j]); printf(" word[j] is: %s\n",word[j]); } printf("ended found printer \n"); counter++; //it has been added } //end if there //printf("ended if found \n"); }//end check for if(fakebool==1){ printf("s not found.\n");
    word[counter]=s; printf("s added.\n"); printf("counter is: %i \n", counter); printf("word in array: %s\n", word[counter]); printf("it was entered into: word[%i]\n",counter); count[counter]=1; //because its the first occurrence printf("count/word arrays: \n"); for(j=0;j<5;j++){ //printing the arrays printf("count[j] is: %i",count[j]); printf(" word[j] is: %s\n",word[j]); } }// end the if()fakebool=1 statement counter++; printf("counter is now: %i \n", counter); printf("w: \n"); } //end if not there }//end while //exit(0); //}
    Perhaps using a 2-dimensional array would help, but I haven't used those in C yet, but if it makes it simpler, that's good...
    Last edited by Nancy Franklin; 09-22-2011 at 11:58 PM. Reason: adding more

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Ok... first mistake, and your compiler should be complaining wildly about it... You cannot assign strings across the equals sign in C ... C is a programming language with no inherrent knowledge of strings. Instead we use character arrays and library functions to manipulate arrays of characters, like they were strings.

    Code:
    char s[100];
    
    s = "hello fred";   // does not work!
    
    strcpy(s,"hello fred"); // this works.
    Moreover... you are using an array of pointers to store the results of an input function that always places text at the same address. By the time you finish assigning all your input words to the array... all pointers in the array will point to the same place.

    Code:
    char *word[100]; // 100 pointers
    
    scanf("%s",s);
    word[place]=s;  // will set all pointers in word to the location of s.
    You need to define an actual array...

    Code:
    char word[100][20]; // 100 strings, 20 characters each
    
    scanf("%s",s);
    strcpy(word[place], s);  // copy s to the array.
    Now your strings are stored separately.


    From your longer version... you need to look up strcmp() in your library documentation and see how it actually works...

  3. #3
    Registered User Maz's Avatar
    Join Date
    Nov 2005
    Location
    Finland
    Posts
    194
    Allright. Tater was faster Just another approach is still use single dimensional array, but increase the index always with the sizeof(STRING).


    Quote Originally Posted by Nancy Franklin View Post
    Goal: count the number of unique strings from the input stream. The typedef, STRING, scanf, and %s are required. After that, sort each unique string according to how many times its appeared. The output of the program should look like:

    word3 6 //word3 has appeared 6 times
    word1 3 //word1 has appeared 3 times
    word4 2 word4 has appeared 2 times
    word2 1 //word2 has appeared 1 time

    My approach: use two arrays: int count[] to store the number of occurances, and char* word[100], which stores each string only if it is unique. The variable counter is the link between the array indexes: when counter=4, count[4] is the number of times that the string stored in word[4] has occured. With each iteration of the while loop, a different word from the input is processed.

    My problem #1: The incoming word is stored in EVERY single index of count[] before counter. It should only be stored in word[counter]. Instead, the end of output looks like:

    s not found.
    counter is: 4
    word in array: dogvvv
    it was entered into: word[4]
    printing count/word arrays:
    count[j] is: 1
    word[j] is: dogvvv
    count[j] is: 1
    word[j] is: dogvvv
    count[j] is: 1
    word[j] is: dogvvv
    count[j] is: 1

    I'm getting that just from running this simplified version of it:

    Code:
    #include <stdlib.h>
     #include <stdio.h>
     #include <string.h>
    
     typedef char STRING[20];
     int main(int argc, char *argv[]){
           STRING s;
           int counter=0;
           char* word[100];
           int count[100];
           int place=0;
           int fakebool=1; //acts like a java Boolean variable
           int i;      int j;      int k;
    
           /* How many zeros does these loop write, and how large is your array? */
    
           for(i=0; i<101;i++){count[i]=0;}    //fills count[] with 0s
           for(i=0; i<101;i++){word[i]=" .";}  //fills word[] with “ .”
                 while(scanf("%s",s)==1){
                 printf("current word: %s\n",s);
    
    /* What is actually stored here? - what happens on memory? Think this from "memory point of view". At first loop, where at memory is the first word stored? At next loop, where in memory is the next word stored? */
    
                 word[place]=s;
                 printf("s added.\n");  
                 printf("counter is: %i \n", counter);
                 printf("word in array: %s\n", word[counter]);
                 printf("it was entered into: word[%i]\n",counter);
                 count[counter]=1; //because its the first occurrence
                 printf("count/word arrays: \n");
                
                 for(j=0;j<5;j++){
                             printf("count[j] is: %i",count[j]);
                             printf(" word[j] is: %s\n",word[j]);
                 }
                 counter++;
                 printf("counter is now: %i \n", counter);
                 place++;
                 printf("w: \n");
           } //end if not there
     }//end while
           //exit(0);
    
     //}
    Trouble#2: In the longer version of this, I'm having trouble comparing strings, so the program knows whether it should add the incoming string to the end of word[] because it is unique. (If the incoming string is a duplicate of something already entered, it should just increase its value in count[].) But this never recognizes a duplicate word!
    Code:
    #include <stdlib.h>
     #include <stdio.h>
     #include <string.h>
    
     typedef char STRING[20];
     int main(int argc, char *argv[]){
           STRING s;
           int counter=0;
           char* word[100];
           int count[100];
           int place=0;
           int fakebool=1; //acts like a java Boolean variable
           int i;      int j;      int k;
           for(i=0; i<101;i++){count[i]=0;}    //fills count[] with 0s
           for(i=0; i<101;i++){word[i]=" .";}  //fills word[] with “ .”
                 while(scanf("%s",s)==1){
                 printf("current word: %s\n",s);
    for(i=0; i<counter;i++){ //check if s is already there k=strcmp(s,word[i]); //compare the strings if(k<0){ // s is already in array fakebool=0; printf("*******s found.********\n"); count[i]++; printf("numOccurances: %i \n", count[i]); printf("the count/word arrays: \n"); for(j=0;j<5;j++){ printf("count[j] is: %i",count[j]); printf(" word[j] is: %s\n",word[j]); } printf("ended found printer \n"); counter++; //it has been added } //end if there //printf("ended if found \n"); }//end check for if(fakebool==1){ printf("s not found.\n");
    word[counter]=s; printf("s added.\n"); printf("counter is: %i \n", counter); printf("word in array: %s\n", word[counter]); printf("it was entered into: word[%i]\n",counter); count[counter]=1; //because its the first occurrence printf("count/word arrays: \n"); for(j=0;j<5;j++){ //printing the arrays printf("count[j] is: %i",count[j]); printf(" word[j] is: %s\n",word[j]); } }// end the if()fakebool=1 statement counter++; printf("counter is now: %i \n", counter); printf("w: \n"); } //end if not there }//end while //exit(0); //}
    Perhaps using a 2-dimensional array would help, but I haven't used those in C yet, but if it makes it simpler, that's good...
    Eventually, all the programs do is writing/readin data to/from some addresses. Your problem here is that you overwrite same spot in memory again and again, while storing pointer to same address again and again. Try ptinting out the address of each string you store in word[] array. You may surprize.
    Code:
    printf("address of word[%u] = %p\n",i,word[i]);
    Last edited by Maz; 09-23-2011 at 12:12 AM.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Double Header... Hi Maz!

  5. #5
    Registered User Nancy Franklin's Avatar
    Join Date
    Sep 2011
    Posts
    15
    Quote Originally Posted by CommonTater View Post
    Ok... first mistake, and your compiler should be complaining wildly about it...
    Yes, it was complaining, but I didn't know what the error message meant. I'm a n00b, and the VMware terminal's error messages are confusing. Messing with it sometimes stored big negative numbers in the array- I'm guessing those were the addresses.

    So, changing it to this will store each input into a different memory address? (I can't run it or get to the documentation files till school tomorrow). But from what I learned about it- it seemed like its result is best stored in a variable like k, and a negative value would be what I need... Is that wrong?
    Code:
    char word[100][20]; // 100 strings, 20 characters each
    
    scanf("%s",s);
    strcpy(word[counter], s);  // copy s to the array.
    Maz, what do you mean by " increase the index always with the sizeof(STRING)"? Do you mean the index of count[], or the index of word[]? Wouldn't that mess up the counter keeping them both together, making sure that the right count[] goes to the right word[]?
    Last edited by Nancy Franklin; 09-23-2011 at 12:39 AM.

  6. #6
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Well the error messages would have been from the compiler you're using... if you post them I'll help you interpret them.

    Yes the strcpy() method with a real array will do the job... at least you won't have an array of pointers all aimed at the same place.

    strcmp() returns 0 when words match. When comparing stra to strb ... it will return a number less than 0 to indicate stra should come before strb and a number greater than 0 if stra comes after strb ... very useful for sorting things, but the one you want is equality... i.e. 0 difference.

    Just in case Maz isn't reading right now... you need to "parallel" your arrays so that count[i] always refers to word[i]... so you have to either mainpulate them from the same counter or manipulate both counters together.

  7. #7
    Registered User Nancy Franklin's Avatar
    Join Date
    Sep 2011
    Posts
    15
    Ok, so that section should be:

    Code:
    for(i=0; i<counter;i++){ //check if s is already there
                       k=strcmp(s,word[i]); //compare the strings
                       if(k==0){ // s is already in array
                             fakebool=0;
                             printf("*******s found.********\n");
    Thank you- I'll be at school in 6 hours, and I'll post the error message, if I still get it. Once I get back to the terminal, I plan to use an insertion sort on count[] to get them in order. I will modify the regular array insertion sort so that, every time it switches something in count[], the corresponding elements in word[] will be switched similarly. Something like:
    Code:
     
        void insertion(int a[],int n)  
           {  
            int i,j,x,k;  
                   for(i=1;i<=n-1;i++)  
                        {  
                               j=i;  
                               x=a[i];  
                               while(a[j-1]>x && j>0)  
                                         {  
                                           a[j]=a[j-1];  
                                           count[j]=count[j-1];  //my modification
                                            j=j-1;  
                                         }  
                                a[j]=x;  
                               printf("\n\n The array after pass no.%d: ",i);  
                               for(k=0;k<=n-1;k++)  
                              printf("%4d",a[k]);  
                         }//end for.  
           }
    Will those work?
    Last edited by Nancy Franklin; 09-23-2011 at 01:03 AM.

  8. #8
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    The requirement seems to contain a contridiction, count the number of unique strings, sort them according to how many times they appear. But if a string appear more than once it's not unique.

  9. #9
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Nancy Franklin View Post
    Ok, so that section should be:

    Code:
    for(i=0; i<counter;i++){ //check if s is already there
                       k=strcmp(s,word[i]); //compare the strings
                       if(k==0){ // s is already in array
                             fakebool=0;
                             printf("*******s found.********\n");
    Very close... it will probably work, but I generally like to try to simplify code whenever possible, so you could do this...
    Code:
    for(i=0; i<counter; i++){ //check if s is already there
                       if ( strcmp(s,word[i]) == 0) //compare the strings
                          {  fakebool=0;
                             printf("*******%s found.********\n", s);
    There's no need to pass it through an intermediate variable unless you need it for something else in the same scope.

    Thank you- I'll be at school in 6 hours, and I'll post the error message, if I still get it. Once I get back to the terminal, I plan to use an insertion sort on count[] to get them in order. I will modify the regular array insertion sort so that, every time it switches something in count[], the corresponding elements in word[] will be switched similarly.
    Yep, that's one way... however you may want to consider sorting only after all your words are in the array (as a second pass, rather than a coincident function) so that you only have to sort once... But, you might also consider sorting on word[] rather than count[], it will make your printout much easier at the end since the words will already be in alphabetical order.

    Swapping variables requires an intermediate variable... otherwise you end up with two things at the same value...
    For example:
    Code:
    temp = count[i];
    count[i] = count[j];
    count [j] = temp;
    The more common looping method is nested for loops ...
    Code:
    for(i = 0; i < count; i++)
      for(j = i; j < count; j++)
    kind of thing.

    Also just as a generalized suggestion... you seem to sprout a lot of intermediate variables. You should try to minimize this as much as possible. The more "eggs" you juggle the bigger the mess you make...

  10. #10
    Registered User Nancy Franklin's Avatar
    Join Date
    Sep 2011
    Posts
    15
    Sorry that wasn't clear, Subsonic. Each string entered must be sorted according to how many times it has appeared. If 15 different words have been entered in an input stream of 30 words, there should be 15 integers that describe how many times a duplicate word was entered. So, if the input stream was ""cat bat dog bat sat", the output should be:

    bat 2
    cat 1
    dog 1
    sat 1

  11. #11
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    LOL.. it would be fun to unleash that on the Encylopedia Britannica...

  12. #12
    Registered User Nancy Franklin's Avatar
    Join Date
    Sep 2011
    Posts
    15
    Um, I don't have a compiler or access to one for 5 hours, but is this what you mean:
    Code:
    for(i = 0; i < count; i++){
        for(j = i; j < count; j++){
             if(a[j]>a[j-1]){
                 a[j]=a[j-1];  
                 count[j]=count[j-1];   
             } //end if 
         }
    }
    Would that do what I'm trying to do? (sorry, my dinosaur laptop can't compile anything)
    Last edited by Nancy Franklin; 09-23-2011 at 01:37 AM.

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    That's kind of the idea... you would have to customize a bit and you still need to use an intermediate value to prevent overwrites...

    What kind of dinosaur is your laptop... Is it Windows? If so you could try Pelles C which should run on almost anything since Win2000 ... or if you visit the Wiki older versions are available for legacy computing on win98... And, boy are you gonna love the help files that come with it!

  14. #14
    Registered User Nancy Franklin's Avatar
    Join Date
    Sep 2011
    Posts
    15
    GLike this?
    Code:
    for(i = 0; i < count; i++){
        for(j = i; j < count; j++){
             if(a[j]>a[j-1]){
    
                  temp = count[i];
                  count[i] = count[j];
                  count [j] = temp;               }//end if
         }
    }
    It runs XP... on only 18 GB of hard drive. My birthday can't get here fast enough.

  15. #15
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Pelles C will work on that machine... It only needs about 300megs...

    Yep that's how the swap works... now keep in mind that at the same time you're sorting your counts you also need to sort the wordlist... The swap concept is the same but it's done with strcpy() instead of across the equals sign.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Counting Occurances of a Character
    By eeengenious in forum C Programming
    Replies: 5
    Last Post: 03-31-2011, 07:50 AM
  2. Replies: 18
    Last Post: 09-28-2010, 11:07 AM
  3. Counting String Occurances
    By Lesaras in forum C++ Programming
    Replies: 5
    Last Post: 12-14-2006, 09:43 AM
  4. Need help fast with counting consecutive occurances
    By c_323_h in forum C++ Programming
    Replies: 1
    Last Post: 07-04-2006, 12:48 AM
  5. occurances of string in a file
    By MB1 in forum C++ Programming
    Replies: 3
    Last Post: 03-28-2005, 04:03 PM

Tags for this Thread