Thread: Words occorrences count in C

  1. #1
    Registered User
    Join Date
    Jun 2018
    Posts
    3

    Post Words occorrences count in C

    I'm trying to create a program that giving it the path of one or more .txt file (or the path for a directory containing those files) it analyzes them and it creates a .out file containing all the words divided one per line each one having a number that indicates how many times the word is repeated in the file(s). It's not case sensitive and for "word" it considers only alphanumeric characters. Can someone help me please?
    Thank you

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    36,372
    What have you done so far?

    Can you for example read a text file:
    - and just print what you read.
    - and print "found word" for each valid word in the file.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Jun 2018
    Posts
    3
    Quote Originally Posted by Salem View Post
    What have you done so far?

    Can you for example read a text file:
    - and just print what you read.
    - and print "found word" for each valid word in the file.
    I can read a text file, but I don't know how to load multiple files or a directory of files, i can print what I read.
    What I was thinking to do is to print the text file re formatted in order to see one word for each line, then put the first word in the .out file and see in the reformatted txt file how many times the word repeats with a counter and the strcmp function, but i don't know how to reformat the file, how to implement the function for multiple words (i was thinking about an array of strings but it doesn't seem to work at all). I also need to make the words in the first file all in uppercase or lowercase for making the strcmp working on non-casesensitive words.

  4. #4
    Registered User john.c's Avatar
    Join Date
    Dec 2017
    Posts
    291
    How you read the directory depends on what OS you are using. On linux (and perhaps macOS) this program will read the current directory and print the first word in uppercase of any (regular) files whose filename ends in ".txt"
    Code:
    #define _BSD_SOURCE
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <unistd.h>
    #include <dirent.h>
    
    void all_toupper(char *s) {
        for ( ; *s; s++) *s = toupper(*s);
    }
    
    int main() {
        char word[100];
        struct dirent *e;
        DIR *dir = opendir(".");
    
        while ((e = readdir(dir)) != NULL) {
    
            struct stat st;
            if (stat(e->d_name, &st) == -1) {
                printf("Can't stat %s\n", e->d_name);
                continue;
            }
    
            if (S_ISREG(st.st_mode)) {  //if (e->d_type == DT_REG) {
                char *p = strrchr(e->d_name, '.');
                if (p && strcmp(p, ".txt") == 0) {
                    printf("%s\n", e->d_name);
                    FILE *f = fopen(e->d_name, "r");
                    fscanf(f, "%99s", word);
                    all_toupper(word);
                    printf("    %s\n", word);
                    fclose(f);
                }
            }
        }
    
        closedir(dir);
        return 0;
    }
    As for keeping track of the words, a balanced binary tree (such as a C++ std::map if you can switch to C++) would be perfect.
    Simplicity is the ultimate sophistication.

  5. #5
    Registered User
    Join Date
    Jun 2018
    Posts
    3
    I tried to solve this problem with the code below but I can't get inside the second while and I don't get why. Any tips?






    Code:
    #define _GNU_SOURCE
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    //LETTURA RIGA PER RIGA
    
    int main(void)
    {
        FILE * fp;
        FILE *fp2;
        FILE * fs;
        char * line = NULL;
        char * line2 = NULL;
        size_t len = 0;
        size_t len2 = 0;
        size_t read;
        size_t read2;
        int contatore=0;
        int inserimento;
        
    
        fp = fopen("swordx.out", "r"); //input
        fp2 = fopen("dio.txt", "r+");// output - in
        
        if (fp == NULL)
            exit(EXIT_FAILURE);
           read = getline(&line, &len, fp);
           fgets(line, 0, fp); // first row in the new file so i can compare
           fputs(line,fp2);
           
            
        while((    read = getline(&line, &len, fp))!= -1){
           
               inserimento=1; // like a boolean 
           
        
           
    
        while((    read2 = getline(&line2, &len2, fp2)) != -1){
            if (strcmp(line2,line) == 0){
          printf("The strings are equal.\n");
           printf("%s",line2);
             inserimento=0;
           
         }
       else{
          printf("The strings are not equal.\n");
           printf("%s",line2);
           
         
       }
    }// second while
      if(inserimento==1){ // if it has no duplicate in the output file we can write on it
          fgets(line, 0, fp);
          fputs(line,fp2);}
         
          
          
        }// first while
    
        fclose(fp);
        fclose(fp2);
        
        
        if (line)
            free(line);
            
            if (line2)
            free(line2);
          
        exit(EXIT_SUCCESS);
        
    
    }

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    36,372
    Well first of all, try to indent your code better so you can see what is going on.
    Code:
    #define _GNU_SOURCE
    #include <stdio.h> 
    #include <stdlib.h> 
    #include <string.h>
      //LETTURA RIGA PER RIGA
    
    int main(void) {
        FILE * fp;
        FILE * fp2;
        FILE * fs;
        char * line = NULL;
        char * line2 = NULL;
        size_t len = 0;
        size_t len2 = 0;
        size_t read;
        size_t read2;
        int contatore = 0;
        int inserimento;
    
        fp = fopen("swordx.out", "r"); //input
        fp2 = fopen("dio.txt", "r+"); // output - in
    
        if (fp == NULL)
          exit(EXIT_FAILURE);
        read = getline( & line, & len, fp);
        fgets(line, 0, fp); // first row in the new file so i can compare
        fputs(line, fp2);
    
        while ((read = getline( & line, & len, fp)) != -1) {
    
          inserimento = 1; // like a boolean 
    
          while ((read2 = getline( & line2, & len2, fp2)) != -1) {
            if (strcmp(line2, line) == 0) {
              printf("The strings are equal.\n");
              printf("%s", line2);
              inserimento = 0;
    
            } else {
              printf("The strings are not equal.\n");
              printf("%s", line2);
    
            }
          } // second while
          if (inserimento == 1) { // if it has no duplicate in the output file we can write on it
            fgets(line, 0, fp);
            fputs(line, fp2);
          }
    
        } // first while
    
        fclose(fp);
        fclose(fp2);
    
        if (line)
          free(line);
    
        if (line2)
          free(line2);
    
        exit(EXIT_SUCCESS);
    
    }
    There are two main problems with your code.
    > while ((read2 = getline( & line2, & len2, fp2)) != -1)
    This reads to the end of the file.
    Which means if you want to read the file again, you need to use rewind() to get back to the beginning of the file.

    > fputs(line, fp2);
    After you do this, you need to call fflush() before you try to read from the file again.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. count words
    By 74466 in forum C++ Programming
    Replies: 4
    Last Post: 02-17-2006, 09:30 AM
  2. How to count the words in sentence ?
    By Th3-SeA in forum C Programming
    Replies: 1
    Last Post: 10-01-2003, 01:34 AM
  3. words count
    By arlenagha in forum C++ Programming
    Replies: 2
    Last Post: 03-06-2003, 09:29 AM
  4. how to count sentences and words?
    By Ray Thompson in forum C Programming
    Replies: 1
    Last Post: 11-08-2002, 01:42 PM
  5. Replies: 2
    Last Post: 05-05-2002, 01:38 PM

Tags for this Thread