Thread: Word frequency (printf problem)

1. Word frequency (printf problem)

Hi, I'm relatively new to programming, so let me know if I don't make sense and I'll elaborate.

I've pasted my main function below. Its supposed to take in a data file and count the frequency of each word, then print all the information out in another data file. (so a sample output would be:

Hello, 2
John, 4
I, 3 etc.

Now the strange thing is, the program ONLY works (the output is generated by the fprintf function subsequent to the return) IF I include the printf function in the while loop (in bold).

Why should this affect the operation of the program... I mean all it does is just print out a line.
Any advice is greatly appreciated. My hair's turned grey.

PS I've attached the code, so you can try compiling it to see what I mean. The input file is called doc.txt... so just get a paragraph, not too long mind, and test it. The output is freq.txt. I've included as many comments as I could to make the program more readable, but its still relatively messy. Sorry if that causes trouble.

Code:
```int main (void)
{
FILE *infile, *outfile;
char line[100];
struct WORD w[500]; //assuming that there are no more than 500 different words. structure contains an int freq, and char word[30]
extern int counter; // counts the number of words (done in different function
int k=0;

infile = fopen ("doc.txt", "r");
outfile = fopen ("freq.txt", "w");

while(fscanf(infile, "%c", &line[0]) == 1)
{
k=1;
do{
fscanf (infile, "%c", &line[k]);
k++;
} while (line[k-1] != '\n');

line[k-1] = '\0';
//  printf("%s\n", line); /* fprintf function (below) only works */
/* if this printf function is inserted */
process (line, w); /* each line is taken in and processed. New words are added into w; and existing words increases frequency of the concerned element */
}

for (k=0; k<counter; k++)
fprintf(outfile, "%s %i\n", w[k].word, w[k].freq); /* This is the desired output... */

return 0;
}```

2. Why should this affect the operation of the program... I mean all it does is just print out a line.
It also flushes the output buffer. (With the '\n'.) Try using
Code:
`fflush(stdout);`
and see if that works.

3. I have no idea what flushing means, but I gave it a go anyway. Still didn't work... thanks for help anyway

4. Suggestions:

1.Get rid of all the external references to counter. It's already a global in the file.

2.Clear out the Word struct prior to using it.
Code:
` struct WORD w[500]= {0};`
3.Both scanf's should be checking for a != EOF condition

4.The internal scanf should break on a found EOF condition

5.Initialize y in process function to 1 in order to ensure the while loop always initially kicks in

Finally, only execute the process function if k > 1

Bob

5. Thanks Bob.
Really appreciate it (and I'm not saying it only cause everyoen else ssays it).

Geez. Thanks ha ha. I feel my hair growing back now. It was the y that wasn't initialized. You either read a lot code, or you spent a little extra time on mine; either way, thanks!

6. It was the y that wasn't initialized.
Unfortunately, it was a little more than the y variable not being initialized.

I recommended that you check for EOF in the following fscanf,line 05 because if you do not check for an EOF condition,
the fscanf will read undefined memory and the loop will spin until a newline character is found in that undefined memory. Let us
assume we are starting on the last record read from the input file. We have fscan'd our last record in the inner loop
and we have replaced the \n with \0 and forwarded the line to process function. Then we return back to the outer loop
and this will execute into the function because it is NOT the EOF. This fscanf is picking up the last newline
in the text file. Thus, the outer fscanf will return a 1. The inner scanf will loop indefinitely into unknown memory
until it finds a newline character. Remember, the inner loop is checking line variable for a newline character in order to
break out of the loop. It is NOT checking for an EOF condition. The line variable does have the character data from the
previous processing sans the the newline character. Thus, the loop is out of control since it is checking undefined memory and
won't break until a newline char is read into the line variable from undefined memory. The statement #08 got rid of the newline
character from the previous line input processing.

Your code is listed below. Just follow the processing of the last record to understand the problem. I would also suggest that you open up an ASCII text file in a hex editor and you will understand what I'm trying to explain. You will notice that there are two CR LF's at the end of an ASCII text file.

Code:
```01.  while(fscanf(infile, "%c", &line[0]) == 1)
02.    {
03.         k=1;
04.         do{
05.            fscanf (infile, "%c", &line[k]);
06.             k++;
07.           } while (line[k-1] != '\n');
08.         line[k-1] = '\0';
09.         if(k >1)
10.         process (line, w);
11.    }```

7. Okay, I'll be needing some time to digest. Thanks for help.

Popular pages Recent additions