I'm not clear on the approach being used in the code. To look for keywords, why are you using isalpha and comparisons with 'A' and 'Z'? You can do it this way, but you might be getting lost in details. For illustration I made a direct translation of my above algorithm.
Code:
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#define MAXLINE 100000
const char DELIM[] = ".,! ;?@\n";
char line[MAXLINE] = "";
char line_s[MAXLINE] = "";
int main()
{
char keyword[] = "hello"; // search for this word
int lineno = 0; // track line number
// In the following comments, these abbreviations apply
// L: line, line_s
// D: DELIM
// W: word
// K: keyword
// while there are more input lines available,
// read an input line into L
while (fgets(line, MAXLINE, stdin) != NULL) {
lineno++;
// begin tokenization of L on D
strcpy(line_s, line);
char *str = line_s;
// while there are more tokens in L
while (true) {
char *word;
// read the next token into W
if ((word = strtok(str, DELIM)) == NULL)
break; // no more tokens
str = NULL;
// if W == K, print L
if (strcmp(word, keyword) == 0)
printf("%d: %s", lineno, line);
}
// end tokenization of L
}
return 0;
}
Replace stdin with the name of your file pointer, and the behaviour should be correct on your file that is 6 MB or 100 MB or gigabytes or whatever. The only limitation: input lines are assume to have a length of MAXLINE or less. Keywords which appear on a line longer than this may not be found.