Hello all,
I've created a fairly simple C program to search a text file for a pattern supplied by the user. I have test files that are 1MB (17k lines), 10MB (174k lines), and 100MB (1.74 million lines). The first one that reads 17k lines of text runs perfectly, but the other two output Segmentation Fault errors. I'm guessing the size of the files is the problem. Am i requesting too much memory? Could fopen() or fgets() be the problem, can they not handle this kind of load?
Here's some code:
Code:
// Currently works when running from command line: ./a.out genome1MB.txt aaatcg
#include <math.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char pattern[100]; //holds the user's DNA pattern
int lineCount = 0;
int patternsFound = 0;
int lineLocation[500], i; //array that holds the line numbers of the pattern
int charLocation[500], j; //array that holds the character locations of the pattern
int x = 0; //used to increment lineLocation array
int y = 0; //used to increment charLocation array
char viewPatternLocations;
if(argc != 3)
{
printf("Please provide the correct number of command line arguments.\n");
}
else
{
// Save the user's DNA pattern from the command line into a variable:
sscanf(argv[2], "%s", pattern);
FILE *f;
char line[500];
char *item;
int calcCharLocation;
f = fopen(argv[1],"r");
if(!f)
{
printf("Error: cannot open file.\n");
return 1;
}
while(fgets(line,500,f))
{
lineCount++;
//printf("%d: %s", lineCount, line);
item = strstr(line, pattern); //assists in calculating character location within each line
// Search each line in the genome (file) for the user's DNA pattern:
if(strstr(line, pattern))
{
//printf("Found\n\n");
patternsFound++;
lineLocation[x] = lineCount; //if pattern is found, save the occurrence's line number into an array
x++; //increment the lineLocation array for the next occurrence
calcCharLocation = strlen(line) - strlen(item); //calculates how far into the line the DNA pattern is
charLocation[y] = calcCharLocation; //save the character location calculation into an array
y++; //increment the charLocation array for the next occurrence
}
else
{
//printf("Not found\n\n");
}
}
printf("--------------------\n");
printf("Given pattern '%s' found %d times in genome. Would you like specific locations? ('y' or 'n')> ",
pattern, patternsFound);
/*
Specific pattern locations could potentially dump a lot of lines to the screen.
Give the user a choice whether to see these locations or not.
*/
scanf("%c", &viewPatternLocations);
if(viewPatternLocations == 'y')
{
for(i = 0; i < patternsFound; i++)
{
printf("In line %d, %d characters in.\n", lineLocation[i], charLocation[i]);
}
}
fclose(f);
}
return 0;
}
Anyone have any ideas?
Thanks,
Steve