Hi
I have a FASTA file that contain sequences string upto 2000000 line of strings. I wrote the code that works well with smaller size but when the size of file grows it get slower. Can you please help me how I can make it efficient? I am reading it line by line.
My code is
Code:
#include "kseq.h"
KSEQ_INIT(gzFile, gzread)
int z=0;
fp = gzopen(dbFile, "r"); //Read database Fasta file into host memory
seq_d = kseq_init(fp);
while ((d = kseq_read(seq_d)) >= 0) {
unsigned char *b = (unsigned char *)malloc(sizeof(unsigned char) * 256);
memcpy(b, seq_d->seq.s, 256);
....
do work with b
....
............
z++
free(b);
}
kseq_destroy(seq_d);
gzclose(fp);
I am confused that why it take more time when file size let see is 100000 even for first iteration that run very efficiently in case of 10,000.
For Exampe: I put printf statement for each iteration. In case of 10,000 first iteration take let say less then a milli second. where as in case of 100000 strings even the first iteration will take more then 30 second to print and so on. Why it could be slow like that?