Then you may want to think about reading file in big chunks instead of byte-byte
and also avoid using function like strlen on arrays of 5M chars
Printable View
Is something like this in the ballpark of the initialization thingy?
Code:#include <stdio.h>
int main(void)
{
static const char dna[] = "acgt";
char pattern[4 * 4 * 4 * 4][5];
int w,x,y,z, i = 0;
for ( w = 0; w < 4; ++w )
{
for ( x = 0; x < 4; ++x )
{
for ( y = 0; y < 4; ++y )
{
for ( z = 0; z < 4; ++z )
{
pattern[i][0] = dna[w];
pattern[i][1] = dna[x];
pattern[i][2] = dna[y];
pattern[i][3] = dna[z];
pattern[i][4] = '\0';
printf("%3d %s\n", i, pattern[i]);
++i;
}
}
}
}
}
I checked the variable seqData and it was able to print out all 5 million bases before.
Here is my code:
Code:// A Program to count the count of 4-mers in a nucleotide sequence.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
int acgt_to_0123(char x)
{
if(x == 'a')
{
return 0 ;
}
else if(x == 'c')
{
return 1 ;
}
else if(x == 'g')
{
return 2 ;
}
else if(x == 't')
{
return 3 ;
}
else
{
printf("Non acgt character\n") ;
exit(0) ;
}
}
main( int argc, char **argv )
{
FILE *input ;
FILE *outpur ;
char buffer[1000] ;
int i = 0 ;
int c ;
int w,x,y,z ;
char *seqData ;
seqData = (char *)malloc(10000000) ;
char t1, t2, t3, t4 ;
int index ;
// Initialize 4-D array
int tupleCount[4][4][4][4] ;
for(w = 0 ; w < 4 ; w++ )
for(x = 0 ; x < 4 ; x++ )
for(y = 0 ; y < 4 ; y++ )
for(z = 0 ; z < 4 ; z++ )
{
tupleCount[w][x][y][z] = 0 ;
}
// Open input file to read from
if( ! ( input = fopen( argv[1], "r" ) ) )
{
printf( "COULD NOT OPEN FILE %s - Exit!\n", argv[1]) ;
exit(1) ;
}
// Collect sequence from GenBank file
while(fgets(buffer, 1000, input))
{
// start obtaining bases after ORIGIN
if(strstr(buffer, "ORIGIN"))
{
while((c=getc(input)) != '/' && c != EOF)
{
if(c >= 'a' && c <= 'z')
{
seqData[i++] = c ;
}
}
}
}
int lengthDNA = strlen(seqData) ;
printf("here is the lenght of the DNA: %d", lengthDNA) ;
// Scan DNA sequence for each 4-mer
for( index = 0 ; index < (lengthDNA - 3) ; ++index )
{
t1 = acgt_to_0123(seqData[index]) ;
t2 = acgt_to_0123(seqData[index + 1]) ;
t3 = acgt_to_0123(seqData[index + 2]) ;
t4 = acgt_to_0123(seqData[index + 3]) ;
// Accumulate a count to find distribution
tupleCount[t1][t2][t3][t4]++ ;
}
fclose(input) ;
printf("Here is the distribution of 4-mers:\n\n%s", tupleCount ) ;
free(seqData) ;
return(0) ;
}
1) Nice reading of data INTO buffer 1000 characters from the file.Code:while(fgets(buffer, 1000, input))
{
// start obtaining bases after ORIGIN
if(strstr(buffer, "ORIGIN"))
{
while((c=getc(input)) != '/' && c != EOF)
{
if(c >= 'a' && c <= 'z')
{
seqData[i++] = c ;
}
}
}
}
2) Ok, you find the ORIGIN in buffer and discard it although you certainly NEED it, because the actual data is AFTER the string ORIGIN.
3) getc? Meaning you read 1000 characters and then forget them? Shouldn't you operate on buffer to read the characters from into seqData?
And also the seqData should be 0'ed or else strlen, will be WRONG when you are done. I am not sure how you got after the IO part, in my computer it dumps core right there.
Fix those issues and i am almost certain you are done.
Sorry for the confusion. I am looking through a file for the tag "LOCUS" using strstr and fgets.
Once i hit that tag, I begin collecting all DNA bases with getc directly from the input file. I stop collecting when I hit '/'
I am not having any problem collecting the bases. When I check the contents of seqData, it contains all the bases; all 5 million.
I think my issue is after this point, when I am trying to scan it for all occurrences of 4-mers.
Not sure if this helps but when I print out my tupleCount i get "-1073745432"
Note: When i try to read my characters from buffer instead of input I get the warning:
"warning: passing argument 1 of ‘getc’ from incompatible pointer type"
Thanks
then you may to write your reading loop in less obfuscated way:
close file just after you stopped the readingCode:while(fgets(buffer, 1000, input))
{
// start obtaining bases after ORIGIN
if(strstr(buffer, "ORIGIN"))
{
break; /* end of file header */
}
}
while((c=getc(input)) != '/' && c != EOF)
{
if(c >= 'a' && c <= 'z')
{
seqData[i++] = c ;
}
}
do not use strlen - use i that specifies how many elements of the array were initialized
%s is not a suatable format for printitng 4D array - use loops and %c format
But vart the code you posted ignores the fact that there are actually data in the buffer, after the header tag. Shouldn't they be copied to the start of seqData? The copy statements should be placed between the two loops.
How do I printout each occurrence in my multidimensional array.
For instance:
0000 = 3
0001 = 2
.
.
.
3333 = 7
The way the code is now I get: "-1073745444" as output.
Here is the code:
Thanks for the helpCode:// Initialize 4-D array
int tupleCount[4][4][4][4] ;
for(w = 0 ; w < 4 ; w++ )
for(x = 0 ; x < 4 ; x++ )
for(y = 0 ; y < 4 ; y++ )
for(z = 0 ; z < 4 ; z++ )
{
tupleCount[w][x][y][z] ;
}
// Scan DNA sequence for each 4-mer
for( index = 0 ; index < (lengthDNA - 3) ; ++index )
{
t1 = acgt_to_0123(seqData[index]) ;
t2 = acgt_to_0123(seqData[index + 1]) ;
t3 = acgt_to_0123(seqData[index + 2]) ;
t4 = acgt_to_0123(seqData[index + 3]) ;
// Accumulate a count to find distribution
++tupleCount[t1][t2][t3][t4] ;
}
printf("Here is the distribution of 4-mers:\n\n%d\n\n", tupleCount ) ;
Think about how you zeroed every element of the multidimensional array. The solution is right in front of you ;)
Sorry, but I do not see the answer. Do I have to use a for loop to print out all the occurrences?
If so, how do I define the my range.
for (i = 0 ; i < ????? ; i++)
{
printf("%d",tupleCount[i]
}
????
Hmmm, your frequencies are stored in a 4-dimensional array as you declare.
In your code, ok maybe it's not your code, you use 4-for loops to count over every dimension and set the element to 0.
The idea is the same.
Remodel your problem to printing out all the words from 'aaaa' to 'zzzz'. How would you do it?
Think you have a chessboard and need to visit every square. Think you have a cube made of chessboards and have to visit every square, and finally think you have a hypercube of cubes made of chessboards and have to visit every square, to get confused a bit.
Welcome to the 4th dimension!
All the above problems are equivalent to your problem.
Thanks for the help. Everything works fine. I appreciate that you do not write out the code for me; its important that I ween myself from these forums.
I am maybe too late for this suject. I hope that there is someone who can reply to my question.
I just want to know if i create dynamically a 4D array with pointer like this
char ****pt.
How can I initialize it? can we use the same way with the statical declaration?
there is no difference from 2D arrays
search forum, this question was already asked several times