hello to all,
I need a function that will change all characters in my string to integers (a to 0, c to 1, g to 2, t to 3)
I know this sounds like a simple operation but, I am new to programming and not fluent in the functions yet.
Thanks
hello to all,
I need a function that will change all characters in my string to integers (a to 0, c to 1, g to 2, t to 3)
I know this sounds like a simple operation but, I am new to programming and not fluent in the functions yet.
Thanks
Example:
You might also want to convert characters to lowercase first as well.Code:char ch = 'd'; int i = ch -'a';
Edit: you seem to want strange values from the characters. Unless theres some formula for it you might want to have an array holding each characters value, then map the characters position in the array. EG:
Code:int map[26] = {0, 345, 1, 23445, 34, 5436, 2, ..... etc} char ch = 'g'; int i = map[ch-'a'];
Last edited by mike_g; 03-01-2008 at 06:50 PM.
How bout a macro
ggCode:#define DNAbase2int(c) \ ((c) == 'a' ? 0 : \ ((c) == 'c' ? 1 : \ ((c) == 'g' ? 2 : 3)))
How 'bout this?
It could be even shorter if you don't care about error checking . . . but never mind that, because mike_g's solution is nicer.Code:int DNA2int(char c) { static const char *convert = "acgt"; const char *p = strchr(convert, c); return p ? p-convert : -1; }
BTW: the other way around is easy.
Or, if you like obfuscation:Code:const char *map = "acgt"; char c = map[i];
Code:char c = "acgt"[i];As Codeplug's response obscurely hinted, it's probably DNA-related.Edit: you seem to want strange values from the characters.
dwk
Seek and ye shall find. quaere et invenies.
"Simplicity does not precede complexity, but follows it." -- Alan Perlis
"Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
"The only real mistake is the one from which we learn nothing." -- John Powell
Other boards: DaniWeb, TPS
Unofficial Wiki FAQ: cpwiki.sf.net
My website: http://dwks.theprogrammingsite.com/
Projects: codeform, xuni, atlantis, nort, etc.
Thanks for the help. I am sorry to say this but, I am new to programming and some of the techniques you are using I am not that familiar with. I will try to clarify what I am trying to do.
I need to convert a->0, c->1, g->2, t->3. These are the only letters in string (DNA sequence)
I need to convert because later in the program I will be using a 4-D array as a chart to collect a count. Below is my code; hopefully it helps
ThanksCode:// A Program to count the count of 4-mers in a nucleotide sequence. #include <stdio.h> #include <string.h> #include <ctype.h> #include <stdlib.h> main( int argc, char **argv ) { FILE *input ; FILE *outpur ; char buffer[1000] ; int i = 0 ; int c, tr ; int w,x,y,z ; char *seqData ; seqData = (char *)malloc(10000000) ; char t1, t2, t3, t4 ; int index ; // Initialize 4-D array int tupleCount[4][4][4][4] ; for(w = 0 ; w < 4 ; w++ ) for(x = 0 ; x < 4 ; x++ ) for(y = 0 ; y < 4 ; y++ ) for(z = 0 ; z < 4 ; z++ ) { tupleCount[w][x][y][z] ; } // Open input file to read from if( ! ( input = fopen( argv[1], "r" ) ) ) { printf( "COULD NOT OPEN FILE %s - Exit!\n", argv[1]) ; exit(1) ; } // Collect sequence from GenBank file while(fgets(buffer, 1000, input)) { // start obtaining bases after ORIGIN if(strstr(buffer, "ORIGIN")) { while((c=getc(input)) != '/' && c != EOF) { if(c >= 'a' && c <= 'z') { seqData[i++] = c ; } } } } // Convert a,c,g,t to 0,1,2,3 /* while(tr=getc(seqData) != EOF) { } */ // Scan DNA sequence for each 4-mer for( index = 0 ; index < strlen(seqData) - 3 ; ++index ) { t1 = seqData[index] ; t2 = seqData[index + 1] ; t3 = seqData[index + 2] ; t4 = seqData[index + 3] ; // Accumulate a count to find distribution ++tupleCount[t1][t2][t3][t4] ; } fclose(input) ; printf("Here is the distribution of 4-mers:\n\n%s", tupleCount ) ; free(seqData) ; return(0) ; }
Casting malloc() is usually a bad idea in C -- see the FAQ.
Obviously, this does nothing. Presumably you want to initialize that element of the array to something, perhaps 0.Code:tupleCount[w][x][y][z] ;
Here's where you want to do the conversion. Well, let's start off with a simple way of doing it. Here's what you said:Code:seqData[i++] = c ;
Okay:I need to convert a->0, c->1, g->2, t->3.
But wait, that's a lot of "seqData[i++]"'s. Maybe it would be better to do something like this:Code:if(c == 'a') seqData[i++] = 0; else if(c == 'c') seqData[i++] = 1; else if(c == 'g') seqData[i++] = 2; else if(c == 't') seqData[i++] = 3; else /* error */;
But that could also be a switch statement:Code:int n; if(c == 'a') n = 0; else if(c == 'c') n = 1; else if(c == 'g') n = 2; else if(c == 't') n = 3; else n = -1; /* error */ seqData[i++] = n;
There are so many ways to do it . . . .Code:switch(c) { case 'a': n = 0; break; case 'c': n = 1; break; case 'g': n = 2; break; case 't': n = 3; break; default: n = -1; break; /* error */ } seqData[i++] = n;
dwk
Seek and ye shall find. quaere et invenies.
"Simplicity does not precede complexity, but follows it." -- Alan Perlis
"Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
"The only real mistake is the one from which we learn nothing." -- John Powell
Other boards: DaniWeb, TPS
Unofficial Wiki FAQ: cpwiki.sf.net
My website: http://dwks.theprogrammingsite.com/
Projects: codeform, xuni, atlantis, nort, etc.