Thread: function to change character to integer

  1. #1
    Registered User
    Join Date
    Feb 2008
    Posts
    77

    function to change character to integer

    hello to all,

    I need a function that will change all characters in my string to integers (a to 0, c to 1, g to 2, t to 3)

    I know this sounds like a simple operation but, I am new to programming and not fluent in the functions yet.

    Thanks

  2. #2
    Dr Dipshi++ mike_g's Avatar
    Join Date
    Oct 2006
    Location
    On me hyperplane
    Posts
    1,218
    Example:
    Code:
    char ch = 'd';
    int i = ch -'a';
    You might also want to convert characters to lowercase first as well.

    Edit: you seem to want strange values from the characters. Unless theres some formula for it you might want to have an array holding each characters value, then map the characters position in the array. EG:
    Code:
    int map[26] = {0, 345, 1, 23445, 34, 5436, 2, ..... etc}
    char ch = 'g';
    int i = map[ch-'a'];
    Last edited by mike_g; 03-01-2008 at 06:50 PM.

  3. #3
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    How bout a macro
    Code:
    #define DNAbase2int(c) \
        ((c) == 'a' ? 0 : \
         ((c) == 'c' ? 1 : \
          ((c) == 'g' ? 2 : 3)))
    gg

  4. #4
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    How 'bout this?
    Code:
    int DNA2int(char c) {
        static const char *convert = "acgt";
        const char *p = strchr(convert, c);
        return p ? p-convert : -1;
    }
    It could be even shorter if you don't care about error checking . . . but never mind that, because mike_g's solution is nicer.

    BTW: the other way around is easy.
    Code:
    const char *map = "acgt";
    char c = map[i];
    Or, if you like obfuscation:
    Code:
    char c = "acgt"[i];
    Edit: you seem to want strange values from the characters.
    As Codeplug's response obscurely hinted, it's probably DNA-related.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  5. #5
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    Thanks for the help. I am sorry to say this but, I am new to programming and some of the techniques you are using I am not that familiar with. I will try to clarify what I am trying to do.

    I need to convert a->0, c->1, g->2, t->3. These are the only letters in string (DNA sequence)
    I need to convert because later in the program I will be using a 4-D array as a chart to collect a count. Below is my code; hopefully it helps


    Code:
    // A Program to count the count of 4-mers in a nucleotide sequence.
    
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    #include <stdlib.h>
    
    main( int argc, char **argv )
        {
            FILE *input ; 
            FILE *outpur ;
            char buffer[1000] ;        
            int i = 0 ;
            int c, tr ;
            int w,x,y,z ;
             
            char *seqData ;
            seqData = (char *)malloc(10000000) ;
            
           
            char t1, t2, t3, t4 ;
            int index ; 
            
            
            // Initialize 4-D array
            int tupleCount[4][4][4][4] ;
            for(w = 0 ; w < 4 ; w++ )
                for(x = 0 ; x < 4 ; x++ )
                    for(y = 0 ; y < 4 ; y++ ) 
                        for(z = 0 ; z < 4 ; z++ )
                            {
                                tupleCount[w][x][y][z] ;
                            }
            
            // Open input file to read from        
            if( ! ( input = fopen( argv[1], "r" ) ) )
                { 
                    printf( "COULD NOT OPEN FILE %s - Exit!\n", argv[1]) ; 
                    exit(1) ; 
                }        
            
            
            // Collect sequence from GenBank file
            while(fgets(buffer, 1000, input))
                            {
                            // start obtaining bases after ORIGIN
                            if(strstr(buffer, "ORIGIN")) 
                                {                                                  
                                   while((c=getc(input)) != '/' && c != EOF)
                                        {
                                        if(c >= 'a' && c <= 'z')
                                            {
                                            seqData[i++] = c ;
                                            }
                                        }                            
                                }           
                            }
                            
            // Convert a,c,g,t to 0,1,2,3                
          /*  while(tr=getc(seqData) != EOF)
                {
                            
                } */
            
            
            
            // Scan DNA sequence for each 4-mer 
            for( index = 0 ; index < strlen(seqData) - 3 ; ++index )
                {
                    t1 = seqData[index] ;
                    t2 = seqData[index + 1] ;
                    t3 = seqData[index + 2] ;
                    t4 = seqData[index + 3] ;
                    
                    
                    // Accumulate a count to find distribution
                    ++tupleCount[t1][t2][t3][t4] ;
                    
                }
                
        
             
             
                       
            fclose(input) ;
        
            printf("Here is the distribution of 4-mers:\n\n%s", tupleCount ) ;     
            free(seqData) ; 
            return(0) ;
        }
    Thanks

  6. #6
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Casting malloc() is usually a bad idea in C -- see the FAQ.

    Code:
    tupleCount[w][x][y][z] ;
    Obviously, this does nothing. Presumably you want to initialize that element of the array to something, perhaps 0.

    Code:
    seqData[i++] = c ;
    Here's where you want to do the conversion. Well, let's start off with a simple way of doing it. Here's what you said:
    I need to convert a->0, c->1, g->2, t->3.
    Okay:
    Code:
    if(c == 'a') seqData[i++] = 0;
    else if(c == 'c') seqData[i++] = 1;
    else if(c == 'g') seqData[i++] = 2;
    else if(c == 't') seqData[i++] = 3;
    else /* error */;
    But wait, that's a lot of "seqData[i++]"'s. Maybe it would be better to do something like this:
    Code:
    int n;
    if(c == 'a') n = 0;
    else if(c == 'c') n = 1;
    else if(c == 'g') n = 2;
    else if(c == 't') n = 3;
    else n = -1;  /* error */
    seqData[i++] = n;
    But that could also be a switch statement:
    Code:
    switch(c) {
        case 'a': n = 0; break;
        case 'c': n = 1; break;
        case 'g': n = 2; break;
        case 't': n = 3; break;
        default: n = -1; break;  /* error */
    }
    seqData[i++] = n;
    There are so many ways to do it . . . .
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 05-13-2011, 08:28 AM
  2. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  3. <Gulp>
    By kryptkat in forum Windows Programming
    Replies: 7
    Last Post: 01-14-2006, 01:03 PM
  4. load gif into program
    By willc0de4food in forum Windows Programming
    Replies: 14
    Last Post: 01-11-2006, 10:43 AM
  5. pointers
    By InvariantLoop in forum C Programming
    Replies: 13
    Last Post: 02-04-2005, 09:32 AM