Originally Posted by
franziss
Thanks Nyda! That's a very interesting idea! I will try to work on that. But how do you read an 32bits character(assuming each character is a long word) and convert it into a 3 bits? Does C allow us to do it?
I don't know how your data looks. If it's a string of characters like
then each character is 8 bit long. So that sequence would take up 4 bytes of memory. Representing those as a binary instead could look like this:
Code:
bin dec rep
000 0 Adenin
001 1 Cytosine
010 2 Thymine
011 3 Guanine
100 4 Uracil (???)
Since you are only talking about DNA in your above post, do you need Uracil at all? Because if you don't you'd get along with 2 bits as you can see above. This would make bitextraction a *lot* easier. That way, you could fit in exactly 4 bases into one byte.
Now all you need is a basic understanding of
- how decimals are represented in the binary system
- c bit-operators
- strings/arrays
To access a 2bit base within an array of bytes, you'd have to write a few functions with prototypes like this.
Code:
char GetBase(char *sequence, int index);
void SetBase(char *sequence, int index, char base);
GetBase(myDNA, 1) would retrieve the first 2 bits of the first byte and convert it to a character. GetBase(myDNA, 6) would retrieve the second 2 bits from the second byte and so on. Your DNA would be stored as bitsequence but you could acces it almost as if it were an array of characters. SetBase would look up the bitwise representation chosen for any base supplied and store it in the array at byte (index/4), bits (index%4)*2 and (index%4)*2 +1.
Since I'm making a lot of assumptions here I won't go in any more details. For a start, search this board or google for bitwise operations. You could also post more details and some code.