I don't know how your data looks. If it's a string of characters like
Originally Posted by franziss
then each character is 8 bit long. So that sequence would take up 4 bytes of memory. Representing those as a binary instead could look like this:
Since you are only talking about DNA in your above post, do you need Uracil at all? Because if you don't you'd get along with 2 bits as you can see above. This would make bitextraction a *lot* easier. That way, you could fit in exactly 4 bases into one byte.
bin dec rep
000 0 Adenin
001 1 Cytosine
010 2 Thymine
011 3 Guanine
100 4 Uracil (???)
Now all you need is a basic understanding of
- how decimals are represented in the binary system
- c bit-operators
To access a 2bit base within an array of bytes, you'd have to write a few functions with prototypes like this.
GetBase(myDNA, 1) would retrieve the first 2 bits of the first byte and convert it to a character. GetBase(myDNA, 6) would retrieve the second 2 bits from the second byte and so on. Your DNA would be stored as bitsequence but you could acces it almost as if it were an array of characters. SetBase would look up the bitwise representation chosen for any base supplied and store it in the array at byte (index/4), bits (index%4)*2 and (index%4)*2 +1.
char GetBase(char *sequence, int index);
void SetBase(char *sequence, int index, char base);
Since I'm making a lot of assumptions here I won't go in any more details. For a start, search this board or google for bitwise operations. You could also post more details and some code.