1. ## Generating DNA sequence

I'm writing a program that generates the DNA sequence given a base number.

For example, if I give it base 1, the sequence would be: a, c, g, t. If I give it base 2, the sequence would be: aa, ac, ag, at, ca, cc, cg, ct, ga, gc, gg, gt, ta, tc, tg, tt. Etc for other bases....

I'm thinking the prototype could be something like: void generate (int base);

Can anyone give me some ideas on how to approach this problem?

2. I would be tempted to use the base number as the subscript number for the sequence, in an array of strings.
Code:
```const int dnaNum = DNA number;
const int maxdnaLength = N;

char dna[dnaNum][maxdnaLength] = {
{ '\0' }, //base number 0 doesn't exist, so this row wouldn't be used
{ "acgt" },  //base number 1 is in subscript array row 1
{ "aaacagatcacccgctgagcgggttatctgtt" }, //base number 2 in array row 2
//etc.
};```

I would be tempted to use the base number as the subscript number for the sequence, in an array of strings.
Code:
```const int dnaNum = DNA number;
const int maxdnaLength = N;

char dna[dnaNum][maxdnaLength] = {
{ '\0' }, //base number 0 doesn't exist, so this row wouldn't be used
{ "acgt" },  //base number 1 is in subscript array row 1
{ "aaacagatcacccgctgagcgggttatctgtt" }, //base number 2 in array row 2
//etc.
};```
Well, I want the program to use an algorithm to produce the sequence, instead of me manually input all the sequence. Imagine base 15, that would produce ~1 billion different sequences. That's impossible to manually write them.

4. sounds like recursion to me

5. I Googled for it, and checked Wikipedia, but was unable to find an algorithm to use. Do you have a pattern that we can follow to generate these long sequences?

6. Well, the pattern is...

For base 2:
First letter is A, then match with second letter ACGT (you'd get, AA, AC, AG, AT). Then first letter change to C, and match with second letter ACGT (you'd get CA, CC, CG, CT). Then first letter change to G, and match with second letter ACGT (you'd get GA, GC, GG, GT). Finally first letter change to T, and match with second letter ACGT (you'd get TA, TC, TG, TT).

For base 3:
First letter is A, second letter is A, match with third letter ACGT (you'd get AAA, AAC, AAG, AAT). First letter stay A, second letter change to C, match with third letter ACGT (you'd get ACA, ACC, ACG, ACT). First letter would change to C only after second letter finish with T. And so on....for a total of 64 different occurrences.

And I'd like to generate for up to base 15.

7. at first i thought of string manipulation, but now i think you could use nested loops

8. Yes, nested loops are the ticket.

Code:
```for(letr = 'A'; letr < 'U'; letr++)  {
//skip non base letters
if(letr == 'H')
letr = 'T';

if(letr != 'A' && letr != 'C' && letr != 'G && letr != 'T")
continue;

for(letr2 = 'A'; letr2 < 'U'; letr2++)  {
//skip non base letters
if(letr == 'H')
letr = 'T';
if(letr != 'A' && letr != 'C' && letr != 'G && letr != 'T")
continue;

printf("%c%c  ", letr, letr2);
}
}```
This is just rough pseudo-code, not ready to run code, and may be too naive to handle this problem.

I have used this type of nested loop to handle 81 squares on Sudoku, and generate Trillions and Trillions (and much more than that), grids, so it is both robust and fast, but every problem is different.

Yes, nested loops are the ticket.

Code:
```for(letr = 'A'; letr < 'U'; letr++)  {
//skip non base letters
if(letr == 'H')
letr = 'T';

if(letr != 'A' && letr != 'C' && letr != 'G && letr != 'T")
continue;

for(letr2 = 'A'; letr2 < 'U'; letr2++)  {
//skip non base letters
if(letr == 'H')
letr = 'T';
if(letr != 'A' && letr != 'C' && letr != 'G && letr != 'T")
continue;

printf("%c%c  ", letr, letr2);
}
}```
This is just rough pseudo-code, not ready to run code, and may be too naive to handle this problem.

I have used this type of nested loop to handle 81 squares on Sudoku, and generate Trillions and Trillions (and much more than that), grids, so it is both robust and fast, but every problem is different.
i would have made letr a pointer to an array holding 'A' 'C' 'G' and 'T' for simplicity

10. yes, that's a very sensible idea.

Carry on d_d!

11. Ok here's what I got so far.

This is only for base 2.

Code:
```#include <stdio.h>
#include <ctype.h>
#include <string.h>

// test generate base 2
int main() {

char base[16][3];
char temp[3];
char choice[] = "acgt";

int i, j, k = 0;

for (i = 0; i < 4; i++) {
for (j = 0; j < 4; j++) {
sprintf(temp, "%c%c\0\n", choice[i], choice[j]);    //save the string to temp
strcpy(base[k],temp);            // save each occurrence
k++;
}
}

for (i = 0; i < 16; i++)
printf("%s \n", base[i]);

return 0;
}```
And the output is:
Code:
```aa
ac
ag
at
ca
cc
cg
ct
ga
gc
gg
gt
ta
tc
tg
tt```
Note: the reason why I saved the occurrences permanently in an array (base) instead of printing it right away (from temp) is because I'm planning on doing something to each of the occurrence later.

The problem is, for each base number, the char base[16][3] and char temp[3] changes in number (ie, for base 3, it would have to be char base[64][4] and char temp[4]. And, the more bases, the more for loops (ie. base 2 has 2 for loops, base 3 would have 3 for loops, etc.).

So,
1) How should I make it a separate method instead of residing inside main()? What should the prototype method be?
2) How should I make the method works for any base number (without repeatedly typing the same codes 15 times and only adding an additional for loop for each base)?

P.S. I plan on avoiding continue, skip, break, etc. They're just not my style.

12. put the for loops inside an if statement?

Carry on d_d!
ಠ_ಠ

13. I would take a different approach.

I would first calculate the amount of space the sequence requires - with base 1, its 4, otherwise 4^(base-1), right?

Then I would calculate the width of "one subsecuence". Eg with base 1, it is 1 (a or c or g or t). with base 2, it is 2 (aa or ac or ...)

And finally amount of the "subsecuences". For example with allocated space/width of sequence.

Then I would start filling up the space.

first I'd write a<empty for rest of subsec>c<empty for rest of subsec>g<empty for rest of subsec>t<empty for rest of subsec>a<empty for rest of subsec>c<empty for rest of subsec>....
then a<empty for rest of subsec>a<empty for rest of subsec>a<empty for rest of subsec>a<empty for rest of subsec>c<empty for rest of subsec>c<empty for rest of subsec>c<empty for rest of subsec>c<empty for rest of subsec> ...

just increasing the amount of repeated writes of same letter for each time - untill the whole area was covered. All of changing variables can be calculated from base. You can build all the loops in function, and then just decide how many loops you go through before breaking out depending on the base.

14. And later you could optimize required memory, by using bit shifts and masks to use only 2 bits to mark one character. (Naturally encoding & decoding will slow down the process, but storing handled data will be more efficient.)

15. you can do it like this 2
Code:
```void base_genrator(int num)
{
switch(num){
case 2:
binary_base
codegoeshere
break;
case 7:
hex_base
codegoeshere;
break;
case 9:
decimal_base;
codegoeshere;
break;}```
you specify the base in your main program and the write the code in the switch .