I need help on how i am going to implement this.
i have a sequence of letters such as ATGCATA
and i want to find the biggest number of repeats which in this case woud be :AT
help would be really appreciative
thanks
I need help on how i am going to implement this.
i have a sequence of letters such as ATGCATA
and i want to find the biggest number of repeats which in this case woud be :AT
help would be really appreciative
thanks
Could you define the problem a little more? For instance, is a letter allowed to repeat itself? Are you looking for groups of 2, or groups of n? Are there only certain letters in the sequence?
If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein
thank you for the quick reply,
sorry for the lack of information, a letter is allowed to repeat itself. Im looking for the maximal repeat in a sequence.There are only 4 letters in the sequence.
ATGCATA : so the most repeated would be AT
ATGCGTGCG: the most repeated would be TGCG
I hope this makes sense
Would you allow substrings of 1? For instance, in ACGTCATGAGCTA, would A be the most repeated?
If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein
i have already done a program which does substrings of 1,but this program would be "n" substring.
In this case though ACGTCATGAGCTA , the maximal repeat is A. As there are no other repeats.
Well, break the program down into pieces. It seems the first step might be to break down the input string into a bunch of string groupings from 1 to 4 characters in length. For that you could use a set:
Should output:Code:set<string> strSet; string data("ATGCATA"); // Find all unique/different 1-4 character groupings from input string. for( string::iterator it = data.begin(); it != data.end(); ++it ) for( int i = 1; i <= 4 && i <= distance(it,data.end()); ++i ) strSet.insert(string(it,i)); // Output all the groupings to the console. copy(strSet.begin(),strSet.end(),ostream_iterator<string>(cout,"\n"));
There, easy isn't it?Code:A AT ATA ATG ATGC C CA CAT CATA G GC GCA GCAT T TA TG TGC TGCA
There is more to do of course. Now you would need to find and count how many of each of those above groupings there are back in the original input string and go from there but of course you can do that with your wonderfull coding skills correct? I leave that up to you!
Last edited by hk_mp5kpdw; 01-14-2005 at 10:39 AM. Reason: Changed int i = 0 to int i = 1
"Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
-Christopher Hitchens
hi,thanks for the help and time to write that sample code.but i have already done a program similar to that, basically calculating the number of occurences from the string..
how i would find a repeat of n length?
if the samplestring was big like:
ATGTACAATGTACA - and the biggest repeat was :ATGTACA
Wow hk_mp5kpdw, that was pretty awesome. I wish I knew STL a bit better, because that just cut the code I was using by 75%.
jodders, you can search for substrings of n length by changing the second for-loop: instead of looping for up to 4 characters, you want to loop to the length of the string. By using a map<string,int> instead of a set<string>, you can keep track of how many times you find each substring. Then it's just a matter of searching back through your map for the largest number of hits.
If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein
so how would you loop to the end of the string, instead of looping 4 characters.
everytime i try and compile, i get set undeclared, and strSet undeclared
Last edited by jodders; 01-14-2005 at 11:40 AM.
Change "i <= 4" to "i <= data.length()".Originally Posted by jodders
Have you included the appropriate headers <string>, and <set> (or <map> if using pianorain's suggestion) [also <algorithm> and possibly <iterator> if using the copy function]? How are you handling the namespace issue? All of these things: strings, sets, etc... are all in the std namespace.Originally Posted by jodders
Last edited by hk_mp5kpdw; 01-14-2005 at 01:10 PM.
"Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
-Christopher Hitchens
trying to identify genes or something?Originally Posted by jodders
Code:#include <cmath> #include <complex> bool euler_flip(bool value) { return std::pow ( std::complex<float>(std::exp(1.0)), std::complex<float>(0, 1) * std::complex<float>(std::atan(1.0) *(1 << (value + 2))) ).real() < 0; }
What exactly are you trying to do?
Here's a algorithm that determines the size of the largest repeating substring in a given string. It doesn't identify what or where the duplicate substrings are, though I believe the techniques it uses could be used to do so.
For a string of size s, there will be z = y + 1 substrings of length n = s - y where y < s. That is; z = s - n + 1.
Code:string input = "ATGTACAATGTACA"; int s = input.length(); string substring; int n; // length of substring int max = 1; //length of largest duplicate substring in input. int z; //number of substrings of length n expected int begin; //where to start substring int i, index; // set<string> substrings;//container to hold all unique substrings for(n = 2; n < s; ++n) //length of substring { for(begin = 0; begin <= s - n; ++begin) //where to start each substring { index = begin; //generate substring of length n starting at index = begin for(i = 0; i < n; ++i) substring += input[index++]; //insert substring into substrings substrings.insert(substring); substring = ""; } z = s - n + 1; //if there are no duplicates, then size of substrings will be equal to z. if(substrings.size() == z) { //maximum length of duplicated sequence in input string is n - 1; max = n - 1; break; } else { //duplicate found, so look as next larger n substrings.clear(); //clear set of substrings max = n; } } cout << max << endl;
Last edited by elad; 01-14-2005 at 03:00 PM.
You're only born perfect.
hi guys,i am very very appreciative with the help you have given me.Iam still learning c++ by myself and have found your tips useful. I have found "hk_mp5kpdw " tips outstanding and also pianorain.
I think i will use pianorain's map<string,int> instead of a set<string> to keep track of the substring.
I have got hk_mp5kpdw snippet of code to work,and been messing about trying to use implement the map function inside it, but i get another compilation error. Sorry for my ignorance beforehand! my compiler just goes mad when i change the map<string,int> instead of a set<string>,. I have added the <map> header as told. Am i missing something else?
Once again,thanks for this help.It has been really useful
All other things being the same, the insert for the map<string,int> object should probably look like:
If using the map, you would probably find it easier to change the copy to a more basic for loop.Code:variablename[string(it,i)]++;
Post your code you are using (as much as you possibly can) and any specific errors you get.
"Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
-Christopher Hitchens
thanks hk_mp5kpdw for your reply,pm;y two errors now!
error:Code:int main() { int variable; map<string,int> strSet; string data("ATGCATAAAAAAAACATGATGACTAGATGACATACGATCGACTGACTGACTGACTGACTGACGAC"); for( string::iterator it = data.begin(); it != data.end(); ++it ) for( int i = 1; i <= data.length() && i <= distance(it,data.end()); ++i ) strSet.insert(variable[string(it,i)]++); for(strSet.begin(),strSet.end(),ostream_iterator<string>(cout,"\n")); }
my own problem is declaring the datatype for my variable. (could you explain wat im doing wrong so i wont make the same mistake in future )
parse error before ')' for loop line