Thread: repeats in a string

  1. #1
    Registered User
    Join Date
    Jan 2005
    Posts
    18

    repeats in a string

    I need help on how i am going to implement this.

    i have a sequence of letters such as ATGCATA

    and i want to find the biggest number of repeats which in this case woud be :AT

    help would be really appreciative

    thanks

  2. #2
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Could you define the problem a little more? For instance, is a letter allowed to repeat itself? Are you looking for groups of 2, or groups of n? Are there only certain letters in the sequence?
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  3. #3
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    thank you for the quick reply,

    sorry for the lack of information, a letter is allowed to repeat itself. Im looking for the maximal repeat in a sequence.There are only 4 letters in the sequence.

    ATGCATA : so the most repeated would be AT

    ATGCGTGCG: the most repeated would be TGCG

    I hope this makes sense

  4. #4
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Would you allow substrings of 1? For instance, in ACGTCATGAGCTA, would A be the most repeated?
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  5. #5
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    i have already done a program which does substrings of 1,but this program would be "n" substring.

    In this case though ACGTCATGAGCTA , the maximal repeat is A. As there are no other repeats.

  6. #6
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Well, break the program down into pieces. It seems the first step might be to break down the input string into a bunch of string groupings from 1 to 4 characters in length. For that you could use a set:

    Code:
    set<string> strSet;
    string data("ATGCATA");
    
    // Find all unique/different 1-4 character groupings from input string.
    for( string::iterator it = data.begin(); it != data.end(); ++it )
        for( int i = 1; i <= 4 && i <= distance(it,data.end()); ++i )
            strSet.insert(string(it,i));
    
    // Output all the groupings to the console.
    copy(strSet.begin(),strSet.end(),ostream_iterator<string>(cout,"\n"));
    Should output:

    Code:
    A
    AT
    ATA
    ATG
    ATGC
    C
    CA
    CAT
    CATA
    G
    GC
    GCA
    GCAT
    T
    TA
    TG
    TGC
    TGCA
    There, easy isn't it?

    There is more to do of course. Now you would need to find and count how many of each of those above groupings there are back in the original input string and go from there but of course you can do that with your wonderfull coding skills correct? I leave that up to you!
    Last edited by hk_mp5kpdw; 01-14-2005 at 10:39 AM. Reason: Changed int i = 0 to int i = 1
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  7. #7
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    hi,thanks for the help and time to write that sample code.but i have already done a program similar to that, basically calculating the number of occurences from the string..

    how i would find a repeat of n length?

    if the samplestring was big like:

    ATGTACAATGTACA - and the biggest repeat was :ATGTACA

  8. #8
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Wow hk_mp5kpdw, that was pretty awesome. I wish I knew STL a bit better, because that just cut the code I was using by 75%.

    jodders, you can search for substrings of n length by changing the second for-loop: instead of looping for up to 4 characters, you want to loop to the length of the string. By using a map<string,int> instead of a set<string>, you can keep track of how many times you find each substring. Then it's just a matter of searching back through your map for the largest number of hits.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  9. #9
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    so how would you loop to the end of the string, instead of looping 4 characters.

    everytime i try and compile, i get set undeclared, and strSet undeclared
    Last edited by jodders; 01-14-2005 at 11:40 AM.

  10. #10
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by jodders
    so how would you loop to the end of the string, instead of looping 4 characters.
    Change "i <= 4" to "i <= data.length()".

    Quote Originally Posted by jodders
    everytime i try and compile, i get set undeclared, and strSet undeclared
    Have you included the appropriate headers <string>, and <set> (or <map> if using pianorain's suggestion) [also <algorithm> and possibly <iterator> if using the copy function]? How are you handling the namespace issue? All of these things: strings, sets, etc... are all in the std namespace.
    Last edited by hk_mp5kpdw; 01-14-2005 at 01:10 PM.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  11. #11
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by jodders
    I need help on how i am going to implement this.

    i have a sequence of letters such as ATGCATA

    and i want to find the biggest number of repeats which in this case woud be :AT

    help would be really appreciative

    thanks
    trying to identify genes or something?
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  12. #12
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    What exactly are you trying to do?

    Here's a algorithm that determines the size of the largest repeating substring in a given string. It doesn't identify what or where the duplicate substrings are, though I believe the techniques it uses could be used to do so.

    For a string of size s, there will be z = y + 1 substrings of length n = s - y where y < s. That is; z = s - n + 1.

    Code:
    string input = "ATGTACAATGTACA";
    int s = input.length();
    string substring;
    int n; // length of substring
    int max = 1; //length of largest duplicate substring in input.
    int z; //number of substrings of length n expected
    int begin; //where to start substring
    int i, index; //
    set<string> substrings;//container to hold all unique substrings 
     
    for(n = 2; n < s; ++n) //length of substring
    {
    	for(begin = 0; begin <= s - n; ++begin) //where to start each substring
    	{
    	 index = begin;
     
    	 //generate substring of length n starting at index = begin
    	 for(i = 0; i < n; ++i)
    		substring += input[index++];
     
    	 //insert substring into substrings
    	 substrings.insert(substring);
    	 substring = "";
    	}
    	z = s - n + 1;
    	//if there are no duplicates, then size of substrings will be equal to z.
    	if(substrings.size() == z)
    	{
    	 //maximum length of duplicated sequence in input string is n - 1;
    	 max = n - 1;
    	 break;
    	}
    	else
    	{
    	 //duplicate found, so look as next larger n
    	 substrings.clear(); //clear set of substrings 
    	 max = n;
    	} 
    }
    cout << max << endl;
    Last edited by elad; 01-14-2005 at 03:00 PM.
    You're only born perfect.

  13. #13
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    hi guys,i am very very appreciative with the help you have given me.Iam still learning c++ by myself and have found your tips useful. I have found "hk_mp5kpdw " tips outstanding and also pianorain.

    I think i will use pianorain's map<string,int> instead of a set<string> to keep track of the substring.

    I have got hk_mp5kpdw snippet of code to work,and been messing about trying to use implement the map function inside it, but i get another compilation error. Sorry for my ignorance beforehand! my compiler just goes mad when i change the map<string,int> instead of a set<string>,. I have added the <map> header as told. Am i missing something else?

    Once again,thanks for this help.It has been really useful

  14. #14
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    All other things being the same, the insert for the map<string,int> object should probably look like:

    Code:
    variablename[string(it,i)]++;
    If using the map, you would probably find it easier to change the copy to a more basic for loop.

    Post your code you are using (as much as you possibly can) and any specific errors you get.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  15. #15
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    thanks hk_mp5kpdw for your reply,pm;y two errors now!

    Code:
    
    int main()
    {
      int variable;
        
    map<string,int> strSet;
    string data("ATGCATAAAAAAAACATGATGACTAGATGACATACGATCGACTGACTGACTGACTGACTGACGAC");
    
    for( string::iterator it = data.begin(); it != data.end(); ++it )
        for( int i = 1; i <= data.length() && i <= distance(it,data.end()); ++i )
            strSet.insert(variable[string(it,i)]++);
    
    for(strSet.begin(),strSet.end(),ostream_iterator<string>(cout,"\n"));
    
    }
    error:
    my own problem is declaring the datatype for my variable. (could you explain wat im doing wrong so i wont make the same mistake in future )

    parse error before ')' for loop line

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  2. We Got _DEBUG Errors
    By Tonto in forum Windows Programming
    Replies: 5
    Last Post: 12-22-2006, 05:45 PM
  3. Something is wrong with this menu...
    By DarkViper in forum Windows Programming
    Replies: 2
    Last Post: 12-14-2002, 11:06 PM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM