# finding mismatches

• 02-28-2005
jodders
finding mismatches
hi gang, having problems with a program i am trying to write..

im going to try my best to explain this. I am terrible at explaining when i am nervous. ops:

In some programs i have written, i have been looking at repeats. Which has been pretty straight forward... But i have decided now to try and looking for mismatches in sequences.

A mismatch for example:

Say you have a sequence:

K=7

S1 = ATGCTATG
S2 = ATGCGAGC

As you can see, from the two sequences there are some characters that match,but some that dont..... ie.. T - G, G-C .... this is called a mismatch....

I myself is still trying to get my head around this different way of thinking...

Here is an example of what a program of mismatches might output:
If you had a sequence: "AGTGTAGTAG"

Code:

```AG - 3 repeats found  1 mismatch    //because AG and TG ...G and G in same position GT - 3 repeats found    0 mismatch AGT - 2 repeats found  1 mismatch    //coz AGT and TGT  GT are similar... TA - 2 repeats found  1 mismatch    //coz GA are similar AGTA - 1 repeats found b] 2 mismatch  [/b] //AGTG,TGTA , GT's are in position```
I hope this makes it a little bit clearer what iam up against...

I have written a program before to find perfect repeats...basically it goes through each sequence and compares each string and outputs the number of times it has been repeated..

i can show my implementation if wanted,but im really stuck and some guidance would be really appreciated

kind regards
JoHnNY
• 02-28-2005
The Brain
You are looking for a way to calculate the number of repeated characters.. or squence of characters appear in a given string. This is how I would do it:

First.. I would calculate the total number of possiblities.. which is the, "total number of characters.. to the total number of characters power."

Code:

`int total_possibilities = pow(strlen(input), strlen(input));`
you can use this information to make dynamic arrays and use as a loop counter

The next step, would be to populate a string array with every single possible character combination. This algorithm might seem wicked hard at first.. but if you start out with like a 4 or 5 letter word.. and draw it out on paper.. you can come up with a general algortihm that will work pretty easily.

The next step, would be to create an array of counters the same size as your array of possible character combinations. You will use this array in a direct 1-to-1 correlation with your array of possible character combinations. (i.e. the 3rd element counter will match up with the 3rd element of your character possibility array.. so you can determine stuff like 'GT' was matched 10 times etc. )

The next step, would be to use nested for loops.. to run through the user input array.. and compare the user input.. against every posibility. If there is a match you can increment the corrosponding counter.

maybe something like this:
Code:

```string test = ""; for(int i=0; i<strlen(input); i++) {     test+=input[i];  //slowly build up a "test" string, one character at a time           for(int j=0; j<total_possibilities; j++)           {               if(test == possibilities[j])                     ++counters[j];  //notice the 1:1 correlation            } }```
This is probably nothing like your mismatch approach to solving this problem.. but I hope it offers a different perspective.
• 02-28-2005
uh are you talking about edit distance
that means how many insertions or deletions are neccessairy to "convert" string 1 into string 2:

e.g.
ABTATBX
ACBFATB

one possibly "alignment" would be
Code:

```A-BT-ATBX ACB-FATB-```
thus in that case youd have 2 indels.

the goal is to find to minimum number of indels neccessairy
(indel = insertion/deletion(

you might lookup edit distance. of course you can use the brute force approach with exponential complexity (takes forever for long strings) - or use the special algorithms with polynomial complexity (takes long - but not forever)
• 02-28-2005