I'm comparing two sequences of characters (using only 6 different characters: A, B, C, D, E, and F)...consider seq1 on the x-axis and seq2 on the y-axis. What I want is a fast and efficient way to see the matching letters in the sequences.
So, if you have:
then for let's say the letter D, the co-ordinates would be (2,4) and (8,4)Code:x-axis: ADCBEFADB y-axis: BCBDAC
If you do this for all the letters, how do you store the multiple co-ordinates...what data structure would be the best...plus i want to make this a very fast process...and really really fast indeed cause I need to use these co-ordinates later on and I might be scanning in sequences as long as 2000 to 5000 characters long!
I thought of the matrices approach whereby I could have 6 different matrices corresponding to each letter and during read I would simply store the position of the relevant letter as a point in the matrix. but then how do i update that to two numbers when I scan in the other sequence...
i'm quite unsure of how best to achieve this...any ideas would be very helpful...
Each letter could be a class with the co-ordinate as an attribute...I could create and object when I read a letter and store that in a MAP or something...but the main issue with that is:
wouldn't this be quite slow???