Best Data Structure for this in C++

• 05-21-2005
alvifarooq
Best Data Structure for this in C++
I'm comparing two sequences of characters (using only 6 different characters: A, B, C, D, E, and F)...consider seq1 on the x-axis and seq2 on the y-axis. What I want is a fast and efficient way to see the matching letters in the sequences.

So, if you have:
Code:

```x-axis: ADCBEFADB y-axis: BCBDAC```
then for let's say the letter D, the co-ordinates would be (2,4) and (8,4)

If you do this for all the letters, how do you store the multiple co-ordinates...what data structure would be the best...plus i want to make this a very fast process...and really really fast indeed cause I need to use these co-ordinates later on and I might be scanning in sequences as long as 2000 to 5000 characters long!

I thought of the matrices approach whereby I could have 6 different matrices corresponding to each letter and during read I would simply store the position of the relevant letter as a point in the matrix. but then how do i update that to two numbers when I scan in the other sequence...

i'm quite unsure of how best to achieve this...any ideas would be very helpful...

Another thought:

Each letter could be a class with the co-ordinate as an attribute...I could create and object when I read a letter and store that in a MAP or something...but the main issue with that is:
wouldn't this be quite slow???
• 05-21-2005
Salem
Seems like a simple vector of x,y coordinates will suffice for each letter
• 05-22-2005
alvifarooq
i'll make it clearer:
- suppose i read seq1 and then seq2
- i have 6 different types of some data structure (for A,B,C,D,E, and F)
Code:

```- while reading each character in seq1   {   if character is A then note its position and store it in the data structure as a point   .   .   .   } - while reading each character in seq2   {   if character is A then go back to same data structure as before and update the location of this point there   .   .   .   }```
In the end, I hope to get this:
Code:

```for character A: ___|56789____  x-axis  2 |  3 |  all 'matched' co-ordinates here  6 |  (these are the points I want, possibly as objects)  7 |   |   |  y-axis```
• 05-22-2005
manofsteel972
Well one way would be to create a class that has a char value to store the letter (i.e A) and a vector of type COORD to store the x-y values. COORD is just a struct with two int's x and y; Just create a new COORD object assign the values for x and y and then push it onto the vector. Then you can create member functions to print the list of coord's etc.
• 05-22-2005
alvifarooq
ok so i get the struct approach...so i just make a new struct when i read in seq2 each time i read in the appropriate letter...question is...this seems it's gonna take some performance hit...reding the sequences and getting the match coordinates is basically the first step...i have to perform searches on it later...like astar etc...so i want this to be extremely fast and efficient...

ya data structure is kind of understood...it's really how to frame it to use it in the context of my program...

Here's what i can do:
Code:

```Class Node { Name of Node (either A, B, C, D, E, and F) x-cord vector y-cord vector matching-cord vector } Make A, B, C, D, E, and F nodes While reading characters in seq1 { if A then push x-cordinate in 'x-cord vector' of Node A //similar for all the others } While reading characters in seq2 { if A then push y-cordinate in 'y-cord vector' of Node A } //Now I need something to insert all x-cords, y-cords from each letter's vector into that letter's 'matching-cord vector'```
The reason is that matching can only be done when i have read in both sequences...

if possible i want to do this later step whille i read in seq2...i don't want any substantial performance hit...
• 05-22-2005
okinrus
Quote:

If you do this for all the letters, how do you store the multiple co-ordinates...what data structure would be the best...plus i want to make this a very fast process...and really really fast indeed cause I need to use these co-ordinates later on and I might be scanning in sequences as long as 2000 to 5000 characters long!
You could use two arrays, maybe "std::vector<int> xvalue[256]" and "std::vector<int> yvalue[256]". For instance, xvalue['A'] would be a vector stpring all posible x coordinates for A. yvalue['A'] would be a vector storing all possible y coordinates. Combining all combinations of the above and you have all possible points. Given your character data is only letters this setup should work.
• 05-23-2005
anykey
You could use std::multimap<char, COORD>.