Thread: Best Data Structure for this in C++

  1. #1
    Registered User
    Join Date
    Sep 2004
    Posts
    124

    Question Best Data Structure for this in C++

    I'm comparing two sequences of characters (using only 6 different characters: A, B, C, D, E, and F)...consider seq1 on the x-axis and seq2 on the y-axis. What I want is a fast and efficient way to see the matching letters in the sequences.

    So, if you have:
    Code:
    x-axis: ADCBEFADB
    y-axis: BCBDAC
    then for let's say the letter D, the co-ordinates would be (2,4) and (8,4)

    If you do this for all the letters, how do you store the multiple co-ordinates...what data structure would be the best...plus i want to make this a very fast process...and really really fast indeed cause I need to use these co-ordinates later on and I might be scanning in sequences as long as 2000 to 5000 characters long!

    I thought of the matrices approach whereby I could have 6 different matrices corresponding to each letter and during read I would simply store the position of the relevant letter as a point in the matrix. but then how do i update that to two numbers when I scan in the other sequence...

    i'm quite unsure of how best to achieve this...any ideas would be very helpful...

    Another thought:

    Each letter could be a class with the co-ordinate as an attribute...I could create and object when I read a letter and store that in a MAP or something...but the main issue with that is:
    wouldn't this be quite slow???
    Last edited by alvifarooq; 05-21-2005 at 02:06 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Seems like a simple vector of x,y coordinates will suffice for each letter
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2004
    Posts
    124
    i'll make it clearer:
    - suppose i read seq1 and then seq2
    - i have 6 different types of some data structure (for A,B,C,D,E, and F)
    Code:
    - while reading each character in seq1
      {
      if character is A then note its position and store it in the data structure as a point
      .
      .
      .
      }
    
    - while reading each character in seq2
      {
      if character is A then go back to same data structure as before and update the location of this point there
      .
      .
      .
      }
    In the end, I hope to get this:
    Code:
    for character A:
    
    ___|56789____   x-axis
     2 |
     3 |  all 'matched' co-ordinates here
     6 |  (these are the points I want, possibly as objects)
     7 |
       |
       |
     y-axis
    Last edited by alvifarooq; 05-22-2005 at 04:18 AM.

  4. #4
    Registered User manofsteel972's Avatar
    Join Date
    Mar 2004
    Posts
    317
    Well one way would be to create a class that has a char value to store the letter (i.e A) and a vector of type COORD to store the x-y values. COORD is just a struct with two int's x and y; Just create a new COORD object assign the values for x and y and then push it onto the vector. Then you can create member functions to print the list of coord's etc.
    "Knowledge is proud that she knows so much; Wisdom is humble that she knows no more."
    -- Cowper

    Operating Systems=Slackware Linux 9.1,Windows 98/Xp
    Compilers=gcc 3.2.3, Visual C++ 6.0, DevC++(Mingw)

    You may teach a person from now until doom's day, but that person will only know what he learns himself.

    Now I know what doesn't work.

    A problem is understood by solving it, not by pondering it.

    For a bit of humor check out xkcd web comic http://xkcd.com/235/

  5. #5
    Registered User
    Join Date
    Sep 2004
    Posts
    124
    ok so i get the struct approach...so i just make a new struct when i read in seq2 each time i read in the appropriate letter...question is...this seems it's gonna take some performance hit...reding the sequences and getting the match coordinates is basically the first step...i have to perform searches on it later...like astar etc...so i want this to be extremely fast and efficient...

    ya data structure is kind of understood...it's really how to frame it to use it in the context of my program...

    Here's what i can do:
    Code:
    Class Node
    {
    Name of Node (either A, B, C, D, E, and F)
    x-cord vector
    y-cord vector
    matching-cord vector
    }
    
    Make A, B, C, D, E, and F nodes
    
    While reading characters in seq1
    {
    if A then push x-cordinate in 'x-cord vector' of Node A 
    //similar for all the others
    }
    
    While reading characters in seq2
    {
    if A then push y-cordinate in 'y-cord vector' of Node A 
    }
    
    //Now I need something to insert all x-cords, y-cords from each letter's vector into that letter's 'matching-cord vector'
    The reason is that matching can only be done when i have read in both sequences...

    if possible i want to do this later step whille i read in seq2...i don't want any substantial performance hit...
    Last edited by alvifarooq; 05-22-2005 at 12:52 PM.

  6. #6
    Registered User
    Join Date
    Aug 2003
    Posts
    470
    If you do this for all the letters, how do you store the multiple co-ordinates...what data structure would be the best...plus i want to make this a very fast process...and really really fast indeed cause I need to use these co-ordinates later on and I might be scanning in sequences as long as 2000 to 5000 characters long!
    You could use two arrays, maybe "std::vector<int> xvalue[256]" and "std::vector<int> yvalue[256]". For instance, xvalue['A'] would be a vector stpring all posible x coordinates for A. yvalue['A'] would be a vector storing all possible y coordinates. Combining all combinations of the above and you have all possible points. Given your character data is only letters this setup should work.

  7. #7
    Registered User
    Join Date
    Apr 2005
    Posts
    22
    You could use std::multimap<char, COORD>.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. pthread question how would I init this data structure?
    By mr_coffee in forum C Programming
    Replies: 2
    Last Post: 02-23-2009, 12:42 PM
  2. Data structure implementation
    By fkheng in forum C Programming
    Replies: 3
    Last Post: 07-31-2003, 07:44 AM
  3. can't insert data into my B-Tree class structure
    By daluu in forum C++ Programming
    Replies: 0
    Last Post: 12-05-2002, 06:03 PM
  4. Tab Controls - API
    By -KEN- in forum Windows Programming
    Replies: 7
    Last Post: 06-02-2002, 09:44 AM
  5. Dynamic Data Structure -- Which one is better?
    By Yin in forum C++ Programming
    Replies: 0
    Last Post: 04-10-2002, 11:38 PM