Thread: String Construction Algorithm Wanted...

  1. #1
    Anirban Ghosh
    Join Date
    Jan 2006
    Posts
    278

    String Construction Algorithm Wanted...

    Suppose I have a string like the following....

    mynameisninjailiveinaustraliailikecricket

    Now I break the string into overlapping substrings such as

    Code:
    mynameis      livein 
      nameisninjailive
                  liveinaustralia
                        australialike
            ninjaili             likecrick
                           tralialike   cket
                      inaustra
    Now my aim is to reconstruct the original string from the overlapping substrings. If not possible to reconstruct the original when multiple solutions exist, then the resultant string must be the shortest. May I get any algorithm to attack this problem please....

  2. #2
    The larch
    Join Date
    May 2006
    Posts
    3,573
    What have you tried?

    I think you'll need to match substrings of the fragment with the full string (you got so far). Go from longer to shorter (probably) to pick a larger overlap in case of multiple solutions.

    Edit: This is not a programming challenge again? In this case the solution probably isn't as trivial as that...

    But doesn't your graph give you some ideas? Finding a way how to align strings and to recurse until you've put together all the pieces...
    Last edited by anon; 08-06-2007 at 12:28 AM.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  3. #3
    Anirban Ghosh
    Join Date
    Jan 2006
    Posts
    278
    No no why should I post programming challenges just without reason! The assignments which my college Sir has given me are too intricate for me to solve! That is why I am asking for your help! Moreover fortunately when I am searching online for algorithm luckily sometimes I get that these are international problems - now what should I do! Hide from you all the truth? I hope not!

    Sorry can you please elaborate it please!!!!!!!

  4. #4
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    1) Start with initial string.
    2) Take next string, and diff with string, giving you the letters which are the same in both strings.
    3) Fill in spaces of the resulting string appropriately.
    4) Make resulting string intial string.
    5) Repeat until no strings are left.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  5. #5
    The larch
    Join Date
    May 2006
    Posts
    3,573
    I'm afraid the assignment is not as simple as that.

    What if the two words don't have any parts in common? (You can't combine them because you'll have no idea what's the best place for the second string. You'd probably need to leave the second string aside for the time being.)

    It might also be possible that this approach won't give you the best result. Suppose the first letter of a string == the last letter of another string. If you happily combined them, you might already miss a better solution where these two strings don't share any common characters.

    I suppose a brute-force backtracking method might overcome the second problem. You might probably start with the longer strings first (these might have longer overlapping segments?) and leave a search branch early if the resulting string is already longer than the solution found so far. (Some variant of A*?) At least, this way you should be able to crack the problem, if not in an suboptimal way.

    This assignment seems to belong to an advanced course on algorithms, so the OP should have better knowledge of available strategies than I do. Therefore ...
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  6. #6
    Registered User manofsteel972's Avatar
    Join Date
    Mar 2004
    Posts
    317
    If you are given the starting string, I think you could just create a result buffer of the same length, go through each of the broken up strings and find if it is a substring of the original. Keep track of which is the shortest. If the string is found then using the position where it was found copy it to the result buffer. Then once you are done going through all the strings, compare the original string to the one you created.

    [Edit]

    If there are multiple occurances of the substring, you would need to find them all? so you will need to search to the end of the string and find each occurance.
    Last edited by manofsteel972; 08-06-2007 at 10:11 AM.
    "Knowledge is proud that she knows so much; Wisdom is humble that she knows no more."
    -- Cowper

    Operating Systems=Slackware Linux 9.1,Windows 98/Xp
    Compilers=gcc 3.2.3, Visual C++ 6.0, DevC++(Mingw)

    You may teach a person from now until doom's day, but that person will only know what he learns himself.

    Now I know what doesn't work.

    A problem is understood by solving it, not by pondering it.

    For a bit of humor check out xkcd web comic http://xkcd.com/235/

  7. #7
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    [Edit: Ok people, he's not given the starting string -- if he merely had to reconstruct that, the problem would be trivial, and he isn't doing that, because he has to find the shortest string that could possibly be the starting string, which makes the starting string useless as it might as well be the concatenation of the substrings he has been given.]

    [Edit 2: Consider this post to be a dumb solution.]

    First, you need to identify the problem that you really want to solve.

    Essentially, you have a finite set, S, of k elements, with a function, f : S \times S -> N (where N is the set of natural numbers >= 0). Of course f (x,y) is the overlap you can get from overlapping x and y with x on the left and y on the right -- with the stipulation that if y is a substring of x, f (x,y) = length(y). For example, f("heyoyo", "yoyofolomado") = 4, and f("abcde","bc") = 2. Then you want to find the sequence [s_1, ..., s_k], where s_1 .. s_k all distinct strings that comprise the set S, that maximizes the sum, f(s_1,s_2) + f(s_2,s_3) + ... + f(s_{k-1}, s_k).

    That's your real problem, and it has nothing in particular to do with strings; you're finding the longest path that visits all the nodes of a graph.
    Last edited by Rashakil Fol; 08-06-2007 at 02:48 PM.
    There are 10 types of people in this world, those who cringed when reading the beginning of this sentence and those who salivated to how superior they are for understanding something as simple as binary.

  8. #8
    Anirban Ghosh
    Join Date
    Jan 2006
    Posts
    278
    Well ,great thanks to you!!! But if could please slightly elaborate the procedure a bit more elaborate I shall be greatly benefited! Also if I map the problem to a graph theory problem of finding longest path, what would be the nodes of the graph! Please do elaborate!

  9. #9
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Um, it's all there in the post above.

    Edit:

    Also, what I've described is not necessarily the best solution.

    Edit2:

    Definitely not. There's just so much structure to this problem.

    Edit 3: Am I just being an idiot?
    Last edited by Rashakil Fol; 08-06-2007 at 02:56 PM.
    There are 10 types of people in this world, those who cringed when reading the beginning of this sentence and those who salivated to how superior they are for understanding something as simple as binary.

  10. #10
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    I'm actually pretty sure, now that Rashakil Fol has outlined it that way, that the problem seems to be a TSP variant, which makes it NP-Complete, which basically says that there won't be an efficient, all-pupose solution (although you may approximate it rather well).
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  11. #11
    Anirban Ghosh
    Join Date
    Jan 2006
    Posts
    278
    Then what should I do now.....

  12. #12
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    I just said there was no efficient all-purpose solution. If you keep the string set small, you're likely never to see much of a problem from an efficiency perspective.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  13. #13
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    I don't know. I get the feeling there might be an efficient solution, or at least a very slow growing exponential solution, and it's probably just a backtracking one that doesn't bother going through all the permutations of non-overlapping strings. The graph I described would be incredibly sparse, and it contains structure. For example, if f(x,y) > f(x,z) > 0, then if f(w,y) < f(x,y), then f(w,z) = max {0, f(w,y) - [f(x,y) - f(x,z)]}.
    There are 10 types of people in this world, those who cringed when reading the beginning of this sentence and those who salivated to how superior they are for understanding something as simple as binary.

  14. #14
    The larch
    Join Date
    May 2006
    Posts
    3,573
    To begin with you might throw out strings that are entirely contained in some other string. Show some initiative...
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > Then what should I do now.....
    Try it and find out - you're not going to really learn anything by just yapping about it or trying to manoeuvre someone else into doing it by continual begging.

    Whether you succeed or fail doesn't matter, you'll still learn a lot from simply trying.

    Q: How do I avoid making mistakes?
    A: Mistakes are avoided through past experience.
    Q: How do I gain experience?
    A: Through lessons learnt from your past mistakes.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Please check my C++
    By csonx_p in forum C++ Programming
    Replies: 263
    Last Post: 07-24-2008, 09:20 AM
  2. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  3. Linked List Help
    By CJ7Mudrover in forum C Programming
    Replies: 9
    Last Post: 03-10-2004, 10:33 PM
  4. creating class, and linking files
    By JCK in forum C++ Programming
    Replies: 12
    Last Post: 12-08-2002, 02:45 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM