Thread: I need an algorithm. But I don't know which one.

  1. #1
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332

    I need an algorithm. But I don't know which one.

    In my spare time, I write extensions for a 3D modeling program. I have one extension that is in serious need of a rewrite. It is way, way too slow and not at all optimized. (It's not C, but Ruby, but I would consider rewriting it in C)

    The task my extension performs is to analyze a 3D or 2D (usually just a 2D in situ) model (consider a topography or a poorly drawn house foundation plan) and find all the line segments that do not touch other line segments at their ends (either or both). See attached examples. topo1 is a view of a lot with elevation lines. topo2 is the same screen shot, with the open line segments (an open line segment == a vertex with only one line coming off of it) labeled and numbered. In this view, for example, I need to determine that point 22 needs to be connected to point 25 because it is "closer" than all the other points in the set of points I am analyzing. And, for another example, points 20 and 21 need to connect, and 23 and 24 need to connect.

    Along the same lines, I would also like to detect that since point 25 is also a "perimeter" point, 25 needs to also be connected to 28. (but this would be icing on the cake)

    My current algorithm (if it could be called that... yuk) is to make an array of all the vertices with only one line connected to them.

    Then, I start with the first vertex in my array, and calculate the distance to all the other vertices in my array, and when the closet vertex is found is found, I draw a line to connect these two vertices, if and only if that new line will not cross another existing line, or a line I have just drawn due to connecting previous vertices) in the process. It has to be a straight shot without crossing any other lines. I do a raytest to determine this - works' fine, but this too adds to the long elapsed times. If an obstruction is found, I get the second-closest vertex and see if that will work, etc.

    After the first vertex is processed and a new line has been drawn, I take the second vertex in the array, and run the array again (from array element #3) to find the distance to its closest vertex, then test for no obstruction, then draw a line, and I keep repeating this until I am done. With a few vertices (think in the dozens), performance is fine. However, when the number of vertices grows the to the hundreds or even thousands, my routine runs FOR EVAR, as I am sure you can imagine.

    Vertices get added to my initial array in willy-nilly fashion, just as you see the pseudo-random labeling added in topo2.

    What is the nature of the algorithm I need in this application to speed it up? I just had a user have to cancel his 3D modeling program after my algorithm ran for 48 hours and did not finish.
    Mainframe assembler programmer by trade. C coder when I can.

  2. #2
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    Not to derail the conversation, but rewriting it in any other language will be an improvement. Ruby is like the slowest language ever according to Computer Language Benchmarks Game. Even if the figures aren't accurate, it's pretty clear that Ruby is very slow.

    The one thing I can say is that you could skip the square root part of calculating the distance, just extra overhead since you don't care about the number itself.

  3. #3
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Actually, there's a callback back into the 3D modeling package that calculates the distance for me, so perhaps I can calculate the distance myself without performing the callback, and without performing the square root. Good point.

    Using ruby in this is scenario is the easy way out for me. It makes install of my extension brainless. Otherwise, I need to produce two different executables (Win and Mac - not a huge deal) and now my download process and install process would (comparatively speaking) be a nightmare... unless I wrote an installer for both Win and Mac. It's amazing how many people can't follow 3 lines of written instructions.
    Mainframe assembler programmer by trade. C coder when I can.

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    I imagine you want to be pretty general, but I'm thinking perhaps some sort of partitioning might be beneficial. In the given example, left-side vs middle vs right-side would be the choice-in-hindsight. What you might do is make a list of lists; when a new vertex comes in, you check the head elements of each list; if it's not sufficiently close to any of them you start a new list. [If it's not too much of a bottleneck, you can go through every element of the list, and as long as you're sufficiently close to one of them then add to that list.] Then when you are trying to create lines, you can just look at the list you're in and only try a different list if none of them work.

  5. #5
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    Quote Originally Posted by Dino View Post
    Actually, there's a callback back into the 3D modeling package that calculates the distance for me, so perhaps I can calculate the distance myself without performing the callback, and without performing the square root. Good point.
    I would time both (guess that's stating the obvious). Assuming the parent program is written in a fast compiled language, it might be faster to call its distance function instead of pulling the values and doing the math in Ruby. Then again, if you already have the points in Ruby, it might be faster there.

  6. #6
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Here's an example that eats my lunch. The zoomed in shot is the highlighted square in the overall image.
    Mainframe assembler programmer by trade. C coder when I can.

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Hmm... those last images, I think, put it into a better perspective. Almost looks like you want some sort of morphology filter or perhaps edge detection?

    I lack the mathematical prowess to help you. But had I this hot potato on my hands, I'd probably start looking into image filters algorithms, particularly dealing with morphology. After some searching, for instance, I ended up finding a somewhat standard method to deal with broken lines; apparently called Closure, consists in applying a dilation filter, followed by an erosion one. There's also edge detection techniques like Sobel and Canny that can be used to enhance edges (remove holes).
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    How about something along these lines:
    1) partition an image into suitable chunks (for 2D you first divide into 4 pieces then each of those pieces into 4 pieces and so on, the 3D case 8 pieces).

    2) For each of these pieces you start from the center of the chunk and work your way outward (start processing the vertex closest to the center so some sorting here when partitioning and building the list of vertices)

    3) Search for the closest vertex, if the vertex that is processed lies closer to the edge of the chunk you include that neighbouring chunk when searching for the closest vertex (obviously this can be up to 4 chunks in 2D)

    4) When completely done with one chunk repeat with next chunk

    I would think something like this could possibly lower the complexity quite neatly.
    Last edited by Shakti; 01-18-2011 at 09:57 AM.

  9. #9
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I found an algorithm similar to the one you just described. In a dusty book I have it's called a Plane Sweep algorithm. I may play with it some this weekend to work it out and give it a test drive.

    FWIW, for the last two images I posted, I started my code on that topo last night, ~8pm. It's still running today and has not registered 1% complete yet.
    Mainframe assembler programmer by trade. C coder when I can.

  10. #10
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Yah with that method, as the number of vertices per chunk approaches 1 the complexity approaches O(n) for the actual connection method, if my mind isn't betraying me. Of course you need to make a tradeoff since the building of the chunks will take more time.
    Last edited by Shakti; 01-18-2011 at 10:34 AM.

  11. #11
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Quote Originally Posted by Shakti View Post
    Yah with that method, as the number of vertices per chunk approaches 1 the complexity approaches O(n) for the actual connection method, if my mind isn't betraying me. Of course you need to make a tradeoff since the building of the chunks will take more time.
    And that's a good thing?

    Signed, Complexity Challenged.
    Mainframe assembler programmer by trade. C coder when I can.

  12. #12
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Yes, i think your current implementation is O(n^2), and in big-O notation you always want n as small as possible (log(n) is better than n for instance). Read more here http://en.wikipedia.org/wiki/Big_O_notation

  13. #13
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    OMG.

    I have done an initial implementation of the plane sweep algorithm. Holy Cow. It is so much faster.

    Before the rewrite, on the 100 meter x 100 meter square of the large topo I pasted above, my old routine processed for > 8 hours, and it had not registered even 1% complete.

    Last night I ran the new algorithm, and to process the same square, I watched as it took 18 minutes. And that's still in Ruby.

    I still have another optimization to code, which should speed it up quite a bit more.
    Mainframe assembler programmer by trade. C coder when I can.

  14. #14
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,739
    The bigger the 'O', the more the pain!

    Getting rid of O(n^2) is essential when speed is an issue. Well done!
    Devoted my life to programming...

  15. #15
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Well, I just fixed a bug in my code.

    My bug was that I was sorting the array of points, but I was using the unsorted array when looking for the closest pair. Duh.

    I fixed my code to properly used the sorted array, and I just took my 18 minutes down to 52 seconds. YESSIRREEEE! This is amazing.

    Tonight, I will make the second performance improvement that I was planning. Cant' wait!!
    Last edited by Dino; 01-20-2011 at 08:46 AM. Reason: typo
    Mainframe assembler programmer by trade. C coder when I can.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Implement of a Fast Time Series Evaluation Algorithm
    By BiGreat in forum C Programming
    Replies: 7
    Last Post: 12-04-2007, 02:30 AM
  2. Replies: 4
    Last Post: 12-10-2006, 07:08 PM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM