I need an algorithm. But I don't know which one.

**Dino** · 01-17-2011

In my spare time, I write extensions for a 3D modeling program. I have one extension that is in serious need of a rewrite. It is way, way too slow and not at all optimized. (It's not C, but Ruby, but I would consider rewriting it in C)

The task my extension performs is to analyze a 3D or 2D (usually just a 2D in situ) model (consider a topography or a poorly drawn house foundation plan) and find all the line segments that do not touch other line segments at their ends (either or both). See attached examples. topo1 is a view of a lot with elevation lines. topo2 is the same screen shot, with the open line segments (an open line segment == a vertex with only one line coming off of it) labeled and numbered. In this view, for example, I need to determine that point 22 needs to be connected to point 25 because it is "closer" than all the other points in the set of points I am analyzing. And, for another example, points 20 and 21 need to connect, and 23 and 24 need to connect.

Along the same lines, I would also like to detect that since point 25 is also a "perimeter" point, 25 needs to also be connected to 28. (but this would be icing on the cake)

My current algorithm (if it could be called that... yuk) is to make an array of all the vertices with only one line connected to them.

Then, I start with the first vertex in my array, and calculate the distance to all the other vertices in my array, and when the closet vertex is found is found, I draw a line to connect these two vertices, if and only if that new line will not cross another existing line, or a line I have just drawn due to connecting previous vertices) in the process. It has to be a straight shot without crossing any other lines. I do a raytest to determine this - works' fine, but this too adds to the long elapsed times. If an obstruction is found, I get the second-closest vertex and see if that will work, etc.

After the first vertex is processed and a new line has been drawn, I take the second vertex in the array, and run the array again (from array element #3) to find the distance to its closest vertex, then test for no obstruction, then draw a line, and I keep repeating this until I am done. With a few vertices (think in the dozens), performance is fine. However, when the number of vertices grows the to the hundreds or even thousands, my routine runs FOR EVAR, as I am sure you can imagine.

Vertices get added to my initial array in willy-nilly fashion, just as you see the pseudo-random labeling added in topo2.

What is the nature of the algorithm I need in this application to speed it up? I just had a user have to cancel his 3D modeling program after my algorithm ran for 48 hours and did not finish.

**Epy** · 01-17-2011

Not to derail the conversation, but rewriting it in any other language will be an improvement. Ruby is like the slowest language ever according to Computer Language Benchmarks Game. Even if the figures aren't accurate, it's pretty clear that Ruby is very slow.

The one thing I can say is that you could skip the square root part of calculating the distance, just extra overhead since you don't care about the number itself.

**Dino** · 01-17-2011

Actually, there's a callback back into the 3D modeling package that calculates the distance for me, so perhaps I can calculate the distance myself without performing the callback, and without performing the square root. Good point.

Using ruby in this is scenario is the easy way out for me. It makes install of my extension brainless. Otherwise, I need to produce two different executables (Win and Mac - not a huge deal) and now my download process and install process would (comparatively speaking) be a nightmare... unless I wrote an installer for both Win and Mac. It's amazing how many people can't follow 3 lines of written instructions.

**tabstop** · 01-17-2011

I imagine you want to be pretty general, but I'm thinking perhaps some sort of partitioning might be beneficial. In the given example, left-side vs middle vs right-side would be the choice-in-hindsight. What you might do is make a list of lists; when a new vertex comes in, you check the head elements of each list; if it's not sufficiently close to any of them you start a new list. [If it's not too much of a bottleneck, you can go through every element of the list, and as long as you're sufficiently close to one of them then add to that list.] Then when you are trying to create lines, you can just look at the list you're in and only try a different list if none of them work.

**Epy** · 01-17-2011

Originally Posted by Dino

Actually, there's a callback back into the 3D modeling package that calculates the distance for me, so perhaps I can calculate the distance myself without performing the callback, and without performing the square root. Good point.

I would time both (guess that's stating the obvious). Assuming the parent program is written in a fast compiled language, it might be faster to call its distance function instead of pulling the values and doing the math in Ruby. Then again, if you already have the points in Ruby, it might be faster there.

**Dino** · 01-17-2011

Here's an example that eats my lunch. The zoomed in shot is the highlighted square in the overall image.

**Mario F.** · 01-17-2011

Hmm... those last images, I think, put it into a better perspective. Almost looks like you want some sort of morphology filter or perhaps edge detection?

I lack the mathematical prowess to help you. But had I this hot potato on my hands, I'd probably start looking into image filters algorithms, particularly dealing with morphology. After some searching, for instance, I ended up finding a somewhat standard method to deal with broken lines; apparently called Closure, consists in applying a dilation filter, followed by an erosion one. There's also edge detection techniques like Sobel and Canny that can be used to enhance edges (remove holes).

**Shakti** · 01-18-2011

How about something along these lines:
1) partition an image into suitable chunks (for 2D you first divide into 4 pieces then each of those pieces into 4 pieces and so on, the 3D case 8 pieces).

2) For each of these pieces you start from the center of the chunk and work your way outward (start processing the vertex closest to the center so some sorting here when partitioning and building the list of vertices)

3) Search for the closest vertex, if the vertex that is processed lies closer to the edge of the chunk you include that neighbouring chunk when searching for the closest vertex (obviously this can be up to 4 chunks in 2D)

4) When completely done with one chunk repeat with next chunk

I would think something like this could possibly lower the complexity quite neatly.

**Dino** · 01-18-2011

I found an algorithm similar to the one you just described. In a dusty book I have it's called a Plane Sweep algorithm. I may play with it some this weekend to work it out and give it a test drive.

FWIW, for the last two images I posted, I started my code on that topo last night, ~8pm. It's still running today and has not registered 1% complete yet.

**Shakti** · 01-18-2011

Yah with that method, as the number of vertices per chunk approaches 1 the complexity approaches O(n) for the actual connection method, if my mind isn't betraying me. Of course you need to make a tradeoff since the building of the chunks will take more time.

**Dino** · 01-18-2011

Originally Posted by Shakti

Yah with that method, as the number of vertices per chunk approaches 1 the complexity approaches O(n) for the actual connection method, if my mind isn't betraying me. Of course you need to make a tradeoff since the building of the chunks will take more time.

And that's a good thing?

Signed, Complexity Challenged.

**Shakti** · 01-18-2011

Yes, i think your current implementation is O(n^2), and in big-O notation you always want n as small as possible (log(n) is better than n for instance). Read more here http://en.wikipedia.org/wiki/Big_O_notation

**Dino** · 01-20-2011

OMG.

I have done an initial implementation of the plane sweep algorithm. Holy Cow. It is so much faster.

Before the rewrite, on the 100 meter x 100 meter square of the large topo I pasted above, my old routine processed for > 8 hours, and it had not registered even 1% complete.

Last night I ran the new algorithm, and to process the same square, I watched as it took 18 minutes. And that's still in Ruby.

I still have another optimization to code, which should speed it up quite a bit more.

**GReaper** · 01-20-2011

The bigger the 'O', the more the pain!

Getting rid of O(n^2) is essential when speed is an issue. Well done!

**Dino** · 01-20-2011

Well, I just fixed a bug in my code.

My bug was that I was sorting the array of points, but I was using the unsorted array when looking for the closest pair. Duh.

I fixed my code to properly used the sorted array, and I just took my 18 minutes down to 52 seconds. YESSIRREEEE! This is amazing.

Tonight, I will make the second performance improvement that I was planning. Cant' wait!!

Thread: I need an algorithm. But I don't know which one.

Thread Tools

Search Thread

Display

I need an algorithm. But I don't know which one.

Similar Threads

Implement of a Fast Time Series Evaluation Algorithm

Euclid Algorithm (extended)Part2 Doubt about pointers - INITIALIZATION

Binary Search Trees Part III

Request for comments