Nearest Neighbor search for high dimension, slow insert

This is a discussion on Nearest Neighbor search for high dimension, slow insert within the Tech Board forums, part of the Community Boards category; Hello all, I have a mostly static set of high-dimensional data. Given a query point I want to find (exact) ...

  1. #1
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262

    Nearest Neighbor search for high dimension, slow insert

    Hello all,

    I have a mostly static set of high-dimensional data. Given a query point I want to find (exact) the nearest neighbor of this point. As the set is static, insertion time can be relatively slow (N^3 would be okay-ish still), but I need the query to be as quick as possible.
    What are some good algorithms for this? Any ideas?


    Thanks in advance,
    EVOEx

  2. #2
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,160
    I think the usual method is to subdivide the space recursively by hyperplanes, down to some fixed depth, or some fixed number of points in each subspace. To find the nearest neighbor, you find the closest subspace, then the closest sub-subspace, etc until you hit a leaf, then you exhaustively test all the points in that subspace. For a concrete example see Octree, which is what I'm talking about if the number of dimensions is 3. You have more dimensions but the idea is the same.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  3. #3
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Thanks Brewbuck. But isn't it that the same as a kd-tree (that only splits it in two segments every subdivision, but the algorithm in essence would be the same, would it not)?

    However, the kd-tree wikipedia article contains the following:
    kd-trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is k, then number of points in the data, N, should be N >> 2k. Otherwise, when kd-trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search,[3] and approximate nearest-neighbour methods are used instead.
    So am I missing a fundamental difference between kd-trees and "octrees"? Or is that wikipedia quote only true in worst case?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 04-16-2011, 04:44 AM
  2. Binary Search Tree Insert
    By bleuz in forum C++ Programming
    Replies: 5
    Last Post: 04-30-2010, 09:53 AM
  3. Help with insert/delete binary search tree
    By Nazgulled in forum C Programming
    Replies: 39
    Last Post: 03-25-2009, 04:24 PM
  4. Nearest Neighbor
    By CheyenneWay in forum C++ Programming
    Replies: 7
    Last Post: 08-10-2004, 08:19 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21