« September 2009 | Main | December 2009 »

October 13, 2009

Hilbert R-trees and C#

My adventures with Silverlight have encouraged me to get up to speed with C# and .Net 3.5.  I’m dividing my time between Windows and Mono (and Moonlight) under Linux.

I’m reading “Pro C# 2008 and the .NET 3.5 Platform”, by Andrew Troelsen (ISBN-13: 978-1-59059-884-9), and can give it a good recommendation.

For my first project, I have decided to implement a Hilbert R-tree and see how well it compares with the overlapping Quad-Trees that we have now.

R-trees use nested dynamic rectangles to store information.  Because they are created based on the data which is stored in them, the structures should provide better utilization, that is no empty (wasted) nodes.

The difficulty with R-trees is that it is difficult to decide how to split data into child rectangles.  The goal is to put an equal number of objects into each of the new child rectangles and minimize overlap.

The Hilbert R-tree solves this problem by using a 2D Hilbert curve to index the data.  A Hilbert curve is a continuous fractal space-filling curve (wiki) which will visit all points of a k-dimensional grid without overlapping.

By partitioning the data using the Hilbert number of it’s center point, we can easily split a node into several child nodes.

The Hilbert R-tree also balances itself via node reinsertion (like the R* tree).

In any case, this seemed like an interesting task to use for exploring C#.

I started by porting my existing overlapping quad tree to C#.  Then I implemented a Hilbert R-tree, without the reinsertion logic, to see how well the splitting logic worked.

Testing Results

Using random data, with a limited size (to better emulate typical mapping data), the quad tree was about 4x faster for inserts.

The R-tree had about %25 fewer nodes.

Search time for the quad tree was slightly better.  I believe that this is due to the large amount of overlapping of the child nodes in the R-tree.

I modified the R-tree splitting to partition the data based on the largest distances between the Hilbert values, which increased the node count, but increased the search speed to roughly the same as the quad tree.

Next Step

The operation of spatial trees is incredibly dependant upon the data.  My tests are using randomly generated data, which is interesting, but ultimately useless.

Next, I’ll extract several hundred thousand objects from a dataset, and use them for performance testing.


Hosting by Yahoo!