Picking a Collection and Design for querying 1G Point objects

Discussions

Performance and scalability: Picking a Collection and Design for querying 1G Point objects



  1. Hi everyone, any chance I could get some design tips?

    I have one million Point objects where the x and y values are floats with 2 decimal places.  These points exist somewhere between a top left corner of (0,0) and a bottom right corner(400,200).  

    When the user specifies a top left and bottom right Point(for a rectangle)  (no slanted diaganol lines), I have to return all of my Point objects that are inside of the Rectangle specified by the user - in less than a second.

    I tried making 2 TreeSets -- one ordered by X and one ordered by Y, where they both use the same Point objects, (I'm avoiding instantiating double the points).  The 1st treeset is instantiated without problems, but when my application attempts instantiating the 2nd TreeSet, I get an out-of Heap memory exception.  If that were to have worked correctly instead, I was planning on using the "subset" method in TreeSet to get all the Point objects in the user's selected X range, .. and then vice-versa on the "Y" treeset for the points matching the user's specified Y rectangle values,.. and then I could take the intersection of those two result lists to arrive at a list with the points inside the user's specified rectangle.

    Tweaking the JVM memory is Not going to be an option.

    Is there a different design I should be  using?
    Thanks so much

    Threaded Messages (3)

  2. Persist it[ Go to top ]

    If you are having such records, try putting in database and querying it. If you don't want to use external database, try using derby. Putting all records in treeset is not a good idea. If it is not allowed then alternatively, you can use multi level indexing/dictionary feature.

  3. Why two sets?[ Go to top ]

    Why store all points in a collection (set) in the first place?

    Why not just just compute the points that are in the rect and put them in your set to return?

    For a 400 x 200 you only need two shorts to represent the x and y.  Floats take up more room and if you don't need floating point it's a waste of memory.  Pesonally I'd create a SmallPoint class with shorts if something doesn't already exist.

    Loop through the points for the rect in a simple double for loop and add the points to the rect.

    Most Rect classes have a point in rect method that returns true if the point is in rect.  It's easy enough to manually build this into two for loops without having to do this check though.

    Sometimes it's expensive to create a bunch of SmallPoint objects every update/loop.  I usually create some kind of PointManager which maintains a list of point objects which can be requested and recycled back to the manager when done.  This way you only ever create each SmallPoint once.  This takes extra effort to make sure you don't have leaks and the manager gets all the points recycled back to it before you start allocating new ones.  This is a little different in that you will have to reset your SmallPoint and set the values instead of just create a new one with the constructor.

    My guess using sets and new objects every time will slow you down on JVM memory cleanup.

    This approach has a balance of computation (cpu) verses memory.

     

  4. Why two sets?[ Go to top ]

    Why store all points in a collection (set) in the first place?

    Why not just just compute the points that are in the rect and put them in your set to return?

    For a 400 x 200 you only need two shorts to represent the x and y.  Floats take up more room and if you don't need floating point it's a waste of memory.  Pesonally I'd create a SmallPoint class with shorts if something doesn't already exist.

    Loop through the points for the rect in a simple double for loop and add the points to the rect.

    Most Rect classes have a point in rect method that returns true if the point is in rect.  It's easy enough to manually build this into two for loops without having to do this check though.

    Sometimes it's expensive to create a bunch of SmallPoint objects every update/loop.  I usually create some kind of PointManager which maintains a list of point objects which can be requested and recycled back to the manager when done.  This way you only ever create each SmallPoint once.  This takes extra effort to make sure you don't have leaks and the manager gets all the points recycled back to it before you start allocating new ones.  This is a little different in that you will have to reset your SmallPoint and set the values instead of just create a new one with the constructor.

    My guess using sets and new objects every time will slow you down on JVM memory cleanup.

    This approach has a balance of computation (cpu) verses memory.