Discussions

Performance and scalability: Large Query Caching for slow DB's

  1. Large Query Caching for slow DB's (8 messages)

    I have the need to implement a JDBC query caching solution for large queries (million+ record returns.) My application interfaces (to among others) a Legacy AS/400 application on the backend using JDBC, which takes 20+ seconds to return results to simple queries durring peak times. Once the query starts to return data, I can read records fairly fast.

    I am looking for a way to either do "transparent" jdbc caching, where maybe a middleware application transparently leaves the ResultSet open for a little while (5 minutes), for if the same sql statement is rerun. This would allow for not having to rerun the query from scratch each time data is needed (think search engine page1 -> page 2 -> page 3 type stuff.)

    Other option is some product that can do readahead caching of "pages" (load the first 25 pages of 25 results, start trying to get pages 26-50 when page 20 is accessed. I know I could build such a solution, but I would like to know what my options are for purchasing a product, or using an existing open source project, as we don't have the manpower on staff to do this ourselves anytime soon.

    Thanks for any suggestions,

    -Pete
  2. Large Query Caching for slow DB's[ Go to top ]

    I would prefer a JDBC solution, since it would be a drop in upgrade, with no application modifications.

    -Pete
  3. Pete,

    A nice solution would be to use the Sun CachedRowSet object. This allows you to drop in a whole ResultSet object and then close the connection to the DB without losing the data (as its cached - surprise, surprise :-) ).

    Then, to reduce your access to your DB, create a hash of the SQL query (just use the .hashCode() method on the String class) and use this as a key to store the CachedRowSet objects in a Hashtable. Before making a DB call, see if your query is in the Hashtable, if it is then use it, otherwise make your call and cache the results.

    Have a thread that clears the cache every 5 (or whatever) mins and you are set! Just be sure to watch your synchronization.

    Cheers,
    Darren.
  4. One other thing ...

    Watch you memory usage. If your query returns a large result set then you may encounter memory problems. A WeakHashMap could be handy here or heap of RAM (perdon the pun).

    Cheers,
    Darren.
  5. I don't believe I can use the CachedRowSet in this case due to the extremely large datasets that I have to work with. CachedRowSet caches the whole ResultSet.

    -Pete
  6. Large Query Caching for slow DB's[ Go to top ]

    Hi,

    Another solution may be to generate the results of the query (as an HTML String) on the server BEFORE the client requests it and then just send it back to the client when they do. This could be done at regular intervals. Of course this depends on the of the complexity of the query e.g. this solution may work if its a simple:

    SELECT * FROM MY_TABLE
     
    The results of this query could be generated at regular intervals as an HTML String (in memory or in a file) on the server and then just dumped back to the client when they need it. It also depends on how "fresh" the client needs the information to be.

    But this solution may not be feasible at all...

    Smythe

     
  7. Large Query Caching for slow DB's[ Go to top ]

    The results returned are the result of a custom query the user enters. I can't predict what that will want to search for. In most cases, I am searching a table of 20 million records.

    -Pete
  8. Large Query Caching for slow DB's[ Go to top ]

    I had a similar problem on comparable datasets with - caching query results is just impractical.

    What we've done, however, is implement an LRU cache that gets the results of a new query and caches just the primary keys in a List

    A new query still results in a big hit, but subsequent calls to the same query can simply pick elements x to y from the list, and extract them from the database by primary key.

    So you basically don't have to run your entire query every time, have a negligable caching overhead (configurable in fact, as the LRU cache can be tuned depending on the system), and get your results back a lot faster as you can explicitly select pages of results by PK (not sure if you're paging your output).

  9. Large Query Caching for slow DB's[ Go to top ]

    Well, I would leave the Query part to database servers only. You can not cache a whole lot of rows due to obvious memory problems. And besides, if you repeat the query, the execution plan is ready with the DB servers and next time around, the DB server would be significantly faster in fetching the rows.
    What I would suggest to improve the performance here is, try to use the fragmentation of tables, logically split the data in the table across fragments and make sure that the fragments are more or less evenly populated, so that you can limit the search by DB engine on a particular fragment only. Also, use stored procedures if possible, in which case, the execution plan for DB will be ready even before you fire the query. If you can combine both Fragmentation and use of stored procedures, you will gain significant improvement in performance.
    Probably, you should be looking to improve the performance of SQL rather than caching.