Discussions

EJB design: Classical Problem of Performance - Loading Many to Many Relation

  1. Hi There,

    I am facing a classical problem of loading many to many relationship information using JDBC.

    The problem goes like this...

    There is a nxm relationship between Task and Element connected using Task_element table. Now I am supposed to load both Task and Element data in database.

    Typically a Task can be associated with 3 to 5 Elements. Element is not a master data as such but can be considered as a master data for a particular user section. (Frequency of change in ELements is once a month).

    Now when I load this information as a typicall JDBC way, I have two options.

    1. Write a Join on all 3 tables and get complete information.

    Catch- Catch here is that no. of tasks are more than 2000. Hence no. of rows returned fromt he database are 2000 x 4 (average no of elements associated with each task) = 8000 records. Since task table has more than 15 fields to be retrieved and element has got 5 more so total columns retrieved from database are 8000 x 20 = 1,60,000. This causes exessive network traffic. Usually this complete operation takes 8-12 sec. based on no. of concurrent users.


    2. Write an Oracle function to concatenate element_ids associated with the task and return it with each task, whcih will be then parsed in the Weblogic server and element ids will be retrieved.

    Catch - The function will fire a query to get the element ids associated with the task. This means for 2000 rows it will fire 2000+1 queries to get the complete information. Which anyway will increase load on database.

    My problem seems to be a very classic and recurring problem as most of the complex applications have such relationships.

    When amount of data increases the performance degrades.

    Could someone help in finding a reasonable solution for this problem.

    Thanks in advance.

    Shiv
  2. I think option 1 is going to be as good as it is going to get. I think you have several choices:

    1. Try to optimize your query further. Maybe by indexing your tables better you can get the query to run in a reasonable amount of time.

    2. Cache the query results in memory. Assuming that the list of tasks does not change frequency, maybe you can get away with running the query once per user login/logout (or something like that) and otherwise leaving the list fixed.

    3. Maybe you can figure out queries that determine deltas on your data system to improve data caching: add tasks for just the last user that logged in, and remove tasks for that user when they log out.
  3. more ideas[ Go to top ]

    You need to be a little more specific here.

    1. Are you loading this data to be viewable by a human being because if so you can easily add paging so that you load only X rows at a time.

    2. Aside from using a view you can also provide quicka and efficient loading using a SQL IN clause, i.e. load one parent table row, load this mapping row and then load the other parent rows using the IN clause and the ids. This can also be done for multiple records at a time and is efficient, especially when combined with a cache which performs object id to object mapping.

    3. Lazy loading can solve this problem as well. Just because you have lots of data does not mean you should retreieve it all.

    Hope this helps!

    Bruce