How to handle huge data set?


EJB design: How to handle huge data set?

  1. How to handle huge data set? (5 messages)

    My requirement is to load approximately a million records, which are to be displayed to the client (pagination is required). Client can work on the records, so its not a read only data. What is the best design pattern that should be applied?

    I have gone through the ValueListHandler/Fast Lane Reader. But these won't work in such a scenario, as these load the complete set on server. I don't want to use database specific strategy, as we plan to migrate to Oracle from SQL Server in near future.

    Threaded Messages (5)

  2. How to handle huge data set?[ Go to top ]

    You should perform a search that loads only the ids of the records and keep this list of ids. Then when browsing through the data you should obtain from the database only substes of full records for the current page. Keeping only the ids is a reasonable choice (as oposed to keeping the full records) for a up to a few hundred thousands records. Keeping a list if 1000000 ids is tricky because it takes a few MB of memory. However, having a user interface where you can have a paged navigation on 1000000 records seems an overkill; who would have the patience to browse to so many pages ?. My reccomandation is to provide more filters that restrict the number of records and make the navigation more user friendly.
    Another approach would be the one described in this thread:
    by Sreekanth Munigati, but this involves an Oracle specific abordation.
    Best regards, Mircea
  3. How to handle huge data set?[ Go to top ]

    Hi Mircea,
    The application is a banking application where transactions can go upto that figure on some search criteria. Its not that every time a search is going to result in a million records. Keeping of IDs is a good option, but as I had stated in my earlier message that the records are not read-only records, user can perform some action on the complete set of records fetched. Moreover, there can be concurrent users performing some or other operation. How to keep the records in sync with the changes happening in database? Entity bean is going to be a performance issue in such a large record set!
    Best Regards,
  4. How to handle huge data set?[ Go to top ]

    Hi Amit,
    You may have to go for database specific query to fetch the records depending upon the page no. user is on. This way you can avoid loading the whole recordset.
    Regarding concurrency issue, you will have to use pattern like having version no. for a record and while updating check the latest version and increment the version by one.

  5. How to handle huge data set?[ Go to top ]

    Hi Badrish:
    We are using version number pattern, but the issue is with the deleted/inserted records. When do I refresh the IDs list, if I am storing the list of IDs that satisfy search criteria?

    Is there any way in SQL Server where I can fetch records as a data chunk from the complete result set?

    I had been thinking of using Value List handler pattern along with some database specific query, where the query returns me say a 1000 records and the user is displayed say 200 records in a page. This saves me database hits every time I request a new page, as I already have a cache from where I can return the next page. I do a database hit again when I am returning the last 200 records in the cache, to refresh the cache. How do I keep data consistency, and I am not able to find anything in SQL Server which can give a data chunk from the desired recordset.

  6. How to handle huge data set?[ Go to top ]

    I have encountered the same problem of concurent access when deleting records.
    When a concurent access problem is detected the application should inform the user and let him/her make a decission. A 'refresh' fucntionality would be nice too. For example, when requesting records that have been deleted in the meantime from the database by other users an error/warning message is displayed in that row. Instead of a real row, the error/warning row contains some user friendly mesaage like : "this row has been deleted, please refresh the serach."
    The main idea is that the application should not try to make any automatic decission in case of concurent access problems, but inform the user and offer some choices.
    Best regards, Mircea