Discussions

Performance and scalability: Fetching from 40,000 records using EJB

  1. Fetching from 40,000 records using EJB (10 messages)

    I have 40,000 records in a table, and it is taking too much time to fetch selected 100 records. We are using Ms-SQL server. Will it be feasible to use EJB for this purpose and if feasible will it improve the time taken to fetch the records ? Can anyone help me by replying...!

    Threaded Messages (10)

  2. Fetching from 40,000 records using EJB[ Go to top ]

    In likelihood, EJB will not help the performance of the query - EJB adds further overhead to the database call.

    What are you currently using to return the records? How long does the query take in the SQL Server Query Analyzer to execute? Have you optimised the database, specifying the correct indexes?



  3. Fetching from 40,000 records using EJB[ Go to top ]

    Thanks for the reply Toby,
    By your statement 'EJB adds further overhead' I believe you are talking about he RMI over head. What if we replace it with by using Local Interfaces for optimization purpose. Regarding the currently used mechanisam for returning records we are using data objects which are set after querrying through Ms-SQL server to Java program using JDBC(also using connection pooling). We are using the MVC architecture. The databases have also been completely optimized. Depending upon the number of records fetched the SQL Server Query Analyzer seems to take time. We may assume same time taken for by the application to fetch the records. Can Application server caching of beans in EJB give some benefits here to speed up the process. Actually there seems to be much debate within the J2EE community regarding bulk data access through EJB !!!
  4. Fetching from 40,000 records using EJB[ Go to top ]

    I suggest you to adopt the ValueListHandler pattern!

  5.  Check out ,
     Process-Display-Process (PDP) pattern. too .

    ~ @bhijit
  6. Hi:

    We faced the same problem even with fewer records around 12,000 records. They took so much time to be fetched on Oracle8i.

    What I suggest to you, is the following and it really depends on your architecture and environment:

    1. Use the Prepared Statement of JDBC, it proves some prefromance enhancements. (writing SQL Tuned).
    2. Depending on Data, you can cache those that are almost rarely changed, and Caching techniques on the Web Tier.

    3. Use the Concept of Mirror in an Elevator, where people keep on looking at their selives until they are finished from it. :)
     So, in programming, keep the user busy with some progress bar or something nice (Client Dependent), and simply use some sort of Batching the Returned Data to the Client, What I mean is : divide the 40,000 into smaller ones. (I know, it might take more network traffic). but will make the user more happy.

    hope , it helps you
  7. Fetching from 40,000 records using EJB[ Go to top ]

    The key is getting those records cached.

    1) Make a simple object that represents your table.

    2) At boot up select the 40,000 records, create a new instance of your object for each record and place it into a collection, I suggest a java Hastable. Key the Hashtable off of the primary key value of the record (If you are using any java Map make sure you wrap the key value in a Double, Long or some other java.lang.Numer class).

    3) Instead of making an sql hit when you need the records. Pull the objects from the Hashtable. I will discuss how to make your sql query work with an in memory object model later.

    4) The biggest pain will be maintaining the cache (especially if you are in a clustered environment). You have to know exactly what clients modify this table's records. This is a non-trivial task that we have implemented using an extra field called cache_dirty that was a flag set by a database trigger whenever any client updated/inserted a record. Then we added a polling thread to monitor when these flags were altered and
    it would update our cache. We used JMS to update all servers in our cluster (using Message driven beans to be transactional).

    Sounds complicated but we found it wass worth it. If you are not using an application server or even if you are not using J2ee this model will still work. You will however have to figure out where the cache will sit. For instance if you are using C++ then you could use the STL Map and place it in a COM object if you are in the MS world.

    Querying:

    The most simple way to fetch the objects out of your cache, is to iterate from top to bottom and retrieve those objects that meet your criteria. A more performant solution is to make the cache key off of your most important field. The value of the entry will be another collection that is keyed
    off of your next most important field etc. Nested collections will require you to place references in multiple places, which is OK because it will act more like an index now. Then at query-time you need only fetch based off of index values which should be fast.
  8. Fetching from 40,000 records using EJB[ Go to top ]

    The suggestion of caching each record into a collection of Java objects is a serious trade off. It _may_ provide better performance, depending on the application usage pattern (i.e. mostly read? writes require more work). However, the biggest alarm bells should be _scalability_.

    The technologies change, but the principles have the same over the last twenty years of enterprise applications.

    -
  9. Fetching from 40,000 records using EJB[ Go to top ]

    The suggestion of caching each record into a collection of Java objects is a serious trade off. It _may_ provide better performance, depending on the application usage pattern (i.e. mostly read? writes require more work). However, the biggest alarm bells should be _scalability_.

    The technologies change, but the principles have remained the same over the last twenty years of enterprise applications.

    The more resources you hold, the less scalable you will be. For scalability, you need as small a footprint as possible. Which is why the stateless session bean architecture is a popular implementation for larger systems.

    From an application point of view, do you really need all 40,000 rows in one hit? Smaller operations would be better is possible.

  10. Fetching from 40,000 records using EJB[ Go to top ]

    Thanks Lee for the comment,

    I had already specified in the original question posted that I am fetching selected 100 records from 40000 records not all 40000.
  11. Fetching from 40,000 records using EJB[ Go to top ]

    Fair enough.

    To re-iterate, there is still a dangerous trade off between performance and scalability, which is what you might bump into if you rely on a heavier footprint or state in your middle layers.

    I assume you have also optimised the database layer itself - e.g. using stored procedures, use of row count to reduce result set on database server, using forward-only read-only cursors for read operations...