Request for alternative pattern to Page by Page Iteretor

Discussions

J2EE patterns: Request for alternative pattern to Page by Page Iteretor

  1. Posible solution:
    Pass iterator data, like current row and number of rows to be displayed, to the finder/SessionBean method which then in turn only returns a subset (number of rows displayed) of the collection to the client.

    Advantages:
    Less memory usage (++) and less data transported from server to client.

    Drawbacks:
    that GUI logic is passed to business logic, EJB (and thus DB) access each time the next page is requested, all client are now tied into this extra interface constraint of passing GUI logic and if you want to have flexible sorting you need to pass the sorting constraints to the EJB layer as well.

    Any other solutions?

    Threaded Messages (50)

  2. What you just mentioned sounds like one of the actual implementations of the pattern to me. Which implementation you choose depends on a number of factors including how long the query takes to run, what the pagability of the data is (i.e. can you even randomly access a page of data using a query?), the total size of the data, whether your data is relevant to only the current user or a large group of users, security constraints, etc.
  3. Let me rephrase that: What I'm looking for is an alternative implementation for the Page by Page Iterator different from the one proposed in the J2EE blueprints. This alternative must be able iterate through a large number of rows with as less impact on memory and (in this case less important) performance as possible.

    The data is relevant to only the current user. I want to keep the EJB interface as clean as possible and preferably without presentation logic if possible.

    Are there any alternative implementations? If so where can I find them?
  4. Excellent. My personal favorite is to create an in-memory index of the primary keys rather than storing the entire dataset. This is cheaper to create and store. You can use it to query random pages of data. It's also good if you are sorting by an arbitrary column (you lazily create a different primary key index for each column). It's also really good if you need to randomize your data--just shuffle the index. Lastly, store the index using a SoftReference so that the memory can be reclaimed when necessary.
  5. I find that this in deed is the only viable alternative.

    I use a Query class to contain the details of a page by page query, including sorting criterias. I do not consider this to be die-hard GUI logic. I know that it indicates some sort of user interaction with the dataset, but the question is if a complete result set will ever be relevant to any client be it a human, another application or anything else ?

    Any query to a data set might return more data than you are willing to handle. The page by page mechanism is a flexible solution to this problem (as opposed to just truncating the result set). The client (be it human or other application) knows how much data it will get back and it has the option to page forward in the result set if it considers it more efficient than to restate the query.

    Two notes on this pattern: When you implement it, you should consider how you keep track of the row numbers / page boundaries when the data set changes in the database. Say you retrieve page 1 which is result 0 to 24. Then before you retrieve page 2, a row is inserted that will come out as the first row in your query. When you query for page 2, you will now get row 25-49 but row 25 in your new page will be the same data as row 24 in the former page.

    Alternatively you can define page 2 to be the page starting right after result number 24. But then you will have to send back data about this result (the primary key) in the next query object, thus increasing the complexity of the API, but this actually could be the best way to do it (haven't tried it yet).

    The other issue is that the performance of a page by page query will degrade with the number of results in the original result set from the query. For example, say that you have a history record table with 100,000 records. You usually browse them from the newest record and backwards in history. As a user you don't really care to put criterias on your query because you just sort it with the latest record first and browse through the first couple of pages.

    But it might be quite a large operation for the database, because it will need to do a sort on the 100,000 rows before it can return the first page to the user.

    There might be some obvious ways of avoiding this scenario. But as I see it, a "fix" will add to the complexity of the APIs used for the paged querying. E.g. add a system level "look back" limit, that will limit this kind of queries to only looking back a certain periode in time.
  6. i wonder why no one uses disconnected recordset approach of jdbc and have a client side cache of the recordset
    and then use cookies to find out whether cache needs to be changed

    i suspect something like this happens with content caching systems or oscache tags

    i wonder why we cannot use something like that for db fetches

    - we did use disconnected recordset approach quite a lot using c++ mts development with odbc drivers but iam new to j2ee so dont know the implications for jdbc
  7. just to make clearer we created our own recordset say for 20 pages - kept it as a cache on client side and leafed thru it while our cookie stored the last rowid (oracle) referenced in recorset and the time it was last put in
  8. This is is exactly what I do. I retrieve data in SessionBean,
    populate CashedRowSet and pass it to web client, store it
    in session object and display first set of rows(users have preference how many rows they want to see per page). There is another features such as allow users specify not just next or previous page but also page number they want to see, store user supplied parameters in session to retrieve the same data
     when they use browser Back and Forward buttons, etc.
    What I`m trying to say that CashedRowSet is VERY powerful, it has many more features and it`s allow to do a lot more.
  9. Except for the fact that with a CachedRowSet you have your entire query sitting in memory, quite possibly until the user's session times out. Not a very scalable solution.
  10. You could implement your own (Smart) CachedRowset which will only access the database when data outside its cache is being requested. It just keeps only a subset of the total resultset in cache. This certainly is a viable solution. But by using CachedRowset you bypass the whole EJB stuff and you end up using two different ways of data access if you have already chosen for EJB.
  11. exactly that is what i did using oracle's rowid -or u could just use any form of indexing - i dont see why it should not be scalable if u have just 20 odd pages in the user's client
  12. maybe we need a smart collection getting returned from the ejb depending on "datasession" .OOps now i need a funky code which emulates the concept of httpsession with regards to data and makes it configurable ie have different datadriven session variables that timeout differently in relation to the protocol session---------????
  13. You bypass Entity Beans Collections to display data. You retrieve data in Session Bean, populate CachedRowSet, send it to client, but for updates/inserts/deletes you using Entity Beans. It could be more scalable and faster.
  14. I`m using smart cache - some queries stay in for entire session, some get discarded. There is a problem with
    queries returning hundred of thousands of rows. But most of the time it just means that application needs better design (;-)
  15. Well, the easiest way to avoid heavy database traffic is to store the query result in some recordset, and save that recordset in the session.

    The next time when a user makes a new request, you can do a simple query to get the total number of rows in your database table, if the number of rows doesn't change, use the cached recordset. If it changes, then retrieve a new recordset from the database.


    One draw back of comparing the row count is that whenever the row count changes, you need to do a query to get the entire recordset even though there is only one row changed. This operation can be very expensive if you have a very large resultset.

    My suggestoin is that in your table design, you should always include an 'update_date' column which indicate when a record was inserted or updated.

    Once you have that, you can use the following logics to decide whether to update your cached recordset or not.

    Note: It is important that the data structure your cached recordset used has insert/update/delete features, so you can update/delete any row in your recordset.


    dbRowCount --> the rowcount in your table
    dbMaxUpdateDate --> the max update_date in your table
    sessionRowCount --> the rowcount of the cached recordset
    sessionMaxUpdateDate --> the max update_date of your cached rowset


    case 1.
    if(dbRowCount == sessionRowCount && dbMaxUpdateDate <= essionMaxUpdateDate).

    No data was change, use cached recordset

    case 1.
    if(dbRowCount == sessionRowCount && dbMaxUpdateDate.equals <= sessionMaxUpdateDate).

    some data was updated, retrieve ONLY the data that has update_date > sessionMaxUpdateDate, and insert it to your cached recordset. (or update the recordset by primary key)


    case 2.
    if(dbRowCount > sessionRowCount).

    some new data got into the database, retrieve ONLY new data that has update_date > sessionMaxUpdateDate, and insert it to your cached recordset. (or update the recordset by primary key)


    case 3.
    if(dbRowCount < sessionRowCount && dbMaxUpdateDate <= sessionMaxUpdateDate ).

    some data was deleted, retrieve all data if you don't know what was deleted. If you know the primary key of the deleted data, you can just delete the entries in your cached recordset by primary key.


    Of course, there are more cases other than these three, but the idea is the same. It greately reduce the traffics between the database and the client. Database IOs are just like file IOs, it is the slowest part of any application, and we should avoid them unless it is absolutely necessary.






  16. sorry about the two case 1's. They should be

    case 1.
    if(dbRowCount == sessionRowCount && dbMaxUpdateDate <= essionMaxUpdateDate).

    No data was change, use cached recordset

    case 4.
    if(dbRowCount == sessionRowCount && dbMaxUpdateDate.equals != sessionMaxUpdateDate).

    some data was updated, retrieve ONLY the data that has update_date > sessionMaxUpdateDate, and insert it to your cached recordset. (or update the recordset by primary key)


  17. There is a ValueListHandler pattern in the j2ee specification. Take a look at that..
  18. value list handler still caches at server. we are talking about the case when the client is sure to make succesive calls to the server anyway and might as well do some caching at the client
  19. Oracle and the other db vendors have been optimising searches for 20 years. Don't do it yourselves, use the database.

    Use views, materialised views, prepared statements, simple sql... whatever. But leave it to the database.

    So, yes, paged searches have to accept page number and page size attributes. As for records inserted in page 0 when you are on page 3, do you really care? This entire problem is about when the user is seeing page 1 of 100. No user looks at 100 pages, they skim, they never read the lot. They will always miss out info.

    Better yet, design your system so the user can drill down to the data, then you won't have page 1 of 100 scenarios.

    Jonathan
    ====================================
    The LowRoad EJB Code generator supports paging, and struts
      http://www.tallsoftware.com/lowroad/

  20. we are not trying to optimise searches - we are trying to optimise network calls - afterall even if a db is super-optimised your table view user sitting half the world away still has to make multiple network calls unless u use caching at client
    ps -quite like your product though
  21. Oracle and the other db vendors have been optimising

    > searches for 20 years. Don't do it yourselves, use the
    > database.

    > Use views, materialised views, prepared statements,
    > simple sql... whatever. But leave it to the database.


    well, I don't think anyone here tries to optimize their queries. There is nothing we can do about it. What we really want is to get the query result once and for all. Basically, we don't want to query the database to display each page. Some part of the data must be store in a cache, either the server side or the client side, but not the database.

  22. As it has always been in the database world, there is a conflict between accuracy and speed (i may be wrong, correct me if so) - the more accurate results, the slower; and vice versa - less accurate results but faster.
    I think some comments made here try to solve this in one pattern, which I believe is rather difficult, fair point about Oracle and others attempting it for 10+ years.
    There was a comment saying that you need to take a lot of factors into account and design from there rather than try and find perfect solution as there probably none.
    If accuracy is not important (e.g. search engine) - who cares if the db. dataset changes. On the other hand, we we search for criminal record, we can wait those extra seconds I guess.

    -- philip

    PS> I've always wondered how engines like google and altavista work....
  23. PS> I've always wondered how engines like google and altavista work....

    philip, it is good that you've already started wondering. I've never thought about that before. You are way far ahead of me. :-)

  24. all these discussions are great and in fact, we ran into the same issues before. The idea of course was similar to some of the suggesstions above.
    However, at certain point I really start to wonder, why are we doing this at all? Nobody ever ask product managers to get with a better design pattern? :-) My point is, there should be a situation that I am displaying a list of couple hundred records, really. Search engines or report tools are for that purpose. In those senarios, I very much doubt any objects are used.
    Even with the page by page pattern, users will not enjoy looking through thousands of records anyway. I think it's time to talk to the product managers.
  25. I agree in principle. One has to be crazy to return 200 search records and expect the user to flip through the list on one more more pages looking for what she wants.

    Some sites do use "too many search results - narrow search criteria" message. This makes sense. Only thing to decide is how many is too many. For me a page full is just about enough.
  26. well, I wouldn't say it is crazy to return more than 200 results. When you do a search in google, it returns thousands of results, each page displays about 20 records.
    That's what a page by page iterator is for.
  27. Not sure if this would be good approach, but if we are using a web browser, why not return results in big chunks (say up to 1000) to the client in one page, and then use client-side JavaScript to emulate pager?

    Benefits:
    * All is loaded straightaway to the client
    * Only one big netword transfer
    * No need to spend server resources, but distribute it to clients
    * No worries about databases, efficiency etc...

    Disatvatages:
    * Users could wait longer for the page to load to get first results
    * Need java script enabled browser
    * The performance could be not so good on less-than-powerful client machines
    * And worst of all, need to code in JavaScript which terrible!

    any comments on this?

    --philip
  28. then whatis wrong with applets + http tunnelling
    might as well go back to sockets - iam losing my patience with http anyway
  29. Not sure if this would be good approach, but if we are using a web browser, why not return results in big chunks (say up to 1000) to the client in one page, and then use client-side JavaScript to emulate pager?


    Philip, I would caution against this approach due to the speed of user connections in most envirnments. I don't want to shove 1000 records to a user on a 28.8 connection. With non-lan clients, every byte means more waiting by the user. 100 records is bad enough on many connections, I have a search where you can choose from 25/50/100 results, and even on a cable modem, I can see a difference in load speed between 25 and 100 rows, and I have a 100k/sec or so download speed from the web server.

    Not to mention I hate to "trust" that JavaScript will always work, and refuse to do anything mission critical in it. My bosses boss trying accessing my app with AOL and some old version of their browser, you're JavaScript must no be recent enough won't cut it. I've had too many JavaScript issues in the past to trust it, although people tell me this situation is better now than when I first tried using in heavily a few years ago. I don't use it today for much more than form validation, and some non-critical div visibility swapping.

    -Pete
  30. Huge html pages will annoy the user, and javascript manipulating huge arrays is dog slow. Don't cache in the html page - its bad in almost every case.

    How about this:
    a) Do the search and return the first page.
    b) start off a server side thread to load up the next 2 or 3 pages and cache. Almost all users either abandon the search or scroll to the following one or two pages.
    c) Subsequent pages are compared with this 'look-a-head' cache.

    So what we have here is a middle ground. No client caching, and the initial page is returned quickly. But a background job then creates a look-a-head cache for this user which is consulted for subsequent pages. And this look-ahead can use the initial result set, so there is no repeat query.

    if page exists in look-a-head
      return page
    else
      new_search=true
    endif
    If new search
      discard look-a-head
      do search
      return results
      craete look-a-head cache using same result set.
    endif

    Gives you a user performance boost, not a huge effort, server load could actually be adversly effected, depending on user search behaviour. i.e. it could also improve.

    Jonathan
  31. What if user wants to see page 50 of 100?

  32. the user might want to sort the result by certain columns, so the entire record sethas to be in the cache.
  33. The problem with the Google example is that most people find no reason (nor do they have the patience) to iterate through more than the first two or three pages to find what they want. In fact, search engines are designed to sort the returned list so that the most relevant items are listed first.

    However, this is not the case for a site like Audiogalaxy. In this case, they are presenting an alphabetical listing of song titles as typed by various users around the world. Even with a "tabbed" interface that creates alphabetic sections, you can still end up with hundreds of entries for the letter "S". To find the songs they want, users are forced to page their way through.

    My company faced a similar problem. Our data is very dynamic and changes all the time in a multi-user environment. Our data may be thousands of entries long and we cannot afford to be instantiating that many objects and keeping them in memory. Nor can we lock up a ResultSet by having it persist in the session.

    We solved our problem by trying to keep our database queries as atomic as possible and providing flexible browsing tools - like jumping by more than one page during a "next" or "previous" call, adding filtering to cut down the list size, etc.

    On the data layer, all we did was add a few generic classes and calls.

    The first thing we did was create a hierarchy of Criteria classes (each sub-class defining its own field indicators as constants) to encapsulate the following:
       - criteria (i.e., map of [field, value])
       - filters (i.e., map of [field, value] that translates to a "LIKE" clause>
       - sorting (map of [<field>, <boolean>]).

    Once we had that, we could easily customize our queries to our data layer without exposing JDBC issues, like ResultSet.

    To support paging, we added some new search methods. Our API on our Handler now looks like this:

    public List search(Criteria c);
       - general search
    public List search(Criteria c, int index, int total);
       - paging search
    public List search(Criteria c, Object start, int total);
       - uses an object as a start position
    public List search(Criteria c, Criteria start, int total);
       - uses a Criteria object to look-up the start position

    These work quite well for us. Our only difficulty has been with Hypersonic, which does not support scrollable result sets. We detect this generically in the code and manually scroll to the proper point. Oh well.
  34. I am looking for ideas to solve the following situation:

    1.) There is a database with thousands of entries. (Nothing is deleted) so it is growing.
    2.) The database is dynamically changing with multiple users logged on at a time.
    3.) We want to return user specific information sorted by a data field.
    4.) We would like to make a basic object to handle paging
       -The returned results could be very few all the way to several hundred.

    I have seen many pros/cons and examples for various solutions but there is a catch:

    6.) The users would like to know what is on following pages
    Eg.
     <     First: 09/15/03 09/01/03 05/12/03 04/13/03
         Last: 09/01/03 08/20/03 04/10/03 04/02/03

    Where First and Last are the frist and last items on the page

    -Is there an easy way to get that information from the database so it is sorted and only returning those colums from selected rows?

    -How would this best be demonstrated if a new entry is added later.

    -In this example, users may jump around depending on what they are looking for,
    or they may just want to see what is/has happen(ing)/(ed) around specified dates.

    *What I am really looking for :) ...
    What sugestions, comments, and possible pros/cons are there for implementing such an object.
  35. 6.) The users would like to know what is on following pages

    > Eg.
    > <> First: 09/15/03 09/01/03 05/12/03 04/13/03
    > Last: 09/01/03 08/20/03 04/10/03 04/02/03
    >
    > Where First and Last are the frist and last items on the page

    That should be a 5, and the dates should be lines up under there appropriate column. Where (Page 6) is the item viewed and doesn't need a date range display (or may have it) becasue the user can see all the results.

    &#160; Do html spaces work?
  36. Search engines or report tools are for that purpose.

    Thomas,
      My interest in this discussion is that my product _IS_ a search engine. For better or worse, one of the legacy data stores I maintain a web search engine interface for can take 5-20 seconds each time I have to hit the database. While there is no getting aroung hitting the DB for the inital search, I am VERY interested in scalable ways to implement a pager, especially since searches can return literally millions of hits. Someone metioned higher up that is probably a fundemental flaw in the apllication. With the Legacy app, there is not much I can do about how much data the searches return.

    Currently some searches take 20 seconds to go to the next page. A cacheable iterator that doesn't clobber resources with hundreds logged on is high on my things to do list.

    I don't have solid suggestions about how it should be implemented, which is why I am reading this, but you imply this (search engines) is not a real world use of this pattern. For display on a page, I pass by rows as objects, for various reasons.

    Passing a Collection (rows) of Collections or ValueObjects(columns), and caching next next X number of pages ahead of time in the http session is something I have been thinking about. Done right, I can auto add the next X number of pages when the user gets near the end of the cached pages.

    Any thoughts on how I may be able to scale this well in my situation? I use raw JDBC queries to get my resultsets, since the returned record count may be in the millions. I too would be very interested in the details of how systems like google work.

    -Pete
  37. Currently some searches take 20 seconds to go to the next

    >page. A cacheable iterator that doesn't clobber resources with
    >hundreds logged on is high on my things to do list.

    In come cases it just makes sense to use a dedicated caching framework, that will handle most of the problems.

    Regarding the paging stuff, you probably have to cache the query itself, including the pointer of where you are in viewing the result set (for example a primary key value in an result set ordered by that key).

    Secondly, with JDBC you can limit the number of rows returned. So even if your query results in 10'000 hits, you just return 100 and let's say you view this in pages of 10 records. When you are close (or at the end) of your 100 records, you execute a new query for the next 100 records. This is pretty much how the virtual memory mechanism works in the computer's operating system. You retrieve the data on "page faults".

    You don't have to retrieve all the thousands or records at once, can still cache on the client side and you make your user wait more only when crossing the boundaries between these larger pages.

    Regards,
    Tibi
  38. <quote>
    Regarding the paging stuff, you probably have to cache the query itself, including the pointer of where you are in viewing the result set (for example a primary key value in an result set ordered by that key).
    </quote>

    It would work if the query is sorted by primary key,
    it's often not the case - users need to sort result by different columns with duplicate values.
  39. I recon we are all saying there are loads of solutions. Take your pick, they all have pros and cons.

    Which is what I recon.

    Jonathan
  40. It would work if the query is sorted by primary key,

    >it's often not the case - users need to sort result by
    >different columns with duplicate values.

    That is correct. I could not find any database server-neutral solution. Oracle has a rowID that could be used, but I don't think there is a standard mechanism for identify a row other than the primary key...

  41. Has anyone considered a solution using a scrollable resultset? Is there some reason that this would be a bad solution? You would submit your query, and the db would return a bite-size chunk of your results. What seems like a nice benefit of this solution is that it doesn't return the entire result set, which means you don't use up all the server's memory, you only send to the client what it needs, and this seems like it would work for any query because it doesnt' involve the pagination object knowing anything about the data thats being returned. Any thoughts?

    Mark
  42. It seems that your problem is mostly related to the time it takes to process your whole data set. Have you tried looking at using the setMaxRows() method on your Statement objects? This will prevent large ResultSet objects from being processed by providing a limit on the number of records that are returned from the database as a result of your query.

    Might help ...
  43. I am working on a similar problem with a slight variation. Currently, we have a batch process that needs to access potentially a million transactions from the back-end database and process these. This is some sort of a never ending batch process that is supposed to poll the back-end database for transactions.

    I intend using the CachedRowSet approach (Data Transfer Row Set pattern) to return data in chunks of lets say, 100,000 transactions from a business method exposed by a stateless session bean that would make a JDBC call to the database to retrieve the next chunk (100,000 rows) of unprocessed data. By virtue of the fact that stateless session beans can be pooled and hence scale very well, I think this would be a scalable solution.

    I would appreciate inputs on this from anyone out there who has implemented a similar solution.

    Thanks
    vp
  44. The page by page iterator seems to advocate retrieving the whole resultset upfront, for our project this would have entailed a massive performance hit for the end user, when the query was first run. This also requires stateful storage of the users resultset.

    I used a pattern that was in our old CICS/COBOL systems (who'd have thought). I developed a pageset class to store a set of keys that would allow the retrieval of a page given the key.

    The pageset class stores objects representing the key of a single page with a resultset. This allows for forward and backward traversal thru a resultset by page no. Once a key is stored the resultset can be retrieved from that particular page, ie where a column is greater than the key.

    The top of the resultset is always equal to an empty key object, ie if the key was a string, then key 0 should contain an empty string.

    A key can be any object ie String or custom class, for primative datatypes the object wrapper class for the primative datatype can be used. ie for int use Integer. Custom classes can be used for more complex keys.

    The whole thing sort of looks like this

    +---------+ +---------+ +---------+
    | key 0 | | key 1 | | key 2 |
    | | | | | |
    +---------+ +---------+ +---------+
    +---------+ +---------+ +---------+
    | Page 1 | | Page 2 | | Page 3 |
    | | | | | |
    | rs > | | rs > | | rs > |
    | key 0 |-->| key 1 |-->| key 2 |--> and so on...
    | | | | | |
    | | | | | |
    | | | | | |
    +---------+ +---------+ +---------+

    Each time the user goes forward the key for the next page is stored in the pageset. This is one of the shortcomings of the pattern however, the user has to move thru the resultset to cause the storage of the keys, so no goto last page mechanism, or jumping to particular unretrieved pages is possible.

    When you re-execute the query for the next/previous page you can retrieve the key for that page from the pageset class.

    Note:

    * for more complex keys you need to use >= as well

    ie where col1 > key1
       and col2 >= key2
       and col3 >= key3

    * the key must be the same as the sort order in the query.

    * if your dbms/jdbc driver has the ability to limit the number of rows returned, this will also improve performance
  45. See bottom of page at this link: http://uqconnect.net/~zzblong
  46. See bottom of page at this link: http://uqconnect.net/~zzblong

    it doesnt talk about ordering of records..
  47. > See bottom of page at this link: http://uqconnect.net/~zzblong
    it doesnt talk about ordering of records..
    See below...

    ----- Original Message -----
    From: Someone
    To: Brad Long
    Sent: Friday, March 19, 2004 12:38 AM
    Subject: RE: Distributed Result Set Iterator

    Thankyou. This works.

    ----- Original Message -----
    From: Brad Long
    To: Someone
    Sent: Friday, March 19, 2004 12:38 AM
    Subject: RE: Distributed Result Set Iterator

    Try this:
     
     WHERE settlement_transaction_date > date OR (settlement_transaction_date = date AND trans_id > id)
     
     
    (with the same ORDER BY as below).
     
    ----- Original Message -----
    From: Someone
    To: Brad Long
    Sent: Friday, March 19, 2004 12:38 AM
    Subject: RE: Distributed Result Set Iterator


    Hi Brad,
    Thanks for your reply. The solution seems attractive in first glance but it still suffers the same problem of missing the trans_id=3 row. Based on your query last row of first batch has values settlement_transaction_date=Jan 3, 2004 and trans_id=3. Now based on the condition settlement_transaction_data>= Jan 3, 2004 and trans_id > 3, query misses the row with trans_id=3 and settlement_transaction_data=Jan 4, 2003.
     
    To resolve this problem we're thinking to add a time stamp to the table and add that to the WHERE clause. This way we take an snapshot of the query in a given time and iterate through that snapshot. Obviously we ignore all the rows that will be added during the paging as their time stamp are greater than the time that query has been initiated. It's not the best solution, but it guarantees to return all the rows for a specific query in a given time.
     
    Please let me know if you have a better idea.
     
    Regards,
    Someone

    -----Original Message-----
    From: Brad Long [mailto:brad dot long at oracle dot com]
    Sent: Wednesday, March 17, 2004 7:43 PM
    To: Someone
    Subject: Re: Distributed Result Set Iterator

    Hi Someone,
     
    Yes, that is a known issue that I mention in the paper. I need to think through this carefully, but I believe the trick to making sorting work is to add the "order by" fields to the "where" clause and use the pattern in the paper:
     
    e.g. order_by_field_1 >= x and order_by_field_2 >= y and primary_key_field > z
     
    If there is more than 1 primary key field, then they need to be appended as usual (using >= for all except the last field, which uses >). Of course, additional where clause predicates can be added as desired.
     
    Similarly the primary key fields MUST be added to the end of the ORDER BY phrase (so the iterator knows which unique row it is up to).
     
    For example, you would need to change your SELECT to:
     
     
    Select *
    FROM (SELECT ROWNUM RNUM, TRANS_VIEW.*
                FROM (SELECT * FROM Trans
                           WHERE settlement_transaction_date >= date AND trans_id > id
                           ORDER BY settlement_transaction_date, trans_id
                ) TRANS_VIEW
    WHERE RNUM < 4
     
     
     
    Hope this helps,
    Brad.
     
    ----- Original Message -----
    From: Someone
    To: Brad Long
    Sent: Thursday, March 18, 2004 8:15 AM
    Subject: Distributed Result Set Iterator


    Hello Brad,
    I read your paper "Distributed Result Set Iterator" and it's an excellent pattern for handling larger result set. The only problem that I've found while I tried to apply the pattern for my problem is the technique that you used for query slicing. Query slicing based on primary key doesn't work properly when you have Order By on your query. You can easily miss rows of data. Consider this example:

               Trans Table
    trans_id settlement_transaction_date
    ---------- ----------------------------------------
    1 Jan 1, 2004
    2 Jan 2, 2004
    3 Jan 4, 2004
    4 Jan 3, 2004

    Select *
    FROM (SELECT ROWNUM RNUM, TRANS_VIEW.*
                FROM (SELECT * FROM Trans
                           WHERE trans_id > id
                           ORDER BY settlement_transaction_date
                ) TRANS_VIEW
    WHERE RNUM < 4

    Running this query with initial value id=0 returns trans_id 1,2 and 4 and id would be set to 4. In this case we miss trans_id 3 in the next page base on trans_id > id criteria.

    Any solution for this problem?

    Regards,
    Someone
  48. Try bottom of this page: http://student.uq.edu.au/~s098806/

    :)
  49. You can Use the same Page By Page Pattern but instead of using EJB you can use DAO classes. In my current project , we have done a framework by extending the Page by Page Pattern, which is preety simpiler to use and its re-usable accross the project.
  50. Pagination can be implemented using relative method on JDBC ResultSet by passing the startPosition. So for example if your query returns 10,000 rows and page size sets to 500, startPosition will start from 1 for the first page. When it gets to the last row of the page, the same query will be executed and startPosition sets to 501. Invoking ResultSet.relative(startPosition -1) returns the next page of rows from index 500.
  51. See: http://valuelist.sourceforge.net/