big ArrayList RecordSet

Discussions

XML & Web services: big ArrayList RecordSet

  1. big ArrayList RecordSet (11 messages)

    I have a design question. I have an application that will have at max 8 concurrent users and 80% of the time it will have two. I need to retrieve an average of 150,000 records. This will be accessed through a DAO layer, and an arraylist of objects will be populated from the recordset. I am not doing any writing to the database, just reading the data doing some business logic on it (through session ejb) and outputting to a text file the results. I am restricted from using storied procedures because of the clients policies. Is this too much of a load? My heapsize for the jvm will be 256. The current implementation uses temp tables and takes about 6 minutes to process. The 150,000 records are insert into temp tables and business rules are run against them. Our current design does not use temp tables, rather load the resultset into an arraylist of objects and do the business rules. Please provide feedback on process time, and other implementation. Thank you.

    Threaded Messages (11)

  2. RE:big ArrayList RecordSet[ Go to top ]

    Why load the entire database into memory to begin with? Why not just subclass a collection class, and make it load a portion of the database at a time. Since it sounds like you may be going through the DB records sequentially, it should be able to do a good job pre-fetching records and the getting rid of the records from memory after updating to the database.
    I would try to avoid loading them all into memory like the plague.
  3. Load dat[ Go to top ]

    I am actually doing a query to retrieve these records (150,000) of them, the table itself contains millions of records. I need all of them to perform some business rules on them. The result of that business rule is outputted to a text file. I would like to break up the records retrieved however, I need the whole resultset to perform the business logic. There were will not be many concurrent users on the application (2 most of the time), and we are limited by client policy for not using storied procedures.
  4. Load dat[ Go to top ]

    I don't know about your logic actually, if you load 150,000 records for each user then there may be a chance for 1,200,000(150,000 X 8) at a same time. It may hit your performance.

    I think you will not show all the 150,000 records on the interface, you may display some kind of calculation results. So what I would suggest is, break up the result set in smaller pieces(let say 10000 records) by using database dependent attributes (rowid in oracle) and do the calculation on your business layer. You can use the appropriate transaction for your query to make sure that you are getting the same result set(as when you start the transaction) in all queries.

    Hope this helps,
    Senthil.
  5. Load dat[ Go to top ]

    Yes, but you can't look at all 150,000 of them at one time. Unless you're doing some sort of set manipulation on them, in which case the DB will be faster then anything you can write.
    Something like:

    ReallyBigDatabaseRequest rbdr = new ReallyBigDatabaseRequest();
    Iterator iter = rbdr.iterator();
    while( iter.hasNext()) {
      //process
    }

    where ReallyBigDatabaseRequest inherits from Collection, and adds it's own methods to meet you're need still sounds like a better deal. It doesn't have to load all the records at once. If you need to access a particular record then you can have a method that will do that. Use the ReallyBigDatabasRequest class to in effect hide the fact that you don't really have all 150,000 records. After all, today its 150,000, tomorrow it's probably twice as many.
  6. Hibernate[ Go to top ]

    Check this persistance layer framework out, might solve all of your worries, in case you are planning to implement your own DAO layer.

    http://www.hibernate.org

    The other approach would be serializing the results as an XML file and running the business rules against that, where you would pipe out results into the XML file directly without caching the whole thing in memory, where after that you can walk the tree and use all sorts of XML-specific methodologies and APIs(like XPath, JDOM, XSL and etc.) to make your task easier.

    If there are any results that you output to the file, than the XSL transformation might be a big helper. Might save you some time on writing custom rules and stick closely to some open conventions.


    I would have to knwo more about the requirements, to provide a specific resolution though.

    Hope that helps to put you on the right path though.


    Regards,

    Art Yegorov
  7. mdb's possibly?[ Go to top ]

    A little more background probably would help. This web application allows an accounting group user to book their entries to the general ledger. The user will submit a request (clicking a link) on the intranet site, and then my application has to retrieve all of the transactions posted from the last run then provide the following:

    1. perform business logic on these transactions ( call them journalObjects)and then generate a text file that will be sent to the general edger.

    2. archive an audit trail of the transactions (call them auditTrailObjects), one transaction typically generates four audit trail records.

    Possible Implementation:
    Typically there are 150,000 transactions posted in a run. This obviously is a large number of transactions to compute in memory and serially. I could have message driven beans that retreive and process at a certain interval say 10000 at a time, to get some parralel processing. My business logic would be in my mdb. After I process the 10000 transactions I will have an arraylist of journalObjects and auditTrailObjects created by the mdb. How do I pass that back in a response to the client? Do I persist them temporarily in the db in some temp tables and then send a response to the client that processing has been completed. From there when the client (a session ejb) receives all mdb responses, then the client can go and generate the text file. Based on the feedback it is much better to select a certain number of records at a time, do some business logic, then go get some more. message db's would allow me to do this and provide some parralel processing, but I am still unclear on how or what should be sent in the response back to the client?
  8. mdb's possibly?[ Go to top ]

    How about piping the results directly into client's response at the time of processing:

    while(more objects exists in the result) {
    get object -> run business logic -> cleanup -> pipe to response
    }

    this way you will only have to work with one object at a given time per each request and client will receive all of the results in the end.

    Pretty much write the results one by one into the response OutputStream. I did this for some reports that had a lot of data and I had to transalte it all into CSF format. For each rwo do the transformation and pipe it to the request. It is fast and memory friendly.

    To complete the transaction just close the response OutputStream.
  9. mdb's possibly?[ Go to top ]

    so for my implementation instead of trying to return an ArrayList of Objects from my message bean I should pipe each object into the clients response. How do I accomplish this with a Message object using my message driven bean response? Should I be scrolling through my resultset within the mdb (the mdb accesses the DAO layer)?
  10. mdb's possibly?[ Go to top ]

    What if you do not use MDBs?

    Just deal with the objects directly...

    Data element -> Response kind of a thing...

    Do you have to be constrained wiht using EJBs? Or you have a choice of a more light-weight implementation?

    i can send you some sample code if that is the case...
  11. mdb's possibly?[ Go to top ]

    Well I am not restrained for using EJB's but thought in this instance it would be applicable because of the features EJB's provide (scalable, reliable, use of container resources-transaction management, etc.). Since our application involved applying business logic to large amount of data (150,000 average records) we could take advantage of the features J2EE provides. Also this will grow into several other similar applications so scalability became a big factor. But please offer some sample code or further suggestions, I am a rookie in this so help is appreciated. Thanks.
  12. mdb's possibly?[ Go to top ]

    Do you run a clustered or load-balanced environment?

    In load-balanced environment EJBs might become a problem, unless you use sticky option (if available) and that affects performance...

    Also EJBs are quite a bit of a memory hog...150K records via EJBs per transaction could be a potential problem.

    I will provide you some code via email, so it retains formatting.