Really need a solution to deal with LARGE data


Web tier: servlets, JSP, Web frameworks: Really need a solution to deal with LARGE data

  1. Really need a solution to deal with LARGE data (5 messages)

    Dear experts,
    I am developing a web app, with a large Oracle database.

    My customer need a function to download all records
    (about a million) to their machine.
    My solution is:

      1. execute sql to get a chunk of 50 records
      2. write to file
      3. go to step 1, until all data is downloaded.
    This works fine with small database. But it spends an hours to deal with a large database (over 500K records).

    Please give me some advice.
  2. Hi Khanh,

    More information would be useful like:

    1. What is your current solution? (JDBC? ORM?)
    2. What is taking the most time in the hours it spends? (The SQL, the iteration of results, the writing to the file, etc.)
    3. How are you limiting the results to 50? On the DB or your code?

    All these will help give a direction on what the best solution is.

  3. Thread to fetch and a single Writer Manager to write in file.

    Fire a parallel thread on the database with different "where" clause in database or for fixed defined condition.After the thread fetch the collection of record , the collection is passed to central writer manager which write the data to file.

    Best Regards,
  4. I'd like to point out the need of open juat one trasaction, since the data could be changed while you're fetching thousands of records.

  5. If the amount of data is that large, then is it really suitable for a realtime activity?
    I think I'd use the intial page to kick off a job on the server to dump the data, and only when it's finished would I attempt to pull the file, rather than trying to pull batches of rows.
    Ideally you'd do all this as close to the Database as possible, then the slow bit (pulling the data across the network) can be done once the transactional bit is done.
    Just a thought, maybe this is a good use for XMLHttpRequest to poll for the completed job in the background?
  6. I'd also like to point out that tranferring that much information over the network to a client machine seems questionable.

    In addition, having such a large transaction running on a DB for so long usually would kill the transaction snapshot - unless it was huge.