Discussions

Performance and scalability: running batch jobs to upload csv

  1. running batch jobs to upload csv (1 messages)

    Hi, My apologies as I'm not sure exactly if this is the right forum to post this question but here goes... I've been asked to create/run a script in Solaris which would be placed in an app server that would retrieve a csv file (located in a RADIUS server) which contains data to be uploaded to a Oracle master database residing also in the app server. I have no idea on what would be the best way to implement this as performance is a concern (the csv contains thousands of records). Here are some of the approaches that I can think of... 1. create a java program which would parse the csv file and upload the data via JDBC. the java program then would be called by the batch file and automated by a cron job. 2. create a java program which would create a .sql file containing insert statements based on the parsed csv file and create a script which would run the command "sqlplus @" Please provide your suggestions. The important thing here really is to be able to upload the csv file from the RADIUS server into the ORACLE database that reside in the app server and that it must be run as a scheduled task. I would still need to parse the csv file because one of the fields mapped to the Oracle table contain commas and this field is not enclosed in single quotes (i.e., data1, data2, data3, , data4, etc) Hoping to hear from you as soon as possible Thanks! Raymond
  2. Re: running batch jobs to upload csv[ Go to top ]

    Hi Raymond, How many ways are there to skin a cat? There are plenty of ways to accomplish what you are talking about, but I'll speak about two of the more popular approaches. First off, what volume of data are we talking about? Where is your database server in relation to the machine that will be acting as your application server i.e. are they in the same building or on different sides of the country? Well, I guess the fastest way to load the data would be to use SQL* Loader. Loading data from a file into a database is exactly what SQL* Loader was created for. How suitable a solution this is for you will depend on how complex the parsing is of the document and your level of comfort with the SQL* Loader utility. Alternatively, you could use a Java based approach. One question though, where does the application server environment come into this? If you are using a cron job to manage scheduling, wouldn't it be easier just to keep it J2SE based? Anyway, we have done some of this type of work before. The BufferedReader is very fast at reading in individual lines from a file. I mean, I have one file that I run as part of a test and it reads 100,000+ rows in a second. So file access will (should) not be a problem. Where you may lose sometime is how you use JDBC to load the data into the database. Make sure that you use the latest version of the Oracle JDBC drivers, they are always faster. Make sure you use Connection pools. Make sure you use Oracle Batching for updates and inserts. There are plenty of sites that offer advice on this. Make sure you research the Oracle specific JDBC classes as part of the batch updating. We worked on a system, where in a development environment, using ordinary JDBC it was taking about 55mins to do 183,000 updates. We changed this to using Oracle batch updating and got this down to 11 minutes. One other tip, look at your code and see where you may have scope for parallel processing. If each line in the csv file is atomic in nature, and does not have a dependency on data further on down the line, then look at creating a thread pool so that you are processing in a multi-threaded rather than single threaded fashion. There are other techniques you can try, but start with those above and let me know how you get on. Regards, Jason