Discussions

EJB programming & troubleshooting: Parsing a byte delimited text file using a meta data file

  1. I need to parse a large text file and populate the oracle DB. The text file is byte delimited ie I have a metadata file which has details of the start location and length in bytes and the corresponding DB column. I need to use this metadata file and read the text file and populate the DB. problem is the File is huge with each record being 2000 bytes and there being around 70K records in each file. What would be the fastest way to parse and populate in such a scenario? Is java a good solution or should I be using perl. Any pointers towards a Java solution would be greatly appreciated since I am a Java novice!!!

    Surajit
  2. Do you really think 70Kb is a large file????
    The algorithm seems quite simple:
    1) read metadata file
    2) configure the parser
    3) parse the data file
    4) insert in db

    Performace-centric approaches:
    * Insert in DB right after reading a line, in order not to load the hole file in memory.
    * Insert in DB after reading an arbitrary number of lines (this number beeing calculated, perhaps, from the number of lines in the file).

    As for the language of choice, other factors might affect the decision: deployment scenarios, integration with other applications in the system, etc.

    Cheers and happy coding,
    Martin
  3. Thanks for the response Martin. 70Kb is indeed not a big file. But in my case though the file size is more like 134 MB (70K records each record is of length 2000 bytes)

    -Cheers
  4. Ah, I see... 134 Mb is a big fat file... Should not be a problem if you don't load all the data at once, rite?