Discussions

News: PZFileReader 3.0.0, CSV access package, released

  1. Paul Zepernick and ObjectLab are pleased to announce PZFileReader 3.0.0 for Java 1.4+, an Apache 2.0-licensed flat file parser (CSV, Fixed Length, Custom) using XML to configure formats. This is a major release with re-designed interfaces and performance gains of, in some cases, several orders of magnitude. Unfortunately, this release is not backward compatible. The implementation is useful to any business that deal with flat files. Not only it can parse very quickly some CSV or any-user defined delimiter, this library can parse fixed-length files. An example of its use, from the project home page://Obtain the proper parser for your needs PZParser pzparser = DefaultPZParserFactory.getInstance().newDelimitedParser( new File("DataFile.txt"), //txt file to parse ',', //delimiter '"'); //text qualifier //obtain DataSet DataSet ds = pzparser.parse(); while (ds.next()){ //loop through file ds.getString("mycolumnName"); }The library allows you to define an XML mapping (or database) of the format of your file, as well as allowing the file itself to determine the columns (as shown in the fragment above). Once this is done, the parsed data can be accessed via a simple name lookup or streaming mechanism. It is our aim to publish at some point some well know file formats for your immediate use. Please contribute if you have some standard files... It is available for download via SourceForge or the Maven Central Repository (both Maven 1 and Maven 2). The homepage has some very quick examples. Message was edited by: joeo@enigmastation.com

    Threaded Messages (8)

  2. Re: opencsv comparison[ Go to top ]

    It's interesting how this library compares to opencsv especially in performance. I've added opencsv support to Scriptella ETL project, nevertheless a quick evaluation of opencsv sources revealed many places for optimizations. Regards, Fyodor Kupolov
  3. Re: opencsv comparison[ Go to top ]

    Are you saying its faster or slower than opencsv?
  4. Re: opencsv comparison[ Go to top ]

    I said that opencsv could be faster, but I don't know if PZFileReader is faster than opencsv?
  5. Re: opencsv comparison[ Go to top ]

    a quick evaluation of opencsv sources revealed many places for optimizations
    Can you create an issue in the OpenCSV issue tracker? http://sourceforge.net/tracker/?group_id=148905&atid=773541
  6. Re: opencsv comparison[ Go to top ]

    Can you create an issue in the OpenCSV issue tracker?
    This issue was created in September (sorry for anonymous post, but I experienced problems with authentication) - and no comments on it: http://sourceforge.net/tracker/index.php?func=detail&aid=1554996&group_id=148905&atid=773544 I could post more issues but you didn't answered the above mentioned. Anyway, at this moment CSV parsing is not a serious issue for my project, moreover in most cases JDBC calls are the primary performance bottleneck in ETL. Regards, Fyodor Kupolov
  7. Re: opencsv comparison[ Go to top ]

    Hi Fyodor, I will see if I can do some speed comparisons and post the results for you. I have not used opencsv before, but at first glance it looks like PZFileReader offers a different approach to reading in the data, and iterating through it. Opencsv appears to return a String[] for ever row in the file. PZFileReader will return a DataSet which has the names of the columns bound to their positions. There is 3 different options for the binding: 1. column names are obtained from the file header 2. use pzfilereader's xml mapping 3. use a pzfilereader's database table layout The PZFileReader can return different types for the column; getString, getDate, getDouble, getInt, getObject. getObject accepts the name of the column to return, and the class of the object to be returned. Out of the box it supports BigDecimal, Double, and Integer. This could be setup to return custom types, for example, you may have a field that contains a customer number and you want to load up a Customer hibernate object. You can implement the PZConverterInterface to return a Customer object based on the data in the file. So you could end up doing something like this DataSet.getObject("custno", Customer.class) which would return your Customer object. It also appears, from looking at the openscv api, that you can only advance forward through the file. PZFileReader offers the following methods for navigating the file. next() previous() absolute(int lineNo) PZFileReader also allows you to order the file by a column(s) before iterating through the file. Ascending or descending orders can also be specified on each column sort. Here is the user document which goes into some additional features: http://pzfilereader.sourceforge.net/documentation/pzfilereader-manual.pdf This is not meant to beat up on opencsv in any way. It looks like a very lightweight, easy to use api. I am merely pointing out some deferences. The other difference being the PZFileReader does not write to CSV, only read. Opencsv look like a very handy tool for writing, and the ResultSet to CSV looks pretty cool too. Sorry for the long post, but there was a lot I wanted to get in :) Thanks, Paul Zepernick
  8. Does PZFileReader 3.0.0 support configuration of a field to occur multiple times? Thanks, -Anil
  9. Hi Anil, Sorry for the late reply... What do you mean by having the field occur multiple times? Do you mean being able to have a line break in the middle of a column? If so, then yes, it is supported. Thanks, Paul