Discussions

Performance and scalability: Need a GOOD library for reading ZIP archives!

  1. Need a GOOD library for reading ZIP archives! (2 messages)

    I'm archiving large volumes of potentially large files (i.e. both the final archive and the individual files in them might be greater than 4gigs). I'm using 7zip and WinZip to make the archive files (ZIP format even when using 7zip). The archive files can be opened just fine in 7Zip, WinZip and WinRAR. I want to read the file lists (names, paths, CRC, sizes, etc) from each archive via a Java app to generate a catalog of the files. The problem I'm having is, ZipFile can't handle anything over 4 gigs and the workaround of using ZipInputStream is giving me two different problems: 1) On some files it hits a particular entity and throws an error on the getNextEntity (same entity in a given archive every time but it's not the same file in each archive). On one archive it complains about an "invalid argument" which from the details of the error looks like it might be a character encoding problem. On other files it complains that the number of bytes read doesn't match the expected size and "surprise surprise" the file size in the error message is right around 4 gigs. 2) The other problem is performance. I'm just cycling through the entities with getNextEntity in a loop. Once it gets up around the 20,000ths entry it start taking FOR EVER to get the next entity. I suspect that using the ZipInputStream, it's actually going back to the beginning of the archive file every time and scanning through again. Since all of these files open and function just fine with 7Zip WinZIP and WINRAR, I'm no longer concerned that the archives are actually corrupted. So now my issue is, where can I get a library for Java that handles ZIP archives as well as the 7zip, WinZIP and WinRAR programs do. If anyone can help, great. If this isn't the right place to ask, then does anyone know where would be a good place to post this kind of question?
  2. I have been doing cursory reading about the True Zip project https://truezip.dev.java.net Its quite good in some ways of getting around limitations of having to use java zip API. I wonder if you could achieve your objective by using this API. Nevertheless its quite easy and quick to use it and hence prove it. cheers,
  3. We've used TrueZip for several years and frankly, it's been something of a pain. At various times we've had memory, consistency, and reliability issues. From a usability perspective, TrueZip is based on the idea of treating a zip file as a normal directory. This seems like a good idea, and frankly, had someone tasked me with writing a zip library in the past, I might have done exactly the same thing, but in practice, it becomes a tremendous pain. TrueZip uses it's own File class which extends java.io.File. This ends up causing far more problems than it solves. I would absolutely LOVE to find a usable, reliable zip library that is optimized for efficiency, especially with large zip files. I've seen one out there, but it was a windows only library (a windows only Java library? Really?). Finding such a library has become something of a snipe hunt for me. One day... One day...