Discussions

News: Java Bayesian Classifier ci-bayes 1.0 released

  1. ci-bayes, a project hosted on java.net, has released its first stable version. ci-bayes allows the use of a classifier to determine what classification a given object might fall into, given prior training, and provides multiple classifiers, hooks for persistence, and results for multiple classifications for each object tested. ci-bayes is based off of the chapter on Bayesian classification from Toby Segaran's "Programming Collective Intelligence," and has been ported from the original python with the explicit permission of the author. ci-bayes is built with Maven 2, and has an explicit runtime dependency on javolution; it provides factories for use with Spring 2, but those aren't required for runtime in the simplest case. A simple example of how the classifier works might look like this:FisherClassifier fc=new FisherClassifierImpl(); fc.train("The quick brown fox jumps over the lazy dog's tail","good"); fc.train("Make money fast!", "bad"); String classification=fc.getClassification("money"); // should be "bad"Currently, ci-bayes uses the SpamAssassin testing corpora for performance and accuracy testing. The methodology is fairly simple: it first trains itself according to the SpamAssassin conventions with seven out of ten corpora, then goes back through the training set, testing the remaining three corpora to see if the result matches what SpamAssassin generated. It's able to run the classification tests in just over eleven seconds on a single CPU core, with a 98% match with SpamAssassin; given that SpamAssassin and ci-bayes have different classification mechanisms and different functions, this is probably acceptable for most usages. (SpamAssassin uses a neural network to analyze spam; it's not a strict bayesian classifier, so a 98% accuracy is - in my opinion - a marvelous result.) The binary jar for ci-bayes-1.0-SNAPSHOT is available on java.net.

    Threaded Messages (4)

  2. This is something that I've sought in Java for a while. I've thought about writing my own, but somehow never gotten around to it (though, Reverend runs reasonably well under jython, just slow). Thanks!
  3. WEKA[ Go to top ]

    There is a very good 100% Java GNU data mining package out in academia land that comes with a whole ton of classifiers. It's called WEKA and you can find it http://www.cs.waikato.ac.nz/~ml/weka/index.html.
    Granted, WEKA does not use javolution (it probably performs rather poorly in comparison with ci-bayes) and does not come with any hooks into J2EE frameworks, it is still an excellent package that we've previously managed to use in a web-app.
  4. This looks quite interesting, downloaded the code from SVN. Only one thing - documentation? At least some javadocs, please? :) Without having read up on tha backing material, I'm sort of struggling to follow what some methods (or what either parameters do) do without some explanation.
  5. Source code?[ Go to top ]

    I noticed there is no source code available. Could you provide this? I was hoping to use Hibernate annotations to persist the FeatureMap and ClassifierMap.