Discussions

News: Compass Framework 0.8 + Lucene Jdbc Directory

  1. Compass Framework 0.8 + Lucene Jdbc Directory (17 messages)

    We are pleased to announce the 0.8.0 release of the Compass Framework,a Java search engine framework based on Lucene. A major feature of this release is the ability to store Lucene indexes within a database.

    Being able to store the Lucene indexes in a database has two parts:

    A complete Lucene Jdbc Directory implementation which is separated from any Compass-related code (currently part of the compass-core module and jar), allowing its use in pure Lucene implementations. The release supports several Dialects, including all major databases.

    The second part is the integration of Jdbc Directory with Compass Framework, allowing existing Compass code to work as is, requiring changes to configuration settings only. The integration includes several Data Source Providers (DBCP, c3p0, JNDI, and more), and special automatic performance improvements.

    The fact that the Lucene index is stored in a database allows creating search clusters a snap since all the cluster nodes can access a centralized database, and might ease the introduction of search abilities using Lucene into the organization (some organizations do not like the fact that the index is stored in the file system, which is understandable if the filesystem has a failure).

    Please see the Upgrade and Change log information.

    Threaded Messages (17)

  2. Why would you store Lucene indexes in a database? That is going to significantly degrade performance and give you almost no benefit (if you want central storage, just get a NAS RAID).

    One of the major beauties of Lucene is its performance. Don't kill it.
  3. Not trying to kill Lucene performance... . Just trying to add additional options for integrating Lucene within applications. By the way, performance do not suffer that much, I did not perform exhaustive tests, but it seems like it's 10-50% slower, and sometimes not that slow. If you think of how fast Lucene is, that is not that big a problem. Have a look at the implementations details, we tried to add as much performance considerations as possible, and it seems to perform well. So, before you decide to "kill" the implementation, take it for a ride, and see how it performs. Lucene was built using amazing concepts and algorithms that seems to work with database not that bad... .

    Also, the benefits of creating a centralized index store easily and storing the index within a database is important for some organizations, more than you might think.
  4. I know.

    There was a time when I sturved for something like what you have created, but then I kindof changed my mind and decided disk-based index is just fine. Oh well...

    Anyway - Congratulations!
  5. Thanks mate. As we all know, sometimes the power of the people running the organization is stronger than "common sense". I would go with a RAID solution (and Linux based, since there are problems with centralized NTFS solution), but it seems like something that a lot of people are asking for it in both Lucene and Compass forums/mailing-lists. Hope it will satisfy their needs.
  6. Thanks mate. As we all know, sometimes the power of the people running the organization is stronger than "common sense". I would go with a RAID solution (and Linux based, since there are problems with centralized NTFS solution), but it seems like something that a lot of people are asking for it in both Lucene and Compass forums/mailing-lists. Hope it will satisfy their needs.
    Very true. Good luck.
  7. since hibernate 3.1 has direct lucene support, does compass really make sense ?

    http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html
  8. If you have a look at Hibernate source code, and the support for Lucene, you will see that the support for it is not good to say the least. There are several problems with the Hibernate code:

    1. It will have very bad performance. An IndexWriter / IndexReader are opened for each object.
    2. No support for transactions. So you might get the index out of sync with the database.
    3. The ability to configure different Lucene options is very minimal.

    Note, that compass is much more than just Hibernate integration, it provides Object to Search Engine mapping support, transaction semantics, plays nice with Spring and most ORM tools, and the list is long. The latest of the features is the ability to save the index in the database, which has nothing to do with Hibernate. And actually can be used with Hibernate implementation (looking at the Hibernate code again, you can not, since you have no options to configure the directory you are working with).
  9. <quote>Why would you store Lucene indexes in a database? That is going to significantly degrade performance and give you almost no benefit (if you want central storage, just get a NAS RAID).
    </quote>


    Actually, There are benefits for this approach and I am interested in this kind of solution.

    In many enterprise applications, full-text indexed search results needs to be filtered by other metadata stored in the database: security-related permissions, user-defined meteadata etc.

    There are usually two ways of doing the filtering: one is read the both set of search results into memory and filter in memory; another one is simply load the resultset into database and perform the join directly in database and only returned the final result to memory and eventually to end user.

    Store Index in database could potentially could make the second approach easier and help overall search performance (I hope :-)

    I am interested to see how this is implemented.

    Chester
  10. You won't be able to do it with the implementation provided, since the index is stored as a Blob in the database. Even if it was not, and it was stored in a different way, with the way an inverted index works I don't think that you will be able to do what you are describing.
  11. sorry .. was a bit unqualified statement, the point for is that most of the people using hibernate anyway, and instead of using 2 frameworks which also includes maintaince i would go just for hibernate with their lucene support.
  12. sorry .. was a bit unqualified statement, the point for is that most of the people using hibernate anyway, and instead of using 2 frameworks which also includes maintaince i would go just for hibernate with their lucene support.

    I have already stated the problems with the Hibernate implementation, so bearing that in mind if you decide to go with it, it's your choice.

    Also, remember that performing searches is the other part of the equation. And with Hibernate you will need to delve into Lucene API's, which are not that bad, but for best performance you will need to perform caching, invalidation and more, all of which are done automatically by compass.
  13. Sorry that's too bad.
  14. You won't be able to do it with the implementation provided, since the index is stored as a Blob in the database. Even if it was not, and it was stored in a different way, with the way an inverted index works I don't think that you will be able to do what you are describing.


    There is no support for BLOBs in sybase... we had to make a wrapper when we used sybase with Hibernate..
  15. I also must recommend this library and the outstanding support that shay gives it on the forums helping beginners and adding requested features etc.

    My only comment is that it's a very powerful framework and what I would say is that seeing as most users use it from a webapp perspective I would recommend a demo that shows more of it's features in terms of types of queries that can be used etc. I know that your time Shay is probably spent on development, but rememeber that the easier it is to use, the more people will use it, which means the better it will get.

    thanks,
    Brian
  16. There is already an example which takes Spring Framework petclinic and shows how it can be integrated with compass.

    Can you please start a thread at the compass developer forum which explain the demo that you think will best serve? We can discuss the options there. Thanks for the input!
  17. I've been implementing full-text search using Compass and I must say it's a great library! I didn't really want to have to learn the Lucene internals for batching up index writes, etc. and managing all that stuff... Plus mapping my domain objects to indexes, and managing automatic updates to the indexes when the ORM saves objects, managing transactions, etc.

    Fortunately, Compass handles all of this for me! I've been able to get the basic stuff working and now I'm working out some relatively complex security and filtering issues, and Shay's been great help on the forums.

    I highly recommend Compass for projects which need to add full-text searching on top of an ORM infrastructure (and probably for other cases too, but that's what I'm doing).
  18. How feasible will it be to store a very large index as a blob in database. I need to create lucene index of a DB Table which has 30 million records and is growing @ of 200K records every day. Will it be feasible to store such a big index as a blob in DB. And how would it impat the Lucene search performance.