Hibernate Search 3.0 available: provides full-text search

Discussions

News: Hibernate Search 3.0 available: provides full-text search

  1. Hibernate Search 3.0, which brings full text search capabilities to Hibernate-based applications, has been released. With Hibernate Search, developers can easily take advantage of advanced Google-like search features, unattainable in relational databases, without the need for extra infrastructure coding. Hibernate Search integrates transparently with Hibernate, the object/relational (O/R) mapping and persistence engine, with little to no configuration (past specifying what entities to index). With advanced features such as query filter and index sharding, Hibernate Search can be embedded into user applications. Key features in Hibernate Search 3.0 include:
    • Transparent index synchronization: This feature eliminates the need to manually update the index on data change. Events generated by Hibernate Core will trigger the update transparently for the application.
    • Seamless integration with the Hibernate and Java Persistence query model: Hibernate Search embraces both the Hibernate and Java Persistence semantic and APIs. As a result, switching from a Hibernate Query Language (HQL) query to a full text query requires minimal changes to the application.
    • Out-of-the-box asynchronous clustering mode: Handles clustered applications, this out of the box mode also handles gracefully indexing load peaks, avoiding any contention on the online system.
    • Product extensibility: Developers can extend Hibernate Search with a series of extension points for deep index interaction customization that helps edge case applications meet their performance and architectural requirements and constraints.

    Threaded Messages (34)

  2. Is there a means to retroactively index already existing data or any data, that is not inserted or updated via hibernate in an efficient manner?
  3. Is there a means to retroactively index already existing data or any data, that is not inserted or updated via hibernate in an efficient manner?
    Very good question indeed. Any answer on the subject would be most welcome. Kind regards, Cédric
  4. Is there a means to retroactively index already existing data or any data, that is not inserted or updated via hibernate in an efficient manner?
    Absolutely. In some cases, the event system does not help and there is a need to trigger a manual indexing: - for the initial index initialization - if some sort of corruption occurs - if some third-party app, update the database behind Hibernate's back Hibernate Search provides an index(object) operation as well as a purge one. You can see more informations in the specific documentation section - especially on how to make this operation efficient :)
  5. This is really cool. I am already thinking about new features to our application that will be made possible by Hibernate Search. Sure, we have to finish our migration to Hibernate first, but this only confirms that the long term investment on Hibernate is well placed. Congrats to the Hibernate team for this release! Paul Casal Sr Developer jbilling.com The Enterprise Open Source Billing System
  6. Out-of-the-box asynchronous clustering mode: Handles clustered applications, this out of the box mode also handles gracefully indexing load peaks, avoiding any contention on the online system.
    Does this really mean that the update of the index can be configured to run asynchronously ? Using JMS for example ? And all nodes in the cluster have their index updated ? If confirmed, that's a kick ass feature !! Kind regards, Cédric
  7. Does this really mean that the update of the index can be configured to run asynchronously ? Using JMS for example ? And all nodes in the cluster have their index updated ?

    If confirmed, that's a kick ass feature !!

    Kind regards,

    Cédric
    Yes, both asynchronously and remotely (to avoid affecting the front end box). Actually JMS is the default communication protocol as I wanted to stay as close as possible to the standards. It is also possible to wire different communication protocols (writing a custom backend impl) if people feel the need for it. You can get some detailed info on this particular architecture < href="here" rel="nofollow">http://www.hibernate.org/hib_docs/search/reference/en/html/search-architecture.html#d0e422">here
  8. Does this really mean that the update of the index can be configured to run asynchronously ? Using JMS for example ? And all nodes in the cluster have their index updated ?

    If confirmed, that's a kick ass feature !!

    Kind regards,

    Cédric


    Yes, both asynchronously and remotely (to avoid affecting the front end box). Actually JMS is the default communication protocol as I wanted to stay as close as possible to the standards. It is also possible to wire different communication protocols (writing a custom backend impl) if people feel the need for it.
    You can get some detailed info on this particular architecture < href="here" rel="nofollow">http://www.hibernate.org/hib_docs/search/reference/en/html/search-architecture.html#d0e422">here
    I've had a look at the possible deployment scenarios, it's absolutely awesome. Congratulations for this release !
  9. FYI[ Go to top ]

    Emmanuel's announcement is here: http://in.relation.to/Bloggers/FullTextSearchForHibernateGoesFinal
  10. Generic Amazon-like Searches?[ Go to top ]

    Amazon has the capability where you can type in anything into a text box and search all attributes of an entity for it. So for example, typing in "Riley" will return books by authors with first name Riley, last name Riley, or Riley Publishing. Does Hibernate Search provide that type of capability? How would you provide this capability with a Books table with author first name, author last name, and publisher fields (to name a few)? I am unfamiliar with Lucene, and I could use that Amazon-like functionality in the app I am developing. Thanks.
  11. Re: Generic Amazon-like Searches?[ Go to top ]

    Amazon has the capability where you can type in anything into a text box and search all attributes of an entity for it. So for example, typing in "Riley" will return books by authors with first name Riley, last name Riley, or Riley Publishing.

    Does Hibernate Search provide that type of capability? How would you provide this capability with a Books table with author first name, author last name, and publisher fields (to name a few)?

    I am unfamiliar with Lucene, and I could use that Amazon-like functionality in the app I am developing.

    Thanks.
    Yes it is possible. Some people uses a global field where they throw all the data into, but I don't like that much, it's fairly inflexible from a use case / property boosting point of view. You can do that nevertheless through what is called a @ClassBridge in Hibernate Search What I like to do instead is using the Lucene MultiFieldQueryParser and list the relevant fields and their respective weight. Something along those lines:
    Map boostPerField = new HashMap(); boostPerField.put("author.firstname", 1); boostPerField.put("author.lastname", 2); boostPerField.put("publisher", 2); String[] productFields = {"author.firstname", "author.lastname", "publisher"}; QueryParser parser = new MultiFieldQueryParser(productFields, new StandardAnalyzer(), boostPerField); ...
    the query will then return elements where either author firstname, lastname or publisher name contains Riley. It will also list john Riley above Riley Garfield because the lastname has more weight. If you want an example of that you can go and download JBoss Seam 2.0, the DVDStore example uses Hibernate Search and this kind of one-box search design.
  12. Oh by the way, JBoss Seam 2.0 has a nice natural integration with Hibernate Search :)
  13. Re: Generic Amazon-like Searches?[ Go to top ]

    Oh by the way, JBoss Seam 2.0 has a nice natural integration with Hibernate Search :)
    Imagine that. ;)
  14. Seam and Hibernate Search[ Go to top ]

    Oh by the way, JBoss Seam 2.0 has a nice natural integration with Hibernate Search :)
    ORLY ? (Omg that was so childish, I am so sorry for that, but I could not hold back) ;)
  15. How is this different from Compass?[ Go to top ]

    My understanding is that Compass offers some similar kind of functionality for JPA, and I was curious how this differs.
  16. The best answer is to give both a shot to feel the difference. But let's try to outline some differences: The out of the box asynchronous clustering mode is unique to Hibernate Search Hibernate Search queries return by default managed objects. These are the same objects attached to the session that an HQL query would return (following the lazy association mapping etc). The Persistence context (session) then guarantees object unicity: 2 objects having the same id are guaranteed to be == regardless where they come from (HQL query or Full Text query). It also means that you can change the object and get it updated to the database. This concept comes in handy when you try to enhance an Hibernate based application with minimal changes. It is also very interesting when dealing with conversations. The cost is that the data is potentially loaded from the database (if the first and second level cache fail), but I have yet to be convinced that it has any significant impact on real applications. If, for some reason, there is an impact, you can use the projection API to retrieve the property data from the index provided that the data is stored into the Lucene index. Of course, your index will be bigger and you will not benefit from what I have just described. Hibernate Search initialization is integrated to the SessionFactory or EntityManagerFactory initialization. There is no additional step and the configuration is minimal (one property in your hibernate.cfg.xml or persistence.xml file). The Query API is the org.hibernate.Query (or javax.persistence.Query) API, so once again, moving from a JPA-QL query to a Full Text query is limited to how you create the query object. My favorite feature: query filters Hibernate Search leverages the Lucene filter feature but add automatic caching, make them easy to enable / disable and inject parameters on a per query basis. The scope of this feature is roughtly the same as Hibernate core filters: security, category filtering, temporal data, versioned data, you name it HTH, but once again, try by yourself and see the programatic model that fit you better.
  17. Hibernate continues to impress[ Go to top ]

    I went to the talk that Emmanuel gave at the AJUG last week. He's done an impressive job of integrating Lucene into Hibernate. Very natural and intuitive. I love the way it uses annotations and fits right into the JPA style. I've always felt Hibernate is the best part of the jbbos product family.
  18. Nice to have but not exactly a technological marvel. Good job though, integrating Lucene with any Hibernate implementation is fairly repetitive so it's nice they've gone through the pain for us.
  19. The Flux Capacitor and the Flowbee were technological marvels. I think whether Hibernate Search is a technological marvel or "really just a Lucene wrapper" is irrelevant. What matters is the extent to which the time to solve real-world problems is reduced. I will need to study the documentation to learn the details of what Emmanuel suggested in his reply to my earlier question, but it looks like Hibernate Search could be a real time-saver. Marvel or not. Besides, I think it takes real creativity and efficiency to take an existing product, utilize all its best features, and add some novel capabilities to produce a cool new product rather than re-invent the wheel for no good reason other than academic curiosity. If I want Marvel, I will stick with Spider-Man or the Fantastic Four.
  20. Major DB vendors provide full-text search (FTS) support. One possible option for Hibernate would be to wrap native FTS and provide "generic" query language as it's done for SQL. In case of Lucene usage, you have to care about your DB and Lucene indexes independently, backup/restore and other data management processes become more complex, non-hibernate based applications need to trigger indexing etc.
  21. Major DB vendors provide full-text search (FTS) support. One possible option for Hibernate would be to wrap native FTS and provide "generic" query language as it's done for SQL. In case of Lucene usage, you have to care about your DB and Lucene indexes independently, backup/restore and other data management processes become more complex, non-hibernate based applications need to trigger indexing etc.
    I believe (based on what I have seen for SQL Server) that this is so database specific that it would be pretty difficult if not more so. And honestly, the Hibernate Search ( and Compass) solution is much better. It is much more than just a full text index. I am dealing with a 3rd party application that is currently stuck on 2000 because they subscribed to the "use the database features to the fullest" mindset.
  22. We implemented such feature ("generic FTS query language)for our application that works on MSSQL and Oracle (with some limitations). It looks like it it's doable. It's also solves issue with transactions mentioned by Greg. DB usually provides you with other useful features like text extractions from different file formats.
  23. We implemented such feature ("generic FTS query language)for our application that works on MSSQL and Oracle (with some limitations). It looks like it it's doable. It's also solves issue with transactions mentioned by Greg. DB usually provides you with other useful features like text extractions from different file formats.
    Ok, so it is "doable". How much effort is it for 2 databases? 3 databases? 4? What things can you not do? What things are still database specific outside your generic query? Can you use the index without the database? Can it easily be "scaled"? Again, I think Hibernate Search is a much more flexible solution than just full text search. So, if it could work with the db and still have the option of Lucene, then that would be great. But if not, no biggy. Yes, the db can extract text and other features. But you typically are sacrificing flexibility (on many levels) for convenience. And if you must grow/scale non-database logic then you must have more database engines running.
  24. Major DB vendors provide full-text search (FTS) support. One possible option for Hibernate would be to wrap native FTS and provide "generic" query language as it's done for SQL. In case of Lucene usage, you have to care about your DB and Lucene indexes independently, backup/restore and other data management processes become more complex, non-hibernate based applications need to trigger indexing etc.
    This is indeed one possibility, but from what I have seen, you lack a lot of flexibility with DB integrated FullText engines as opposed to Lucene. Especially on how you index your data and what you index in your data. Flexibility in indexing and searching is key to make a good search engine (eg. tune it for your website, or your business need).
  25. ------------------------------------------------------------------------------------------ Absolutely. In some cases, the event system does not help and there is a need to trigger a manual indexing: - for the initial index initialization - if some sort of corruption occurs - if some third-party app, update the database behind Hibernate's back Hibernate Search provides an index(object) operation as well as a purge one. You can see more informations in the specific documentation section - especially on how to make this operation efficient :) ------------------------------------------------------------------------------------------ Firstly thanks much to Emmanuel Bernard and the entire Hibernate team this framework. I have worked in projects where we sorely missed a framework like this. Now for a question. There may be reasons for changeing the index of an object:- 1) The object is updated in the same JVM. Answer: This framework will take care of changing the index. 2) The object is updated a different JVM. Answer: As i understand the JMS solution in this framework, works with different VMS's running Hibernate. 3) Index corruption 4) If some third-party app, update the database behind Hibernate's back Answer: As for the other issues 3 and 4 mentioned above, the solution is to do manual indexing as defined in http://www.hibernate.org/hib_docs/search/reference/en/html/search-batchindex.html. Is there a generic way to detect index corruption or data change outside of hibernate ? Regards Suchak Jani
  26. 3) Index corruption
    No there is no built-in solution to discover a corrupted index. If someone has an idea, speak up :)
    4) If some third-party app, update the database behind Hibernate's back
    Answer: As for the other issues 3 and 4 mentioned above, the solution is to do manual indexing as defined in http://www.hibernate.org/hib_docs/search/reference/en/html/search-batchindex.html.

    Is there a generic way to detect index corruption or data change outside of hibernate ?
    As for 4, not yet but this is something we want to tackle for the next round.
  27. 3) Index corruption

    No there is no built-in solution to discover a corrupted index. If someone has an idea, speak up :)

    4) If some third-party app, update the database behind Hibernate's back
    Answer: As for the other issues 3 and 4 mentioned above, the solution is to do manual indexing as defined in http://www.hibernate.org/hib_docs/search/reference/en/html/search-batchindex.html.

    Is there a generic way to detect index corruption or data change outside of hibernate ?


    As for 4, not yet but this is something we want to tackle for the next round.
    Emmanuel Thank you for taking time to answer. And once more, let me take this opportunity to thank you and the hibernate team for your efforts and hard work. Working with hibernate was always a pleasure, and now with frameworks like this, JPA integration, etc., time spent learning hibernate is surely well invested. As far as a corrupted index, all i can think of is a few pluggable and configurale rules per persistent object that could help in cases where the index is more fields than a non-composite primary key. For example, one way could be, before an update or a select on a certain persistent object, the code could check the index against the db. Or there could be a timed index check and refresh against the db, for certain objects. Regards Suchak Jani
  28. Transactions ?[ Go to top ]

    Does the update to the search index automatically integrate with the database transaction? e.g. on a rollback the changes to the index get rolled back as well as the DB changes. I could see anything on the docs, but that would be a really useful feature! Greg.
  29. Re: Transactions ?[ Go to top ]

    I meant: I could not see any information on transactions in the online documentation. Greg.
  30. Re: Transactions ?[ Go to top ]

    Does the update to the search index automatically integrate with the database transaction? e.g. on a rollback the changes to the index get rolled back as well as the DB changes.

    I could see anything on the docs, but that would be a really useful feature!

    Greg.
    Yes in the sense that the index will not be updated until the transaction is committed. So when you rollback, no change is propagated. This is the behavior you are looking for. No it the sense that Hibernate Search does not make Lucene an XAResource and does not participate in the 2PC protocol. My reasoning for not going that path is that an index is not a valuable data: - it can always be rebuilt from the original data (the database) - if for some reason the indexing fail, I still want to get my data committed in the DB Hibernate Search considers Lucene more as an index than a data storage (even if you can store your data in the Lucene indexes). I thought I described that in the architecture section of the doc, but I need to make it more explicit :)
  31. Re: Transactions ?[ Go to top ]

    Emmanuel, Thanks for the reply. Whilst I can see your point about the index not being valuable data, this does cause a problem if the index update fails. If the data is missing from the index, and can not be found by a search, it is essentially lost to the user, even though it exists in the database. And rebuilding the index takes time when dealing with a large volume of data, which affects system availability. The option to to include the index update in a 2PC transaction would be really useful, from a resilience perspective. Greg.
  32. Re: Transactions ?[ Go to top ]

    Emmanuel,

    Thanks for the reply.

    Whilst I can see your point about the index not being valuable data, this does cause a problem if the index update fails. If the data is missing from the index, and can not be found by a search, it is essentially lost to the user, even though it exists in the database. And rebuilding the index takes time when dealing with a large volume of data, which affects system availability.

    The option to to include the index update in a 2PC transaction would be really useful, from a resilience perspective.

    Greg.
    I am more inclined to use a compensation/recovery mechanism Note that the jms mode naturally does that if the message is not consumed properly, it goes back to the queue for later process (depends on your jms provider configuration) I also want to point out that sql queries should not be systematically replaced by full text queries. It make sense to switch when sql does not support the search feature you want or if it would do it very inefficiently
  33. Question regarding FieldBridge[ Go to top ]

    Hi Emmanuel, I am currently using Hibernate Search for a project here at work, and came across an issue yesterday that I could not figure out... at least not from the documentation. I have the following classes: @MappedSuperclass public abstract class BaseEntity ... { @Id protected Integer id; @DocumentId public Integer getId() { return id; } } @Entity @Indexed(index = "Product") public abstract class Product extends BaseEntity ... { @ManyToOne @JoinColumn(name = "STUDIO_ID", nullable = false) private Studio studio; @IndexedEmbedded public Studio getStudio() { return studio; } } @Entity @Indexed public class Studio extends BaseEntity ... { ... } From your hibernate_search.pdf documentation, I would expect the Product index to contain the field studio.id, since the Studio class is also a Hibernate entity. This compiles fine; However, when I run this under Tomcat, I get the following stack at startup: org.hibernate.search.SearchException: Unable to guess FieldBridge for studio at org.hibernate.search.bridge.BridgeFactory.guessType(BridgeFactory.java:180) at org.hibernate.search.engine.DocumentBuilder.bindFieldAnnotation(DocumentBuilder.java:321) at org.hibernate.search.engine.DocumentBuilder.initializeMember(DocumentBuilder.java:220) at org.hibernate.search.engine.DocumentBuilder.initializeMembers(DocumentBuilder.java:162) at org.hibernate.search.engine.DocumentBuilder.(DocumentBuilder.java:94) at org.hibernate.search.impl.SearchFactoryImpl.initDocumentBuilders(SearchFactoryImpl.java:262) at org.hibernate.search.impl.SearchFactoryImpl.(SearchFactoryImpl.java:94) at org.hibernate.search.impl.SearchFactoryImpl.getSearchFactory(SearchFactoryImpl.java:172) at org.hibernate.search.event.FullTextIndexEventListener.initialize(FullTextIndexEventListener.java:44) at org.hibernate.event.EventListeners.initializeListeners(EventListeners.java:356) at org.hibernate.cfg.Configuration.getInitializedEventListeners(Configuration.java:1304) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1294) at org.hibernate.cfg.AnnotationConfiguration.buildSessionFactory(AnnotationConfiguration.java:915) If I understood FieldBridge correctly from your examples (Place and Address classes), it should only be used when I have a property of a user-defined type that's not an entity, correct? Or, do I have to configure FieldBridge subclasses even when those property types are themselves entities? Thanks in advance! -Frank
  34. Re: Question regarding FieldBridge[ Go to top ]

    Hi Frank Judging from the exception and the line of code, I am pretty sure your class has a @Field annotation on studio (field or getter). And it should not. Let's continue this discusssion on the user forum.
  35. Hi, I would understand if I can mix query. for example: imagine a student table with id, first name, last name, year of birth.. ecc and another table homework with studentId, date, exercise text.. may I query the sistem for all the homework containing the word "napoleon" of all the students who's name is "mike" ?? how? shall I ask hib_search to index first name, last name, ecc too? and what about performance? thank you :D