Discussions

News: Using Lucene with OJB

  1. Using Lucene with OJB (19 messages)

    Brian McCallister looks at the Lucene search engine and shows us how to index and retrieve objects from a sample Student application. The Student objects are persisted using ObjectRelationalBridge (OJB); Brian shows us how to query against the Student index, pull out all the pk's for the hits, then select for the domain objects using those pks.

    Read Using Lucene with OJB

    Threaded Messages (19)

  2. Using Lucene with OJB[ Go to top ]

    I'm out of words.
  3. Using Lucene with OJB[ Go to top ]

    How OJB is related to this article content ?
  4. Using Lucene with OJB[ Go to top ]

    OJB seems to be related since they are using it to store the user data in the database. It isn't a focus on OJB, but rather just uses it as part of the process. You could easily replace OJB with Hibernate or plain SQL in this article, and have the same capability.

    The article does a pretty good job of showing how to use the Lucene classes for indexing and searching, and making use of its ability to index arbitrary fields. You could use that example for a forum, and add subjects and message bodies to the index, but skip the message Id's, and have a working forum search.
  5. Using Lucene with OJB[ Go to top ]

    OJB seems to be related since they are using it to store the user data in the database.
    Probably they use OJB in some layer, but there is nothing about OJB in this article.
  6. Using Lucene with OJB[ Go to top ]

    Nice example but what if your database holds large amounts of Student data?

    If you have a 2G database and let assume only half is of index able data. That means your lucene index data will be approximately 1G in size I'd imagine (maybe more).

    That’s a heavy storage price to pay for just a nice way of searching your database.

    Is there not an api or such like that enables you to write query strings like those used in google and lucene on a database? After all a database is made to be queried!!

    If not, anyone fancy starting a project?

    Tim..
  7. Using Lucene with OJB[ Go to top ]

    Nice example but what if your database holds large amounts of Student data? ... Is there not an api or such like that enables you to write query strings like those used in google and lucene on a database? After all a database is made to be queried!!If not, anyone fancy starting a project?Tim..

    Tim,

    I have been thinking of doing something exactly like that for a little while... Just have been too busy to actually do it. What I envision is a fairly simplistic standardized database schema that only uses the capabilities that are common to all SQL databases.

    You would use lucene for it's word stemmers, which would be used when indexing documents and the same stemmers would be used on your search queries. You would need to build a new implementation of Lucene's IndexWriter that stores fields directly into a database. Ideally Hibernate would be used here to provide an easy abstraction to various SQL databases.

    Then build a query parser that converts the lucene search language to the SQL/HQL necessary to query the database. Again, using Hibernate provides a nice abstraction from the DB. By using Lucene and Hibernate you could have a really big & quick win.

    Instead of becoming yet another separate open source project this code could get contributed to either Lucene or Hibernate. Since it is a searching system one would think to add it to Lucene. However, it really is a full text search implementation that is database agnostic (via Hibernate) and it might be best to include it in Hibernate itself.

    Thoughts?

    If anyone actually wants to do this I would be willing to be involved from an idea/architecture/design/review standpoint, but couldn't be involved in code development for a few months at least. Just not enough time in the day.

    Rob
  8. Using Lucene with OJB[ Go to top ]

    Ideally Hibernate would be used here to provide an easy abstraction to various SQL databases.Then build a query parser that converts the lucene search language to the SQL/HQL necessary to query the database.

    Lucene is not a query parser and index persister, it is indexing and search engine, databases have native text seach support too, but custom parser will hide it with SQL/HQL. It will be abstraction (query string transformation) but it will be without text indexing and search.
  9. Lucene query mapped over database[ Go to top ]

    I wasn't thinking of developing a new search engine based on a database. The idea I had in mind was taking a query with similar syntax to Lucene, and translating it in to the WHERE clause of an sql query.

    So "dolly" would translate into WHERE text ILIKE ‘%dolly%’;.

    And fields could translate on to columns e.g. title:"The Right Way" AND text:go would translate into WHERE (title ILIKE ‘%The Right Way%’) AND (text ILIKE ‘%go%’);

    I've had a quick look and most of the functionality of a Lucene query string can be mapped into an sql query. Some of them rather more complicated than others.

    After doing my quick research I kept thinking about the load on a database if you perform a complicated query and how it compares to using Lucene with as in the example described in the blog. Using Lucene you would hit the database for each result you get back that you wish to display. Is that better than doing one elaborate query that returns you all the information in one hit?

    Thoughts?

    I'm thinking of taking the Query Parser from Lucene and using the Criteria object from Hibernate to test out my ideas.. I'll report back my findings later.

    Tim..
  10. Lucene query mapped over database[ Go to top ]

    It is not a good idea to use combination of SQL LIKE operator or regular expressions and logical operators for text search. Text search needs special kind of index to perform, it needs many features not suported by standard operators too.
  11. Lucene query mapped over database[ Go to top ]

    It is not a good idea to use combination of SQL LIKE operator or regular expressions and logical operators for text search. Text search needs special kind of index to perform, it needs many features not suported by standard operators too.

    Point taken but for a lot of applications the use of the SQL LIKE operator acts as the poor mans search function. It's quick & easy to knock up and it works relatively well. Why not provide a way to make its use a little more natural, in the quick & easy situations only of course.

    I'm not knocking the use of Lucene. I've used Lucene myself a couple of times. It's just sometimes it might be a bit of an over kill.

    Tim..
  12. Lucene query mapped over database[ Go to top ]

    It is very trivial to transform query string to SQL filter
    , but this is very slow and poor way to search text, it can be used if your database is very small (table scan is not a problem) and you know how to sort search results (using something like "lastModified" )
    It must be better to use native RDBMS index engine for free text search, if external engine integration is too complex (replication), query transformations must not be very hard if you want to have the same syntax for many databases too.
  13. comparing lucene to Oracle Text[ Go to top ]

    How does Lucene's text search capabilities and performance compare to that of 'Oracle Text'(Oracle Text has text search capabilities) Can someone elaborate on this.
  14. comparing lucene to Oracle Text[ Go to top ]

    http://www.searchtools.com/index.html
    It must be possible to find information and motivation on this site.
  15. Using Lucene with OJB[ Go to top ]

    Typically your search index database does not scale in such a linear manner.. Given the duplication in terms between documents, adding new documents to the index is a much more minor hit as far as space is concerned.

    Also.. is space really an issue? If you have 2GB.. or 10GB of data.. if you are spending even up to 50% more on a database index, is that such a big deal? If you use a pre-existing database indexing tool, such as found in Oracle, you are taking up space for that. (Not related to the other DB stemming indexing idea otherwise mentioned in the thread..) I have used Oracle's Intermedia in the past and found it did a pretty good job though..

    From a performance perspective, I would imagine that the custom database storage files will out perform the DB-bound wildcard query approach.. provide better scoring possibly.. and be easy to maintain.

    Personally, I'd be happy to keep users off of my database engine and keeping it free for actual use. Heck, memory is cheap, so you could copy indexes into memory and hit them for a wonderful response time, and just sync on some schedule with a disk copy. Why bother the DB unless you need to?
  16. Using Lucene with OJB[ Go to top ]

    A good way is store all entity fields in Document object and load it without hitting DB, if stale data can be tolerated. It does not save disk space, but it performs very good.
  17. Transactions and Relationships[ Go to top ]

    I have used Lucene with Hibernate in a very similar way. It is very fast however there are several issues with this approach.

    - How does one keep a transacted database in sync with a non-transacted Lucene index. Lucene does not natively support transactions (in my case I deferred index writes until a callback was made by the TransactionManager).

    - When an index contains attributes from more than one database entity, for example a employee/employer relationship; how does one easily update the index for all entities where the parent (employer) changes. A Lucene index does not support updates, only adds and deletes.
  18. Final?[ Go to top ]

    I'm not much of a java head but I don't understand why every member/parameter is marked as final in the demo code. Is this some best practice i've just never seen before?

    j.
  19. Final?[ Go to top ]

    Have a read of this example chapter of the Hardcore Java book. It will explain all.

    http://www.oreilly.com/catalog/hardcorejv/chapter/ch02.pdf

    But in a nutshell...

    'final causes logic errors to be turned into compiler errors'


    Tim..
  20. DBSight: Instant Scalable Database Search[ Go to top ]

    Please take a look at DBSight, http://www.dbsight.net . It is a generalized way to search databae. It can help to create a scalable database search in minutes. So it's really trivial to create a lucene database now. http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes