-
Did you mean... for Lucene (6 messages)
- Posted by: Joseph Ottinger
- Posted on: August 09 2005 09:04 EDT
In "Did You Mean: Lucene?" on Java.net, author Tom White discusses how Lucene can provide suggested spellings for search terms, using the n-gram method for determining possible alternate spellings.Threaded Messages (6)
- in-memory indexing? by Dmitry Beransky on August 09 2005 13:59 EDT
- Re: in-memory indexing? by Kai Virkki on August 09 2005 15:42 EDT
- cool by Dmitry Beransky on August 09 2005 08:41 EDT
- Re: in-memory indexing? by Kai Virkki on August 09 2005 15:42 EDT
- Fantastic article by Mike Perham on August 09 2005 18:21 EDT
- good article, but... by ahmet a on August 10 2005 03:52 EDT
- Re: good article, but... by Tom White on August 11 2005 16:45 EDT
-
in-memory indexing?[ Go to top ]
- Posted by: Dmitry Beransky
- Posted on: August 09 2005 13:59 EDT
- in response to Joseph Ottinger
are there any libraries for indexing beans in memory? I've found one called BeanIndex (http://beanindex.sourceforge.net/). Are there other? -
Re: in-memory indexing?[ Go to top ]
- Posted by: Kai Virkki
- Posted on: August 09 2005 15:42 EDT
- in response to Dmitry Beransky
are there any libraries for indexing beans in memory? I've found one called BeanIndex (http://beanindex.sourceforge.net/). Are there other?
Check the TSSthread
about Compass, a Java Search Engine Framework. -
cool[ Go to top ]
- Posted by: Dmitry Beransky
- Posted on: August 09 2005 20:41 EDT
- in response to Kai Virkki
just what the doctor ordered! thank you. -
Fantastic article[ Go to top ]
- Posted by: Mike Perham
- Posted on: August 09 2005 18:21 EDT
- in response to Joseph Ottinger
What a cool idea for an article and very clear. Nice job, Tom. -
good article, but...[ Go to top ]
- Posted by: ahmet a
- Posted on: August 10 2005 03:52 EDT
- in response to Joseph Ottinger
Good article, nice solution. but a naive question maybe: what about fuzzy search in Lucene? isnt it suppose to find similar items for the search words? -
Re: good article, but...[ Go to top ]
- Posted by: Tom White
- Posted on: August 11 2005 16:45 EDT
- in response to ahmet a
Good article, nice solution. but a naive question maybe: what about fuzzy search in Lucene? isnt it suppose to find similar items for the search words?
Good question. I think it comes down to performance. A fuzzy search needs to calculate the (Levenshtein) edit distance between the search term and all terms in the index, which is an expensive operation. The n-gram approach uses a regular (exact match) index lookup, then ranks the hits using edit distance, so far fewer edit distance calculations are performed. (Admittedly, this is conjecture as I haven't measured the performance differences.)
In addition, the n-gram approach allows a little more flexibility in that the start (and end) of words can be weighted. By default the start n-gram is boosted by a factor of two, so words which start with the same few characters are counted as closer matches.