Did you mean... for Lucene

Discussions

News: Did you mean... for Lucene

  1. Did you mean... for Lucene (6 messages)

    In "Did You Mean: Lucene?" on Java.net, author Tom White discusses how Lucene can provide suggested spellings for search terms, using the n-gram method for determining possible alternate spellings.

    Threaded Messages (6)

  2. in-memory indexing?[ Go to top ]

    are there any libraries for indexing beans in memory? I've found one called BeanIndex (http://beanindex.sourceforge.net/). Are there other?
  3. Re: in-memory indexing?[ Go to top ]

    are there any libraries for indexing beans in memory? I've found one called BeanIndex (http://beanindex.sourceforge.net/). Are there other?

    Check the TSSthread
    about Compass, a Java Search Engine Framework.
  4. cool[ Go to top ]

    just what the doctor ordered! thank you.
  5. Fantastic article[ Go to top ]

    What a cool idea for an article and very clear. Nice job, Tom.
  6. good article, but...[ Go to top ]

    Good article, nice solution. but a naive question maybe: what about fuzzy search in Lucene? isnt it suppose to find similar items for the search words?
  7. Re: good article, but...[ Go to top ]

    Good article, nice solution. but a naive question maybe: what about fuzzy search in Lucene? isnt it suppose to find similar items for the search words?

    Good question. I think it comes down to performance. A fuzzy search needs to calculate the (Levenshtein) edit distance between the search term and all terms in the index, which is an expensive operation. The n-gram approach uses a regular (exact match) index lookup, then ranks the hits using edit distance, so far fewer edit distance calculations are performed. (Admittedly, this is conjecture as I haven't measured the performance differences.)

    In addition, the n-gram approach allows a little more flexibility in that the start (and end) of words can be weighted. By default the start n-gram is boosted by a factor of two, so words which start with the same few characters are counted as closer matches.