Apache Solr 1.3.0 Released

Home

News: Apache Solr 1.3.0 Released

  1. Apache Solr 1.3.0 Released (16 messages)

    The Apache Solr team is happy to announce the availability of Solr 1.3.0 for public download. This version contains many enhancements and bug fixes, including: - Distributed search capabilities - Numerous Lucene and other performance improvements - Support for multiple indexes in a single deployment - SolrJ client and a binary response protocol for faster client-server communication - Search Components that can be chained together to offer flexible query processing. Components include existing functionality like faceting and add More Like This, Editorial Boosting (Query Elevation) and Spell Checking - New DataImportHandler for easily indexing database content into Solr See the http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.3.0/CHANGES.txt for more details. The download is available from http://www.apache.org/dyn/closer.cgi/lucene/solr/. See the Solr Wiki for documentation: http://wiki.apache.org/solr/ About Apache Solr: Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat. For more information, refer to the Solr website at http://lucene.apache.org/solr/.

    Threaded Messages (16)

  2. Thank God you told me what I needed to know - like "What is Apache Solr?" - before telling me stuff I needed to know less, like what this release changed. Oh, wait...
  3. Hey...it's better than when we get a post that doesn't even mention what the product is, only talks about what has changed.
  4. Shhhhhhh... it's a secret. There have been big changes, though, oh yes, BIG changes.
  5. Check out DBSight 1.6.0[ Go to top ]

    Solr comes a long way. Congratulations! DBSight actually started on 2004, long before Solr. Actually some features are in DBSight first, and copied into Solr. And some features are not copied yet. If you have any problem with Solr, try DBSight. It is free to use, and is really Instant Scalable Full-Text Search On Any Database/Application. You can get started pretty quickly. site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! So as long as you know how to pull data into the database, DBSight can earn some money for you.
  6. Re: Check out DBSight 1.6.0[ Go to top ]

    Actually, I started Solr in July 2004 (it wasn't open source at the time though... this was within CNET). And I assume you're using the term "copied into" loosely, as I was not even aware of DBSight for some time, and I can assure you that I've not personally copied any features from it.
  7. Re: Check out DBSight 1.6.0[ Go to top ]

    Surely you're kidding!!! DBSight was founded by 3 Ex-Oracle employees. You're main product is targeting database indexing and is heavily based on Lucene. Now... you have Solr, which is a generic search engine (not only targeting DB) that was developed by the Lucene committers themselves. In fact, quite a few Solr features and Lucene enhancements done in Solr found their way into the Lucene code base (so in a sense DBSight benefits from Solr as well). Yes... Solr came a long way, and it's still going. Extremely active community and development. Of course I would like to see better quality and structured code base, but if you ask me to choose a search solution, I would definitely go for Solr as it's developed by the same IR experts who brought us Lucene, Nutch, Hadoop, Tika, and Mahout. And no... as someone who's been monitoring the code base of Solr for years now, as well as the user/dev forums, I can assure you that none of Solr's features/code was "copied" from your product. If anything, most of the new concepts and ideas are coming based on the enterprise search market as a whole and the features offered by the big commercial players in it (which I'm sorry to say, but you're not one of them). About making money. You can check out the following link to see a (partial) list of companies making a lot of money with Solr-based products: http://wiki.apache.org/solr/PublicServers Sorry for the somewhat "harsh" response... I just don't like to see people/companies take credits for other people's hard work. Congratulations to all people involved in Solr for finally making this 1.3 release!!! I do hope though, that from now on, there will be more steady and shorter release cycles.
  8. Re: Check out DBSight 1.6.0[ Go to top ]

    Well, you are right and I was too quick. DBSight works on database only. But it's funny to see after so many years, the "new" feature, the data importer, is learned from DBSight. You can check the jira entry.
  9. Re: Check out DBSight 1.6.0[ Go to top ]

    We started writing DataImportHandler at AOL to simplify an extremely common use-case. A majority of users store content in databases which need to be transferred to Solr for scalable full text search. We thought it would be good to contribute such a feature back into Solr. We were unaware of your product until I subscribed to the lucene java-user mailing list and saw one of your emails with the promotional footer text. This was well after we had suggested this feature to solr mailing list and opened the DataImportHandler issue in Solr's jira. We had already developed a large part of the functionality before proposing this to the solr mailing list. I'll leave it up to you to search the java-user archives and figure out the dates. Let us not indulge in accusing one another and focus on adding value to our users. Let the users decide for themselves the merits of each solution.
  10. Re: Check out DBSight 1.6.0[ Go to top ]

    Thanks for clearing my own confusion and mis-understanding. We do not really follow SOLR development process, but only saw some visiting references from links like this: http://marc.info/?l=solr-dev&m=117789117914453&w=2 I totally understand the same approach can happen independently. And I know in order to survive, software companies always need to innovate, to bring easy-to-use software to the users.
  11. Re: Apache Solr 1.3.0 Released[ Go to top ]

    Sweet. Are you guys using a trunk version of Lucene 2.4? Do you know when 2.4 will come out officially?
  12. Re: Apache Solr 1.3.0 Released[ Go to top ]

    Yes, we occasionally make Solr releases with trunk versions of Lucene that we feel comfortable enough with. Barring any serious bugs, I'd estimate that we'll have Lucene 2.4 out by very early October.
  13. Re: Apache Solr 1.3.0 Released[ Go to top ]

    Just 15 months back when Solr 1.2 release was announced on TheServerSide I was asking for references of any one using Solr in their production environments here: http://www.theserverside.com/news/thread.tss?thread_id=45719#234206 Today I am more than happy that we chose Solr over any commercial product available in the market.
  14. Re: Apache Solr 1.3.0 Released[ Go to top ]

    A very happy user of Solr and Hadoop. The ability to deal with even structured data is amazing. Thanks Sunil http://sunilabinash.vox.com
  15. Re: Apache Solr 1.3.0 Released[ Go to top ]

    About Apache Solr:
    Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat. For more information, refer to the Solr website at http://lucene.apache.org/solr/.
    Is there something somewhere that explains what it does/scenarios in which to use it? (besides the blurb and features). It sounds very interesting and I have used Lucene (java and .net) and a few projects.
  16. Solr usages Scenarios[ Go to top ]

    Granted, I'm a bit biased, but I think you can use Solr pretty much anywhere you would Lucene, or for that matter, any other search vendor. I think it particularly shines in the application that has text, plus metadata (price, author, manufacturer, etc.) and you want to offer search and faceting (i.e. like what you see in the left hand side of Amazon.com when you do a search). I've also used it for general search, since it has all of Lucene's goodness in it. If you're used to Lucene, it's easy to work with Solr too. If you're not used to Lucene but want to take advantage of it, Solr is a much easier starting point, since you don't have to build up all the infrastructure to take the Lucene library and make it a search server. One good starting point to answer your question on scenarios is the "Powered By" page on the wiki: http://wiki.apache.org/solr/PublicServers/ From there, you can see how a number of different people use it. Personally, I found that every time I did a Lucene project, I ended up writing something that more or less looked like Solr in terms of how it manages the Lucene indexes, etc. (this was before Solr was open sourced). Now, I just use Solr.
  17. Congratulations, and thank you for sharing this very interesting Lucene implementation! Don't forget: it started as a shopping engine for CNET. I didn't try DBSight but I noticed some noizy posts in Lucene-related message boards. I heard about DBSight from a colleague who suggested it "to have full text search for a database", who believed it is quick and easy solution. I tried to evaluate DBSight and first of all browsed available configuration settings directrly in WEB-INF folder and subfolders. Looks weak... I tried Compass before SOLR. For a "search add-on" for existing database SOLR offers most of possible freedom. You don't even need to have a database for it: indeed, Lucene internals implement "data normalization" automatically for you. Behind the scenes, Apache Hadoop/HBase uses several layers of data compression of different kinds (different algo) which is also "data normalization" but not the way as DBA understands it... Never ever try to automate full-text searches with databases!!! For instance, Compass (Hibernate + Lucene) promises "transactional support" but... in some cases "commit" may take few minutes in Lucene (merging few files), what about "optimize"? Recently I got a call from well-known technology company, they have a client who needs SOLR to implement database full-text search for about 8-10 billions documents, and SOLR was choosen as a "simplest" solution. Are you kidding? Even pure Lucene can't handle that in a single index, even SOLR Shards will need 64 additional GET request parameters for such a distributed search!!! Lucene uses FieldCache internally for performance optimizations, the primary cause of hundreds-thousands posts related to OutOfMemoryException in SOLR-user and Lucene-user mailing lists (including posts from DBSight technical staff). What is it: it is an array storing "Field" for each non-tokenized non-boolean field for all documents stored in an index. For 10 billions of documents with simplest field such as Social Insurance Number or ISBN, single Lucene index will need an array of average size 1 Terabytes. SOLR can't handle such distribution (only if you have hardware with few terabytes of RAM). A lot of work is going on in Lucene: for instance, to remove synchronization on isDeleted() method which is called for each query. Would be nice to have non-synchronized versions for read-only indexes. SOLR is not as huge as Lucene or LingPipe or GATE projects, but it is extremely effective tool. It is very easy to configure XML schema instead of working directly with Lucene API. Main selling point of SOLR (since CNET-based project started and contributed to Apache) is so-called "faceted search" which is simply calculating of intersection sizes of different DocSets (just look at search results of modern price comparison sites - they show subset counts for different categories). However, that was too... architectural mistake. Look at https://issues.apache.org/jira/browse/SOLR-711 - counting frequencies of Terms for a given DocSet is faster than counting intersections. Lucene + Database: transactional???... I started with Compass, then moved to Nutch, then - SOLR!!! Now I am using HBase just because power of MySQL + InnoDB is not enough for a highly concurrent application. No need to index database: instead, I am indexing data :) Thanks, http://www.tokenizer.org/bot.html (Robot-based Shopping Engine)