Home > Lucene Solr

Lucene Solutions

Lucene - Solr

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™.

Apache Lucene is a free open source information retrieval software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software Licence.

Lucene has been ported to other programming languages including Delphi, Perl, C#, C++, Python, Ruby, and PHP.

While suitable for any application that requires full text indexing and search capability, Lucene’s value has been widely recognized in the implementation of Internet search engines and local, single-site searching.

At the core of Lucene’s logical architecture is the idea of a document containing extractable fields of text. This allows the indexing of PDF, HTML, Microsoft Word, and Open Document formats, as well as many others but excluding images.

Use Cases for Lucene

A user-adaptive, interactive and web-based learning environment.
Match each page of web content to best fit ads.
Index and selection is carried out with Lucene queries.
A search application with a ready-to-use front-end, graphical web administration.
Multi language document management and translations.
Small business document management for a paperless office.
Web-based help desk software.
Job search engine.
Twitter analysing tool.
Search engine for stock images.
Key benefits of Lucene
  • Advanced Search Functionality.
  • Full-text and metadata search.
  • High volume, high performance search cluster.
  • Product search and categorisation.
  • Drive customised and dynamic content.
Features of Lucene
Scalable, High-Performance Indexing
  • Over 150GB/hour on modern hardware.
  • Small RAM requirements — only 1MB heap.
  • Incremental indexing as fast as batch indexing.
  • Index size roughly 20-30% the size of text indexed.
Powerful, Accurate and Efficient Search Algorithms
  • Ranked searching — best results returned first.
  • Many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more.
  • Fielded searching (e.g. title, author, contents).
  • Sorting by any field.
  • Multiple-index searching with merged results.
  • Allows simultaneous update and searching.
  • Flexible faceting, highlighting, joins and result grouping.
  • Fast, memory-efficient and typo-tolerant suggesters.
  • Pluggable ranking models, including the Vector Space Model and Okapi BM25.
  • Configurable storage engine (codecs).
Cross-Platform Solution
  • Available as open-source software under the Apache Licence which lets you use Lucene in both commercial and open-source programs.
  • 100%-pure Java.

Implementations in other programming languages are available, which are index-compatible

  • Open-source software under Apache Licence.
  • Lucene has been ported to other programming languages including Delphi, Perl, C#, C++, Python, Ruby, and PHP.
Lucene solutions options

Lucene itself is just an indexing and search library and does not contain crawling and HTML parsing functionality. However, several projects extend Lucene’s capability:

  • Apache Nutch — provides web crawling and HTML parsing.
  • Apache Solr — an enterprise search server.
  • Elasticsearch — an enterprise search server.
  • Compass — a Java Search Engine Framework.
  • DocFetcher — a multiplatform desktop search application.
  • Lucene.NET — a port of Lucene written in C# and targeted at .NET Framework users. There are currently two variations of the software, differing in Generics support and a few bug fixes.
  • Swiftype – an enterprise search startup based on Lucene.
  • Ferret — a search library for Ruby (programming language) inspired by Lucene. There is also a Ruby on Rails plugin called acts_as_ferret. Ferret utilizes Poshlib.
  • Kinosearch — a search engine written in Perl and C and a loose port of Lucene. The Socialtext wiki software uses this search engine, and so does the MojoMojo wiki. It is also used by the Human Metabolome Database (HMDB)[14] and the Toxin and Toxin-Target Database (T3DB).
  • Apache Lucy is a successor project of both KinoSearch and Ferret, being jointly developed by the authors of these and having bindings in both Perl and Ruby.
  • Luke — A Java-based GUI for Lucene which allows you to display and modify indexes.
Our Lucene consulting services
  • Software Lifecycle Management / Software Development Life Cycle (SDLC).
  • AWS Cloud Hosting.

Solr solutions

Solr (pronounced ‘solar’) is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its most important features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g. Word, PDF) handling. Providing distributed search and index replication, Highly scalable and fault tolerant, Solr is the most popular enterprise search engine.

Solr is written in Java and runs as a standalone full-text-search server. It uses the Lucene Java search library for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Its powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customisation.

  • Solr supports complex mathematical analysis on result sets, enabling next-generation search-based business intelligence.
  • Function query results can be returned with and used to sort or boost hits.
  • The stats component can compute min, max, sum, mean, distinct values and more.
  • Grouping, nesting and other approaches enable applications to represent complex relationships between content types.
  • Coming in Solr 5.0: statistics over pivot facets.

Use Cases for Solr

Instagram uses Solr to power its geosearch API.
WhiteHouse.gov: The Obama administration’s website is inbuilt in Drupal and Solr.
Netflix: Solr powers basic movie searching on this extremely busy site.
Internet archive: search this vast repository of music, documents, and video using Solr.
StubHub.com: This ticket reseller uses Solr to help visitors search for concerts and sporting events.
The Smithsonian Institution: search the Smithsonian’s collection of over 4 million items.
Key benefits of Solr
  • Power Geosearch API.
  • Search vast repositories easily – uses Lucene for full text search.
  • Site search, auto-suggest, and faceted navigation.
  • Store and retrieve billions of data points.
Features of Solr
  • Powerful extensions – Solr ships with optional plugins for indexing rich content (e.g. PDFs, Word), language detection, search results clustering and more.
  • Faceted search and filtering – Slice and dice your data as you see fit using a large array of faceting algorithms.
  • Geospatial search – Enabling location based search is simple with Solr’s built-in support for spatial search.
  • Advanced configurable text analysis – Solr ships with support for most widely spoken languages and many other analysis tools designed to make indexing and querying your content as flexible as possible.
  • Highly configurable and user extensible caching – Fine-grained controls on Solr’s built-in caches make it easy to optimise performance.
  • Performance optimisations – Solr has been tuned to handle even the largest sites.
  • External configuration via XML – Solr’s configuration files make it easy to adjust and extend your setup without burying your configuration in code.
  • Advanced storage options – Building on Lucene’s advanced storage capabilities (codecs, directories and more), Solr makes it easy to tune your data storage needs to fit your application.
  • Monitorable logging – Easily access Solr’s log files from the admin interface.
  • Query suggestions, spelling and more – Solr ships with advanced capabilites for auto-complete (typeahead search), spell checking and more.
  • Your Data, Your Way – JSON, CSV, XML and more are supported out of the box. Don’t waste precious time converting all your data to a common representation, just send it to Solr.
  • Rich document parsing – Solr ships with Apache Tika built-in, making it easy to index rich content such as Adobe PDF, Microsoft Word and more.
  • Apache UIMA – Ready to enhance your content with advanced annotation engines? Solr integrates into Apache UIMA, making it easy to leverage NLP and other tools as part of your application.
  • Multiple search indexes – Solr supports multi-tenant architectures, making it easy to isolate users and content.
  • Standards based open interfaces – XML, JSON and HTTP.
  • Advanced Full-Text Search Capabilities.
  • Optimised for High Volume Traffic.
  • Comprehensive Administration Interfaces.
  • Easy Monitoring.
  • Highly Scalable and Fault Tolerant.
  • Flexible and Adaptable with easy configuration.
  • Near Real-Time Indexing.
  • Extensible Plugin Architecture.

 

Java

Solr solutions options

Open source enterprise search platform

Our Solr consulting services

  • Software Lifecycle Management / Software Development Life Cycle (SDLC).
  • AWS Cloud Hosting.
mautic is open source marketing automation