An indexing search engine with Nutch and Solr
CMS, wikis, text files … modern companies store important data in many different places, and that data must be accessible down to the tiniest detail through a single search. Commercial software vendors such as Google [1] offer tools that will index the data and store the index on an external server. But many organizations prefer to keep control of the search capabilities – for security and privacy reasons, but also to add flexibility and promote innovation and customization.
A handy constellation of open source tools from the Apache project will help you build your own search index for the assorted documents and data on your network: Nutch, Solr, Apache, and Lucene.
Nutch [2] is a powerful web crawler, and Apache Solr [3] is a search engine based on Apache Lucene [4]. You can combine Nutch with Solr to create a complete search engine – a miniature Google, if you like.
[...]
Buy this article as PDF
(incl. VAT)