Smart research using Elasticsearch
Fine-Tuning
The min_term_freq
parameter specifies a threshold for the selection of a word in the reference document with the more_like_this
function. If min_term_freq
is set to the default value 2
, a word must occur there at least twice to make its way into the list of words with which other documents are compared later. The second parameter max_query_terms
is the maximum number of words from the list in the original document that the algorithm selects to use later in the query.
For anyone wanting to find out about other methods for fine-tuning the search engine, I would recommend the O'Reilly book on the topic [1]. It explains how to deal with Elasticsearch using examples, provides tips for scaling in clusters, and takes a look behind the scenes, where the Apache Lucene search engine is at work.
Infos
- Gormley, Clinton and Zachary Tong, Elasticsearch: The Definitive Guide: O'Reilly, 2015.
- Elasticsearch: https://www.elastic.co
- "Perl: Elasticsearch" by Mike Schilli, Linux Magazine, issue 162, pg. 66, 2014: http://www.linux-magazine.com/Issues/2014/162/Perl-Elasticsearch/(language)/eng-US
- Tf-idf: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
- More Like This Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html
- Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/182/
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)