Smart research using Elasticsearch

Fine-Tuning

The min_term_freq parameter specifies a threshold for the selection of a word in the reference document with the more_like_this function. If min_term_freq is set to the default value 2, a word must occur there at least twice to make its way into the list of words with which other documents are compared later. The second parameter max_query_terms is the maximum number of words from the list in the original document that the algorithm selects to use later in the query.

For anyone wanting to find out about other methods for fine-tuning the search engine, I would recommend the O'Reilly book on the topic [1]. It explains how to deal with Elasticsearch using examples, provides tips for scaling in clusters, and takes a look behind the scenes, where the Apache Lucene search engine is at work.

Infos

  1. Gormley, Clinton and Zachary Tong, Elasticsearch: The Definitive Guide: O'Reilly, 2015.
  2. Elasticsearch: https://www.elastic.co
  3. "Perl: Elasticsearch" by Mike Schilli, Linux Magazine, issue 162, pg. 66, 2014: http://www.linux-magazine.com/Issues/2014/162/Perl-Elasticsearch/(language)/eng-US
  4. Tf-idf: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
  5. More Like This Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html
  6. Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/182/

The Author

Mike Schilli works as a software engineer in the San Francisco Bay Area. He can be contacted at mailto:mschilli@perlmeister.com. Mike's homepage can be found at http://perlmeister.com.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Perl: Elasticsearch

    The Elasticsearch full-text search engine quickly finds expressions even in huge text collections. With a few tricks, you can even locate photos that have been shot in the vicinity of a reference image.

  • ELK Stack

    A powerful search engine, a tool for processing and normalizing protocols, and another for visualizing the results – Elasticsearch, Logstash, and Kibana form the ELK stack, which helps admins manage logfiles on high-volume systems.

  • ELK Stack Workshop

    ELK Stack is a powerful monitoring system known for efficient log management and versatile visualization. This hands-on workshop will help you take your first steps with setting up your own ELK Stack monitoring solution.

  • Logstash

    When something goes wrong on a system, the logfile is the first place to look for troubleshooting clues. Logstash, a log server with built-in analysis tools, consolidates logs from many servers and even makes the data searchable.

  • Tube Archivist

    Tube Archivist indexes videos or entire channels from YouTube in order to download them with the help of the yt-dlp tool.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News