Elasticsearch, Logstash, and Kibana – The ELK stack

Logstash

Logstash [2] processes and normalizes logfiles. The application retrieves its information from various data sources, which you need to define as input modules. Sources can be, for example, data streams from Syslog or protocol files. In the second step, filter plugins process the data based on user specifications; you can also make this phase simply forward the material without any processing. The output modules finally output the results; in our lab, everything goes to the Elasticsearch service. Figure 2 shows how the components interact.

Figure 2: The input plugin routes data from one or multiple sources to Logstash. After filtering, the output module forwards the data to the Elasticsearch datastore.

Like Elasticsearch, Logstash is free and available under the Apache license. The project page offers the source code, Debian and RPM packages, and notes about the online repository.

We installed version 2.1.0 from November 25, 2015. Logstash also needs a Java Runtime Environment. It does not include a systemd service unit; instead, the vendor provides a legacy init script – unfortunately, without a reload parameter – because Logstash is not currently capable of dynamically reloading its configuration. A bug report had already been submitted when this issue went to press.

Building Blocks

You can configure Logstash in the /etc/logstash/conf.d directory, which is empty by default. The vendor does not deliver a simple default configuration and thus does not give users some quick guidance on how to achieve some initial results and understand the interaction of the Logstash pipeline. That said, the reference [8] is a good place to look for exhaustive explanations about all the plugins, and searching the web reveals numerous examples by other users that you could use as a template.

Because Logstash parses all the setup files from /etc/logstash/conf.d in alphanumeric order and connects them to create an overall configuration, it is a good idea to think about the structure up front. The test computer uses the following schema: The input files start with a  , the filters with 5, and the output modules with 9. The filenames are preceded by a four-digit number, leaving plenty of scope for experiment.

The simplest example is 0005-file-input.conf (Listing 2), which reads local logfiles. It uses the file input module and defines as the sources the Nginx access logfiles from the local machine (line 12); exclude rules out files with the .gz suffix, and type is an optional descriptor that the filter plugins can reference (see the "Extracted" section).

Listing 2

0005-file-input.conf

01 # "file" is a simple input module for files on the Logstash server:
02 # https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
03 #
04 # In a file called "sincedb", Logstash records whether and up to what point
05 # a file has been read.
06 #
07 # The "*" wildcard after "log" ensures that "logrotate"-rotated files are
08 # included and ensures that no entries are lost during a log rotation. The
09 # "exclude" here excludes compressed files ("* .gz").
10 # https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#_file_rotation
11
12 input {
13     file {
14         path => "/var/log/nginx/*access*.log*"
15         exclude => "*.gz"
16         type => "nginx_access"
17     }
18 }

If you want to collect and process logs from remote servers in addition to local logs, you can draw on Syslog itself for some help (Listings 3 and 4 and online [6]). The Logstash forwarder and Filebeat offer you an alternative (see the "Woodcutters" box).

Listing 3

0001-syslog-input.conf

01 # Native Syslog Input
02 # (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-syslog.html)
03 #
04 # Elastic is described in a tutorial:
05 # (https://www.elastic.co/guide/en/logstash/current/config-examples.html#_processing_syslog_messages)
06 # A TCP or UDP socket and subsequent parsing with grok. If syslog is
07 # implemented cleanly according to RFC 3164, this module works fine
08 # and has the advantage that SSL can be used.
09
10 input {
11     syslog {
12         port => 5514
13 # The default port for Syslog would be 514. Ports <1024 require root
14 # privileges; however, Logstash should not run as root.
15         type => "syslog-native"
16     }
17 }

Listing 4

0002-socketsyslog-input.conf

01 # TCP or UDP Socket:
02 # https://www.elastic.co/guide/en/logstash/current/plugins-inputs-tcp.html
03 # https://www.elastic.co/guide/en/logstash/current/plugins-inputs-udp.html
04 #
05 # A grok filter is defined for the "syslogviasocket" type, extracted
06 # from the lines of typical Syslog fields. Elastic describes it in detail at:
07 # https://www.elastic.co/guide/en/logstash/current/config-examples.html#_processing_syslog_messages
08 #
09 # This approach is possibly more robust and flexible with poorly formatted
10 # Syslog messages, as James Turnbull, author of "The Logstash Book,"
11 # writes in his blog:
12 # http://kartar.net/2014/09/when-logstash-and-syslog-go-wrong
13
14 input {
15   tcp {
16     port => 5000
17     type => syslogviasocket
18   }
19   udp {
20     port => 5000
21     type => syslogviasocket
22   }
23 }

Woodcutter

The Logstash server can receive data from remote computers. Earlier versions relied on a Logstash forwarder [9] to do this. A small tool that ran on each server collected the logs locally and then sent them to the central Logstash server using the Lumberjack protocol. The 0201-lumberjack-input.conf file (Listing 5) shows an example that makes the Logstash server available for existing legacy installations using Logstash forwarders.

In Logstash 2, the developers introduced Filebeat [4], a universal service that relies on the Beats protocol [10] to send data streams to a specific port on the Logstash server. Filebeat will replace Lumberjack in the long term. The Beats protocol, which has only been around since Logstash 2, is also used for other data shippers. We installed Filebeat version 1.0.0 dated November 24, 2015, from the project website. The /etc/filebeat/filebeat.yml contains meaningful defaults, which you can easily modified to suit your needs. The example file [6] shows the setup for the test machine.

One of Filebeat's advantages is that the tool can use SSL to send the collected logs on request to the Logstash server. Additionally, administrators can equip the Filebeat clients with certificates themselves and thus define the computers from which they receive logs. Another benefit is the registry file, which Filebeat users to remember which files it has already read and sent. In other words, if the Logstash server is not available, Filebeat can restart at a later time from where the transmission was interrupted.

Filebeat can theoretically deliver directly to Elasticsearch, and this is the default setting in the configuration (Output section). Because this does not include processing with filters, but simply sends the data streams to the index as is, we commented out this option on our test machine. Instead, Filebeat sends its data to Logstash.

Listing 5

0201-lumberjack-input.conf

01 # Lumberjack receives data from (old) Logstash forwarders:
02 # https://www.elastic.co/guide/en/logstash/current/plugins-inputs-lumberjack.html
03 # https://github.com/elastic/logstash-forwarder
04 # Filebeat supersedes these with Logstash 2; therefore, they are not
05 # actively developed. If an older Logstash forwarder is used, this file
06 # can serve as an input module. SSL is optional, but it was activated here.
07
08 input {
09     lumberjack {
10     port => 5043
11     type => "lumberjack"
12
13     ssl_certificate => "/etc/logstash/ssl/elk-test.example.com.cert.pem"
14     ssl_key => "/etc/logstash/ssl/elk-test.example.com.privkey.pem"
15     }
16 }

The filter modules process everything that has reached Logstash from the sources. They analyze the data streams and break them down into individual snippets of information and data fields. In addition to the modules for parsing and breaking down, other modules add more detail to enrich the raw data, including, for example, dns (DNS name resolution) and geoip (IP geolocation from the MaxMind database).

Extracted

All Logstash plugins, including the filters, are Ruby Gems. You can use the /opt/logstash/bin/plugin scripts to manage these extensions on your own system [11], list existing modules (Figure 3), install new ones from the web or from your local disk, and update existing modules. In addition to the included filters, you have access to a number of community plugins not written or updated by Elastic.

Figure 3: Which Logstash filters currently exist on your system and what version numbers do they have?

To define which filters are allowed to access what data, you can use ifelse statements that use standard fields, such as the previously mentioned type descriptor set by many input modules. Tags, which Filebeat can define, can serve as differentiating criteria for filters. Basically, all of the fields discovered in the upstream process are available, including fields that have just been extracted from a line in a logfile.

The 5003-postfix-filter.conf file [6] provides an example:

[...]
if [postfix_keyvalue_data] {
   kv {
     source       => "postfix_keyvalue_data"
     trim         => "<>,"
     prefix       => "postfix_"
     remove_field => [ "postfix_keyvalue_data" ]
   }
[...]

In this case, the kv filter (extraction of key-value pairs) is only used if the postfix_keyvalue_data field is defined.

The frequently used grok module can parse certain log formats, breaking down the body text into individual data fields, orienting its work on popular regular expressions, and supporting references to previously defined templates. Logstash itself contains a number of Grok patterns in the logstash-patterns-core plugin.

Developing your own patterns is not a trivial task. You will find a number of half-baked attempts on the web, but also some good examples under free licenses that you can add to your own configuration. You will find a very good pattern for Postfix [12], examples for Dovecot [13], and Nginx patterns [14]. The GitHub repository [15] collects templates for services such as Bacula, Nagios, PostgreSQL, and more.

The trial and error method of creating meaningful Grok patterns – requiring continuous Logstash restarts – is not a good idea and takes far too long. Two online tools solve this problem. Administrators can develop and test their Grok patterns on two sites [16] [17] before adding them to their Logstash configurations and restarting the service. You will also want to keep an eye on the configtest parameter of your Logstash init scripts and check your setup files for syntax errors using the /etc/init.d/logstash configtest command.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Logstash

    When something goes wrong on a system, the logfile is the first place to look for troubleshooting clues. Logstash, a log server with built-in analysis tools, consolidates logs from many servers and even makes the data searchable.

  • Perl: Elasticsearch

    The Elasticsearch full-text search engine quickly finds expressions even in huge text collections. With a few tricks, you can even locate photos that have been shot in the vicinity of a reference image.

  • Tutorials – Collectd

    The collectd tool harvests your system stats and stores them for plotting into colorful graphs.

  • Perl – Elasticsearch

    Websites often offer readers links to articles about similar topics. Using Elasticsearch, the free search engine, is one way to find related documents instantly and automatically.

  • FAQ

    Big data is like The Matrix – Better without the sequel

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News