Elasticsearch, Logstash, and Kibana – The ELK stack
ELK Hunt
A powerful search engine, a tool for processing and normalizing protocols, and another for visualizing the results – Elasticsearch, Logstash, and Kibana form the ELK stack, which helps admins manage logfiles on high-volume systems.
Even a single, small LAMP server will produce a number of logfiles, and if you have a large array of servers, you can generally look forward to a volume of logfiles that is likely to exceed the capabilities of most built-in log management tools – if you want to analyze the data in your logs, that is. The different file formats output by the typical zoo of applications also add complexity.
The ELK stack, which is a combination of Elasticsearch [1], Logstash [2], and Kibana [3] addresses these difficulties. Elasticsearch is an extremely powerful search server that receives its data from Logstash, an application that extracts the data from server protocols, normalizes them, and dumps the results in an Elasticsearch index. Finally, the Kibana analytics and data visualization tool offers extremely flexible views of the information.
The lab environment consisted of several Debian Jessie servers, one running an ELK stack, as well as Filebeat [4], a service that acquires the local logs and sends them to Logstash. Filebeat can also collect logs from remote sources; we used it on another server that was already set up and upgraded as a central log host. The server also takes care of Syslog forwarding.
Three other servers work as Elasticsearch nodes to improve storage space and search performance across the board. Currently, an ELK stack is taking care of the logs from Postfix, Dovecot, Apache, Nginx, and Open-Xchange in the lab.
Elasticsearch
Elasticsearch [1] by Elastic is implemented in Java and based on Apache Lucene, an extremely powerful full-text search engine that provides its feature set via a REST API. Elasticsearch automatically indexes all text (documents). Even without defining fields or data types, it can find search terms in a large volume of data. Elasticsearch supports complex requests with many dependencies and understands metrics (e.g., the frequency of occurrence of certain criteria).
The main components are released under the Apache license and are available for free via the GitHub repository and the project's website. This is also where users will find the source code and packages for Debian- and RPM-based distributions. Elasticsearch has additional commercial modules, such as Shield (see the "Security!" section), Marvel (monitoring), or Watcher (alerting).
Elastic does not sell individual licenses for the plugins; instead, users need to take out a subscription that includes all the components and support. The website does not cite prices for the individual subscription models [5]. If you are interested in a subscription, you need to contact the vendor to request a quotation.
The test team installed version 2.1.0 dated November 24, 2015, using the Debian package from the homepage. The Elasticsearch repository was added to our own server's package sources to keep everything up to date. The package is easily integrated with the system – but it does not complain if you are missing a Java Runtime Environment. This is something you definitely need to install retroactively; openjdk-8-jre
worked perfectly in our lab. The installation routine sets up a service unit for systemd to start and stop the daemon.
Well Distributed
Linking up multiple machines with an Elasticsearch installation to create a cluster is easily done. The nodes synchronize their indexes in the cluster and autonomously distribute incoming search requests from clients. Adding a second Elasticsearch node means the data is replicated, so you start to increase storage space as of the third node. Elasticsearch automatically breaks down its indexes into shards, which means that the service can store large collections of data distributed across multiple servers, ensuring replication if a node fails.
Moreover, access is distributed, which improves performance and ensures that large collections of data are searched quickly. Admins do not need to decide whether or not they want the ELK to scale before installing and setting up. At any time you can extend your setup and add more Elasticsearch nodes to your cluster. The software supports mechanisms for distributing the data out of the box, which removes the need for an additional clustering or load balancing component.
Elasticsearch is configured in the /etc/elasticsearch/elasticsearch.yml
file, which is broken down into various sections. The listings for this article [6] has an example of the first section, as well as the setup file for the other nodes. The cluster name is listed below the Cluster
section (e.g., cluster.name: elk-test
), and the Node
section contains the node designations: elk-test1
, elk-test2
, … elk-test4
in this example (e.g., node.name: elk-test1).
The test team also made changes below Network
. By default, the Elasticsearch service is tied to port 9200 on localhost
(IPv4 and IPv6). Because we have multiple nodes, we told Elasticsearch to listen on all network interfaces. As of this writing, it is not possible to define a list of interfaces and thus restrict access, but the vendor has received such a feature request.
If you have multiple IP addresses, you can use the publish_host
variable in the Network
section to define which IP the computer uses to communicate with the other Elasticsearch nodes. In contrast, bind_host
defines the addresses on which the service listens. The setting is particularly important if you need to scale massively. In this case, you will probably want the Elasticsearch nodes to exchange data on one network but use a different outward-facing IP for client access.
The Discovery
section of the configuration file, which is where you list all the nodes. is also interesting if you have more than one Elasticsearch node. Once a node is set up, users can run the curl
command-line tool or use their web browsers to check whether the search service is running (Figure 1).
Security!
One thing you notice on first contact is that Elasticsearch does not use any authentication mechanisms and that the data passes through the network in the clear. It also lacks rights management to determine which client is allowed to access what part of the index.
The Shield [7] plugin gives you all of these security features and can be particularly interesting if you are running Elasticsearch in a cluster with multiple server instances. You can use the /usr/share/elasticsearch/bin/plugin
scripts to install the license and Shield on each of your nodes – as described on the website. Then restart all of your Elasticsearch services. You can test Shield and the other commercial plugins for 30 days free of charge.
Shield extends the search service to include user management and a rights system. It also encrypts the data streams between the Elasticsearch nodes with SSL and prevents unauthorized nodes joining the cluster. You need to manage the SSL certificates yourself, but you will find some support in the Shield documentation on the website.
As an alternative, you can use iptables to decide who is allowed to access your Elasticsearch server or servers. For example, you could specify that only certain machines on your internal network are allowed to access the nodes (Listing 1), but this does not solve the problem of unencrypted data transfer. In the case of logfiles, which may contain confidential information, this is not exactly ideal. Because Elasticsearch provides a web server, you could install a reverse proxy in the middle to enable both SSL encryption and authentication based on htpasswd
.
Listing 1
Iptables Rules for Elasticsearch
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.