Sensu — A powerful and scalable monitoring solution

Client Installation

The client systems only communicate with RabbitMQ, which makes it very easy to install the Sensu client on the computers you want to monitor. We use the same Sensu package and the same config.json as on the server but only enable the sensu-client service.

When the client first starts, it automatically registers with the server. From this moment on, the Sensu server expects at least regular signs of life in the form of keepalive messages. In the default configuration, Sensu raises an alarm if a client has not phoned home within the past three minutes.

Like any monitoring system, Sensu performs checks to verify the status of certain system components. Unlike Nagios, Sensu does not support host-based checks. Checks are always performed by a Sensu client. The client fields the check output and then dumps it on the central message bus for processing by the server. You can develop Sensu checks in any programming language that can output text to stdout.

If you are starting out with Sensu, you will probably want to begin with status checks that reflect the current state of the system. Sensu distinguishes among the following:

Passive checks requested by the Sensu server
Active checks that the Sensu client performs without a request
External events, which separate applications transmit to the Sensu client

Sensu expects the results of status checks in Nagios format. Sensu's support for Nagios format makes it extremely simple for users who are familiar with Nagios to start writing their own checks. Also, Nagios support means a huge number of ready-to-use checks are available from the outset. In fact, I was able to relieve my overtaxed Nagios system by passing many critical checks to Sensu, thus finally enjoying up-to-date monitoring results once again; #monitoringlove flooded the team.

The Sensu server initiates most of the checks. Sensu always addresses the prompt for a check to a group of subscribers. Listing 2 shows a simple configuration that connects the client to the all and test groups. Thanks to this publish/subscribe process, a single request to the server is all it takes to perform a routine task on a massive scale, such as querying the free disk space on several hundred clients.

Listing 2

/etc/sensu/conf.d/client.Json

01 ```Json
02 {
03   "client": {
04     "name": "<client1.example.com>",
05     "address": "10.0.10.1",
06     "subscriptions": [ "all", "test" ],
07     "disk_warn": "10%",
08     "disk_crit": "5%"
09   }
10 }
11 ```

Listing 3 shows the configuration for a typical check. Each client that has subscribed to receive all group messages will, when prompted by the server, perform the check defined in command (at 60-second intervals) and return its output to the server via RabbitMQ.

Listing 3

Disk Check

01 ~~~Json
02 {
03   "checks": {
04     "disk_free": {
05       "type": "status",
06       "subscribers": [ "all" ],
07       "handlers": [ "default" ],
08       "command": "/usr/lib/nagios/plugins/check_disk -w \
                      :::disk_warn::: -c :::disk_crit::: \
                      -A -x /dev/shm -X nfs -i /boot",
09       "interval": 60
10     }
11   }
12 }
13 ~~~

The sample check works with variables that can use specific values for the respective client. The name with three colons on the left and right serves as a placeholder for a variable. The Sensu client takes its local value from the client.conf file.

In addition to interval, Sensu also supports other options for managing checks. For example, you might want to configure the system to send a notice to the server only after several failed checks (occurrences) in a row. Sensu also has a feature for handling rapid state changes (flapping).

The standalone check is used if the client actively needs to initiate a check (i.e., independent of the server). Listing 4 shows an example of a locally controlled MySQL check that the client executes every 30 seconds. Active checks are simpler than passive checks because they do not require configuration and management on the server. A JSON file created manually on the client is all it takes to enable an active check.

Listing 4

Active Check

01 ```Json
02 {
03   "checks": {
04     "mysql_server": {
05       "standalone": true,
06       "interval": 30,
07       "handlers": [
08         "default"
09       ],
10       "command": "/usr/lib/nagios/plugins/check_mysql \
                      -u 'monitoring' -p 'db1ch3ck'"
11     }
12   }
13 }
14 ```

Active checks are useful for monitoring short-lived servers that do not justify the initial centralized configuration overhead. You can use the management tool that checks configurations to set up active checks (see the "Cooking with Chef" box). Active checks are also useful if you need them to run at specific times. The publish/subscribe process used with passive checks cannot guarantee a specific time.

Cooking with Chef

The Sensu cookbook for setting up active checks defines a simple Chef resource named sensu_check. Listing 5 contains a recipe fragment that sets up the check through Chef.

Listing 5

Chief Resource for Active Checks

01 ~~~ruby
02 sensu_check 'mysql_server' do
03   command "/usr/lib/nagios/plugins/check_mysql " + \
              "-u 'monitoring' " + \
              "-p '#{node['mysql']['server_mon_password']}'"
04   handlers ['default']
05   standalone true
06   interval 30
07 end
08 ~~~

You do not need to develop special checks if you want Sensu both to process status information from the system and to monitor events for an external application. Sensu can transmit its data to the local Sensu client directly via port 3030. Listing 6 shows how easy it is with an sample shell script. The use of the Sensu shell helper [5] has stood the test in practice because Sensu expects external events in JSON format, which can be difficult to create with shell commands. Besides status information, the Sensu client can also collect run-time metrics. Listing 7 shows the definition of a check that runs a Ruby script to increase the system load. As with status checks, the run-time metrics' output format is kept deliberately simple. As you can see from Listing 8, Sensu expects one measuring point per line, consisting of a hierarchical metric ID, the measured value, and a time stamp.

Listing 6

Transferring External Events

01 ~~~bash
02 echo '{ "name": "my_check", "output": "{ ... }", \
          "status": 0 }' > /dev/tcp/localhost/3030
03 ~~~

Listing 7

Check for Run-Time Metrics

01 ~~~Json
02 {
03   "checks": {
04     "load_metrics": {
05       "type": "metric",
06       "command": "load-metrics.rb",
07       "subscribers": [
08         "production"
09       ],
10       "interval": 10
11     }
12   }
13 }
14 ~~~

Listing 8

Metric Check

01 ~~~
02 $ ruby load-metrics.rb
03 srv3.local.load_avg.one 0.89  1365270842
04 srv3.local.load_avg.five  1.01  1365270842
05 srv3.local.load_avg.fifteen 1.06  1365270842
06 $ echo $?
07 0
08 ~~~

The event handlers on the server evaluate the event once the Sensu client has run the check and returned the results on the message bus. As soon as a new event arrives on the bus, Sensu passes it on (as usual in JSON format) to the relevant event handler.

Sensu distinguishes the following types of event handlers:

Pipe: A system command executes this type of routine and passes the event data to it via stdin.
TCP, UDP: Two types of write event data in a TCP or UDP socket.
Transport: This type internally publishes event data on a transport channel in Sensu, typically RabbitMQ.
Group: An event handler group sends the event data to a group of event handlers. Adding a single event handler to a group thereby effectively defines an alias name.

Sensu can associate a wide range of actions with an event. Possible actions include:

Notification via email or text message
Messages on chat channels
Alerting via pager duty
Forwarding of run-time metrics to Graphite
Generating log entries for evaluation in Logstash

Listing 9 shows how easy it is to process a monitoring event in an event handler. This simple Ruby script is stored in /etc/sensu/handlers/file.rb and receives events in JSON format, which it writes to files that are formatted to be readable by humans. The new event handler is configured in /etc/sensu/conf.d/handlers/default.json as a Pipe plugin (Listing 10). It might be easy to build your own event handler, but you can save yourself the trouble in most cases. The Sensu community has collected an extensive repository of ready-to-use plugins on GitHub [6]. The repository contains more than 600 checks, event handlers, and other Sensu extensions.

Listing 9

Event Handler

01 ~~~ruby
02 #!/usr/bin/env ruby
03
04 require 'rubygems'
05 require 'Json'
06
07 # Read event data
08 event = Json.parse(STDIN.read, :symbolize_names => true)
09 # Write the event data to a file
10 file_name = "/tmp/sensu_#{event[:client][:name]}_" + \
                "#{event[:check][:name]}"
11 File.open(file_name, 'w') do |file|
12   file.write(Json.pretty_generate(event))
13 end
14 ~~~

Listing 10

Integrating the Event Handler

01 ~~~Json
02 {
03   "handlers": {
04     "file": {
05       "type": "pipe",
06       "command": "/etc/sensu/handlers/file.rb"
07     }
08   }
09 }
10 ~~~

Automatic Remedies

Would it not be cool if your monitoring system could fix errors as well as detect and report them? Writing an event handler that initiates appropriate measures is not too difficult. However, because the event handler runs on the Sensu server and the error occurs on a client, you need a mechanism to bridge this gap.

At freistil IT, we experimented with the remote execution tool Serf for freistilbox.com. However, smart Sensu users realized that it was not necessary to use two different applications that both ultimately use their own messaging systems to transport actions and events. This realization led to the Sensu Remediator plugin.

Using this plugin, I could assign the check with a three-stage repair strategy. A suitable command was executed on the client at each stage; the plugin also smartly "misappropriated" the Sensu checks. In the example (Listing 11), the plugin first triggers a reload when entering a WARNING status. If the status remains unchanged, the plugin will try a restart instead. The system will respond by rebooting if a CRITICAL status occurs.

Listing 11

Self-Healing Infrastructure

01 ```Json
02 {
03   "checks": {
04     "check_foo": {
05       "command": "check-procs.rb ...",
06       "interval": 30,
07       "subscribers": ["application_server"],
08       "handlers": ["debug", "slack", "remediator"],
09       "remediation": {
10         "light_remediation": {
11           "occurrences": [1, 2],
12           "severities": [1]
13         },
14         "medium_remediation": {
15           "occurrences": ["3-5"],
16           "severities": [1]
17         },
18         "heavy_remediation": {
19           "occurrences": ["1+"],
20           "severities": [2]
21         }
22       }
23     },
24     "light_remediation": {
25       "command": "service foo reload",
26       "subscribers": [],
27       "handlers": ["debug"],
28       "publish": false
29     },
30     "medium_remediation": {
31       "command": "service foo restart",
32       "subscribers": [],
33       "handlers": ["debug", "slack"],
34       "publish": false
35     },
36     "heavy_remediation": {
37       "command": "sudo reboot",
38       "subscribers": [],
39       "handlers": ["debug", "slack"],
40       "publish": false
41     }
42   }
43 }
44 ```

The three repair "checks" are deliberately defined without subscribers; the plugin always prompts the affected client to run it. For this approach to work, this client must have a subscription using its own host name (Listing 12).

Listing 12

Self-Subscription

01 ```Json
02 {
03   "client": {
04     "name":"client1.example.com",
05     "address":"10.0.10.1",
06     "subscriptions":[
07       "all",
08       "test",
09       "client1.example.com"
10     ]
11   }
12 }
13 ```

Sensu is very unobtrusive in day-to-day operations – at least as long as no errors occur. As a sys admin, you hardly have direct interaction with Sensu, especially if you use external services such as Pager Duty for alerting. If you do have a need to interact with the monitoring system, doing so via the web dashboard is simple and efficient. You can acknowledge alerts or even shut them off for a while using the silence function.

Anyone who prefers to use Sensu without a mouse should try sensu-cli [7]. This command-line application can acknowledge alerts:

sensu-cli resolve server3 apache_http

or temporarily stop:

sensu-cli silence server3 reason "Shut up already" - expire 3600

Because a new Sensu client registers with the server, this registration must be deleted if the client no longer exists:

sensu-cli client delete server3

This step avoids unnecessary alerts and is easy to do.

ChatOps

Many companies, especially if their employees are geographically dispersed, use chat for team communication. The chat system becomes the central source of information if you enrich team messages with system messages. In this way, everyone finds out about new Git commits or changes in the wiki without delay, and team members can exchange information on the spot. Sensu comes with event handlers for several common chat systems (IRC, Slack, Campfire, etc.).

The quantum leap from the central source of information to ChatOps is achieved by implementing a back channel in the form of a chatbot. This bot is tasked with receiving instructions from the chat and interacting with various OPS services.

GitHub's Hubot [8] is the classic chatbot; on freistilbox, the team had great fun with Lita [9]. Besides simply acknowledging an alert with a simple pagerbot ack 1234 or quickly taking over standby duties for a colleague with pagerbot put me on firstlevel for 1 hour, members were also able to communicate these actions instantly to the rest of the team.

« Previous 1 2 3 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

Games , Hardware , laptop , Linux

This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
XZ Gets the All-Clear

Arch Linux , Fedora , Linux , open source , Security , Ubuntu

The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
Canonical Collaborates with Qualcomm on New Venture

Artificial Inte... , Linux , open source , Security , Ubuntu

This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
Kodi 21.0 Open-Source Entertainment Hub Released

audio , Multimedia , Music , open source , streaming video , Video

After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
Linux Usage Increases in Two Key Areas

Games , Linux , open source , Steam

If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
Vulnerability Discovered in xz Libraries

Fedora , Linux , malware , Security

An urgent alert for Fedora 40 has been posted and users should pay attention.
Canonical Bumps LTS Support to 12 years

Linux , open source , Operating Systems , Ubuntu

If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
Fedora 40 Beta Released Soon

Fedora , Gnome , open source , Plasma , Wayland

With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
New Pentesting Distribution to Compete with Kali Linux

Linux , open source , Tools , Ubuntu

SnoopGod is now available for your testing needs
Juno Computers Launches Another Linux Laptop

Hardware , laptop , Linux , Ubuntu

If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.

Sensu — A powerful and scalable monitoring solution

Client Installation

Automatic Remedies

ChatOps

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

XZ Gets the All-Clear

Canonical Collaborates with Qualcomm on New Venture

Kodi 21.0 Open-Source Entertainment Hub Released

Linux Usage Increases in Two Key Areas

Vulnerability Discovered in xz Libraries

Canonical Bumps LTS Support to 12 years

Fedora 40 Beta Released Soon

New Pentesting Distribution to Compete with Kali Linux

Juno Computers Launches Another Linux Laptop

Sensu — A powerful and scalable monitoring solution

Client Installation

Automatic Remedies

ChatOps

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters