Parallel shell with pdsh

Specifying Hosts

I won't list all the several other ways to specify a list of nodes, because the pdsh website [9] discusses virtually all of them; however, some of the methods are pretty handy. The simplest way is to specify the nodes on the command line is to use the -w option:

$ pdsh -w 192.168.1.4,192.168.1.250 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

In this case, I specified the node names separated by commas. You can also use a range of hosts as follows:

pdsh -w host[1-11]
pdsh -w host[1-4,8-11]

In the first case, pdsh expands the host range to host1, host2, host3, …, host11. In the second case, it expands the hosts similarly (host1, host2, host3, host4, host8, host9, host10, host11). You can go to the pdsh website for more information on hostlist expressions [10].

Another option is to have pdsh read the hosts from a file other than the one to which WCOLL points. The command shown in Listing 2 tells pdsh to take the hostnames from the file /tmp/hosts, which is listed after -w ^ (with no space between the "^" and the filename). You can also use several host files,

Listing 2

Read Hosts from File

[laytonjb@home4 ~]$ pdsh -w ^/tmp/hosts uptime
192.168.1.4:  15:51:39 up  8:35, 12 users,  load average: 0.64, 0.38, 0.20
192.168.1.250:  15:47:53 up 2 min,  0 users,  load average: 0.10, 0.10, 0.04
[laytonjb@home4 ~]$ more /tmp/hosts
192.168.1.4
192.168.1.250
$ more /tmp/hosts
192.168.1.4
$ more /tmp/hosts2
192.168.1.250
$ pdsh -w ^/tmp/hosts,^/tmp/hosts2 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

or you can exclude hosts from a list:

$ pdsh -w -192.168.1.250 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64

The option -w -192.168.1.250 excluded node 192.168.1.250 from the list and only output the information for 192.168.1.4. You can also exclude nodes using a node file:

$ pdsh -w -^/tmp/hosts2 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64

In this case, /tmp/hosts2 contains 192.168.1.250, which isn't included in the output. Using the -x option with a hostname,

$ pdsh -x 192.168.1.4 uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64
$ pdsh -x ^/tmp/hosts uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64
$ more /tmp/hosts
192.168.1.4

or a list of hostnames to be excluded from the command to run also works.

More Useful pdsh Commands

Now I can shift into second gear and try some fancier pdsh tricks. First, I want to run a more complicated command on all of the nodes (Listing 3). Notice that I put the entire command in quotes. This means the entire command is run on each node, including the first (cat /proc/cpuinfo) and second (grep bogomips) parts.

Listing 3

Quotation Marks 1

[laytonjb@home4 ~]$ pdsh 'cat /proc/cpuinfo | grep bogomips'
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23

In the output, the node precedes the command results, so you can tell what output is associated with which node. Notice that the BogoMips values are different on the two nodes, which is perfectly understandable because the systems are different. The first node has eight cores (four cores and four Hyper-Thread cores), and the second node has four cores.

You can use this command across a homogeneous cluster to make sure all the nodes are reporting back the same BogoMips value. If the cluster is truly homogeneous, this value should be the same. If it's not, then I would take the offending node out of production and check it.

A slightly different command shown in Listing 4 runs the first part contained in quotes, cat /proc/cpuinfo, on each node and the second part of the command, grep bogomips, on the node on which you issue the pdsh command.

Listing 4

Quotation Marks 2

[laytonjb@home4 ~]$ pdsh 'cat /proc/cpuinfo' | grep bogomips
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.4: bogomips   : 6997.39
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23

The point here is that you need to be careful on the command line. In this example, the differences are trivial, but other commands could have differences that might be difficult to notice.

One very important thing to note is that pdsh does not guarantee a return of output in any particular order. If you have a list of 20 nodes, the output does not necessarily start with node 1 and increase incrementally to node 20. For example, in Listing 5, I run vmstat on each node and get three lines of output from each node.

Listing 5

Order of Output

laytonjb@home4 ~]$ pdsh vmstat 1 2
192.168.1.4: procs  ------------memory------------   ---swap-- -----io---- --system--  -----cpu-----
192.168.1.4:  r  b     swpd   free    buff   cache     si   so    bi    bo   in    cs  us sy id wa st
192.168.1.4:  1  0        0 30198704  286340  751652    0    0     2     3   48    66   1  0 98  0  0
192.168.1.250: procs -----------memory------------   ---swap-- -----io---- --system-- ------cpu------
192.168.1.250:  r  b   swpd   free    buff   cache     si   so    bi    bo   in    cs us sy  id wa st
192.168.1.250:  0  0      0 7248836   25632  79268      0    0    14     2   22    21  0  0  99  0  0
192.168.1.4:    1  0      0 30198100  286340 751668     0    0     0     0  412   735  1  0  99  0  0
192.168.1.250:  0  0      0 7249076   25632  79284      0    0     0     0   90    39  0  0 100  0  0

At first, it looks like the results from the first node are output first, but then the second node creeps in with its results. You need to expect that the output from a command that returns more than one line per node could be mixed. My best advice is to grab the output, put it into an editor, and rearrange the lines, remembering that the lines for any specific node are in the correct order.

Maybe someone with some serious pdsh-fu has a simple solution (please let me know if you have a technique). The other option is to issue only commands that return a single line of output. The results might not return in node order, but it would be easier to sort them.

You can easily use pdsh to run scripts or commands on each node. For example, if you have read my past articles on processor and memory metrics [11] or processes, networks, and disk metrics [12], you can use those scripts to gather metrics quickly and easily on each node. However, you might want to modify the scripts so you only get one line of output (or maybe add switches in the code so you can specify the output) to make it easier to sort the results.

pdsh Modules

Previously, I mentioned that pdsh uses rcmd modules to access nodes. The authors have extended this to create modules for various specific situations. The pdsh modules page [13] lists other modules that can be built as part of pdsh, including:

  • machines
  • genders
  • nodeupdown
  • slurm
  • torque
  • dshgroup
  • netgroup

These modules extend the functionality of pdsh. For example, the SLURM module allows you to run the command only on nodes specified by currently running SLURM jobs. When pdsh is run with the SLURM module, it reads the list of nodes from the SLURM_JOBID environment variable. Running pdsh with the -j obid option gets the list of hosts from the jobid specified.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • SSHFS-MUX

    With some simple tuning, SSHFS performance is comparable to NFS almost across the board. In an effort to get even more performance from SSHFS, we examine SSHFS-MUX, which allows you to combine directories from multiple servers into a single mountpoint.

  • Perl: Network Monitoring

    To discover possibly undesirable arrivals and departures on their networks, a Perl daemon periodically stores the data from Nmap scans and passes them on to Nagios via a built-in web interface.

  • Rocks Clustering

    Rocks offers an easy solution for clustering with virtual machines.

  • Mesh Networking

    Mesh networking comes to with the IEEE802.11s draft standard. We'll show you how to mix a mesh.

  • Puppet Labs MCollective 1.0 Now Available

    “Our partners bring users the operating system, Amazon EC2 and OpenStack, as well as traditional hosted and in-house servers give users a platform to run that operating system, and Puppet with MCollective1.0 gives them ease of use and adds a layer of functionality that greatly improve the productivity of both the platform and the user.” Luke Kanies

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News