Parallel shell with pdsh

Shell Games

Article from Issue 166/2014

Author(s): Jeff Layton

The most fundamental tool needed to administer a cluster is a parallel shell, which allows you to run the same command on a series of nodes. In this article, we look at pdsh.

A parallel shell allows you to run the same command on designated nodes in the cluster, so you don't have to log in to each node to run the command. This tool can be useful in many ways, but I like to use it when performing administrative tasks, such as:

Checking the versions of particular software packages on each node
Checking the OS version on all nodes
Checking the kernel version on all nodes
Searching the system logs on each node (if you don't store them centrally)
Examining the CPU usage on each node
Examining local I/O (if the nodes are doing local I/O)
Checking whether any nodes are swapping
Spot-monitoring the compute nodes

The complete list of possible tasks is extensive, but anything you want to do on a single node can be done on a large number of nodes using a parallel shell tool.

If you try to use a parallel shell on a 50,000-node cluster, however, the time skew could be large enough to make the results meaningless. Although certain techniques can allow the use of parallel commands on a large number of nodes, parallel shells are better used on a modest number of nodes or to gather information on slowly varying data. Parallel shells are even great for administering instances in the cloud on something like Amazon Web Services (AWS).

Many parallel shells are available – including DSH [1], PyDSH [2], PPSS [3], PSSH [4], pdsh [5], PuSSH [6], sshpt [7], and mqsh [8] – and each tool has its pros and cons. (Note: I have not tested all of these tools, so I can't vouch for them.) Several of these tools are written in Python, which has become a very popular tool for devops.

In this article, I'll select one of the parallel shells to illustrate its possibilities. Other tools are fairly similar, with some syntactical differences and various sets of features. The tool I'm going to talk about here is pdsh.

Introduction to pdsh

Pdsh is arguably one of the most popular parallel shell tools. The most recent version on SourceForge as of writing this article is 2.26, dated 2011-05-01. Code development appears to have moved to Google code. The most recent version there is 2.29, updated February 2013. I'll be using that version in this article.

Pdsh is very interesting because it allows you to run commands on multiple nodes using only ssh. The client nodes only need ssh, which is generally present on systems, and you don't need to install any extra software on the compute nodes – you just need ssh. However, you need the ability to SSH to any node without a password ("passwordless SSH").

Building and Installing pdsh

Building and installing pdsh is really simple if you've built code using GNU's autoconfigure before. The steps are quite easy:

./configure --with-ssh --without-rsh
make
make install

This puts the binaries into /usr/local/, which is fine for testing purposes. For production work, I would put it in /opt or something like that – just be sure it's in your path.

You might notice that I used the --without-rsh option in the configure command. By default, pdsh uses rsh, which is not really secure, so I chose to exclude it from the configuration. In the output in Listing 1, you can see the pdsh rcmd modules (rcmd is the remote command used by pdsh). Notice that the "available rcmd modules" at the end of the output lists only ssh and exec. If I didn't exclude rsh, it would be listed here, too, and it would be the default. To override rsh and make ssh the default, you just add the following line to your .bashrc file:

Listing 1

rcmd Modules

export PDSH_RCMD_TYPE=ssh

Be sure to "source" your .bashrc file (i.e., source .bashrc) to set the environment variable. You can also log out and log back in. If, for some reason, you see the following when you try running pdsh,

$ pdsh -w 192.168.1.250 ls -s
pdsh@home4: 192.168.1.250: rcmd: socket: Permission denied

then you have built it with rsh. You can either rebuild pdsh without rsh, or you can use the environment variable in your .bashrc file, or you can do both.

First pdsh Commands

To begin, I'll try to get the kernel version of a node by using its IP address:

$ pdsh -w 192.168.1.250 uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

The -w option means I am specifying the node(s) that will run the command. In this case, I specified the IP address of the node (192.168.1.250). After the list of nodes, I add the command I want to run, which is uname -r in this case. Notice that pdsh starts the output line by identifying the node name.

If you need to mix rcmd modules in a single command, you can specify which module to use in the command line,

$ pdsh -w ssh:laytonjb@192.168.1.250 uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

by putting the rcmd module before the node name. In this case, I used ssh and typical ssh syntax.

A very common way of using pdsh is to set the environment variable WCOLL to point to the file that contains the list of hosts you want to use in the pdsh command. For example, I created a subdirectory PDSH where I create a file hosts that lists the hosts I want to use:

[laytonjb@home4 ~]$ mkdir PDSH
[laytonjb@home4 ~]$ cd PDSH
[laytonjb@home4 PDSH]$ vi hosts
[laytonjb@home4 PDSH]$ more hosts
192.168.1.4
192.168.1.250

I'm only using two nodes: 192.168.1.4 and 192.168.1.250. The first is my test system (like a cluster head node), and the second is my test compute node. You can put hosts in the file as you would on the command line separated by commas. Be sure not to put a blank line at the end of the file because pdsh will try to connect to it. You can put the environment variable WCOLL in your .bashrc file:

export WCOLL=/home/laytonjb/PDSH/hosts

As before, you can source your .bashrc file, or you can log out and log back in.

1 2 3 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

New Slimbook EVO with Raw AMD Ryzen Power

Hardware , laptop , Linux

If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
The Gnome Foundation Struggling to Stay Afloat

Community , Gnome , open source

The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
Thousands of Linux Servers Infected with Stealth Malware Since 2021

Linux , malware , Security

Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
Halcyon Creates Anti-Ransomware Protection for Linux

cyberattack , Linux , ransomware , Tools

As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
Valve and Arch Linux Announce Collaboration

Arch Linux , open source , Steam

Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
Hacker Successfully Runs Linux on a CPU from the Early ‘70s

Hardware , Linux

From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
OSI and LPI Form Strategic Alliance

Community , Linux , open source

With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
Fedora 41 Beta Available with Some Interesting Additions

Fedora , Gimp , Wayland

If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
AlmaLinux Unveils New Hardware Certification Process

AlmaLinux , Enterprise Linux , open source

The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
Wind River Introduces eLxr Pro Linux Solution

DEBIAN , Enterprise Linux , open source

eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.

Parallel shell with pdsh

Shell Games

Introduction to pdsh

Building and Installing pdsh

First pdsh Commands

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

New Slimbook EVO with Raw AMD Ryzen Power

The Gnome Foundation Struggling to Stay Afloat

Thousands of Linux Servers Infected with Stealth Malware Since 2021

Halcyon Creates Anti-Ransomware Protection for Linux

Valve and Arch Linux Announce Collaboration

Hacker Successfully Runs Linux on a CPU from the Early ‘70s

OSI and LPI Form Strategic Alliance

Fedora 41 Beta Available with Some Interesting Additions

AlmaLinux Unveils New Hardware Certification Process

Wind River Introduces eLxr Pro Linux Solution

Parallel shell with pdsh

Shell Games

Introduction to pdsh

Building and Installing pdsh

First pdsh Commands

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters