Setting up a file server cluster with Samba and CTDB

CTDB Hands-On

For a CTDB setup, the Samba developers recommend (at least) two – preferably physically separated – networks: a public network from which the clients will access the available services (Samba, NFS, ftp, …) and a private network, which CTDB uses to handle internal communications within the cluster.

The network for the cluster filesystem can be a separate network, or it can be the same network that CTDB uses internally. A separate management network can turn out to be a good thing for, say, SSH logins on the nodes. Figure 1 shows the basic configuration of a CTDB cluster.

See the box titled "Downloading and Compiling CTDB" for information on how to add CTDB to your own Samba implementation. CTDB's central configuration file is /etc/sysconfig/ctdb. The really important thing is to specify the recovery lock file via the CTDB_RECOVERY_LOCK variable. On top of this, the admin user has to populate the /etc/ctdb/nodes file with the IP addresses of all the CTDB nodes on the private network (Listing 1). This file also has to be identical on all nodes.

Downloading and Compiling CTDB

The Samba project has been using the decentralized Git [10] code management system since late in 2007. The developers maintain Samba and CTDB on the server at git://git.samba.org or on the web front end [11]. The branches for the official Samba versions and the master developer branch are available from the git://git.samba.org/samba.git repository. The mirror [12] will even give you tarball snapshots of every single revision.

The official CTDB sources are available from Ronnie Sahlberg's repository [7]. The repository at git://git.samba.org/obnox/samba-ctdb.git contains Samba versions with cluster extensions based on the official release branches – in particular, a production-ready cluster variant of Samba version 3.2 (v3-2-ctdb). As of this writing, the CTDB software will run on Linux and AIX. The normal sequence of commands will build and install the software:

cd ctdb/
./autogen.sh
./configure [options]
make
make install

You don't need any special configure options. The normal --prefix allows the administrator to customize the installation directories. On RPM systems, you can generate a package directly from a Git checkout:

cd ctdb/

Prebuilt CTDB and v3-2-ctdb RPMs are available for Red Hat [13] and other distributions [14].

Listing 1

/etc/ctdb/nodes

01 192.168.46.70
02 192.168.46.71
03 192.168.46.72

Samba Configuration

If you have Samba with cluster support (see the box "How-to Build Your Own Samba"), you will want to configure it with your own smb.conf parameters. The clustering = yes parameter enables clustering at run time. Without this parameter, the Samba clustering version will work like any old version of Samba without cluster support.

Despite what various pages of the Samba wiki say [23], you won't need to locate private dir on the cluster filesystem (well, maybe for a local smbpsswd). This information only applies to earlier versions of CTDB that could not handle persistent TDB databases, such as secrets.tdb and passdb.tdb in private dir. Current versions of CTDB automatically distribute the persistent TDBs over the cluster.

If you need group mappings, you must change the back end from the default of ldb to tdb with groupdb:backend = tdb.

Samba uses an identification code to store the locking information: smbd typically creates this code by stat()ing the file's device and inode number. However, the cluster setup needs an ID that is valid for multiple nodes because the device number is not invariable for the file in the cluster. The VFS fileid module provides an alternative approach to forming a file ID that is valid throughout the cluster. The vfs objects = fileid parameter in the corresponding configuration section enables the fileid module either globally or for a share. The value of the fileid:algorithm option in the [global] section configures the method, as in

vfs objects = fileid
fileid:algorithm = fsid

How-to Build Your Own Samba

If you can not, or prefer not to, use prebuilt packages, you can build and install a cluster-capable Samba 3.3 from the source code using the standard sequence of commands:

cd samba/source
./autogen.sh
./configure --with-cluster-support --with-ctdb=/usr/include --with-shared-modules=idmap_tdb2 [Options]
./make everything
./make install

You only need to call autogen.sh if you are using a Git repository snapshot instead of the release tarball. The --with-ctdb= configure parameter specifies where the CTDB headers are on the system. Samba needs them to compile the code for communications with CTDB. If you have already installed CTDB from a package, /usr/include is normally okay.

Building the cluster variant of the standard ID mapping module, idmap_tdb, adds idmap_tdb2 to the list of modules in --with-shared-modules=. The Samba team is currently working on merging idmap_tdb and idmap_tdb2 to support idmap_tdb in the cluster. One of the next Samba versions will probably resolve this issue.

The commands

cd samba/source
./packaging/RHEL-CTDB/makerpms.sh

generate RPMs for Red Hat and SUSE systems directly from a Git checkout.

IP Addressing

To distribute public IP addresses across the cluster nodes you can use any of three options. For example, you can assign addresses statically without involving CTDB. In this case, CTDB can't play its high-availability card. Or, you can use a single IP address as the public cluster address in what is known as LVS mode and let the LVS master node distribute the address to the participating nodes. Setting the CTDB_LVS_PUBLIC_IP and CTDB_PUBLIC_INTERFACE variables in /etc/sysconfig/ctdb enables this mode.

The third method is to allow CTDB to dynamically assign multiple public IP addresses to the nodes. In combination with round-robin DNS upstream, this option adds load balancing and high availability to your CTDB cluster. To allow this to happen, you need to specify a file – typically /etc/ctdb/public_addresses – with the /etc/sysconfig/ctdb CTDB_PUBLIC_ADDRESSES variable on each node; the file contains the address pool with the netmasks and network interfaces that CTDB will assign to the nodes.

The address list does not need to be on every node, and it does not need to be the same on each node. Instead, you can take the network topology of your public network into consideration and create partitions. If a node fails, CTDB transfers its public IP addresses to other cluster nodes, which have these addresses in their public_addresses lists.

It is important to understand that load balancing and client distribution over the client nodes are connection oriented. If an IP address is switched from one node to another, all the connections actively using this IP address are dropped and the clients have to reconnect.

To avoid delays, CTDB uses a trick: When an IP is switched, the new CTDB node "tickles" the client with an illegal TCP ACK packet (tickle ACK) containing an invalid sequence number of 0 and an ACK number of 0. The client responds with a valid ACK packet, allowing the new IP address owner to close the connection with an RST packet, thus forcing the client to reestablish the connection to the new node.

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Samba's Registry

    Samba's new registry-based configuration system conserves resources and lets the administrator configure entire clusters with a single command.

  • Samba 3.2 With IPv6, Clustering and GPLv3

    The free Samba project has just released version 3.2 of the file and printer server for Microsoft Windows clients. The team will be moving to the GPL v3 license as of this version.

  • Rocks Clustering

    Rocks offers an easy solution for clustering with virtual machines.

  • Proxmox VE

    The Proxmox Virtual Environment has developed from an insider’s tip to a free VMware ESXi/ vSphere clone. We show you how to get started setting up a PVE high-availability cluster.

  • OpenSSI

    The OpenSSI framework rearranges processes for easy and transparent clustering.

comments powered by Disqus

Direct Download

Read full article as PDF:

News