The Kosmos distributed FS

Launching KFS

The next step distributes the binary files to the meta- and chunk servers. A Python script in the ~/kfs-0.1.1/scripts directory takes care of this, creating a customized program package for each server and then securing the installation with SSH.

To allow this to happen, all of your servers should run the same Linux environment, or at least the distributions should not be wildly different. Configuring SSH with keypairs removes the need to keep entering multiple passwords.

Topology

The only thing missing now is the configuration file that tells the script which computers on the network will be handling which task. Listing 1 shows a sample configuration file.

Listing 1

Kosmos FS Sample Configuration

01 [metaserver]
02 node: 192.168.1.100
03 rundir: /home/tim/kfs/metaserver
04 baseport: 20000
05 [chunkserver1]
06 node: 192.168.1.101
07 rundir: /home/tim/kfs/chunk1
08 baseport: 30000
09 space: 30 G
10 [chunkserver2]
11 node: 192.168.1.102
12 rundir: /home/tim/kfs/chunk2
13 baseport: 30000
14 space: 18000 M

The file has a separate section for each server involved, headed by the server name in square brackets. The minimal requirement is a [metaserver] section.

Following is a section for each chunk server, which typically takes the form of [chunkserver1] through [chunkserverN]. The KFS cluster in this example comprises a metaserver and two cluster servers. Each section contains the settings for one server.

node: is followed by the name of the IP address for the server. rundir: is followed by the directory in which the binaries will be stored (in the example in Listing 1, this is the home directory for the tim user account on each server). The baseport: keyword specifies the TCP port that the server will use to communicate with the other nodes.

The computer names do not need to be different. In fact, Kosmos FS will let you run all the servers on a single machine – and this can be localhost – but in cases like this, you must assign unique TCP ports to your metaservers and cluster servers.

Each chunk server has a space: option that specifies how much disk space the server will use to save data. In the example here, the first chunk server provides 30GB, the second slightly less, 18,000MB. Sample configuration files are available in the conf directory below the source code archive.

Command Center

Now that the configuration file is complete, the next step is to change directory to scripts and enable the following:

python kfssetup.py -f configuration_file.cfg -b ../build/bin

Thanks to the configuration file, all the servers and SSH can be launched centrally from the current machine:

python kfslaunch.py -f configuration_file.cfg --start

The following call shuts the system down:

python kfslaunch.py -f configuration_file<.cfg --stop

Specifying the configuration file is important and lets users manage different KFS clusters from a single console.

Now that the servers are running, users can start moving data onto the enormous new filesystem using either the special KFS Shell (see the box titled "Toolbox" for more details) or via the API. A simple example of a C++ program that stores its data in KFS is given in Listing 2.

Listing 2

Creating a File

01 ...
02 #include "libkfsClient/KfsClient.h"
03
04 using namespace KFS; // KFS Namespace:
05
06 int main(int argc, char **argv)
07 {
08     string serverHost = "localhost";
09     int port = 20000;
10
11     KfsClient *gKfsClient;
12
13     // Get access to filesystem:
14     gKfsClient = KfsClient::Instance();
15     gKfsClient->Init    (serverHost, port);
16
17     // Create subdirectory:
18     gKfsClient->Mkdirs("testdir");
19
20     // Open file, "fd" is the handle:
21     int fd = gKfsClient-> Create("testdir/foo.1");
22
23     // Write junk:
24     int numBytes=2048;
25     char *buffer = new char[numBytes];
26     gKfsClient->Write(fd, buffer, numBytes);
27
28     // Flush changes:
29     gKfsClient->Sync(fd);
30
31     // Close file:
32     gKfsClient->Close(fd);
33 }

Unfortunately, the header files are hidden away in the depths of the source code archive in src/cc. This also applies to the libraries, which are located in build/lib:

g
++ test.cpp -I ~/kfs-0.1.1/src/cc -L ~/kfs-0.1.1/build/lib/ -lkfsClient -lkfsIO -lkfsCommon

Before calling the results, LD_LIBRARY_PATH has to be set:

export LD_LIBRARY_PATH=~/kfs-0.1.1/build

To save the linker the trouble of searching for the dynamic libraries, you can link your own programs with the static variant, which is located in ~kfs-0.1.1/build/lib-static.

To handle huge volumes of data, a KFS application simply opens a new file via the client library.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • RAID Performance

    You can improve performance up to 20% by using the right parameters when you configure the filesystems on your RAID devices.

  • Partition Backup

    A partition backup offers several advantages over legacy, file-based backup alternatives, and using a backup server adds even more convenience. We’ll show you some free tools for partition backup over the network.

  • File Transport

    Various alternatives let you work around pesky size limits when transferring a file from point A to point B.

  • Ask Klaus!
  • Offline FS

    Tired of copying and recopying files from your laptop to the office file server? Maybe you need an automated offline filesystem, such as OFS.

comments powered by Disqus

Direct Download

Read full article as PDF:

048-051_kosmos.pdf  (356.70 kB)

News