The Kosmos distributed FS

Metastases

Unfortunately, chunk servers do not bother remembering which parts of which file are stored on which member server. For this reason, a metadata server (or metaserver, for short) is deployed to monitor a number of chunk servers (the Google filesystem refers to these metaservers as masters). As the name suggests, the metaservers store the metadata, including details of which chunk server has which part of a file, the corresponding file sizes and file names, and information on which processes are currently accessing each file.

At regular intervals, the metaserver checks the capacity of the chunk servers assigned to it. If necessary, it will migrate chunks from a server with a heavy load to a less busy machine (rebalancing). This optimizes use of available capacities, thus improving the performance in general.

Clients

Applications use the client library to access this infrastructure (Figure 2). The library includes a complete filesystem API that allows clients to store (large) files on KFS and to manipulate and read existing files in the normal way.

In contrast to its competitor HDFS, KFS supports writing to multiple arbitrary positions in a file or appending data to existing files.

Unfortunately, the client library is the only door to the distributed filesystem, except for a couple of minimal tools (see the box titled "Toolbox"). Consequently, there is no escaping modifying your own programs, and the choice of programming languages is restricted to C++ or Python. Java programmers can use the JNI native interface. In a clever move, the KFS developers have added an API for the HDFS filesystem, a competitor to KFS; programs written for HDFS can be ported easily to KFS.

Quickstart

Kosmos FS is provided in the form of a handy source code archive that you can only build on a 64-bit system. Apart from this, Kosmos is fairly frugal in its requirements: besides CMake, you just need the log4cpp and Boost libraries. After fulfilling the requirements, just unpack the archive and open the CmakeLists.txt file.

By default, the compiler will build the KFS programs and libraries with debug information. If you prefer to do without debugging, change the value in quotes that follows CMAKE_BUILD_TYPE from Debug to Release. If you need FUSE support (see the "Toolbox" box for details), uncomment the

# set (Fuse_LIBRARY_DIR "")

line and add the path to the FUSE library in quotes.

The administrator needs to enter a couple of commands to build and install KFS. To start, change to the KFS source code directory, which is ~/kfs-0.1.1 in this example. When you get there, enter the following commands:

mkdir build
cd build
cmake ~/kfs-0.1.1
gmake
gmake install

The last command suggests a system installation, but what actually happens is that the programs created in the previous step are moved to ~/kfs-0.1.1/build/bin and the corresponding libraries to ~/kfs-0.1.1/build/lib or ~/kfs-0.1.1/build/lib-static.

If you need a Java interface, you can change to the KFS directory, ~/kfs-0.1.1, and launch ant jar.

If everything has worked out okay, the kfs.jar file should be in the build subdirectory. This package contains everything you need to develop Java programs that use KFS.

A Python interface is slightly more complex. Start by changing directory to ~/kfs-0.1.1/src/cc/access, then open the file kfs_setup.py in an editor and modify the include paths.

Next, give the python kfs_setup.py ~/kfs-0.1.1/build/lib build command. This creates kfs.so in the build directory, which you can then integrate with your Python system by typing python kfs_setup.py ~/kfs-0.1.1/build/lib/ install.

Toolbox

The client library gives applications convenient access to filesystem functionality, but to check the content of a directory would mean programming a tool for the task. The KFS package has a special Shell to remove the need for extra programming. The Shell provides counterparts to popular Unix tools, including ls, cp, and mv. Thanks to the Shell, users can navigate the KFS tree in the normal way. To launch the Shell, you need to execute a script in the scripts directory below the source code archive:

python kfsshell.py -f Konfigurationsdatei.cfg -b ~/kfs-0.1.1/build/bin/KfsPing

KfsPing is an advanced ping that provides a useful service monitoring KFS servers. Typing KfsPing -h displays help. Other useful tools are located in the build/bin/tools directory.

If you do not like the idea of special commands, your alternative on Linux is FUSE support (Filesystem in Userspace), a kernel module that migrates a filesystem driver to user mode. FUSE allows users to mount KFS like a normal hard disk partition and then deploy the full range of Linux tools.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • RAID Performance

    You can improve performance up to 20% by using the right parameters when you configure the filesystems on your RAID devices.

  • Partition Backup

    A partition backup offers several advantages over legacy, file-based backup alternatives, and using a backup server adds even more convenience. We’ll show you some free tools for partition backup over the network.

  • File Transport

    Various alternatives let you work around pesky size limits when transferring a file from point A to point B.

  • Ask Klaus!
  • Offline FS

    Tired of copying and recopying files from your laptop to the office file server? Maybe you need an automated offline filesystem, such as OFS.

comments powered by Disqus

Direct Download

Read full article as PDF:

048-051_kosmos.pdf  (356.70 kB)

News