The Kosmos distributed FS
Unfortunately, chunk servers do not bother remembering which parts of which file are stored on which member server. For this reason, a metadata server (or metaserver, for short) is deployed to monitor a number of chunk servers (the Google filesystem refers to these metaservers as masters). As the name suggests, the metaservers store the metadata, including details of which chunk server has which part of a file, the corresponding file sizes and file names, and information on which processes are currently accessing each file.
At regular intervals, the metaserver checks the capacity of the chunk servers assigned to it. If necessary, it will migrate chunks from a server with a heavy load to a less busy machine (rebalancing). This optimizes use of available capacities, thus improving the performance in general.
Applications use the client library to access this infrastructure (Figure 2). The library includes a complete filesystem API that allows clients to store (large) files on KFS and to manipulate and read existing files in the normal way.
In contrast to its competitor HDFS, KFS supports writing to multiple arbitrary positions in a file or appending data to existing files.
Unfortunately, the client library is the only door to the distributed filesystem, except for a couple of minimal tools (see the box titled "Toolbox"). Consequently, there is no escaping modifying your own programs, and the choice of programming languages is restricted to C++ or Python. Java programmers can use the JNI native interface. In a clever move, the KFS developers have added an API for the HDFS filesystem, a competitor to KFS; programs written for HDFS can be ported easily to KFS.
Kosmos FS is provided in the form of a handy source code archive that you can only build on a 64-bit system. Apart from this, Kosmos is fairly frugal in its requirements: besides CMake, you just need the log4cpp and Boost libraries. After fulfilling the requirements, just unpack the archive and open the CmakeLists.txt file.
By default, the compiler will build the KFS programs and libraries with debug information. If you prefer to do without debugging, change the value in quotes that follows CMAKE_BUILD_TYPE from Debug to Release. If you need FUSE support (see the "Toolbox" box for details), uncomment the
# set (Fuse_LIBRARY_DIR "")
line and add the path to the FUSE library in quotes.
The administrator needs to enter a couple of commands to build and install KFS. To start, change to the KFS source code directory, which is ~/kfs-0.1.1 in this example. When you get there, enter the following commands:
mkdir build cd build cmake ~/kfs-0.1.1 gmake gmake install
The last command suggests a system installation, but what actually happens is that the programs created in the previous step are moved to ~/kfs-0.1.1/build/bin and the corresponding libraries to ~/kfs-0.1.1/build/lib or ~/kfs-0.1.1/build/lib-static.
If you need a Java interface, you can change to the KFS directory, ~/kfs-0.1.1, and launch ant jar.
If everything has worked out okay, the kfs.jar file should be in the build subdirectory. This package contains everything you need to develop Java programs that use KFS.
A Python interface is slightly more complex. Start by changing directory to ~/kfs-0.1.1/src/cc/access, then open the file kfs_setup.py in an editor and modify the include paths.
Next, give the python kfs_setup.py ~/kfs-0.1.1/build/lib build command. This creates kfs.so in the build directory, which you can then integrate with your Python system by typing python kfs_setup.py ~/kfs-0.1.1/build/lib/ install.
The client library gives applications convenient access to filesystem functionality, but to check the content of a directory would mean programming a tool for the task. The KFS package has a special Shell to remove the need for extra programming. The Shell provides counterparts to popular Unix tools, including ls, cp, and mv. Thanks to the Shell, users can navigate the KFS tree in the normal way. To launch the Shell, you need to execute a script in the scripts directory below the source code archive:
python kfsshell.py -f Konfigurationsdatei.cfg -b ~/kfs-0.1.1/build/bin/KfsPing
KfsPing is an advanced ping that provides a useful service monitoring KFS servers. Typing KfsPing -h displays help. Other useful tools are located in the build/bin/tools directory.
If you do not like the idea of special commands, your alternative on Linux is FUSE support (Filesystem in Userspace), a kernel module that migrates a filesystem driver to user mode. FUSE allows users to mount KFS like a normal hard disk partition and then deploy the full range of Linux tools.
Buy this article as PDF
Linux Magazine will include the best of both magazines.
Popular open source encryption tool is vulnerable to attack
New “Yakkety Yak” edition emphasizes cloud and servers
Google finally enters the phone hardware business.
Innovative system adds a hard drive and Ubuntu Core to the RPi for an IoT hub.
Linux is two weeks younger than we thought!
The Apache Software Foundation considers retiring OpenOffice
Adobe won’t kill the plugin in 2017
Linux Foundation's big event celebrates the 25th anniversary of Linux