The Kosmos distributed FS
Distributed filesystems effortlessly juggle enormous files in the gigabyte and terabyte ranges. The Kosmos filesystem plans to impress its competitors.
Modern computer programs handle increasingly large volumes of data. Whereas data-mining applications are content to sift through mountains of existing data, Internet search engines constantly horde new information. Users who access this data regularly encounter files of several gigabytes or more.
Legacy filesystems soon reach their limits with this kind of data and throughput. Consequently, organizations that manage huge volumes of data need an alternative solution for fast and safe access. Having redundant data storage is useful; after all, who wants to lose the valuable data gained by several days of number crunching because of a banal disk error?
Distributed filesystems fulfill these requirements. A distributed filesystem splits the data into manageable chunks and stores the chunks on a scalable cluster of computers. By virtualizing storage on the cluster, the filesystem then tricks applications into believing that they are talking to an enormous hard disk.
The Kosmos filesystem (KFS)  is a promising new entry into this field. Kosmix Corporation developed KFS and released the source code under the Apache license. The first alpha version 0.1 appeared in September 2007. KFS's relative youth shows when setting up the filesystem: KFS requires 64-bit Linux. If possible, the Linux version and distribution should be identical on all the computers involved in data storage.
KFS is up against a number of renowned competitors, including Google filesystem (GFS), which Google uses as the underpinnings for its search engine, and Hadoop project's HDFS . The KFS developers lifted much of the structure and functionality from Google, but they have removed a number of limitations. KFS – like GFS – is optimized for scenarios in which many large files are created once but read many times .
The Kosmos filesystem consists of three components:
- one or multiple chunk servers that store the data on their own hard disks,
- a metaserver that keeps an eye on the chunk servers, and
- an application that quickly gets rid of a single large file.
KFS thus works much like a database that resides between a computer program and the traditional filesystem (see Figure 1).
KFS first splits a file into handy 64MB blocks. The filesystem distributes these chunks evenly over all attached servers, aptly referred to as block or chunk servers. The servers store the blocks on normal filesystems that belong to the host operating systems.
If the chunk servers start to run out of storage capacity, the administrator can simply add a new computer to the cluster. KFS automatically adapts the new storage node, which keeps the whole system scalable and helps it keep pace with increasing storage demands.
KFS mitigates hardware errors by storing the blocks from every single file redundantly on multiple chunk servers; typically, three instances of each file placed in storage exist.
This safety net allows administrators to deploy standard PCs as cheap, but reliable, data repositories. Google FS proves that this works day after day. If a disk or server fails, you just replace it with a new one. KFS detects the replacement and automatically integrates the newcomer into the cluster.
As another preventive measure against data loss, each block has both a version number and a checksum. KFS evaluates the checksum on each read operation. In case of irregularity, the distributed filesystem deletes the defective chunk and replaces it immediately with an intact copy (re-replication).
Version numbers help to identify obsolete chunks: If a poor Internet connection temporarily separates one server from the cluster, it can identify obsolete chunks quickly when the connection is reestablished and retrieve the more recent variant from the other servers in the cluster.
Buy this article as PDF
New flaw in an old encryption scheme leaves the experts scrambling to disable SSL 3
Lennart Poettering wants to change the way Linux developers talk to each other.
Enterprise giant frees itself from ink and home PCs (and visa versa).
Mozilla’s product think tank sinks silently into history.
TODO group will focus on open source tools in large-scale environments.
New tool will look like GParted but support a wider range of storage technologies.
New public key pinning feature will help prevent man-in-the-middle attacks.
Carnegie Mellon researchers say 3 million pages could fall down the phishing hole in the next year.
The US government rolls new best-practice rules for protecting SSH.