The Kosmos distributed FS
Distributed filesystems effortlessly juggle enormous files in the gigabyte and terabyte ranges. The Kosmos filesystem plans to impress its competitors.
Modern computer programs handle increasingly large volumes of data. Whereas data-mining applications are content to sift through mountains of existing data, Internet search engines constantly horde new information. Users who access this data regularly encounter files of several gigabytes or more.
Legacy filesystems soon reach their limits with this kind of data and throughput. Consequently, organizations that manage huge volumes of data need an alternative solution for fast and safe access. Having redundant data storage is useful; after all, who wants to lose the valuable data gained by several days of number crunching because of a banal disk error?
Distributed filesystems fulfill these requirements. A distributed filesystem splits the data into manageable chunks and stores the chunks on a scalable cluster of computers. By virtualizing storage on the cluster, the filesystem then tricks applications into believing that they are talking to an enormous hard disk.
The Kosmos filesystem (KFS)  is a promising new entry into this field. Kosmix Corporation developed KFS and released the source code under the Apache license. The first alpha version 0.1 appeared in September 2007. KFS's relative youth shows when setting up the filesystem: KFS requires 64-bit Linux. If possible, the Linux version and distribution should be identical on all the computers involved in data storage.
KFS is up against a number of renowned competitors, including Google filesystem (GFS), which Google uses as the underpinnings for its search engine, and Hadoop project's HDFS . The KFS developers lifted much of the structure and functionality from Google, but they have removed a number of limitations. KFS – like GFS – is optimized for scenarios in which many large files are created once but read many times .
The Kosmos filesystem consists of three components:
- one or multiple chunk servers that store the data on their own hard disks,
- a metaserver that keeps an eye on the chunk servers, and
- an application that quickly gets rid of a single large file.
KFS thus works much like a database that resides between a computer program and the traditional filesystem (see Figure 1).
KFS first splits a file into handy 64MB blocks. The filesystem distributes these chunks evenly over all attached servers, aptly referred to as block or chunk servers. The servers store the blocks on normal filesystems that belong to the host operating systems.
If the chunk servers start to run out of storage capacity, the administrator can simply add a new computer to the cluster. KFS automatically adapts the new storage node, which keeps the whole system scalable and helps it keep pace with increasing storage demands.
KFS mitigates hardware errors by storing the blocks from every single file redundantly on multiple chunk servers; typically, three instances of each file placed in storage exist.
This safety net allows administrators to deploy standard PCs as cheap, but reliable, data repositories. Google FS proves that this works day after day. If a disk or server fails, you just replace it with a new one. KFS detects the replacement and automatically integrates the newcomer into the cluster.
As another preventive measure against data loss, each block has both a version number and a checksum. KFS evaluates the checksum on each read operation. In case of irregularity, the distributed filesystem deletes the defective chunk and replaces it immediately with an intact copy (re-replication).
Version numbers help to identify obsolete chunks: If a poor Internet connection temporarily separates one server from the cluster, it can identify obsolete chunks quickly when the connection is reestablished and retrieve the more recent variant from the other servers in the cluster.
Version 16 of the popular Linux desktop reveals new tools, edge-snapping, and performance improvements.
Symantec says Linux-Darlioz burrows in through PHP.
Dell renews its quest for the ultimate developer machine.
Innovative back door looks like normal SSH traffic.
One of CeBITs most successful forums opens the new year with a new name. The popular Open Source Forum continues in 2014 under the name Special Conference: Open Source. This year, the forum will be bigger and offer a wider range of possibilities for sponsors.
New release offers better graphics drivers and expands filesystem support.
New mail protocol will shut out the NSA and prevent snooping on metadata.
A new web application helps users visualize distributed denial-of-service attacks.
Ubuntu 13.10 takes a step toward convergence, with lots of mobility, but Mir only partly here.
Galileo board is targeted to embedded developers and educational institutions.