HDF5 for efficient I/O
Fast Containers

© Lead Image © Kirsty Pargeter, 123RF.com
HDF5 is a flexible, self-describing, and portable hierarchical filesystem supported by a number of languages and tools, with the ability to run processes in parallel.
Input/output operations are a very important part of many applications, sometimes involving a huge amount of data and a large number of reads and writes. Therefore, applications can use a very significant portion of their total run time to perform I/O, which becomes critical in Big Data, machine learning, and high-performance computing (HPC).
In a previous article [1], I discussed options for improving I/O performance, focusing on parallel I/O. One of the options mentioned was to use a high-level library to perform the I/O. A great example of such a library is the Hierarchical Data Format (HDF) [2], a standard library used primarily for scientific computing.
In this article, I introduce HDF5 and focus on the concepts and its strengths in performing I/O; then, I look at some simple Python and Fortran code examples, before ending with an example of parallel I/O with HDF5 and Fortran.
[...]
Buy this article as PDF
(incl. VAT)