Skip to content

Blosc/HDF5-Blosc2

Repository files navigation

Blosc2 filter for HDF5

Travis CI:travis
And...:powered

This is a filter for HDF5 that uses the Blosc2 compressor; by installing this filter, you can read and write HDF5 files with Blosc2-compressed datasets.

You need to be a bit careful before using this filter because you should not activate the shuffle right in HDF5, but rather from Blosc2 itself. This is because Blosc2 uses an SIMD shuffle internally which is much faster.

Installing the Blosc2 filter plugin

Instead of just linking this Blosc2 filter into your HDF5 application, it is possible to install it as a system-wide HDF5 plugin (with HDF5 1.8.11 or later). This is useful because it allows every HDF5-using program on your system to transparently read Blosc2-compressed HDF5 files.

As described in the HDF5 plugin documentation, you just need to compile the Blosc2 plugin into a shared library and copy it to the plugin directory (which defaults to /usr/local/hdf5/lib/plugin on non-Windows systems).

Following the cmake instructions below produces a libH5Zblosc2.so shared library file (or .dylib/.dll on Mac/Windows), that you can copy to the HDF5 plugin directory.

To write Blosc2-compressed HDF5 files, on the other hand, an HDF5 using program must be specially modified to enable the Blosc2 filter when writing HDF5 datasets, as described below.

Linking the Blosc2 filter directly into your program

Instead of (or in addition to) installing the Blosc2 plugin system-wide as described above, you can also link the Blosc2 filter directly into your application. Although this only makes the Blosc2 filter available in your application (as opposed to other HDF5-using applications), it is useful in cases where installing the plugin is inconvenient. Compile the Blosc2 filter as described above, but link libblosc2_filter.a (generated by make) directly into your program.

In order to register Blosc2 in your HDF5 application, you then need to call a function in blosc2_filter.h, with the following signature:

int register_blosc2(char **version, char **date)

Calling this will register the filter with the HDF5 library and will return info about the Blosc2 release in **version and **date char pointers.

A non-negative return value indicates success. If the registration fails, an error is pushed onto the current error stack and a negative value is returned.

An example C program ('src/example.c') is included which demonstrates the proper use of the filter.

This filter has been tested against HDF5 versions 1.6.5 through 1.8.10. It is released under the MIT license (see LICENSE.txt for details).

Using the Blosc2 filter in your application

Assuming the filter is installed (either by a system-wide plugin or registered directly in your program as described above), your application can transparently read HDF5 files with Blosc2-compressed datasets. (The HDF5 library will detect that the dataset is Blosc2-compressed and invoke the filter automatically).

To write an HDF5 file with a Blosc2-compressed dataset, you call the H5Pset_filter function on the property list of the dataset you are creating, and pass FILTER_BLOSC2 (defined in blosc2_filter.h) for the filter_id parameter. In addition, HDF5 only supports compression for "chunked" datasets; this just means that you need to call H5Pset_chunk to specify a chunk size (e.g. 1MB chunks), and the subsequent chunking of the dataset I/O is performed transparently by HDF5.

Compiling

The filter consists of a single 'src/blosc2_filter.c' source file and 'src/blosc2_filter.h' header, which will need the Blosc2 library installed to work. It is simplest to just use the provided cmake build scripts, which compile and both the filter and the Blosc2 library into a library for you

Assuming you have cmake and other standard Unix build tools installed, do:

mkdir build
cd build
cmake ..
make

This generates the library/plugin files required above in the build directory.

Acknowledgments

See THANKS.rst.


Enjoy data!