Quick Start Guide


With Anaconda or Miniconda:

conda install h5py

With Enthought Canopy, use the GUI package manager or:

enpkg h5py

With pip or setup.py, see Installation.

Core concepts

An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is:

Groups work like dictionaries, and datasets work like NumPy arrays

The very first thing you’ll need to do is create a new file:

>>> import h5py
>>> import numpy as np
>>> f = h5py.File("mytestfile.hdf5", "w")

The File object is your starting point. It has a couple of methods which look interesting. One of them is create_dataset:

>>> dset = f.create_dataset("mydataset", (100,), dtype='i')

The object we created isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:

>>> dset.shape
>>> dset.dtype

They also support array-style slicing. This is how you read and write data from a dataset in the file:

>>> dset[...] = np.arange(100)
>>> dset[0]
>>> dset[10]
>>> dset[0:100:10]
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

For more, see File Objects and Datasets.

Groups and hierarchical organization

“HDF” stands for “Hierarchical Data Format”. Every object in an HDF5 file has a name, and they’re arranged in a POSIX-style hierarchy with /-separators:

>>> dset.name

The “folders” in this system are called groups. The File object we created is itself a group, in this case the root group, named /:

>>> f.name

Creating a subgroup is accomplished via the aptly-named create_group:

>>> grp = f.create_group("subgroup")

All Group objects also have the create_* methods like File:

>>> dset2 = grp.create_dataset("another_dataset", (50,), dtype='f')
>>> dset2.name

By the way, you don’t have to create all the intermediate groups manually. Specifying a full path works just fine:

>>> dset3 = f.create_dataset('subgroup2/dataset_three', (10,), dtype='i')
>>> dset3.name

Groups support most of the Python dictionary-style interface. You retrieve objects in the file using the item-retrieval syntax:

>>> dataset_three = f['subgroup2/dataset_three']

Iterating over a group provides the names of its members:

>>> for name in f:
...     print name

Containership testing also uses names:

>>> "mydataset" in f
>>> "somethingelse" in f

You can even use full path names:

>>> "subgroup/another_dataset" in f

There are also the familiar keys(), values(), items() and iter() methods, as well as get().

Since iterating over a group only yields its directly-attached members, iterating over an entire file is accomplished with the Group methods visit() and visititems(), which take a callable:

>>> def printname(name):
...     print name
>>> f.visit(printname)

For more, see Groups.


One of the best features of HDF5 is that you can store metadata right next to the data it describes. All groups and datasets support attached named bits of data called attributes.

Attributes are accessed through the attrs proxy object, which again implements the dictionary interface:

>>> dset.attrs['temperature'] = 99.5
>>> dset.attrs['temperature']
>>> 'temperature' in dset.attrs

For more, see Attributes.