nbodykit.io.hdf module¶

class nbodykit.io.hdf.ColumnInfo(size, dtype, dset)¶

Bases: tuple

Attributes

`dset`	Alias for field number 2
`dtype`	Alias for field number 1
`size`	Alias for field number 0

Methods

`count`(…)
`index`((value, [start, …)	Raises ValueError if the value is not present.

__getnewargs__()¶: Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, size, dtype, dset)¶: Create new instance of ColumnInfo(size, dtype, dset)

__repr__()¶: Return a nicely formatted representation string

count(value) → integer -- return number of occurrences of value¶

dset¶: Alias for field number 2

dtype¶: Alias for field number 1

index(value[, start[, stop]]) → integer -- return first index of value.¶: Raises ValueError if the value is not present.

size¶: Alias for field number 0

class nbodykit.io.hdf.HDFFile(path, root='/', exclude=[])[source]¶

Bases: nbodykit.io.base.FileType

A file object to handle the reading of columns of data from a h5py HDF5 file.

See http://docs.h5py.org for documentation on h5py.

Parameters:	path (str) – the file path to load root (str, optional) – the start path in the HDF file, loading all data below this path exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to `root`

Attributes

`columns`	A list of the names of the columns in the file.
`dtype`	A `numpy.dtype` object holding the data types of each column in the file.
`ncol`	The number of data columns in the file.
`shape`	The shape of the file, which defaults to `(size, )`
`size`	The size of the file, i.e., number of rows

Methods

`asarray`()	Return a view of the file, where the fields of the
`get_dask`(column[, blocksize])	Return the specified column as a dask array, which
`keys`()	Aliased function to return `columns`
`read`(columns, start, stop[, step])	Read the specified column(s) over the given range

__getitem__(s)¶

This function provides numpy-like array indexing of the file object.

It supports:

integer, slice-indexing similar to arrays
string indexing using column names in keys()
array-like indexing using integer lists or boolean arrays

Note

If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.

asarray()¶

Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array

Examples

Start with a file object with three named columns, ra, dec, and z

>>> ff.dtype
dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')])
>>> ff.shape
(1000,)
>>> ff.columns
['ra', 'dec', 'z']
>>> ff[:3]
array([(235.63442993164062, 59.39099884033203, 0.6225500106811523),
       (140.36181640625, -1.162310004234314, 0.5026500225067139),
       (129.96627807617188, 45.970130920410156, 0.4990200102329254)],
      dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))

Select a subset of columns and switch the ordering and convert output to a single numpy array

>>> x = ff[['dec', 'ra']].asarray()
>>> x.dtype
dtype('float32')
>>> x.shape
(1000, 2)
>>> x.columns
['dec', 'ra']
>>> x[:3]
array([[  59.39099884,  235.63442993],
       [  -1.16231   ,  140.36181641],
       [  45.97013092,  129.96627808]], dtype=float32)

Now, select only the first column (dec)

>>> dec = x[:,0]
>>> dec[:3]
array([ 59.39099884,  -1.16231   ,  45.97013092], dtype=float32)

Returns:	a file object that will return a numpy array with the columns representing the fields
Return type:	FileType

columns¶

A list of the names of the columns in the file.

This defaults to the named fields in the file’s dtype attribute, but differ from this if a view of the file has been returned with asarray()

dtype¶: A numpy.dtype object holding the data types of each column in the file.

get_dask(column, blocksize=100000)¶

Return the specified column as a dask array, which delays the explicit reading of the data until dask.compute() is called

The dask array is chunked into blocks of size blocksize

Parameters:	column (str) – the name of the column to return blocksize (int, optional) – the size of the chunks in the dask array
Returns:	the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies
Return type:	`dask.array.Array`

keys()¶: Aliased function to return columns

logger = <logging.Logger object>¶

ncol¶: The number of data columns in the file.

read(columns, start, stop, step=1)[source]¶

Read the specified column(s) over the given range

‘start’ and ‘stop’ should be between 0 and size, which is the total size of the file

Parameters:	columns (str, list of str) – the name of the column(s) to return start (int) – the row integer to start reading at stop (int) – the row integer to stop reading at step (int, optional) – the step size to use when reading; default is 1
Returns:	structured array holding the requested columns over the specified range of rows
Return type:	numpy.array

shape¶

The shape of the file, which defaults to (size, )

Multiple dimensions can be introduced into the shape if a view of the file has been returned with asarray()

size¶: The size of the file, i.e., number of rows

nbodykit.io.hdf.find_datasets(info, attrs, name, obj)[source]¶

Recursively add a ColumnInfo named tuple to the info dict if obj is a Dataset

When obj is a structured array with named fields, a ColumnInfo tuple will be added for each of the named fields