nbodykit.io.bigfile module¶

class nbodykit.io.bigfile.BigFile(path, exclude=None, header='.', dataset='./')[source]¶

Bases: nbodykit.io.base.FileType

A file object to handle the reading of columns of data from a bigfile file.

bigfile is a reproducible, massively parallel IO library for large, hierarchical datasets, and it is the default format of the FastPM and the MP-Gadget simulations.

Parameters:	path (str) – the name of the directory holding the bigfile data exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header header (str, optional) – the path to the header dataset (str) – load a specific dataset from the bigfile

Attributes

`columns`	A list of the names of the columns in the file.
`dtype`	A `numpy.dtype` object holding the data types of each column in the file.
`ncol`	The number of data columns in the file.
`shape`	The shape of the file, which defaults to `(size, )`
`size`	The size of the file, i.e., number of rows

Methods

`asarray`()	Return a view of the file, where the fields of the
`get_dask`(column[, blocksize])	Return the specified column as a dask array, which
`keys`()	Aliased function to return `columns`
`read`(columns, start, stop[, step])	Read the specified column(s) over the given range,

__getitem__(s)¶

This function provides numpy-like array indexing of the file object.

It supports:

integer, slice-indexing similar to arrays
string indexing using column names in keys()
array-like indexing using integer lists or boolean arrays

Note

If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.

asarray()¶

Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array

Examples

Start with a file object with three named columns, ra, dec, and z

>>> ff.dtype
dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')])
>>> ff.shape
(1000,)
>>> ff.columns
['ra', 'dec', 'z']
>>> ff[:3]
array([(235.63442993164062, 59.39099884033203, 0.6225500106811523),
       (140.36181640625, -1.162310004234314, 0.5026500225067139),
       (129.96627807617188, 45.970130920410156, 0.4990200102329254)],
      dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))

Select a subset of columns and switch the ordering and convert output to a single numpy array

>>> x = ff[['dec', 'ra']].asarray()
>>> x.dtype
dtype('float32')
>>> x.shape
(1000, 2)
>>> x.columns
['dec', 'ra']
>>> x[:3]
array([[  59.39099884,  235.63442993],
       [  -1.16231   ,  140.36181641],
       [  45.97013092,  129.96627808]], dtype=float32)

Now, select only the first column (dec)

>>> dec = x[:,0]
>>> dec[:3]
array([ 59.39099884,  -1.16231   ,  45.97013092], dtype=float32)

Returns:	a file object that will return a numpy array with the columns representing the fields
Return type:	FileType

columns¶

A list of the names of the columns in the file.

This defaults to the named fields in the file’s dtype attribute, but differ from this if a view of the file has been returned with asarray()

dtype¶: A numpy.dtype object holding the data types of each column in the file.

get_dask(column, blocksize=100000)¶

Return the specified column as a dask array, which delays the explicit reading of the data until dask.compute() is called

The dask array is chunked into blocks of size blocksize

Parameters:	column (str) – the name of the column to return blocksize (int, optional) – the size of the chunks in the dask array
Returns:	the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies
Return type:	`dask.array.Array`

keys()¶: Aliased function to return columns

logger = <logging.Logger object>¶

ncol¶: The number of data columns in the file.

read(columns, start, stop, step=1)[source]¶

Read the specified column(s) over the given range, as a dictionary

‘start’ and ‘stop’ should be between 0 and size, which is the total size of the binary file (in particles)

shape¶

The shape of the file, which defaults to (size, )

Multiple dimensions can be introduced into the shape if a view of the file has been returned with asarray()

size¶: The size of the file, i.e., number of rows