nbodykit.io.base¶

Functions

find_slice_chunks(index)

A generator to yield (start, stop, step) tuples which will correspond to the input selection index

Classes

FileType(dtype, size)

An abstract base class representing a file object.

class nbodykit.io.base.FileType(dtype, size)[source]¶

An abstract base class representing a file object.

Users should subclass this class and implement the read() function, responsible for reading data from the specific file type.

Attributes

columns: A list of the names of the columns in the file.
dtype: A numpy.dtype object holding the data types of each column in the file.
ncol: The number of data columns in the file.
ndim
shape: The shape of the file, which defaults to (size, )
size: The size of the file, i.e., number of rows

Methods

`asarray`()	Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
`get_dask`(column[, blocksize])	Return the specified column as a dask array, which delays the explicit reading of the data until `dask.compute()` is called
`keys`()	Aliased function to return `columns`
`read`(columns, start, stop[, step])	Read the specified column(s) over the given range, returning a structured numpy array

__getitem__(s)[source]¶

This function provides numpy-like array indexing of the file object.

It supports:

integer, slice-indexing similar to arrays
string indexing using column names in keys()
array-like indexing using integer lists or boolean arrays

Note

If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.

asarray()[source]¶

Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array

Examples

Start with a file object with three named columns, ra, dec, and z

>>> ff.dtype
dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')])
>>> ff.shape
(1000,)
>>> ff.columns
['ra', 'dec', 'z']
>>> ff[:3]
array([(235.63442993164062, 59.39099884033203, 0.6225500106811523),
       (140.36181640625, -1.162310004234314, 0.5026500225067139),
       (129.96627807617188, 45.970130920410156, 0.4990200102329254)],
      dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))

Select a subset of columns and switch the ordering and convert output to a single numpy array

>>> x = ff[['dec', 'ra']].asarray()
>>> x.dtype
dtype('float32')
>>> x.shape
(1000, 2)
>>> x.columns
['dec', 'ra']
>>> x[:3]
array([[  59.39099884,  235.63442993],
       [  -1.16231   ,  140.36181641],
       [  45.97013092,  129.96627808]], dtype=float32)

Now, select only the first column (dec)

>>> dec = x[:,0]
>>> dec[:3]
array([ 59.39099884,  -1.16231   ,  45.97013092], dtype=float32)

Returns: a file object that will return a numpy array with the columns representing the fields
Return type: FileType

property columns¶

A list of the names of the columns in the file.

This defaults to the named fields in the file’s dtype attribute, but differ from this if a view of the file has been returned with asarray()

property dtype¶: A numpy.dtype object holding the data types of each column in the file.

get_dask(column, blocksize=None)[source]¶

Return the specified column as a dask array, which delays the explicit reading of the data until dask.compute() is called

The dask array is chunked into blocks of size blocksize

Parameters

column (str) – the name of the column to return
blocksize (int, optional) – the size of the chunks in the dask array

Returns

the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies

Return type

dask.array.Array

keys()[source]¶: Aliased function to return columns

property ncol¶: The number of data columns in the file.

abstract read(columns, start, stop, step=1)[source]¶

Read the specified column(s) over the given range, returning a structured numpy array

Parameters

columns (str, list of str) – the name of the column(s) to return
start (int) – the row integer to start reading at
stop (int) – the row integer to stop reading at
step (int, optional) – the step size to use when reading; default is 1

Returns

data – a numpy structured array holding the requested data

Return type

array_like

property shape¶

The shape of the file, which defaults to (size, )

Multiple dimensions can be introduced into the shape if a view of the file has been returned with asarray()

property size¶: The size of the file, i.e., number of rows

nbodykit.io.base.find_slice_chunks(index)[source]¶

A generator to yield (start, stop, step) tuples which will correspond to the input selection index

index can be either a boolen index, or a list of integers specifying the rows to include

Parameters: index (array_like) – either a boolean array, indicating which rows to select, or integers specifying which rows to include
Yields: (start, stop, step) (tuple of int) – the slice integers to read, corresponding to a valid spart of the selection index