nbodykit.io.stack module¶

class nbodykit.io.stack.FileStack(filetype, path, *args, **kwargs)[source]¶

Bases: nbodykit.io.base.FileType

A file object that offers a continuous view of a stack of subclasses of FileType instances.

This allows data to be accessed across multiple files from a single file object. The “stack” is a concatenation of one file to the end of the previous file.

Parameters:	filetype (subclass of `FileType`) – the type of file class to initialize path (str) – list of file names, or string specifying single file or containing a glob-like ‘’ pattern args – additional arguments to pass to the `filetype` instance during initialization **kwargs – additional keyword arguments passed to the `filetype` instance during initialization

Attributes

`attrs`	Dictionary of meta-data for the stack
`columns`	A list of the names of the columns in the file.
`dtype`	A `numpy.dtype` object holding the data types of each column in the file.
`ncol`	The number of data columns in the file.
`nfiles`	The number of files in the FileStack
`shape`	The shape of the file, which defaults to `(size, )`
`size`	The size of the file, i.e., number of rows

Methods

`asarray`()	Return a view of the file, where the fields of the
`get_dask`(column[, blocksize])	Return the specified column as a dask array, which
`keys`()	Aliased function to return `columns`
`read`(columns, start, stop[, step])	Read the specified column(s) over the given range,

__getitem__(s)¶

This function provides numpy-like array indexing of the file object.

It supports:

integer, slice-indexing similar to arrays
string indexing using column names in keys()
array-like indexing using integer lists or boolean arrays

Note

If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.

asarray()¶

Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array

Examples

Start with a file object with three named columns, ra, dec, and z

>>> ff.dtype
dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')])
>>> ff.shape
(1000,)
>>> ff.columns
['ra', 'dec', 'z']
>>> ff[:3]
array([(235.63442993164062, 59.39099884033203, 0.6225500106811523),
       (140.36181640625, -1.162310004234314, 0.5026500225067139),
       (129.96627807617188, 45.970130920410156, 0.4990200102329254)],
      dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))

Select a subset of columns and switch the ordering and convert output to a single numpy array

>>> x = ff[['dec', 'ra']].asarray()
>>> x.dtype
dtype('float32')
>>> x.shape
(1000, 2)
>>> x.columns
['dec', 'ra']
>>> x[:3]
array([[  59.39099884,  235.63442993],
       [  -1.16231   ,  140.36181641],
       [  45.97013092,  129.96627808]], dtype=float32)

Now, select only the first column (dec)

>>> dec = x[:,0]
>>> dec[:3]
array([ 59.39099884,  -1.16231   ,  45.97013092], dtype=float32)

Returns:	a file object that will return a numpy array with the columns representing the fields
Return type:	FileType

attrs¶: Dictionary of meta-data for the stack

columns¶

A list of the names of the columns in the file.

This defaults to the named fields in the file’s dtype attribute, but differ from this if a view of the file has been returned with asarray()

dtype¶: A numpy.dtype object holding the data types of each column in the file.

get_dask(column, blocksize=100000)¶

Return the specified column as a dask array, which delays the explicit reading of the data until dask.compute() is called

The dask array is chunked into blocks of size blocksize

Parameters:	column (str) – the name of the column to return blocksize (int, optional) – the size of the chunks in the dask array
Returns:	the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies
Return type:	`dask.array.Array`

keys()¶: Aliased function to return columns

logger = <logging.Logger object>¶

ncol¶: The number of data columns in the file.

nfiles¶: The number of files in the FileStack

read(columns, start, stop, step=1)[source]¶

Read the specified column(s) over the given range, returning a structured numpy array

Parameters:	columns (str, list of str) – the name of the column(s) to return start (int) – the row integer to start reading at stop (int) – the row integer to stop reading at step (int, optional) – the step size to use when reading; default is 1
Returns:	data – a numpy structured array holding the requested data
Return type:	array_like

shape¶

The shape of the file, which defaults to (size, )

Multiple dimensions can be introduced into the shape if a view of the file has been returned with asarray()

size¶: The size of the file, i.e., number of rows