nbodykit.io.stack¶
Classes
|
A file object that offers a continuous view of a stack of subclasses of |
-
class
nbodykit.io.stack.FileStack(filetype, path, *args, **kwargs)[source]¶ A file object that offers a continuous view of a stack of subclasses of
FileTypeinstances.This allows data to be accessed across multiple files from a single file object. The “stack” is a concatenation of one file to the end of the previous file.
- Parameters
filetype (subclass of
FileType) – the type of file class to initializepath (str) – list of file names, or string specifying single file or containing a glob-like ‘*’ pattern
*args – additional arguments to pass to the
filetypeinstance during initialization**kwargs – additional keyword arguments passed to the
filetypeinstance during initialization
- Attributes
attrsDictionary of meta-data for the stack
columnsA list of the names of the columns in the file.
dtypeA
numpy.dtypeobject holding the data types of each column in the file.ncolThe number of data columns in the file.
- ndim
nfilesThe number of files in the FileStack
shapeThe shape of the file, which defaults to
(size, )sizeThe size of the file, i.e., number of rows
Methods
asarray(self)Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
get_dask(self, column[, blocksize])Return the specified column as a dask array, which delays the explicit reading of the data until
dask.compute()is calledkeys(self)Aliased function to return
columnsread(self, columns, start, stop[, step])Read the specified column(s) over the given range, returning a structured numpy array
-
__getitem__(self, s)¶ This function provides numpy-like array indexing of the file object.
It supports:
integer, slice-indexing similar to arrays
string indexing using column names in
keys()array-like indexing using integer lists or boolean arrays
Note
If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.
-
asarray(self)¶ Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
Examples
Start with a file object with three named columns,
ra,dec, andz>>> ff.dtype dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]) >>> ff.shape (1000,) >>> ff.columns ['ra', 'dec', 'z'] >>> ff[:3] array([(235.63442993164062, 59.39099884033203, 0.6225500106811523), (140.36181640625, -1.162310004234314, 0.5026500225067139), (129.96627807617188, 45.970130920410156, 0.4990200102329254)], dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))
Select a subset of columns and switch the ordering and convert output to a single numpy array
>>> x = ff[['dec', 'ra']].asarray() >>> x.dtype dtype('float32') >>> x.shape (1000, 2) >>> x.columns ['dec', 'ra'] >>> x[:3] array([[ 59.39099884, 235.63442993], [ -1.16231 , 140.36181641], [ 45.97013092, 129.96627808]], dtype=float32)
Now, select only the first column (
dec)>>> dec = x[:,0] >>> dec[:3] array([ 59.39099884, -1.16231 , 45.97013092], dtype=float32)
- Returns
a file object that will return a numpy array with the columns representing the fields
- Return type
-
property
attrs¶ Dictionary of meta-data for the stack
-
property
columns¶ A list of the names of the columns in the file.
This defaults to the named fields in the file’s
dtypeattribute, but differ from this if a view of the file has been returned withasarray()
-
property
dtype¶ A
numpy.dtypeobject holding the data types of each column in the file.
-
get_dask(self, column, blocksize=None)¶ Return the specified column as a dask array, which delays the explicit reading of the data until
dask.compute()is calledThe dask array is chunked into blocks of size blocksize
- Parameters
- Returns
the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies
- Return type
-
property
ncol¶ The number of data columns in the file.
-
property
nfiles¶ The number of files in the FileStack
-
read(self, columns, start, stop, step=1)[source]¶ Read the specified column(s) over the given range, returning a structured numpy array
- Parameters
- Returns
data – a numpy structured array holding the requested data
- Return type
array_like
-
property
shape¶ The shape of the file, which defaults to
(size, )Multiple dimensions can be introduced into the shape if a view of the file has been returned with
asarray()
-
property
size¶ The size of the file, i.e., number of rows