nbodykit.io.stack¶
Classes
|
A file object that offers a continuous view of a stack of subclasses of |
-
class
nbodykit.io.stack.
FileStack
(filetype, path, *args, **kwargs)[source]¶ A file object that offers a continuous view of a stack of subclasses of
FileType
instances.This allows data to be accessed across multiple files from a single file object. The “stack” is a concatenation of one file to the end of the previous file.
- Parameters
filetype (subclass of
FileType
) – the type of file class to initializepath (str) – list of file names, or string specifying single file or containing a glob-like ‘*’ pattern
*args – additional arguments to pass to the
filetype
instance during initialization**kwargs – additional keyword arguments passed to the
filetype
instance during initialization
- Attributes
attrs
Dictionary of meta-data for the stack
columns
A list of the names of the columns in the file.
dtype
A
numpy.dtype
object holding the data types of each column in the file.ncol
The number of data columns in the file.
- ndim
nfiles
The number of files in the FileStack
shape
The shape of the file, which defaults to
(size, )
size
The size of the file, i.e., number of rows
Methods
asarray
(self)Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
get_dask
(self, column[, blocksize])Return the specified column as a dask array, which delays the explicit reading of the data until
dask.compute()
is calledkeys
(self)Aliased function to return
columns
read
(self, columns, start, stop[, step])Read the specified column(s) over the given range, returning a structured numpy array
-
__getitem__
(self, s)¶ This function provides numpy-like array indexing of the file object.
It supports:
integer, slice-indexing similar to arrays
string indexing using column names in
keys()
array-like indexing using integer lists or boolean arrays
Note
If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.
-
asarray
(self)¶ Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
Examples
Start with a file object with three named columns,
ra
,dec
, andz
>>> ff.dtype dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]) >>> ff.shape (1000,) >>> ff.columns ['ra', 'dec', 'z'] >>> ff[:3] array([(235.63442993164062, 59.39099884033203, 0.6225500106811523), (140.36181640625, -1.162310004234314, 0.5026500225067139), (129.96627807617188, 45.970130920410156, 0.4990200102329254)], dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))
Select a subset of columns and switch the ordering and convert output to a single numpy array
>>> x = ff[['dec', 'ra']].asarray() >>> x.dtype dtype('float32') >>> x.shape (1000, 2) >>> x.columns ['dec', 'ra'] >>> x[:3] array([[ 59.39099884, 235.63442993], [ -1.16231 , 140.36181641], [ 45.97013092, 129.96627808]], dtype=float32)
Now, select only the first column (
dec
)>>> dec = x[:,0] >>> dec[:3] array([ 59.39099884, -1.16231 , 45.97013092], dtype=float32)
- Returns
a file object that will return a numpy array with the columns representing the fields
- Return type
-
property
attrs
¶ Dictionary of meta-data for the stack
-
property
columns
¶ A list of the names of the columns in the file.
This defaults to the named fields in the file’s
dtype
attribute, but differ from this if a view of the file has been returned withasarray()
-
property
dtype
¶ A
numpy.dtype
object holding the data types of each column in the file.
-
get_dask
(self, column, blocksize=None)¶ Return the specified column as a dask array, which delays the explicit reading of the data until
dask.compute()
is calledThe dask array is chunked into blocks of size blocksize
- Parameters
- Returns
the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies
- Return type
-
property
ncol
¶ The number of data columns in the file.
-
property
nfiles
¶ The number of files in the FileStack
-
read
(self, columns, start, stop, step=1)[source]¶ Read the specified column(s) over the given range, returning a structured numpy array
- Parameters
- Returns
data – a numpy structured array holding the requested data
- Return type
array_like
-
property
shape
¶ The shape of the file, which defaults to
(size, )
Multiple dimensions can be introduced into the shape if a view of the file has been returned with
asarray()
-
property
size
¶ The size of the file, i.e., number of rows