nbodykit.io.stack.
FileStack
(filetype, path, *args, **kwargs)[source]¶Bases: nbodykit.io.base.FileType
A file object that offers a continuous view of a stack of subclasses of
FileType
instances.
This allows data to be accessed across multiple files from a single file object. The “stack” is a concatenation of one file to the end of the previous file.
Parameters: |
|
---|
Attributes
attrs |
Dictionary of meta-data for the stack |
columns |
A list of the names of the columns in the file. |
dtype |
A numpy.dtype object holding the data types of each column in the file. |
ncol |
The number of data columns in the file. |
nfiles |
The number of files in the FileStack |
shape |
The shape of the file, which defaults to (size, ) |
size |
The size of the file, i.e., number of rows |
Methods
asarray () |
Return a view of the file, where the fields of the |
get_dask (column[, blocksize]) |
Return the specified column as a dask array, which |
keys () |
Aliased function to return columns |
read (columns, start, stop[, step]) |
Read the specified column(s) over the given range, |
__getitem__
(s)¶This function provides numpy-like array indexing of the file object.
It supports:
keys()
Note
If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.
asarray
()¶Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
Examples
Start with a file object with three named columns,
ra
, dec
, and z
>>> ff.dtype
dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')])
>>> ff.shape
(1000,)
>>> ff.columns
['ra', 'dec', 'z']
>>> ff[:3]
array([(235.63442993164062, 59.39099884033203, 0.6225500106811523),
(140.36181640625, -1.162310004234314, 0.5026500225067139),
(129.96627807617188, 45.970130920410156, 0.4990200102329254)],
dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))
Select a subset of columns and switch the ordering and convert output to a single numpy array
>>> x = ff[['dec', 'ra']].asarray()
>>> x.dtype
dtype('float32')
>>> x.shape
(1000, 2)
>>> x.columns
['dec', 'ra']
>>> x[:3]
array([[ 59.39099884, 235.63442993],
[ -1.16231 , 140.36181641],
[ 45.97013092, 129.96627808]], dtype=float32)
Now, select only the first column (dec
)
>>> dec = x[:,0]
>>> dec[:3]
array([ 59.39099884, -1.16231 , 45.97013092], dtype=float32)
Returns: | a file object that will return a numpy array with the columns representing the fields |
---|---|
Return type: | FileType |
attrs
¶Dictionary of meta-data for the stack
columns
¶A list of the names of the columns in the file.
This defaults to the named fields in the file’s dtype
attribute, but differ from this if a view of the file has been
returned with asarray()
dtype
¶A numpy.dtype
object holding the data types of each column in the file.
get_dask
(column, blocksize=100000)¶Return the specified column as a dask array, which
delays the explicit reading of the data until
dask.compute()
is called
The dask array is chunked into blocks of size blocksize
Parameters: | |
---|---|
Returns: | the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies |
Return type: |
logger
= <logging.Logger object>¶ncol
¶The number of data columns in the file.
nfiles
¶The number of files in the FileStack
read
(columns, start, stop, step=1)[source]¶Read the specified column(s) over the given range, returning a structured numpy array
Parameters: | |
---|---|
Returns: | data – a numpy structured array holding the requested data |
Return type: | array_like |
shape
¶The shape of the file, which defaults to (size, )
Multiple dimensions can be introduced into the shape if
a view of the file has been returned with asarray()
size
¶The size of the file, i.e., number of rows