nbodykit.io.bigfile¶
Classes
|
|
|
A file object to handle the reading of columns of data from a |
- class nbodykit.io.bigfile.BigFile(path, exclude=None, header=<class 'nbodykit.io.bigfile.Automatic'>, dataset='./')[source]¶
A file object to handle the reading of columns of data from a
bigfile
file.bigfile
is a reproducible, massively parallel IO library for large, hierarchical datasets, and it is the default format of the FastPM and the MP-Gadget simulations.See also: https://github.com/rainwoodman/bigfile
- Parameters
path (str) – the name of the directory holding the bigfile data
exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header. If any list is given, the name of the header column must be given too if it is not part of the data set. The names are shell glob patterns.
header (str, or list, optional) – the path to the header; default is to use a column ‘Header’. It is relative to the file, not the dataset. If a list is provided, the attributes is updated from the first entry to the last.
dataset (str) – finding columns from a specific dataset in the bigfile; the default is start looking for columns from the root.
- Attributes
Methods
asarray
()Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
get_dask
(column[, blocksize])Return the specified column as a dask array, which delays the explicit reading of the data until
dask.compute()
is calledkeys
()Aliased function to return
columns
read
(columns, start, stop[, step])Read the specified column(s) over the given range, as a dictionary
- __getitem__(s)¶
This function provides numpy-like array indexing of the file object.
It supports:
integer, slice-indexing similar to arrays
string indexing using column names in
keys()
array-like indexing using integer lists or boolean arrays
Note
If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.
- asarray()¶
Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
Examples
Start with a file object with three named columns,
ra
,dec
, andz
>>> ff.dtype dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]) >>> ff.shape (1000,) >>> ff.columns ['ra', 'dec', 'z'] >>> ff[:3] array([(235.63442993164062, 59.39099884033203, 0.6225500106811523), (140.36181640625, -1.162310004234314, 0.5026500225067139), (129.96627807617188, 45.970130920410156, 0.4990200102329254)], dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))
Select a subset of columns and switch the ordering and convert output to a single numpy array
>>> x = ff[['dec', 'ra']].asarray() >>> x.dtype dtype('float32') >>> x.shape (1000, 2) >>> x.columns ['dec', 'ra'] >>> x[:3] array([[ 59.39099884, 235.63442993], [ -1.16231 , 140.36181641], [ 45.97013092, 129.96627808]], dtype=float32)
Now, select only the first column (
dec
)>>> dec = x[:,0] >>> dec[:3] array([ 59.39099884, -1.16231 , 45.97013092], dtype=float32)
- Returns
a file object that will return a numpy array with the columns representing the fields
- Return type
- property columns¶
A list of the names of the columns in the file.
This defaults to the named fields in the file’s
dtype
attribute, but differ from this if a view of the file has been returned withasarray()
- property dtype¶
A
numpy.dtype
object holding the data types of each column in the file.
- get_dask(column, blocksize=None)¶
Return the specified column as a dask array, which delays the explicit reading of the data until
dask.compute()
is calledThe dask array is chunked into blocks of size blocksize
- Parameters
- Returns
the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies
- Return type
- property ncol¶
The number of data columns in the file.
- read(columns, start, stop, step=1)[source]¶
Read the specified column(s) over the given range, as a dictionary
‘start’ and ‘stop’ should be between 0 and
size
, which is the total size of the binary file (in particles)
- property shape¶
The shape of the file, which defaults to
(size, )
Multiple dimensions can be introduced into the shape if a view of the file has been returned with
asarray()
- property size¶
The size of the file, i.e., number of rows