nbodykit.base.catalog¶
Functions
|
Decorator that defines the decorated function as a column in a CatalogSource. |
Classes
|
An abstract base class representing a catalog of discrete particles. |
|
An abstract base class that implements most of the functionality in |
|
Provides access to a Column from a Catalog. |
|
A meta-class that will register all columns of a class that have been marked with the |
- class nbodykit.base.catalog.CatalogSource(comm)[source]¶
An abstract base class representing a catalog of discrete particles.
This objects behaves like a structured numpy array – it must have a well-defined size when initialized. The
size
here represents the number of particles in the source on the local rank.The information about each particle is stored as a series of columns in the format of dask arrays. These columns can be accessed in a dict-like fashion.
All subclasses of this class contain the following default columns:
Weight
Value
Selection
For a full description of these default columns, see the documentation.
Important
Subclasses of this class must set the
_size
attribute.- Parameters
comm – the MPI communicator to use for this object
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()[source]¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()[source]¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()[source]¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __setitem__(col, value)[source]¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- gslice(start, stop, end=1, redistribute=True)[source]¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)[source]¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)[source]¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.base.catalog.CatalogSourceBase(comm)[source]¶
An abstract base class that implements most of the functionality in
CatalogSource
.The main difference between this class and
CatalogSource
is that this base class does not assume the object has asize
attribute.Note
See the docstring for
CatalogSource
. Most often, users should implement custom sources as subclasses ofCatalogSource
.The names of hard-coded columns, i.e., those defined through member functions of the class, are stored in the
_defaults
and_hardcolumns
attributes. These attributes are computed by theColumnFinder
meta-class.- Parameters
comm – the MPI communicator to use for this object
- Attributes
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
Methods
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- __delitem__(col)[source]¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)[source]¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)[source]¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __setitem__(col, value)[source]¶
Add new columns to the CatalogSource, overriding any existing columns with the name
col
.Note
If the
base
attribute is set, columns will be added tobase
instead of toself
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)[source]¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()[source]¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- get_hardcolumn(col)[source]¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)[source]¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- read(columns)[source]¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)[source]¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)[source]¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)[source]¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)[source]¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.base.catalog.ColumnAccessor(catalog, daskarray, is_default=False)[source]¶
Provides access to a Column from a Catalog.
This is a thin subclass of
dask.array.Array
to provide a reference to the catalog object, an additionalattrs
attribute (for recording the reproducible meta-data), and some pretty print support.Due to particularity of
dask
, any transformation that is not explicitly in-place will return adask.array.Array
, and losing the pointer to the original catalog and the meta data attrs.- Parameters
catalog (CatalogSource) – the catalog from which the column was accessed
daskarray (dask.array.Array) – the column in dask array form
is_default (bool, optional) – whether this column is a default column; default columns are not serialized to disk, as they are automatically available as columns
- Attributes
- A
- T
blocks
Slice an array by blocks
chunks
Chunks property.
- chunksize
- dask
- dtype
- imag
itemsize
Length of one array element in bytes
- name
nbytes
Number of bytes in array
- ndim
- npartitions
- numblocks
partitions
Slice an array by partitions.
- real
- shape
- size
Number of elements in array
vindex
Vectorized indexing with broadcasting.
Methods
all
([axis, out, keepdims])This docstring was copied from numpy.ndarray.all.
any
([axis, out, keepdims])This docstring was copied from numpy.ndarray.any.
argmax
([axis, out])This docstring was copied from numpy.ndarray.argmax.
argmin
([axis, out])This docstring was copied from numpy.ndarray.argmin.
argtopk
(k[, axis, split_every])The indices of the top k elements of an array.
astype
(dtype, **kwargs)Copy of the array, cast to a specified type.
choose
(choices[, out, mode])This docstring was copied from numpy.ndarray.choose.
clip
([min, max, out])This docstring was copied from numpy.ndarray.clip.
compute
()Compute this dask collection
Compute the chunk sizes for a Dask array.
copy
()Copy array.
cumprod
([axis, dtype, out])This docstring was copied from numpy.ndarray.cumprod.
cumsum
([axis, dtype, out])This docstring was copied from numpy.ndarray.cumsum.
dot
(b[, out])This docstring was copied from numpy.ndarray.dot.
flatten
([order])This docstring was copied from numpy.ndarray.ravel.
map_blocks
(*args[, name, token, dtype, ...])Map a function across all blocks of a dask array.
map_overlap
(func, depth[, boundary, trim])Map a function over blocks of the array with some overlap
max
([axis, out, keepdims, initial, where])This docstring was copied from numpy.ndarray.max.
mean
([axis, dtype, out, keepdims])This docstring was copied from numpy.ndarray.mean.
min
([axis, out, keepdims, initial, where])This docstring was copied from numpy.ndarray.min.
moment
(order[, axis, dtype, keepdims, ddof, ...])Calculate the nth centralized moment.
nonzero
()This docstring was copied from numpy.ndarray.nonzero.
persist
(**kwargs)Persist this dask collection into memory
prod
([axis, dtype, out, keepdims, initial, ...])This docstring was copied from numpy.ndarray.prod.
ravel
([order])This docstring was copied from numpy.ndarray.ravel.
rechunk
([chunks, threshold, ...])See da.rechunk for docstring
repeat
(repeats[, axis])This docstring was copied from numpy.ndarray.repeat.
reshape
(shape[, order])This docstring was copied from numpy.ndarray.reshape.
round
([decimals, out])This docstring was copied from numpy.ndarray.round.
squeeze
([axis])This docstring was copied from numpy.ndarray.squeeze.
std
([axis, dtype, out, ddof, keepdims])This docstring was copied from numpy.ndarray.std.
store
(targets[, lock, regions, compute, ...])Store dask arrays in array-like objects, overwrite data in target
sum
([axis, dtype, out, keepdims, initial, where])This docstring was copied from numpy.ndarray.sum.
swapaxes
(axis1, axis2)This docstring was copied from numpy.ndarray.swapaxes.
to_dask_dataframe
([columns, index, meta])Convert dask Array to dask Dataframe
to_delayed
([optimize_graph])Convert into an array of
dask.delayed
objects, one per chunk.to_hdf5
(filename, datapath, **kwargs)Store array in HDF5 file
to_svg
([size])Convert chunks from Dask Array into an SVG Image
to_tiledb
(uri, *args, **kwargs)Save array to the TileDB storage manager
to_zarr
(*args, **kwargs)Save array to the zarr storage format
topk
(k[, axis, split_every])The top k elements of an array.
trace
([offset, axis1, axis2, dtype, out])This docstring was copied from numpy.ndarray.trace.
transpose
(*axes)This docstring was copied from numpy.ndarray.transpose.
var
([axis, dtype, out, ddof, keepdims])This docstring was copied from numpy.ndarray.var.
view
([dtype, order])Get a view of the array as a new data type
visualize
([filename, format, optimize_graph])Render the computation of this object's task graph using graphviz.
as_daskarray
conj
- static __dask_optimize__(dsk, keys, **kwargs)[source]¶
Optimize the dask object.
Note
The dask default optimizer induces too many (unnecesarry) IO calls. We turn this feature off by default, and only apply a culling.
- __repr__()¶
>>> import dask.array as da >>> da.ones((10, 10), chunks=(5, 5), dtype='i4') dask.array<..., shape=(10, 10), dtype=int32, chunksize=(5, 5), chunktype=numpy.ndarray>
- all(axis=None, out=None, keepdims=False)¶
This docstring was copied from numpy.ndarray.all.
Some inconsistencies with the Dask version may exist.
Returns True if all elements evaluate to True.
Refer to numpy.all for full documentation.
See also
numpy.all
equivalent function
- any(axis=None, out=None, keepdims=False)¶
This docstring was copied from numpy.ndarray.any.
Some inconsistencies with the Dask version may exist.
Returns True if any of the elements of a evaluate to True.
Refer to numpy.any for full documentation.
See also
numpy.any
equivalent function
- argmax(axis=None, out=None)¶
This docstring was copied from numpy.ndarray.argmax.
Some inconsistencies with the Dask version may exist.
Return indices of the maximum values along the given axis.
Refer to numpy.argmax for full documentation.
See also
numpy.argmax
equivalent function
- argmin(axis=None, out=None)¶
This docstring was copied from numpy.ndarray.argmin.
Some inconsistencies with the Dask version may exist.
Return indices of the minimum values along the given axis of a.
Refer to numpy.argmin for detailed documentation.
See also
numpy.argmin
equivalent function
- argtopk(k, axis=- 1, split_every=None)¶
The indices of the top k elements of an array.
See
da.argtopk
for docstring
- astype(dtype, **kwargs)¶
Copy of the array, cast to a specified type.
- Parameters
dtype (str or dtype) – Typecode or data-type to which the array is cast.
casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –
Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
- ’same_kind’ means only safe casts or casts within a kind,
like float64 to float32, are allowed.
’unsafe’ means any data conversions may be done.
copy (bool, optional) – By default, astype always returns a newly allocated array. If this is set to False and the dtype requirement is satisfied, the input array is returned instead of a copy.
- property blocks¶
Slice an array by blocks
This allows blockwise slicing of a Dask array. You can perform normal Numpy-style slicing but now rather than slice elements of the array you slice along blocks so, for example,
x.blocks[0, ::2]
produces a new dask array with every other block in the first row of blocks.You can index blocks in any way that could index a numpy array of shape equal to the number of blocks in each dimension, (available as array.numblocks). The dimension of the output array will be the same as the dimension of this array, even if integer indices are passed. This does not support slicing with
np.newaxis
or multiple lists.Examples
>>> import dask.array as da >>> x = da.arange(10, chunks=2) >>> x.blocks[0].compute() array([0, 1]) >>> x.blocks[:3].compute() array([0, 1, 2, 3, 4, 5]) >>> x.blocks[::2].compute() array([0, 1, 4, 5, 8, 9]) >>> x.blocks[[-1, 0]].compute() array([8, 9, 0, 1])
- Return type
A Dask array
- choose(choices, out=None, mode='raise')¶
This docstring was copied from numpy.ndarray.choose.
Some inconsistencies with the Dask version may exist.
Use an index array to construct a new array from a set of choices.
Refer to numpy.choose for full documentation.
See also
numpy.choose
equivalent function
- property chunks¶
Chunks property.
- clip(min=None, max=None, out=None, **kwargs)¶
This docstring was copied from numpy.ndarray.clip.
Some inconsistencies with the Dask version may exist.
Return an array whose values are limited to
[min, max]
. One of max or min must be given.Refer to numpy.clip for full documentation.
See also
numpy.clip
equivalent function
- compute()[source]¶
Compute this dask collection
This turns a lazy Dask collection into its in-memory equivalent. For example a Dask array turns into a NumPy array and a Dask dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.
- Parameters
scheduler (string, optional) – Which scheduler to use like “threads”, “synchronous” or “processes”. If not provided, the default is to check the global settings first, and then fall back to the collection defaults.
optimize_graph (bool, optional) – If True [default], the graph is optimized before computation. Otherwise the graph is run as is. This can be useful for debugging.
kwargs – Extra keywords to forward to the scheduler function.
See also
dask.base.compute
- compute_chunk_sizes()¶
Compute the chunk sizes for a Dask array. This is especially useful when the chunk sizes are unknown (e.g., when indexing one Dask array with another).
Notes
This function modifies the Dask array in-place.
Examples
>>> import dask.array as da >>> import numpy as np >>> x = da.from_array([-2, -1, 0, 1, 2], chunks=2) >>> x.chunks ((2, 2, 1),) >>> y = x[x <= 0] >>> y.chunks ((nan, nan, nan),) >>> y.compute_chunk_sizes() # in-place computation dask.array<getitem, shape=(3,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray> >>> y.chunks ((2, 1, 0),)
- copy()¶
Copy array. This is a no-op for dask.arrays, which are immutable
- cumprod(axis=None, dtype=None, out=None)¶
This docstring was copied from numpy.ndarray.cumprod.
Some inconsistencies with the Dask version may exist.
Dask added an additional keyword-only argument
method
.- method{‘sequential’, ‘blelloch’}, optional
Choose which method to use to perform the cumprod. Default is ‘sequential’.
‘sequential’ performs the cumprod of each prior block before the current block.
‘blelloch’ is a work-efficient parallel cumprod. It exposes parallelism by first taking the product of each block and combines the products via a binary tree. This method may be faster or more memory efficient depending on workload, scheduler, and hardware. More benchmarking is necessary.
Return the cumulative product of the elements along the given axis.
Refer to numpy.cumprod for full documentation.
See also
numpy.cumprod
equivalent function
- cumsum(axis=None, dtype=None, out=None)¶
This docstring was copied from numpy.ndarray.cumsum.
Some inconsistencies with the Dask version may exist.
Dask added an additional keyword-only argument
method
.- method{‘sequential’, ‘blelloch’}, optional
Choose which method to use to perform the cumsum. Default is ‘sequential’.
‘sequential’ performs the cumsum of each prior block before the current block.
‘blelloch’ is a work-efficient parallel cumsum. It exposes parallelism by first taking the sum of each block and combines the sums via a binary tree. This method may be faster or more memory efficient depending on workload, scheduler, and hardware. More benchmarking is necessary.
Return the cumulative sum of the elements along the given axis.
Refer to numpy.cumsum for full documentation.
See also
numpy.cumsum
equivalent function
- dot(b, out=None)¶
This docstring was copied from numpy.ndarray.dot.
Some inconsistencies with the Dask version may exist.
Dot product of two arrays.
Refer to numpy.dot for full documentation.
See also
numpy.dot
equivalent function
Examples
>>> a = np.eye(2) >>> b = np.ones((2, 2)) * 2 >>> a.dot(b) array([[2., 2.], [2., 2.]])
This array method can be conveniently chained:
>>> a.dot(b).dot(b) array([[8., 8.], [8., 8.]])
- flatten([order])¶
This docstring was copied from numpy.ndarray.ravel.
Some inconsistencies with the Dask version may exist.
Return a flattened array.
Refer to numpy.ravel for full documentation.
See also
numpy.ravel
equivalent function
ndarray.flat
a flat iterator on the array.
- property itemsize¶
Length of one array element in bytes
- map_blocks(*args, name=None, token=None, dtype=None, chunks=None, drop_axis=[], new_axis=None, meta=None, **kwargs)¶
Map a function across all blocks of a dask array.
Note that
map_blocks
will attempt to automatically determine the output array type by callingfunc
on 0-d versions of the inputs. Please refer to themeta
keyword argument below if you expect that the function will not succeed when operating on 0-d arrays.- Parameters
func (callable) – Function to apply to every block in the array.
args (dask arrays or other objects) –
dtype (np.dtype, optional) – The
dtype
of the output array. It is recommended to provide this. If not provided, will be inferred by applying the function to a small set of fake data.chunks (tuple, optional) – Chunk shape of resulting blocks if the function does not preserve shape. If not provided, the resulting array is assumed to have the same block structure as the first input array.
drop_axis (number or iterable, optional) – Dimensions lost by the function.
new_axis (number or iterable, optional) – New dimensions created by the function. Note that these are applied after
drop_axis
(if present).token (string, optional) – The key prefix to use for the output array. If not provided, will be determined from the function name.
name (string, optional) – The key name to use for the output array. Note that this fully specifies the output key name, and must be unique. If not provided, will be determined by a hash of the arguments.
meta (array-like, optional) – The
meta
of the output array, when specified is expected to be an array of the same type and dtype of that returned when calling.compute()
on the array returned by this function. When not provided,meta
will be inferred by applying the function to a small set of fake data, usually a 0-d array. It’s important to ensure thatfunc
can successfully complete computation without raising exceptions when 0-d is passed to it, providingmeta
will be required otherwise. If the output type is known beforehand (e.g.,np.ndarray
,cupy.ndarray
), an empty array of such type dtype can be passed, for example:meta=np.array((), dtype=np.int32)
.**kwargs – Other keyword arguments to pass to function. Values must be constants (not dask.arrays)
See also
dask.array.blockwise
Generalized operation with control over block alignment.
Examples
>>> import dask.array as da >>> x = da.arange(6, chunks=3)
>>> x.map_blocks(lambda x: x * 2).compute() array([ 0, 2, 4, 6, 8, 10])
The
da.map_blocks
function can also accept multiple arrays.>>> d = da.arange(5, chunks=2) >>> e = da.arange(5, chunks=2)
>>> f = da.map_blocks(lambda a, b: a + b**2, d, e) >>> f.compute() array([ 0, 2, 6, 12, 20])
If the function changes shape of the blocks then you must provide chunks explicitly.
>>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))
You have a bit of freedom in specifying chunks. If all of the output chunk sizes are the same, you can provide just that chunk size as a single tuple.
>>> a = da.arange(18, chunks=(6,)) >>> b = a.map_blocks(lambda x: x[:3], chunks=(3,))
If the function changes the dimension of the blocks you must specify the created or destroyed dimensions.
>>> b = a.map_blocks(lambda x: x[None, :, None], chunks=(1, 6, 1), ... new_axis=[0, 2])
If
chunks
is specified butnew_axis
is not, then it is inferred to add the necessary number of axes on the left.Map_blocks aligns blocks by block positions without regard to shape. In the following example we have two arrays with the same number of blocks but with different shape and chunk sizes.
>>> x = da.arange(1000, chunks=(100,)) >>> y = da.arange(100, chunks=(10,))
The relevant attribute to match is numblocks.
>>> x.numblocks (10,) >>> y.numblocks (10,)
If these match (up to broadcasting rules) then we can map arbitrary functions across blocks
>>> def func(a, b): ... return np.array([a.max(), b.max()])
>>> da.map_blocks(func, x, y, chunks=(2,), dtype='i8') dask.array<func, shape=(20,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>
>>> _.compute() array([ 99, 9, 199, 19, 299, 29, 399, 39, 499, 49, 599, 59, 699, 69, 799, 79, 899, 89, 999, 99])
Your block function get information about where it is in the array by accepting a special
block_info
orblock_id
keyword argument.>>> def func(block_info=None): ... pass
This will receive the following information:
>>> block_info {0: {'shape': (1000,), 'num-chunks': (10,), 'chunk-location': (4,), 'array-location': [(400, 500)]}, None: {'shape': (1000,), 'num-chunks': (10,), 'chunk-location': (4,), 'array-location': [(400, 500)], 'chunk-shape': (100,), 'dtype': dtype('float64')}}
For each argument and keyword arguments that are dask arrays (the positions of which are the first index), you will receive the shape of the full array, the number of chunks of the full array in each dimension, the chunk location (for example the fourth chunk over in the first dimension), and the array location (for example the slice corresponding to
40:50
). The same information is provided for the output, with the keyNone
, plus the shape and dtype that should be returned.These features can be combined to synthesize an array from scratch, for example:
>>> def func(block_info=None): ... loc = block_info[None]['array-location'][0] ... return np.arange(loc[0], loc[1])
>>> da.map_blocks(func, chunks=((4, 4),), dtype=np.float_) dask.array<func, shape=(8,), dtype=float64, chunksize=(4,), chunktype=numpy.ndarray>
>>> _.compute() array([0, 1, 2, 3, 4, 5, 6, 7])
block_id
is similar toblock_info
but contains only thechunk_location
:>>> def func(block_id=None): ... pass
This will receive the following information:
>>> block_id (4, 3)
You may specify the key name prefix of the resulting task in the graph with the optional
token
keyword argument.>>> x.map_blocks(lambda x: x + 1, name='increment') dask.array<increment, shape=(100,), dtype=int64, chunksize=(10,), chunktype=numpy.ndarray>
For functions that may not handle 0-d arrays, it’s also possible to specify
meta
with an empty array matching the type of the expected result. In the example below,func
will result in anIndexError
when computingmeta
:>>> da.map_blocks(lambda x: x[2], da.random.random(5), meta=np.array(())) dask.array<lambda, shape=(5,), dtype=float64, chunksize=(5,), chunktype=numpy.ndarray>
Similarly, it’s possible to specify a non-NumPy array to
meta
, and provide adtype
:>>> import cupy >>> rs = da.random.RandomState(RandomState=cupy.random.RandomState) >>> dt = np.float32 >>> da.map_blocks(lambda x: x[2], rs.random(5, dtype=dt), meta=cupy.array((), dtype=dt)) dask.array<lambda, shape=(5,), dtype=float32, chunksize=(5,), chunktype=cupy.ndarray>
- map_overlap(func, depth, boundary=None, trim=True, **kwargs)¶
Map a function over blocks of the array with some overlap
We share neighboring zones between blocks of the array, then map a function, then trim away the neighboring strips.
Note that this function will attempt to automatically determine the output array type before computing it, please refer to the
meta
keyword argument inmap_blocks
if you expect that the function will not succeed when operating on 0-d arrays.- Parameters
func (function) – The function to apply to each extended block
depth (int, tuple, or dict) – The number of elements that each block should share with its neighbors If a tuple or dict then this can be different per axis
boundary (str, tuple, dict) – How to handle the boundaries. Values include ‘reflect’, ‘periodic’, ‘nearest’, ‘none’, or any constant value like 0 or np.nan
trim (bool) – Whether or not to trim
depth
elements from each block after calling the map function. Set this to False if your mapping function already does this for you**kwargs – Other keyword arguments valid in
map_blocks
Examples
>>> import dask.array as da >>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1]) >>> x = da.from_array(x, chunks=5) >>> def derivative(x): ... return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0) >>> y.compute() array([ 1, 0, 1, 1, 0, 0, -1, -1, 0])
>>> import dask.array as da >>> x = np.arange(16).reshape((4, 4)) >>> d = da.from_array(x, chunks=(2, 2)) >>> d.map_overlap(lambda x: x + x.size, depth=1).compute() array([[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
>>> func = lambda x: x + x.size >>> depth = {0: 1, 1: 1} >>> boundary = {0: 'reflect', 1: 'none'} >>> d.map_overlap(func, depth, boundary).compute() array([[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27]])
>>> x = np.arange(16).reshape((4, 4)) >>> d = da.from_array(x, chunks=(2, 2)) >>> y = d.map_overlap(lambda x: x + x[2], depth=1, meta=np.array(())) >>> y dask.array<_trim, shape=(4, 4), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> >>> y.compute() array([[ 4, 6, 8, 10], [ 8, 10, 12, 14], [20, 22, 24, 26], [24, 26, 28, 30]])
>>> import cupy >>> x = cupy.arange(16).reshape((5, 4)) >>> d = da.from_array(x, chunks=(2, 2)) >>> y = d.map_overlap(lambda x: x + x[2], depth=1, meta=cupy.array(())) >>> y dask.array<_trim, shape=(4, 4), dtype=float64, chunksize=(2, 2), chunktype=cupy.ndarray> >>> y.compute() array([[ 4, 6, 8, 10], [ 8, 10, 12, 14], [20, 22, 24, 26], [24, 26, 28, 30]])
- max(axis=None, out=None, keepdims=False, initial=<no value>, where=True)¶
This docstring was copied from numpy.ndarray.max.
Some inconsistencies with the Dask version may exist.
Return the maximum along a given axis.
Refer to numpy.amax for full documentation.
See also
numpy.amax
equivalent function
- mean(axis=None, dtype=None, out=None, keepdims=False)¶
This docstring was copied from numpy.ndarray.mean.
Some inconsistencies with the Dask version may exist.
Returns the average of the array elements along given axis.
Refer to numpy.mean for full documentation.
See also
numpy.mean
equivalent function
- min(axis=None, out=None, keepdims=False, initial=<no value>, where=True)¶
This docstring was copied from numpy.ndarray.min.
Some inconsistencies with the Dask version may exist.
Return the minimum along a given axis.
Refer to numpy.amin for full documentation.
See also
numpy.amin
equivalent function
- moment(order, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶
Calculate the nth centralized moment.
- Parameters
order (int) – Order of the moment that is returned, must be >= 2.
axis (int, optional) – Axis along which the central moment is computed. The default is to compute the moment of the flattened array.
dtype (data-type, optional) – Type to use in computing the moment. For arrays of integer type the default is float64; for arrays of float types it is the same as the array type.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.
- Returns
moment
- Return type
ndarray
References
- 1
Pebay, Philippe (2008), “Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments”, Technical Report SAND2008-6212, Sandia National Laboratories.
- property nbytes¶
Number of bytes in array
- nonzero()¶
This docstring was copied from numpy.ndarray.nonzero.
Some inconsistencies with the Dask version may exist.
Return the indices of the elements that are non-zero.
Refer to numpy.nonzero for full documentation.
See also
numpy.nonzero
equivalent function
- property partitions¶
Slice an array by partitions. Alias of dask array .blocks attribute.
This alias allows you to write agnostic code that works with both dask arrays and dask dataframes.
This allows blockwise slicing of a Dask array. You can perform normal Numpy-style slicing but now rather than slice elements of the array you slice along blocks so, for example,
x.blocks[0, ::2]
produces a new dask array with every other block in the first row of blocks.You can index blocks in any way that could index a numpy array of shape equal to the number of blocks in each dimension, (available as array.numblocks). The dimension of the output array will be the same as the dimension of this array, even if integer indices are passed. This does not support slicing with
np.newaxis
or multiple lists.Examples
>>> import dask.array as da >>> x = da.arange(10, chunks=2) >>> x.partitions[0].compute() array([0, 1]) >>> x.partitions[:3].compute() array([0, 1, 2, 3, 4, 5]) >>> x.partitions[::2].compute() array([0, 1, 4, 5, 8, 9]) >>> x.partitions[[-1, 0]].compute() array([8, 9, 0, 1]) >>> all(x.partitions[:].compute() == x.blocks[:].compute()) True
- Return type
A Dask array
- persist(**kwargs)¶
Persist this dask collection into memory
This turns a lazy Dask collection into a Dask collection with the same metadata, but now with the results fully computed or actively computing in the background.
The action of function differs significantly depending on the active task scheduler. If the task scheduler supports asynchronous computing, such as is the case of the dask.distributed scheduler, then persist will return immediately and the return value’s task graph will contain Dask Future objects. However if the task scheduler only supports blocking computation then the call to persist will block and the return value’s task graph will contain concrete Python results.
This function is particularly useful when using distributed systems, because the results will be kept in distributed memory, rather than returned to the local process as with compute.
- Parameters
scheduler (string, optional) – Which scheduler to use like “threads”, “synchronous” or “processes”. If not provided, the default is to check the global settings first, and then fall back to the collection defaults.
optimize_graph (bool, optional) – If True [default], the graph is optimized before computation. Otherwise the graph is run as is. This can be useful for debugging.
**kwargs – Extra keywords to forward to the scheduler function.
- Return type
New dask collections backed by in-memory data
See also
dask.base.persist
- prod(axis=None, dtype=None, out=None, keepdims=False, initial=1, where=True)¶
This docstring was copied from numpy.ndarray.prod.
Some inconsistencies with the Dask version may exist.
Return the product of the array elements over the given axis
Refer to numpy.prod for full documentation.
See also
numpy.prod
equivalent function
- ravel([order])¶
This docstring was copied from numpy.ndarray.ravel.
Some inconsistencies with the Dask version may exist.
Return a flattened array.
Refer to numpy.ravel for full documentation.
See also
numpy.ravel
equivalent function
ndarray.flat
a flat iterator on the array.
- rechunk(chunks='auto', threshold=None, block_size_limit=None, balance=False)¶
See da.rechunk for docstring
- repeat(repeats, axis=None)¶
This docstring was copied from numpy.ndarray.repeat.
Some inconsistencies with the Dask version may exist.
Repeat elements of an array.
Refer to numpy.repeat for full documentation.
See also
numpy.repeat
equivalent function
- reshape(shape, order='C')¶
This docstring was copied from numpy.ndarray.reshape.
Some inconsistencies with the Dask version may exist.
Note
See
dask.array.reshape()
for an explanation of themerge_chunks
keyword.Returns an array containing the same data with a new shape.
Refer to numpy.reshape for full documentation.
See also
numpy.reshape
equivalent function
Notes
Unlike the free function numpy.reshape, this method on ndarray allows the elements of the shape parameter to be passed in as separate arguments. For example,
a.reshape(10, 11)
is equivalent toa.reshape((10, 11))
.
- round(decimals=0, out=None)¶
This docstring was copied from numpy.ndarray.round.
Some inconsistencies with the Dask version may exist.
Return a with each element rounded to the given number of decimals.
Refer to numpy.around for full documentation.
See also
numpy.around
equivalent function
- size¶
Number of elements in array
- squeeze(axis=None)¶
This docstring was copied from numpy.ndarray.squeeze.
Some inconsistencies with the Dask version may exist.
Remove single-dimensional entries from the shape of a.
Refer to numpy.squeeze for full documentation.
See also
numpy.squeeze
equivalent function
- std(axis=None, dtype=None, out=None, ddof=0, keepdims=False)¶
This docstring was copied from numpy.ndarray.std.
Some inconsistencies with the Dask version may exist.
Returns the standard deviation of the array elements along given axis.
Refer to numpy.std for full documentation.
See also
numpy.std
equivalent function
- store(targets, lock=True, regions=None, compute=True, return_stored=False, **kwargs)¶
Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling
np.array(myarray)
instead.- Parameters
sources (Array or iterable of Arrays) –
targets (array-like or Delayed or iterable of array-likes and/or Delayeds) – These should support setitem syntax
target[10:20] = ...
lock (boolean or threading.Lock, optional) – Whether or not to lock the data stores while storing. Pass True (lock each file individually), False (don’t lock) or a particular
threading.Lock
object to be shared among all writes.regions (tuple of slices or list of tuples of slices) – Each
region
tuple inregions
should be such thattarget[region].shape = source.shape
for the corresponding source and target in sources and targets, respectively. If this is a tuple, the contents will be assumed to be slices, so do not provide a tuple of tuples.compute (boolean, optional) – If true compute immediately, return
dask.delayed.Delayed
otherwisereturn_stored (boolean, optional) – Optionally return the stored result (default False).
Examples
>>> x = ...
>>> import h5py >>> f = h5py.File('myfile.hdf5', mode='a') >>> dset = f.create_dataset('/data', shape=x.shape, ... chunks=x.chunks, ... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
- sum(axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True)¶
This docstring was copied from numpy.ndarray.sum.
Some inconsistencies with the Dask version may exist.
Return the sum of the array elements over the given axis.
Refer to numpy.sum for full documentation.
See also
numpy.sum
equivalent function
- swapaxes(axis1, axis2)¶
This docstring was copied from numpy.ndarray.swapaxes.
Some inconsistencies with the Dask version may exist.
Return a view of the array with axis1 and axis2 interchanged.
Refer to numpy.swapaxes for full documentation.
See also
numpy.swapaxes
equivalent function
- to_dask_dataframe(columns=None, index=None, meta=None)¶
Convert dask Array to dask Dataframe
- Parameters
columns (list or string) – list of column names if DataFrame, single string if Series
index (dask.dataframe.Index, optional) –
An optional dask Index to use for the output Series or DataFrame.
The default output index depends on whether the array has any unknown chunks. If there are any unknown chunks, the output has
None
for all the divisions (one per chunk). If all the chunks are known, a default index with known divsions is created.Specifying
index
can be useful if you’re conforming a Dask Array to an existing dask Series or DataFrame, and you would like the indices to match.meta (object, optional) – An optional meta parameter can be passed for dask to specify the concrete dataframe type to use for partitions of the Dask dataframe. By default, pandas DataFrame is used.
See also
- to_delayed(optimize_graph=True)¶
Convert into an array of
dask.delayed
objects, one per chunk.- Parameters
optimize_graph (bool, optional) – If True [default], the graph is optimized before converting into
dask.delayed
objects.
See also
- to_hdf5(filename, datapath, **kwargs)¶
Store array in HDF5 file
>>> x.to_hdf5('myfile.hdf5', '/x')
Optionally provide arguments as though to
h5py.File.create_dataset
>>> x.to_hdf5('myfile.hdf5', '/x', compression='lzf', shuffle=True)
See also
da.store
,h5py.File.create_dataset
- to_svg(size=500)¶
Convert chunks from Dask Array into an SVG Image
- Parameters
chunks (tuple) –
size (int) – Rough size of the image
Examples
>>> x.to_svg(size=500)
- Returns
text
- Return type
An svg string depicting the array as a grid of chunks
- to_tiledb(uri, *args, **kwargs)¶
Save array to the TileDB storage manager
See function
to_tiledb()
for argument documentation.See https://docs.tiledb.io for details about the format and engine.
- to_zarr(*args, **kwargs)¶
Save array to the zarr storage format
See https://zarr.readthedocs.io for details about the format.
See function
to_zarr()
for parameters.
- topk(k, axis=- 1, split_every=None)¶
The top k elements of an array.
See
da.topk
for docstring
- trace(offset=0, axis1=0, axis2=1, dtype=None, out=None)¶
This docstring was copied from numpy.ndarray.trace.
Some inconsistencies with the Dask version may exist.
Return the sum along diagonals of the array.
Refer to numpy.trace for full documentation.
See also
numpy.trace
equivalent function
- transpose(*axes)¶
This docstring was copied from numpy.ndarray.transpose.
Some inconsistencies with the Dask version may exist.
Returns a view of the array with axes transposed.
For a 1-D array this has no effect, as a transposed vector is simply the same vector. To convert a 1-D array into a 2D column vector, an additional dimension must be added. np.atleast2d(a).T achieves this, as does a[:, np.newaxis]. For a 2-D array, this is a standard matrix transpose. For an n-D array, if axes are given, their order indicates how the axes are permuted (see Examples). If axes are not provided and
a.shape = (i[0], i[1], ... i[n-2], i[n-1])
, thena.transpose().shape = (i[n-1], i[n-2], ... i[1], i[0])
.- Parameters
axes (None, tuple of ints, or n ints) –
None or no argument: reverses the order of the axes.
tuple of ints: i in the j-th place in the tuple means a’s i-th axis becomes a.transpose()’s j-th axis.
n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
- Returns
out – View of a, with axes suitably permuted.
- Return type
ndarray
See also
ndarray.T
Array property returning the array transposed.
ndarray.reshape
Give a new shape to an array without changing its data.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> a.transpose() array([[1, 3], [2, 4]]) >>> a.transpose((1, 0)) array([[1, 3], [2, 4]]) >>> a.transpose(1, 0) array([[1, 3], [2, 4]])
- var(axis=None, dtype=None, out=None, ddof=0, keepdims=False)¶
This docstring was copied from numpy.ndarray.var.
Some inconsistencies with the Dask version may exist.
Returns the variance of the array elements, along given axis.
Refer to numpy.var for full documentation.
See also
numpy.var
equivalent function
- view(dtype=None, order='C')¶
Get a view of the array as a new data type
- Parameters
dtype – The dtype by which to view the array. The default, None, results in the view having the same data-type as the original array.
order (string) – ‘C’ or ‘F’ (Fortran) ordering
that (This reinterprets the bytes of the array under a new dtype. If) –
shape (dtype does not have the same size as the original array then the) –
change. (will) –
taking (Beware that both numpy and dask.array can behave oddly when) –
some (shape-changing views of arrays under Fortran ordering. Under) –
shape-changing (versions of NumPy this function will fail when taking) –
of (views of Fortran ordered arrays if the first dimension has chunks) –
one. (size) –
- property vindex¶
Vectorized indexing with broadcasting.
This is equivalent to numpy’s advanced indexing, using arrays that are broadcast against each other. This allows for pointwise indexing:
>>> import dask.array as da >>> x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> x = da.from_array(x, chunks=2) >>> x.vindex[[0, 1, 2], [0, 1, 2]].compute() array([1, 5, 9])
Mixed basic/advanced indexing with slices/arrays is also supported. The order of dimensions in the result follows those proposed for ndarray.vindex: the subspace spanned by arrays is followed by all slices.
Note:
vindex
provides more general functionality than standard indexing, but it also has fewer optimizations and can be significantly slower.
- visualize(filename='mydask', format=None, optimize_graph=False, **kwargs)¶
Render the computation of this object’s task graph using graphviz.
Requires
graphviz
to be installed.- Parameters
filename (str or None, optional) – The name of the file to write to disk. If the provided filename doesn’t include an extension, ‘.png’ will be used by default. If filename is None, no file will be written, and we communicate with dot using only pipes.
format ({'png', 'pdf', 'dot', 'svg', 'jpeg', 'jpg'}, optional) – Format in which to write output file. Default is ‘png’.
optimize_graph (bool, optional) – If True, the graph is optimized before rendering. Otherwise, the graph is displayed as is. Default is False.
color ({None, 'order'}, optional) – Options to color nodes. Provide
cmap=
keyword for additional colormap**kwargs – Additional keyword arguments to forward to
to_graphviz
.
Examples
>>> x.visualize(filename='dask.pdf') >>> x.visualize(filename='dask.pdf', color='order')
- Returns
result – See dask.dot.dot_graph for more information.
- Return type
IPython.diplay.Image, IPython.display.SVG, or None
See also
dask.base.visualize
,dask.dot.dot_graph
Notes
For more information on optimization see here:
- class nbodykit.base.catalog.ColumnFinder(name, bases, namespace, **kwargs)[source]¶
A meta-class that will register all columns of a class that have been marked with the
column()
decorator.This adds the following attributes to the class definition:
1.
_defaults
: default columns, specified by passingdefault=True
to thecolumn()
decorator._hardcolumns
: non-default, hard-coded columns
Note
This is a subclass of
abc.ABCMeta
so subclasses can define abstract properties, if they need to.Methods
__call__
(*args, **kwargs)Call self as a function.
mro
()return a type's method resolution order
register
(subclass)Register a virtual subclass of an ABC.
- __instancecheck__(instance)¶
Override for isinstance(instance, cls).
- __subclasscheck__(subclass)¶
Override for issubclass(subclass, cls).
- mro() list ¶
return a type’s method resolution order
- register(subclass)¶
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.