nbodykit.base.catalog.
CatalogCopy
(size, comm, use_cache=False, **columns)[source]¶Bases: nbodykit.base.catalog.CatalogSource
A CatalogSource object that holds column data copied from an original source
Parameters: |
|
---|
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
A list of the hard-coded columns in the CatalogSource. |
size |
|
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Construct and return a hard-coded column. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the @column decorator.
Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
hardcolumns
¶A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by @column decorator.
Subclasses may override this method and use get_hardcolumn()
to
bypass the decorator logic.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.base.catalog.
CatalogSource
(comm, use_cache=False)[source]¶Bases: nbodykit.base.catalog.CatalogSourceBase
An abstract base class representing a catalog of discrete particles.
This objects behaves like a structured numpy array – it must have a
well-defined size when initialized. The size
here represents the
number of particles in the source on the local rank.
Subclasses of this class must define a size
attribute.
The information about each particle is stored as a series of columns in the format of dask arrays. These columns can be accessed in a dict-like fashion.
All subclasses of this class contain the following default columns:
Weight
Value
Selection
For a full description of these default columns, see the documentation.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
A list of the hard-coded columns in the CatalogSource. |
size |
The number of particles in the CatalogSource on the local rank. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Construct and return a hard-coded column. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()[source]¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()[source]¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()[source]¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__setitem__
(col, value)[source]¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()[source]¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the @column decorator.
Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
hardcolumns
¶A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by @column decorator.
Subclasses may override this method and use get_hardcolumn()
to
bypass the decorator logic.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The number of particles in the CatalogSource on the local rank.
This property must be defined for all subclasses.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()[source]¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.base.catalog.
CatalogSourceBase
(comm, use_cache=False)[source]¶Bases: object
An abstract base class representing a catalog of discrete particles.
This objects behaves like a structured numpy array – it must have a
well-defined size when initialized. The size
here represents the
number of particles in the source on the local rank.
Subclasses of this class must define a size
attribute.
The information about each particle is stored as a series of columns in the format of dask arrays. These columns can be accessed in a dict-like fashion.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
hardcolumns |
A list of the hard-coded columns in the CatalogSource. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
get_hardcolumn (col) |
Construct and return a hard-coded column. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
__getitem__
(sel)[source]¶The following types of indexing are supported:
__setitem__
(col, value)[source]¶Add new columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)[source]¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
get_hardcolumn
(col)[source]¶Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the @column decorator.
Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
hardcolumns
¶A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by @column decorator.
Subclasses may override this method and use get_hardcolumn()
to
bypass the decorator logic.
logger
= <logging.Logger object>¶make_column
(array)[source]¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)[source]¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')[source]¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')[source]¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.base.catalog.
ColumnAccessor
[source]¶Bases: dask.array.core.Array
Provides access to a Column from a Catalog
This is a thin subclass of dask.array.Array
to
provide a reference to the catalog object,
an additional attrs
attribute (for recording the
reproducible meta-data), and some pretty print support.
Due to particularity of dask
, any transformation
that is not explicitly in-place will return
a dask.array.Array
, and losing the pointer to
the original catalog and the meta data attrs.
Attributes
A |
|
T |
|
chunks |
|
imag |
|
itemsize |
Length of one array element in bytes |
name |
|
nbytes |
Number of bytes in array |
ndim |
|
npartitions |
|
numblocks |
|
real |
|
shape |
|
size |
Number of elements in array |
vindex |
Vectorized indexing with broadcasting. |
Methods
all ([axis, out, keepdims]) |
Returns True if all elements evaluate to True. |
any ([axis, out, keepdims]) |
Returns True if any of the elements of a evaluate to True. |
argmax ([axis, out]) |
Return indices of the maximum values along the given axis. |
argmin ([axis, out]) |
Return indices of the minimum values along the given axis of a. |
as_daskarray () |
|
astype (dtype, **kwargs) |
Copy of the array, cast to a specified type. |
choose (choices[, out, mode]) |
Use an index array to construct a new array from a set of choices. |
clip ([min, max, out]) |
Return an array whose values are limited to [min, max] . |
compute () |
|
conj () |
|
copy () |
Copy array. |
cumprod (axis[, dtype, out]) |
See da.cumprod for docstring |
cumsum (axis[, dtype, out]) |
See da.cumsum for docstring |
dot (b[, out]) |
Dot product of two arrays. |
flatten ([order]) |
Return a flattened array. |
map_blocks (func, *args, **kwargs) |
Map a function across all blocks of a dask array. |
map_overlap (func, depth[, boundary, trim]) |
Map a function over blocks of the array with some overlap |
max ([axis, out]) |
Return the maximum along a given axis. |
mean ([axis, dtype, out, keepdims]) |
Returns the average of the array elements along given axis. |
min ([axis, out, keepdims]) |
Return the minimum along a given axis. |
moment (order[, axis, dtype, keepdims, ddof, …]) |
Calculate the nth centralized moment. |
nonzero () |
Return the indices of the elements that are non-zero. |
persist (**kwargs) |
Persist multiple Dask collections into memory |
prod ([axis, dtype, out, keepdims]) |
Return the product of the array elements over the given axis |
ravel ([order]) |
Return a flattened array. |
rechunk (chunks[, threshold, block_size_limit]) |
See da.rechunk for docstring |
repeat (repeats[, axis]) |
Repeat elements of an array. |
reshape (shape[, order]) |
Returns an array containing the same data with a new shape. |
round ([decimals, out]) |
Return a with each element rounded to the given number of decimals. |
squeeze ([axis]) |
Remove single-dimensional entries from the shape of a. |
std ([axis, dtype, out, ddof, keepdims]) |
Returns the standard deviation of the array elements along given axis. |
store (sources, targets[, lock, regions, compute]) |
Store dask arrays in array-like objects, overwrite data in target |
sum ([axis, dtype, out, keepdims]) |
Return the sum of the array elements over the given axis. |
swapaxes (axis1, axis2) |
Return a view of the array with axis1 and axis2 interchanged. |
to_dask_dataframe ([columns]) |
Convert dask Array to dask Dataframe |
to_delayed () |
Convert Array into dask Delayed objects |
to_hdf5 (filename, datapath, **kwargs) |
Store array in HDF5 file |
topk (k) |
The top k elements of an array. |
transpose (*axes) |
Returns a view of the array with axes transposed. |
var ([axis, dtype, out, ddof, keepdims]) |
Returns the variance of the array elements, along given axis. |
view (dtype[, order]) |
Get a view of the array as a new data type |
visualize ([filename, format, optimize_graph]) |
Render the computation of this object’s task graph using graphviz. |
vnorm ([ord, axis, keepdims, split_every, out]) |
Vector norm |
A
¶T
¶__repr__
()¶>>> import dask.array as da
>>> da.ones((10, 10), chunks=(5, 5), dtype='i4')
dask.array<..., shape=(10, 10), dtype=int32, chunksize=(5, 5)>
all
(axis=None, out=None, keepdims=False)¶Returns True if all elements evaluate to True.
Refer to numpy.all for full documentation.
See also
numpy.all()
any
(axis=None, out=None, keepdims=False)¶Returns True if any of the elements of a evaluate to True.
Refer to numpy.any for full documentation.
See also
numpy.any()
argmax
(axis=None, out=None)¶Return indices of the maximum values along the given axis.
Refer to numpy.argmax for full documentation.
See also
numpy.argmax()
argmin
(axis=None, out=None)¶Return indices of the minimum values along the given axis of a.
Refer to numpy.argmin for detailed documentation.
See also
numpy.argmin()
astype
(dtype, **kwargs)¶Copy of the array, cast to a specified type.
Parameters: |
|
---|
choose
(choices, out=None, mode='raise')¶Use an index array to construct a new array from a set of choices.
Refer to numpy.choose for full documentation.
See also
numpy.choose()
chunks
¶clip
(min=None, max=None, out=None)¶Return an array whose values are limited to [min, max]
.
One of max or min must be given.
Refer to numpy.clip for full documentation.
See also
numpy.clip()
conj
()¶copy
()¶Copy array. This is a no-op for dask.arrays, which are immutable
cumprod
(axis, dtype=None, out=None)¶See da.cumprod for docstring
cumsum
(axis, dtype=None, out=None)¶See da.cumsum for docstring
dask
¶dot
(b, out=None)¶Dot product of two arrays.
Refer to numpy.dot for full documentation.
See also
numpy.dot()
Examples
>>> a = np.eye(2)
>>> b = np.ones((2, 2)) * 2
>>> a.dot(b)
array([[ 2., 2.],
[ 2., 2.]])
This array method can be conveniently chained:
>>> a.dot(b).dot(b)
array([[ 8., 8.],
[ 8., 8.]])
dtype
¶flatten
([order])¶Return a flattened array.
Refer to numpy.ravel for full documentation.
See also
numpy.ravel()
ndarray.flat()
imag
¶itemsize
¶Length of one array element in bytes
map_blocks
(func, *args, **kwargs)¶Map a function across all blocks of a dask array.
Parameters: |
|
---|
Examples
>>> import dask.array as da
>>> x = da.arange(6, chunks=3)
>>> x.map_blocks(lambda x: x * 2).compute()
array([ 0, 2, 4, 6, 8, 10])
The da.map_blocks
function can also accept multiple arrays.
>>> d = da.arange(5, chunks=2)
>>> e = da.arange(5, chunks=2)
>>> f = map_blocks(lambda a, b: a + b**2, d, e)
>>> f.compute()
array([ 0, 2, 6, 12, 20])
If the function changes shape of the blocks then you must provide chunks explicitly.
>>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))
You have a bit of freedom in specifying chunks. If all of the output chunk sizes are the same, you can provide just that chunk size as a single tuple.
>>> a = da.arange(18, chunks=(6,))
>>> b = a.map_blocks(lambda x: x[:3], chunks=(3,))
If the function changes the dimension of the blocks you must specify the created or destroyed dimensions.
>>> b = a.map_blocks(lambda x: x[None, :, None], chunks=(1, 6, 1),
... new_axis=[0, 2])
Map_blocks aligns blocks by block positions without regard to shape. In the following example we have two arrays with the same number of blocks but with different shape and chunk sizes.
>>> x = da.arange(1000, chunks=(100,))
>>> y = da.arange(100, chunks=(10,))
The relevant attribute to match is numblocks.
>>> x.numblocks
(10,)
>>> y.numblocks
(10,)
If these match (up to broadcasting rules) then we can map arbitrary functions across blocks
>>> def func(a, b):
... return np.array([a.max(), b.max()])
>>> da.map_blocks(func, x, y, chunks=(2,), dtype='i8')
dask.array<func, shape=(20,), dtype=int64, chunksize=(2,)>
>>> _.compute()
array([ 99, 9, 199, 19, 299, 29, 399, 39, 499, 49, 599, 59, 699,
69, 799, 79, 899, 89, 999, 99])
Your block function can learn where in the array it is if it supports a
block_id
keyword argument. This will receive entries like (2, 0, 1),
the position of the block in the dask array.
>>> def func(block, block_id=None):
... pass
You may specify the key name prefix of the resulting task in the graph with
the optional token
keyword argument.
>>> x.map_blocks(lambda x: x + 1, token='increment')
dask.array<increment, shape=(100,), dtype=int64, chunksize=(10,)>
map_overlap
(func, depth, boundary=None, trim=True, **kwargs)¶Map a function over blocks of the array with some overlap
We share neighboring zones between blocks of the array, then map a function, then trim away the neighboring strips.
Parameters: |
|
---|
Examples
>>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1])
>>> x = from_array(x, chunks=5)
>>> def derivative(x):
... return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0)
>>> y.compute()
array([ 1, 0, 1, 1, 0, 0, -1, -1, 0])
>>> import dask.array as da
>>> x = np.arange(16).reshape((4, 4))
>>> d = da.from_array(x, chunks=(2, 2))
>>> d.map_overlap(lambda x: x + x.size, depth=1).compute()
array([[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
>>> func = lambda x: x + x.size
>>> depth = {0: 1, 1: 1}
>>> boundary = {0: 'reflect', 1: 'none'}
>>> d.map_overlap(func, depth, boundary).compute()
array([[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27]])
max
(axis=None, out=None)¶Return the maximum along a given axis.
Refer to numpy.amax for full documentation.
See also
numpy.amax()
mean
(axis=None, dtype=None, out=None, keepdims=False)¶Returns the average of the array elements along given axis.
Refer to numpy.mean for full documentation.
See also
numpy.mean()
min
(axis=None, out=None, keepdims=False)¶Return the minimum along a given axis.
Refer to numpy.amin for full documentation.
See also
numpy.amin()
moment
(order, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶Calculate the nth centralized moment.
Parameters: |
|
---|---|
Returns: | moment |
Return type: | ndarray |
References
[R114] | Pebay, Philippe (2008), “Formulas for Robust, One-Pass Parallel |
Computation of Covariances and Arbitrary-Order Statistical Moments” (PDF), Technical Report SAND2008-6212, Sandia National Laboratories
name
¶nbytes
¶Number of bytes in array
ndim
¶nonzero
()¶Return the indices of the elements that are non-zero.
Refer to numpy.nonzero for full documentation.
See also
numpy.nonzero()
npartitions
¶numblocks
¶persist
(**kwargs)¶Persist multiple Dask collections into memory
This turns lazy Dask collections into Dask collections with the same metadata, but now with their results fully computed or actively computing in the background.
For example a lazy dask.array built up from many lazy calls will now be a dask.array of the same shape, dtype, chunks, etc., but now with all of those previously lazy tasks either computed in memory as many small NumPy arrays (in the single-machine case) or asynchronously running in the background on a cluster (in the distributed case).
This function operates differently if a dask.distributed.Client
exists
and is connected to a distributed scheduler. In this case this function
will return as soon as the task graph has been submitted to the cluster,
but before the computations have completed. Computations will continue
asynchronously in the background. When using this function with the single
machine scheduler it blocks until the computations have finished.
When using Dask on a single machine you should ensure that the dataset fits entirely within memory.
Examples
>>> df = dd.read_csv('/path/to/*.csv')
>>> df = df[df.name == 'Alice']
>>> df['in-debt'] = df.balance < 0
>>> df = df.persist() # triggers computation
>>> df.value().min() # future computations are now fast
-10
>>> df.value().max()
100
>>> from dask import persist # use persist function on multiple collections
>>> a, b = persist(a, b)
Parameters: |
|
---|---|
Returns: | |
Return type: | New dask collections backed by in-memory data |
prod
(axis=None, dtype=None, out=None, keepdims=False)¶Return the product of the array elements over the given axis
Refer to numpy.prod for full documentation.
See also
numpy.prod()
ravel
([order])¶Return a flattened array.
Refer to numpy.ravel for full documentation.
See also
numpy.ravel()
ndarray.flat()
real
¶rechunk
(chunks, threshold=None, block_size_limit=None)¶See da.rechunk for docstring
repeat
(repeats, axis=None)¶Repeat elements of an array.
Refer to numpy.repeat for full documentation.
See also
numpy.repeat()
reshape
(shape, order='C')¶Returns an array containing the same data with a new shape.
Refer to numpy.reshape for full documentation.
See also
numpy.reshape()
round
(decimals=0, out=None)¶Return a with each element rounded to the given number of decimals.
Refer to numpy.around for full documentation.
See also
numpy.around()
shape
¶size
¶Number of elements in array
squeeze
(axis=None)¶Remove single-dimensional entries from the shape of a.
Refer to numpy.squeeze for full documentation.
See also
numpy.squeeze()
std
(axis=None, dtype=None, out=None, ddof=0, keepdims=False)¶Returns the standard deviation of the array elements along given axis.
Refer to numpy.std for full documentation.
See also
numpy.std()
store
(sources, targets, lock=True, regions=None, compute=True, **kwargs)¶Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling
np.array(myarray)
instead.
Parameters: |
|
---|
Examples
>>> x = ...
>>> import h5py
>>> f = h5py.File('myfile.hdf5')
>>> dset = f.create_dataset('/data', shape=x.shape,
... chunks=x.chunks,
... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
sum
(axis=None, dtype=None, out=None, keepdims=False)¶Return the sum of the array elements over the given axis.
Refer to numpy.sum for full documentation.
See also
numpy.sum()
swapaxes
(axis1, axis2)¶Return a view of the array with axis1 and axis2 interchanged.
Refer to numpy.swapaxes for full documentation.
See also
numpy.swapaxes()
to_dask_dataframe
(columns=None)¶Convert dask Array to dask Dataframe
Parameters: | columns (list or string) – list of column names if DataFrame, single string if Series |
---|
See also
dask.dataframe.from_dask_array()
to_delayed
()¶Convert Array into dask Delayed objects
Returns an array of values, one value per chunk.
See also
dask.array.from_delayed()
to_hdf5
(filename, datapath, **kwargs)¶Store array in HDF5 file
>>> x.to_hdf5('myfile.hdf5', '/x')
Optionally provide arguments as though to h5py.File.create_dataset
>>> x.to_hdf5('myfile.hdf5', '/x', compression='lzf', shuffle=True)
See also
da.store()
, h5py.File.create_dataset()
topk
(k)¶The top k elements of an array.
See da.topk
for docstring
transpose
(*axes)¶Returns a view of the array with axes transposed.
For a 1-D array, this has no effect. (To change between column and
row vectors, first cast the 1-D array into a matrix object.)
For a 2-D array, this is the usual matrix transpose.
For an n-D array, if axes are given, their order indicates how the
axes are permuted (see Examples). If axes are not provided and
a.shape = (i[0], i[1], ... i[n-2], i[n-1])
, then
a.transpose().shape = (i[n-1], i[n-2], ... i[1], i[0])
.
Parameters: | axes (None, tuple of ints, or n ints) –
|
---|---|
Returns: | out – View of a, with axes suitably permuted. |
Return type: | ndarray |
See also
ndarray.T()
Examples
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> a.transpose()
array([[1, 3],
[2, 4]])
>>> a.transpose((1, 0))
array([[1, 3],
[2, 4]])
>>> a.transpose(1, 0)
array([[1, 3],
[2, 4]])
var
(axis=None, dtype=None, out=None, ddof=0, keepdims=False)¶Returns the variance of the array elements, along given axis.
Refer to numpy.var for full documentation.
See also
numpy.var()
view
(dtype, order='C')¶Get a view of the array as a new data type
Parameters: |
|
---|
vindex
¶Vectorized indexing with broadcasting.
This is equivalent to numpy’s advanced indexing, using arrays that are broadcast against each other. This allows for pointwise indexing:
>>> x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> x = from_array(x, chunks=2)
>>> x.vindex[[0, 1, 2], [0, 1, 2]].compute()
array([1, 5, 9])
Mixed basic/advanced indexing with slices/arrays is also supported. The order of dimensions in the result follows those proposed for ndarray.vindex [1]_: the subspace spanned by arrays is followed by all slices.
Note: vindex
provides more general functionality than standard
indexing, but it also has fewer optimizations and can be significantly
slower.
visualize
(filename='mydask', format=None, optimize_graph=False, **kwargs)¶Render the computation of this object’s task graph using graphviz.
Requires graphviz
to be installed.
Parameters: |
|
---|---|
Returns: | result – See dask.dot.dot_graph for more information. |
Return type: | IPython.diplay.Image, IPython.display.SVG, or None |
See also
dask.base.visualize()
, dask.dot.dot_graph()
Notes
For more information on optimization see here:
vnorm
(ord=None, axis=None, keepdims=False, split_every=None, out=None)¶Vector norm
nbodykit.base.catalog.
column
(name=None)[source]¶Decorator that defines a function as a column in a CatalogSource
nbodykit.base.catalog.
find_column
(cls, name)[source]¶Find a specific column name
of an input class, or raise
an exception if it does not exist
Returns: | column – the callable that returns the column data |
---|---|
Return type: | callable |
nbodykit.base.catalog.
find_columns
(cls)[source]¶Find all hard-coded column names associated with the input class
Returns: | hardcolumns – a set of the names of all hard-coded columns for the
input class cls |
---|---|
Return type: | set |
nbodykit.base.catalog.
get_catalog_subset
(parent, index)[source]¶Select a subset of a CatalogSource
according to a boolean
index array.
Returns a CatalogCopy
holding only the data that satisfies
the slice criterion.
Parameters: |
|
---|---|
Returns: | subset – the particle source with the same meta-data as parent, and with the sliced data arrays |
Return type: |