nbodykit.source.catalog¶
- class nbodykit.source.catalog.ArrayCatalog(data, comm=None, **kwargs)[source]¶
A CatalogSource initialized from an in-memory
dict
, structurednumpy.ndarray
, orastropy.table.Table
.See the documentation for examples.
- Parameters
data (obj:dict,
numpy.ndarray
,astropy.table.Table
) – a dictionary, structured ndarray, or astropy Table; items are interpreted as the columns of the catalog; the length of any item is used as the size of the catalog.comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicator**kwargs – additional keywords to store as meta-data in
attrs
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying data array/dict.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)[source]¶
Return a column from the underlying data array/dict.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.BigFileCatalog(path, *args, **kwargs)¶
A CatalogSource that uses
BigFile
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the name of the directory holding the bigfile data
exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header. If any list is given, the name of the header column must be given too if it is not part of the data set. The names are shell glob patterns.
comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
Examples
Please see the documentation for examples.
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.BinaryCatalog(path, *args, **kwargs)¶
A CatalogSource that uses
BinaryFile
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the name of the binary file to load
dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a
numpy.dtype
or be able to be converted to one via anumpy.dtype()
calloffsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage
header_size (int, optional) – the size of the header in bytes
size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes
comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
Examples
Please see the documentation for examples.
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.CSVCatalog(path, *args, **kwargs)¶
A CatalogSource that uses
CSVFile
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the name of the file to load
names (list of str) – the names of the columns of the csv file; this should give names of all the columns in the file – pass
usecols
to select a subset of columnsblocksize (int, optional) – the file will be partitioned into blocks of bytes roughly of this size
dtype (dict, str, optional) – if specified as a string, assume all columns have this dtype, otherwise; each column can have a dtype entry in the dict; if not specified, the data types will be inferred from the file
usecols (list, optional) – a
pandas.read_csv
; a subset ofnames
to store, ignoring all other columnsdelim_whitespace (bool, optional) – a
pandas.read_csv
keyword; if the CSV file is space-separated, set this toTrue
**config – additional keyword arguments that will be passed to
pandas.read_csv()
; see the documentation of that function for a full list of possible optionscomm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
Examples
Please see the documentation for examples.
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.FITSCatalog(path, *args, **kwargs)¶
A CatalogSource that uses
FITSFile
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the file path to load
ext (number or string, optional) – The extension. Either the numerical extension from zero or a string extension name. If not sent, data is read from the first HDU that has data.
comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
Examples
Please see the documentation for examples.
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.Gadget1Catalog(path, *args, **kwargs)¶
A CatalogSource that uses
Gadget1File
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the path to the binary file to load
columndefs (list) – a list of triplets (columnname, element_dtype, particle_types)
ptype (int) – type of particle of interest.
hdtype (list, dtype) – dtype of the header; must define Massarr and Npart
comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.HDFCatalog(path, *args, **kwargs)¶
A CatalogSource that uses
HDFFile
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the file path to load
root (str, optional) – the start path in the HDF file, loading all data below this path
exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to
root
comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
Examples
Please see the documentation for examples.
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.HaloCatalog(source, cosmo, redshift, mdef='vir', mass='Mass', position='Position', velocity='Velocity')[source]¶
A CatalogSource of objects that represent halos, which can be populated using analytic models from
halotools
.- Parameters
source (CatalogSource) – the source holding the particles to be interpreted as halos
cosmo (
Cosmology
) – the cosmology instance;redshift (float) – the redshift of the halo catalog
mdef (str, optional) – string specifying mass definition, used for computing default halo radii and concentration; should be ‘vir’ or ‘XXXc’ or ‘XXXm’ where ‘XXX’ is an int specifying the overdensity
mass (str, optional) – the column name specifying the mass of each halo
position (str, optional) – the column name specifying the position of each halo
velocity (str, optional) – the column name specifying the velocity of each halo
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
size
The number of objects in the CatalogSource on the local rank.
Methods
The halo concentration, computed using
nbodykit.transform.HaloConcentration()
.Mass
()The halo mass column, assumed to be in units of \(M_\odot/h\).
Position
()The halo position column, assumed to be in units of \(\mathrm{Mpc}/h\).
Radius
()The halo radius, computed using
nbodykit.transform.HaloRadius()
.A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity
()The halo velocity column, assumed to be in units of km/s.
The redshift-space distance offset due to the velocity in units of distance.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
populate
(model[, BoxSize, seed])Populate the HaloCatalog using a
halotools
model.read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_halotools
([BoxSize])Return the HaloCatalog as a
halotools.sim_manager.UserSuppliedHaloCatalog
.to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- Concentration()[source]¶
The halo concentration, computed using
nbodykit.transform.HaloConcentration()
.This uses the analytic formulas for concentration from Dutton and Maccio 2014.
Users can override this column to implement custom mass-concentration relations.
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Radius()[source]¶
The halo radius, computed using
nbodykit.transform.HaloRadius()
.Assumed units of \(\mathrm{Mpc}/h\).
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- VelocityOffset()[source]¶
The redshift-space distance offset due to the velocity in units of distance. The assumed units are \(\mathrm{Mpc}/h\).
This multiplies
Velocity
by \(1 / (a 100 E(z)) = 1 / (a H(z)/h)\).
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- populate(model, BoxSize=None, seed=None, **params)[source]¶
Populate the HaloCatalog using a
halotools
model.The model can be a built-in model from
nbodykit.hod
(which will be converted to a Halotools model) or directly a Halotools model instance.This assumes that this is the first time this catalog has been populated with the input model. To re-populate using the same model (but different parameters), call the
repopulate()
function of the returnedPopulatedHaloCatalog
.- Parameters
model (
nbodykit.hod.HODModel
or halotools model object) – the model instance to use to populate; model types fromnbodykit.hod
will automatically be convertedBoxSize (float, 3-vector, optional) – the box size of the catalog; this must be supplied if ‘BoxSize’ is not in
attrs
seed (int, optional) – the random seed to use when populating the mock
**params – key/value pairs specifying the model parameters to use
- Returns
cat – the catalog object storing information about the populated objects
- Return type
PopulatedHaloCatalog
Examples
Initialize a demo halo catalog:
>>> from nbodykit.tutorials import DemoHaloCatalog >>> cat = DemoHaloCatalog('bolshoi', 'rockstar', 0.5)
Populate with the built-in Zheng07 model:
>>> from nbodykit.hod import Zheng07Model >>> galcat = cat.populate(Zheng07Model, seed=42)
And then re-populate galaxy catalog with new parameters:
>>> galcat.repopulate(alpha=0.9, logMmin=13.5, seed=42)
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_halotools(BoxSize=None)[source]¶
Return the HaloCatalog as a
halotools.sim_manager.UserSuppliedHaloCatalog
.The Halotools catalog only holds the local data, although halos are labeled via the
halo_id
column using the global index.
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.LogNormalCatalog(Plin, nbar, BoxSize, Nmesh, bias=2.0, seed=None, cosmo=None, redshift=None, unitary_amplitude=False, inverted_phase=False, comm=None)[source]¶
A CatalogSource containing biased particles that have been Poisson-sampled from a log-normal density field.
- Parameters
Plin (callable) – callable specifying the linear power spectrum at the desired redshift.
nbar (float) – the number density of the particles in the box, assumed constant across the box; this is used when Poisson sampling the density field
BoxSize (float, 3-vector of floats) – the size of the box to generate the grid on
Nmesh (int) – the mesh size to use when generating the density and displacement fields, which are Poisson-sampled to particles
bias (float, optional) – the desired bias of the particles; applied while applying a log-normal transformation to the density field
seed (int, optional) – the global random seed; if set to
None
, the seed will be set randomlycosmo (
nbodykit.cosmology.core.Cosmology
, optional) – this must be supplied ifPlin
does not carrycosmo
attributeredshift (float, optional) – this must be supplied if
Plin
does not carry aredshift
attributecomm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicator
References
Cole and Jones, 1991 Agrawal et al. 2017
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
size
The number of objects in the CatalogSource on the local rank.
Methods
Position
()Position assumed to be in Mpc/h
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity
()Velocity in km/s
The corresponding RSD offset, in Mpc/h
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.MultipleSpeciesCatalog(names, *species, **kwargs)[source]¶
A CatalogSource interface for handling multiples species of particles.
This CatalogSource stores a copy of the original CatalogSource objects for each species, providing access to the columns via the format
species/
where “species” is one of the species names provided.- Parameters
names (list of str) – list of strings specifying the names of the various species; data columns are prefixed with “species/” where “species” is in
names
*species (two or more CatalogSource objects) – catalogs to be combined into a single catalog, which give the data for different species of particles; as many catalogs as names must be provided
Examples
Initialization:
>>> data = UniformCatalog(nbar=3e-5, BoxSize=512., seed=42) >>> randoms = UniformCatalog(nbar=3e-5, BoxSize=512., seed=84) >>> cat = MultipleSpeciesCatalog(['data', 'randoms'], data, randoms)
Accessing the Catalogs for individual species:
>>> data = cat["data"] # a copy of the original "data" object
Accessing individual columns:
>>> data_pos = cat["data/Position"]
Setting new columns:
>>> cat["data"]["new_column"] = 1.0 >>> assert "data/new_column" in cat
- Attributes
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
Columns for individual species can be accessed using a
species/
prefix and the column name, i.e.,data/Position
.hardcolumns
Hardcolumn of the form
species/name
species
List of species names
Methods
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the catalog to a mesh, which knows how to "paint" the the combined density field, summed over all particle species.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(key)[source]¶
This provides access to the underlying data in two ways:
The CatalogSource object for a species can be accessed if
key
is a species name.Individual columns for a species can be accessed using the format:
species/column
.
- __setitem__(col, value)[source]¶
Add columns to any of the species catalogs.
Note
New column names should be prefixed by ‘species/’ where ‘species’ is a name in the
species
attribute.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
Columns for individual species can be accessed using a
species/
prefix and the column name, i.e.,data/Position
.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- get_hardcolumn(col)¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- property hardcolumns¶
Hardcolumn of the form
species/name
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property species¶
List of species names
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)[source]¶
Convert the catalog to a mesh, which knows how to “paint” the the combined density field, summed over all particle species.
- Parameters
Nmesh (int, 3-vector, optional) – the number of cells per box side; can be inferred from
attrs
if the value is the same for all speciesBoxSize (float, 3-vector, optional) – the size of the box; can be inferred from
attrs
if the value is the same for all speciesdtype (str, dtype, optional) – the data type of the mesh when painting
interlaced (bool, optional) – whether to use interlacing to reduce aliasing when painting the particles on the mesh
compensated (bool, optional) – whether to apply a Fourier-space transfer function to account for the effects of the gridding + aliasing
resampler (str, optional) – the string name of the resampler to use when interpolating the
weight (str, optional) – the name of the column specifying the weight for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
value (str, optional) – the name of the column specifying the field value for each particle
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, optional) – the string name of the window to use when interpolating (deprecated, use resampler)
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.RandomCatalog(csize, seed=None, comm=None)[source]¶
A CatalogSource that can have columns added via a collective random number generator.
The random number generator stored as
rng
behaves asnumpy.random.RandomState
but generates random numbers only on the local rank in a manner independent of the number of ranks.- Parameters
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
rng
A
MPIRandomState
that behaves asnumpy.random.RandomState
but generates random numbers in a manner independent of the number of ranks.size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- property rng¶
A
MPIRandomState
that behaves asnumpy.random.RandomState
but generates random numbers in a manner independent of the number of ranks.
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.SubVolumesCatalog(source, domain=None, position='Position', columns=None)[source]¶
A catalog that distributes the particles spatially into subvolumes per MPI rank.
- domain¶
The domain objects for decomposition. If None, generate a domain to decompose the catalog into a 3d grid.
- Type
- layout¶
- Type
A large object that holds which particle belongs to which rank.
- source¶
- Type
the original source object
- Parameters
columns (list) – a list of columns to already exchange
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)[source]¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.TPMBinaryCatalog(path, *args, **kwargs)¶
A CatalogSource that uses
TPMBinaryFile
to read data from disk.Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the
path
argument. See Reading Multiple Data Files at Once for examples.- Parameters
path (str) – the path to the binary file to load
precision ({'f4', 'f8'}, optional) – the string dtype specifying the precision
comm (MPI Communicator, optional) – the MPI communicator instance; default (
None
) sets to the current communicatorattrs (dict, optional) – dictionary of meta-data to store in
attrs
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
The union of the columns in the file and any transformed columns.
size
The number of objects in the CatalogSource on the local rank.
Methods
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Return a column from the underlying file source.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
query_range
(start, end)Seek to a range in the file catalog.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Return a column from the underlying file source.
Columns are returned as dask arrays.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
The union of the columns in the file and any transformed columns.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- query_range(start, end)¶
Seek to a range in the file catalog.
- Parameters
- Returns
A new catalog that only accesses the given region of the file.
If the original catalog (self) contains any assigned columns not directly
obtained from the file, then the function will raise ValueError, since
the operation in that case is not well defined.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.
- class nbodykit.source.catalog.UniformCatalog(nbar, BoxSize, seed=None, dtype='f8', comm=None)[source]¶
A CatalogSource that has uniformly-distributed
Position
andVelocity
columns.The random numbers generated do not depend on the number of available ranks.
- Parameters
- Attributes
Index
The attribute giving the global index rank of each particle in the list.
attrs
A dictionary storing relevant meta-data about the CatalogSource.
columns
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize
The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns
A list of the hard-coded columns in the CatalogSource.
rng
A
MPIRandomState
that behaves asnumpy.random.RandomState
but generates random numbers in a manner independent of the number of ranks.size
The number of objects in the CatalogSource on the local rank.
Methods
Position
()The position of particles, uniformly distributed in
BoxSize
A boolean column that selects a subset slice of the CatalogSource.
Value
()When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity
()The velocity of particles, uniformly distributed in
0.01 x BoxSize
Weight
()The column giving the weight to use for each particle on the mesh.
compute
(*args, **kwargs)Our version of
dask.compute()
that computes multiple delayed dask collections at once.copy
()Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.get_hardcolumn
(col)Construct and return a hard-coded column.
gslice
(start, stop[, end, redistribute])Execute a global slice of a CatalogSource.
make_column
(array)Utility function to convert an array-like object to a
dask.array.Array
.persist
([columns])Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)Return the requested columns as dask arrays.
save
(output[, columns, dataset, datasets, ...])Save the CatalogSource to a
bigfile.BigFile
.sort
(keys[, reverse, usecols])Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh
([Nmesh, BoxSize, dtype, interlaced, ...])Convert the CatalogSource to a MeshSource, using the specified parameters.
to_subvolumes
([domain, position, columns])Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
view
([type])Return a "view" of the CatalogSource object, with the returned type set by
type
.create_instance
- property Index¶
The attribute giving the global index rank of each particle in the list. It is an integer from 0 to
self.csize
.Note that slicing changes this index value.
- Selection()¶
A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to
True
for all particles, and all CatalogSource objects will contain this column.
- Value()¶
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- Weight()¶
The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of
Value
, with the weights given byWeight
.By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
- __delitem__(col)¶
Delete a column; cannot delete a “hard-coded” column.
Note
If the
base
attribute is set, columns will be deleted frombase
instead of fromself
.
- __finalize__(other)¶
Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.- Parameters
other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
- Returns
return
self
, with the added attributes- Return type
- __getitem__(sel)¶
The following types of indexing are supported:
strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogSource holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogSource holding only the selected columns
Notes
Slicing is a collective operation
If the
base
attribute is set, columns will be returned frombase
instead of fromself
.
- __len__()¶
The local size of the CatalogSource on a given rank.
- __setitem__(col, value)¶
Add columns to the CatalogSource, overriding any existing columns with the name
col
.
- property attrs¶
A dictionary storing relevant meta-data about the CatalogSource.
- property columns¶
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the
base
attribute is set, the value ofbase.columns
will be returned.
- compute(*args, **kwargs)¶
Our version of
dask.compute()
that computes multiple delayed dask collections at once.This should be called on the return value of
read()
to converts any dask arrays to numpy arrays.- . note::
If the
base
attribute is set,compute()
will called usingbase
instead ofself
.
- Parameters
args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.
- copy()¶
Return a shallow copy of the object, where each column is a reference of the corresponding column in
self
.Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary of the copy no longer related to
self
.- Returns
a new CatalogSource that holds all of the data columns of
self
- Return type
- property csize¶
The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of
size
across all available ranks.If the
base
attribute is set, thebase.csize
attribute will be returned.
- get_hardcolumn(col)¶
Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the
base
attribute is set,get_hardcolumn()
will called usingbase
instead ofself
.
- gslice(start, stop, end=1, redistribute=True)¶
Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
- Parameters
start (int) – the start index of the global slice
stop (int) – the stop index of the global slice
step (int, optional) – the default step size of the global size
redistribute (bool, optional) – if
True
, evenly re-distribute the sliced data across all ranks, otherwise just return any local data part of the global slice
- property hardcolumns¶
A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by
@column
decorator. Subclasses may override this method and useget_hardcolumn()
to bypass the decorator logic.Note
If the
base
attribute is set, the value ofbase.hardcolumns
will be returned.
- static make_column(array)¶
Utility function to convert an array-like object to a
dask.array.Array
.Note
The dask array chunk size is controlled via the
dask_chunk_size
global option. Seeset_options
.- Parameters
array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
- Returns
a dask array initialized from
array
- Return type
- persist(columns=None)¶
Return a CatalogSource, where the selected columns are computed and persist in memory.
- read(columns)¶
Return the requested columns as dask arrays.
- Parameters
columns (list of str) – the names of the requested columns
- Returns
the list of column data, in the form of dask arrays
- Return type
list of
dask.array.Array
- property rng¶
A
MPIRandomState
that behaves asnumpy.random.RandomState
but generates random numbers in a manner independent of the number of ranks.
- save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶
Save the CatalogSource to a
bigfile.BigFile
.Only the selected columns are saved and
attrs
are saved inheader
. The attrs of columns are stored in the datasets.- Parameters
output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where
attrs
is stored if header is None, do not save the header.compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.
- property size¶
The number of objects in the CatalogSource on the local rank.
If the
base
attribute is set, thebase.size
attribute will be returned.Important
This property must be defined for all subclasses.
- sort(keys, reverse=False, usecols=None)¶
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
- Parameters
keys (list, tuple) – the names of columns to sort by. If multiple columns are provided, the data is sorted consecutively in the order provided
reverse (bool, optional) – if
True
, perform descending sort operationsusecols (list, optional) – the name of the columns to include in the returned CatalogSource
- to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶
Convert the CatalogSource to a MeshSource, using the specified parameters.
- Parameters
Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in
attrs
BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in
attrs
dtype (string, optional) – the data type of the mesh array
interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh
compensated (bool, optional) – whether to correct for the resampler window introduced by the grid interpolation scheme
resampler (str, optional) – the string specifying which resampler interpolation scheme to use; see pmesh.resampler.methods
weight (str, optional) – the name of the column specifying the weight for each particle
value (str, optional) – the name of the column specifying the field value for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, deprecated) – use resampler instead.
- Returns
mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
- Return type
- to_subvolumes(domain=None, position='Position', columns=None)¶
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
- Parameters
domain (
pmesh.domain.GridND
object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is apmesh.pm.ParticleMesh
object.position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.
- Returns
A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.
self.attrs are carried over as a shallow copy to the returned object.
- Return type
- view(type=None)¶
Return a “view” of the CatalogSource object, with the returned type set by
type
.This initializes a new empty class of type
type
and attaches attributes to it via the__finalize__()
mechanism.- Parameters
type (Python type) – the desired class type of the returned object.