Classes
HaloCatalog (source, cosmo, redshift[, mdef, …]) |
A CatalogSource of objects that represent halos, which can be populated using analytic models from halotools . |
PopulatedHaloCatalog (data, model, cosmo[, comm]) |
A CatalogSource to represent a set of objects populated into a HaloCatalog . |
nbodykit.source.catalog.halos.
HaloCatalog
(source, cosmo, redshift, mdef='vir', mass='Mass', position='Position', velocity='Velocity')[source]¶A CatalogSource of objects that represent halos, which can be populated
using analytic models from halotools
.
Parameters: |
|
---|---|
Attributes: |
|
Methods
Concentration () |
The halo concentration, computed using nbodykit.transform.HaloConcentration() . |
Mass () |
The halo mass column, assumed to be in units of \(M_\odot/h\). |
Position () |
The halo position column, assumed to be in units of \(\mathrm{Mpc}/h\). |
Radius () |
The halo radius, computed using nbodykit.transform.HaloRadius() . |
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Velocity () |
The halo velocity column, assumed to be in units of km/s. |
VelocityOffset () |
The redshift-space distance offset due to the velocity in units of distance. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a shallow copy of the object, where each column is a reference of the corresponding column in self . |
get_hardcolumn (col) |
Construct and return a hard-coded column. |
gslice (start, stop[, end, redistribute]) |
Execute a global slice of a CatalogSource. |
make_column (array) |
Utility function to convert an array-like object to a dask.array.Array . |
persist ([columns]) |
Return a CatalogSource, where the selected columns are computed and persist in memory. |
populate (model[, BoxSize, seed]) |
Populate the HaloCatalog using a halotools model. |
read (columns) |
Return the requested columns as dask arrays. |
save (output[, columns, dataset, datasets, …]) |
Save the CatalogSource to a bigfile.BigFile . |
sort (keys[, reverse, usecols]) |
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys. |
to_halotools ([BoxSize]) |
Return the HaloCatalog as a halotools.sim_manager.UserSuppliedHaloCatalog . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
to_subvolumes ([domain, position, columns]) |
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. |
view ([type]) |
Return a “view” of the CatalogSource object, with the returned type set by type . |
create_instance |
Concentration
()[source]¶The halo concentration, computed using nbodykit.transform.HaloConcentration()
.
This uses the analytic formulas for concentration from Dutton and Maccio 2014.
Users can override this column to implement custom mass-concentration relations.
Index
¶The attribute giving the global index rank of each particle in the
list. It is an integer from 0 to self.csize
.
Note that slicing changes this index value.
Radius
()[source]¶The halo radius, computed using nbodykit.transform.HaloRadius()
.
Assumed units of \(\mathrm{Mpc}/h\).
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles, and
all CatalogSource objects will contain this column.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
VelocityOffset
()[source]¶The redshift-space distance offset due to the velocity in units of distance. The assumed units are \(\mathrm{Mpc}/h\).
This multiplies Velocity
by \(1 / (a 100 E(z)) = 1 / (a H(z)/h)\).
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column.
Note
If the base
attribute is set, columns will be deleted
from base
instead of from self
.
__finalize__
(other)¶Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar
to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.
Parameters: | other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied |
---|---|
Returns: | return self , with the added attributes |
Return type: | CatalogSource |
__getitem__
(sel)¶The following types of indexing are supported:
Notes
base
attribute is set, columns will be returned
from base
instead of from self
.__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the base
attribute is set, the value of
base.columns
will be returned.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
base
attribute is set, compute()
will called using base
instead of self
.Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
copy
()¶Return a shallow copy of the object, where each column is a reference
of the corresponding column in self
.
Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary
of the copy no longer related to self
.
Returns: | a new CatalogSource that holds all of the data columns of self |
---|---|
Return type: | CatalogSource |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
If the base
attribute is set, the base.csize
attribute
will be returned.
get_hardcolumn
(col)¶Construct and return a hard-coded column.
These are usually produced by calling member functions marked by the
@column
decorator.
Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.
Note
If the base
attribute is set, get_hardcolumn()
will called using base
instead of self
.
gslice
(start, stop, end=1, redistribute=True)¶Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
Parameters: |
|
---|
hardcolumns
¶A list of the hard-coded columns in the CatalogSource.
These columns are usually member functions marked by @column
decorator. Subclasses may override this method and use
get_hardcolumn()
to bypass the decorator logic.
Note
If the base
attribute is set, the value of
base.hardcolumns
will be returned.
make_column
(array)¶Utility function to convert an array-like object to a
dask.array.Array
.
Note
The dask array chunk size is controlled via the dask_chunk_size
global option. See set_options
.
Parameters: | array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object |
---|---|
Returns: | a dask array initialized from array |
Return type: | dask.array.Array |
persist
(columns=None)¶Return a CatalogSource, where the selected columns are computed and persist in memory.
populate
(model, BoxSize=None, seed=None, **params)[source]¶Populate the HaloCatalog using a halotools
model.
The model can be a built-in model from nbodykit.hod
(which
will be converted to a Halotools model) or directly a Halotools model
instance.
This assumes that this is the first time this catalog has been
populated with the input model. To re-populate using the same
model (but different parameters), call the repopulate()
function of the returned PopulatedHaloCatalog
.
Parameters: |
|
---|---|
Returns: | cat – the catalog object storing information about the populated objects |
Return type: |
Examples
Initialize a demo halo catalog:
>>> from nbodykit.tutorials import DemoHaloCatalog
>>> cat = DemoHaloCatalog('bolshoi', 'rockstar', 0.5)
Populate with the built-in Zheng07 model:
>>> from nbodykit.hod import Zheng07Model
>>> galcat = cat.populate(Zheng07Model, seed=42)
And then re-populate galaxy catalog with new parameters:
>>> galcat.repopulate(alpha=0.9, logMmin=13.5, seed=42)
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The number of objects in the CatalogSource on the local rank.
If the base
attribute is set, the base.size
attribute
will be returned.
Important
This property must be defined for all subclasses.
sort
(keys, reverse=False, usecols=None)¶Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
Parameters: |
|
---|
to_halotools
(BoxSize=None)[source]¶Return the HaloCatalog as a
halotools.sim_manager.UserSuppliedHaloCatalog
.
The Halotools catalog only holds the local data, although halos are
labeled via the halo_id
column using the global index.
Parameters: | BoxSize (float, array_like, optional) – the size of the box; must be supplied if ‘BoxSize’ is not in
the attrs dict |
---|---|
Returns: | cat – the Halotools halo catalog, storing the local halo data |
Return type: | halotools.sim_manager.UserSuppliedHaloCatalog |
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
to_subvolumes
(domain=None, position='Position', columns=None)¶Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
Parameters: |
|
---|---|
Returns: | A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object. self.attrs are carried over as a shallow copy to the returned object. |
Return type: |
view
(type=None)¶Return a “view” of the CatalogSource object, with the returned
type set by type
.
This initializes a new empty class of type type
and attaches
attributes to it via the __finalize__()
mechanism.
Parameters: | type (Python type) – the desired class type of the returned object. |
---|
nbodykit.source.catalog.halos.
PopulatedHaloCatalog
(data, model, cosmo, comm=None)[source]¶A CatalogSource to represent a set of objects populated into a
HaloCatalog
.
Note
Users should not access this class directly, but rather, call
HaloCatalog.populate()
to generate a PopulatedHaloCatalog
.
Parameters: |
|
---|---|
Attributes: |
|
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a shallow copy of the object, where each column is a reference of the corresponding column in self . |
get_hardcolumn (col) |
Return a column from the underlying data array/dict. |
gslice (start, stop[, end, redistribute]) |
Execute a global slice of a CatalogSource. |
make_column (array) |
Utility function to convert an array-like object to a dask.array.Array . |
persist ([columns]) |
Return a CatalogSource, where the selected columns are computed and persist in memory. |
read (columns) |
Return the requested columns as dask arrays. |
repopulate ([seed]) |
Re-populate the catalog in-place, using the specified seed or model parameters. |
save (output[, columns, dataset, datasets, …]) |
Save the CatalogSource to a bigfile.BigFile . |
sort (keys[, reverse, usecols]) |
Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys. |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
to_subvolumes ([domain, position, columns]) |
Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. |
view ([type]) |
Return a “view” of the CatalogSource object, with the returned type set by type . |
create_instance |
Index
¶The attribute giving the global index rank of each particle in the
list. It is an integer from 0 to self.csize
.
Note that slicing changes this index value.
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles, and
all CatalogSource objects will contain this column.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles, and all CatalogSource objects will contain this column.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column.
Note
If the base
attribute is set, columns will be deleted
from base
instead of from self
.
__finalize__
(other)¶Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.
The idea here is to only copy over attributes that are similar
to meta-data, so we do not copy some of the core attributes of the
CatalogSource
object.
Parameters: | other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied |
---|---|
Returns: | return self , with the added attributes |
Return type: | CatalogSource |
__getitem__
(sel)¶The following types of indexing are supported:
Notes
base
attribute is set, columns will be returned
from base
instead of from self
.__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
Note
If the base
attribute is set, the value of
base.columns
will be returned.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
base
attribute is set, compute()
will called using base
instead of self
.Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
copy
()¶Return a shallow copy of the object, where each column is a reference
of the corresponding column in self
.
Note
No copy of data is made.
Note
This is different from view in that the attributes dictionary
of the copy no longer related to self
.
Returns: | a new CatalogSource that holds all of the data columns of self |
---|---|
Return type: | CatalogSource |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
If the base
attribute is set, the base.csize
attribute
will be returned.
get_hardcolumn
(col)¶Return a column from the underlying data array/dict.
Columns are returned as dask arrays.
gslice
(start, stop, end=1, redistribute=True)¶Execute a global slice of a CatalogSource.
Note
After the global slice is performed, the data is scattered evenly across all ranks.
Note
The current algorithm generates an index on the root rank and does not scale well.
Parameters: |
|
---|
hardcolumns
¶The union of the columns in the file and any transformed columns.
make_column
(array)¶Utility function to convert an array-like object to a
dask.array.Array
.
Note
The dask array chunk size is controlled via the dask_chunk_size
global option. See set_options
.
Parameters: | array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object |
---|---|
Returns: | a dask array initialized from array |
Return type: | dask.array.Array |
persist
(columns=None)¶Return a CatalogSource, where the selected columns are computed and persist in memory.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
repopulate
(seed=None, **params)[source]¶Re-populate the catalog in-place, using the specified seed
or model parameters.
This re-uses the model that was last used to create this catalog.
It is faster than HaloCatalog.populate()
as it avoids
initialization steps. It is intended to be used when looping over
different parameter sets, e.g., when performing parameter optimization.
Note
This operation is performed in-place.
Parameters: |
|
---|
save
(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The number of objects in the CatalogSource on the local rank.
If the base
attribute is set, the base.size
attribute
will be returned.
Important
This property must be defined for all subclasses.
sort
(keys, reverse=False, usecols=None)¶Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
Sort columns must be floating or integer type.
Note
After the sort operation, the data is scattered evenly across all ranks.
Parameters: |
|
---|
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
to_subvolumes
(domain=None, position='Position', columns=None)¶Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.
This will read in the full position array and all of the requested columns.
Parameters: |
|
---|---|
Returns: | A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object. self.attrs are carried over as a shallow copy to the returned object. |
Return type: |
view
(type=None)¶Return a “view” of the CatalogSource object, with the returned
type set by type
.
This initializes a new empty class of type type
and attaches
attributes to it via the __finalize__()
mechanism.
Parameters: | type (Python type) – the desired class type of the returned object. |
---|