nbodykit.source.catalog.hod module¶

class nbodykit.source.catalog.hod.HODBase(halos, seed=None, use_cache=False, comm=None, **params)[source]¶

Bases: nbodykit.source.catalog.array.ArrayCatalog

A base class to be used for HOD population of a halo catalog.

The user must supply the __makemodel__() function, which returns the halotools composite HOD model.

This abstraction allows the user to potentially implement several different types of HOD models quickly, while using the population framework of this base class.

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Position`()	Galaxy positions, in units of Mpc/h
`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Velocity`()	Galaxy velocity, in units of km/s
`VelocityOffset`()	The RSD velocity offset, in units of Mpc/h
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying data array/dict.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`repopulate`([seed])	Update the HOD parameters and then re-populate the mock catalog
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Position()[source]¶: Galaxy positions, in units of Mpc/h

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Velocity()[source]¶: Galaxy velocity, in units of km/s

VelocityOffset()[source]¶

The RSD velocity offset, in units of Mpc/h

This multiplies Velocity by 1 / (a*100*E(z)) = 1 / (a H(z)/h)

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__makemodel__()[source]¶

Abstract class to be overwritten by user; this should return the HOD model instance that will be used to do the mock population.

See the documentation for more details.

Returns:	the halotools object implementing the HOD model
Return type:	`HodModelFactory`

__makesource__()[source]¶: Make the source of galaxies by performing the halo HOD population

Note

The mock population is only done by the root, and the resulting catalog is then distributed evenly amongst the available ranks

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying data array/dict.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

repopulate(seed=None, **params)[source]¶

Update the HOD parameters and then re-populate the mock catalog

Warning

This operation is done in-place, so the size of the Source changes

Parameters:	seed (int, optional) – the new seed to use when populating the mock params – key/value pairs of HOD parameters to update

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶

to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.hod.HODCatalog(halos, logMmin=13.031, sigma_logM=0.38, alpha=0.76, logM0=13.27, logM1=14.08, seed=None, use_cache=False, comm=None)[source]¶

Bases: nbodykit.source.catalog.hod.HODBase

A CatalogSource that uses the HOD prescription of Zheng et al 2007 to populate an input halo catalog with galaxies.

The mock population is done using halotools. See the documentation for halotools.empirical_models.Zheng07Cens and halotools.empirical_models.Zheng07Sats for further details regarding the HOD.

The columns generated in this catalog are:

Position: the galaxy position
Velocity: the galaxy velocity
VelocityOffset: the RSD velocity offset, in units of distance
conc_NFWmodel: the concentration of the halo
gal_type: the galaxy type, 0 for centrals and 1 for satellites
halo_id: the global ID of the halo this galaxy belongs to, between 0 and csize
halo_local_id: the local ID of the halo this galaxy belongs to, between 0 and size
halo_mvir: the halo mass
halo_nfw_conc: alias of conc_NFWmodel
halo_num_centrals: the number of centrals that this halo hosts, either 0 or 1
halo_num_satellites: the number of satellites that this halo hosts
halo_rvir: the halo radius
halo_upid: equal to -1; should be ignored by the user
halo_vx, halo_vy, halo_vz: the three components of the halo velocity
halo_x, halo_y, halo_z: the three components of the halo position
host_centric_distance: the distance from this galaxy to the center of the halo
vx, vy, vz: the three components of the galaxy velocity, equal to Velocity
x,y,z: the three components of the galaxy position, equal to Position

For futher details, please see the documentation.

Note

Default HOD values are from Reid et al. 2014

Parameters:

Parameters:	halos (`UserSuppliedHaloCatalog`) – the halotools table holding the halo data; this object must have the following attributes: cosmology, Lbox, redshift logMmin (float, optional) – Minimum mass required for a halo to host a central galaxy sigma_logM (float, optional) – Rate of transition from <Ncen>=0 –> <Ncen>=1 alpha (float, optional) – Power law slope of the relation between halo mass and <Nsat> logM0 (float, optional) – Low-mass cutoff in <Nsat> logM1 (float, optional) – Characteristic halo mass where <Nsat> begins to assume a power law form seed (int, optional) – the random seed to generate deterministic mocks

halos (UserSuppliedHaloCatalog) – the halotools table holding the halo data; this object must have the following attributes: cosmology, Lbox, redshift
logMmin (float, optional) – Minimum mass required for a halo to host a central galaxy
sigma_logM (float, optional) – Rate of transition from <Ncen>=0 –> <Ncen>=1
alpha (float, optional) – Power law slope of the relation between halo mass and <Nsat>
logM0 (float, optional) – Low-mass cutoff in <Nsat>
logM1 (float, optional) – Characteristic halo mass where <Nsat> begins to assume a power law form
seed (int, optional) – the random seed to generate deterministic mocks

References

Zheng et al. (2007), arXiv:0703457

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Position`()	Galaxy positions, in units of Mpc/h
`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Velocity`()	Galaxy velocity, in units of km/s
`VelocityOffset`()	The RSD velocity offset, in units of Mpc/h
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying data array/dict.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`repopulate`([seed])	Update the HOD parameters and then re-populate the mock catalog
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Position()¶: Galaxy positions, in units of Mpc/h

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Velocity()¶: Galaxy velocity, in units of km/s

VelocityOffset()¶

The RSD velocity offset, in units of Mpc/h

This multiplies Velocity by 1 / (a*100*E(z)) = 1 / (a H(z)/h)

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__makemodel__()[source]¶

Return the Zheng 07 HOD model.

This model evaluates Eqs. 2 and 5 of Zheng et al. 2007

__makesource__()¶: Make the source of galaxies by performing the halo HOD population

Note

The mock population is only done by the root, and the resulting catalog is then distributed evenly amongst the available ranks

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying data array/dict.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

repopulate(seed=None, **params)¶

Update the HOD parameters and then re-populate the mock catalog

Warning

This operation is done in-place, so the size of the Source changes

Parameters:	seed (int, optional) – the new seed to use when populating the mock params – key/value pairs of HOD parameters to update

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

nbodykit.source.catalog.hod.find_object_dtypes(data)[source]¶: Utility function to convert ‘O’ data types to strings

nbodykit.source.catalog.hod.gal_type_integers(galtab)[source]¶

Convert centrals to 0 and satellites to 1 in the input galaxy Table

This operatio is done in place