nbodykit includes several methods for generating mock catalogs, with varying
levels of sophistication. These CatalogSource
objects allow users to create catalogs of objects at run time and include:
nbodykit includes two subclasses of
CatalogSource
that
generate particles randomly in a box: RandomCatalog
and UniformCatalog
. While these catalogs do not produce
realistic cosmological distributions of objects, they are especially useful
for generating catalogs quickly and for testing purposes.
RandomCatalog
¶The RandomCatalog
class includes a random number generator
with the functionality of numpy.random.RandomState
that generates
random numbers in parallel and in a manner that is independent of the number
of MPI ranks being used. This property is especially useful for running
reproducible tests where the number of CPUs might vary. The random number
generator is stored as the rng
attribute.
Users can use this random number generator to add columns to the catalog,
using the syntax to add columns.
For example,
In [1]:
from nbodykit.lab import RandomCatalog
import numpy
# initialize a catalog with only the default columns
cat = RandomCatalog(csize=100) # collective size of 100
print("columns = ", cat.columns) # only the default columns present
# add mass uniformly distributed in log10
cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))
# add normally distributed velocity
cat['Velocity'] = cat.rng.normal(loc=0, scale=500, size=cat.size)
print(cat.columns)
columns = ['Selection', 'Value', 'Weight']
['Mass', 'Selection', 'Value', 'Velocity', 'Weight']
Caveats
rng
attribute, please see the API documentation for MPIRandomState
.size
attribute. Most functions
of the rng
attribute accept the size
keyword to generate an array of the correct size.UniformCatalog
¶The UniformCatalog
is a subclass of
RandomCatalog
that includes Position
and Velocity
columns that are uniformly distributed. The positions of the particles are
uniformly distributed between zero and the size of the box (as specified by
the user), with an input number density. The velocities are also
uniformly distributed but on a scale that is 1% of the size of the box.
For example,
In [2]:
from nbodykit.lab import UniformCatalog
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
print("columns = ", cat.columns)
# min must be greater than 0
print("position minimum = ", cat.compute(cat['Position'].min()))
# max must be less than 1.0
print("position maximum = ", cat.compute(cat['Position'].max()))
# min must be greater than 0
print("velocity minimum = ", cat.compute(cat['Velocity'].min()))
# max must be less than 0.01
print("velocity maximum = ", cat.compute(cat['Velocity'].max()))
columns = ['Position', 'Selection', 'Value', 'Velocity', 'Weight']
position minimum = 0.000156849663905
position maximum = 0.99925086034
velocity minimum = 1.56849663905e-06
velocity maximum = 0.0099925086034
Note that because UniformCatalog
is a subclass of
RandomCatalog
users can also use the
rng
attribute to add new columns to
a UniformCatalog
object.
The LogNormalCatalog
offers a more realistic
approximation of cosmological large-scale structure. The class
generates a set of objects by Poisson sampling a log-normal density field,
using the Zel’dovich approximation to model non-linear evolution.
Given a linear power spectrum function, redshift, and linear bias
supplied by the user, this class performs the following steps:
Generate Gaussian initial conditions
First, a Gaussian overdensity field \(\delta_L(\vk)\) is generated in Fourier space with a power spectrum given by a function specified by the user. We also generate linear velocity fields from the overdensity field in Fourier space, using
where \(f\) is the logarithmic growth rate, \(a\) is the scale factor, and \(H\) is the Hubble parameter at \(a\). Note that bold variables reflect vector quantities.
Finally, we Fourier transform \(\delta_L(\vk)\) and \(\vv(\vk)\) to configuration space. These fields serve as the “initial conditions”. They will be evolved forward in time and Poisson sampled to create the final catalog.
Perform the log-normal transformation
Next, we perform a log-normal transformation on the density field \(\delta\). As first discussed in Coles and Jones 1991, the distribution of galaxies on intermediate to large scales can be well-approximated by a log-normal distribution. An additional useful property of a log-normal field is that it follows the natural constraint \(\delta(\vx) \ge -1\) of density contrasts by definition. This property does not generally hold true for Gaussian realizations of the overdensity field.
The new, transformed density field is given by
where the normalization factor \(\sigma^2\) ensures that the mean of
the \(\delta(\vx)\) vanishes and \(b_L\) is the Lagrangian bias
factor, which is related to the final, linear bias as \(b_L = b_1 - 1\).
Here, \(b_1\) is the value input by the user as the bias
keyword
to the LogNormalCatalog
class.
Poisson sample the density field
We then generate discrete positions of objects by Poisson sampling the
overdensity field in each cell of the mesh. We assign each object the
velocity of the mesh cell that it is located in, and objects are placed randomly
inside their cells. The desired number density of objects in the box is
specified by the user as the nbar
parameter to the
LogNormalCatalog
class.
Apply the Zel’dovich approximation
Finally, we evolve the overdensity field according to the Zel’dovich approximation, which is 1st order Lagrangian perturbation theory. To do this, we move the positions of each object according to the linear velocity field,
where \(\vr(\vx)\) is the final position of the objects, \(\vr_0(\vx)\) is the initial position, and \(\vv(\vx)\) is the velocity assigned in the previous step.
After this step, we have a catalog of discrete objects, with a Position
column in units of \(\mathrm{Mpc}/h\), a Velocity
columns in units
of km/s, and a VelocityOffset
column in units of \(\mathrm{Mpc}/h\).
The VelocityOffset
is a convenience function for adding redshift-space
distortions (see Adding Redshift-space Distortions), such that RSD can be added using:
line_of_sight = [0,0,1]
src['Position'] = src['Position'] + src['VelocityOffset'] * line_of_sight
Note
For examples using log-normal mocks, see the Log-normal Mocks recipe in The Cookbook.
nbodykit includes functionality to generate mock galaxy catalogs using the
Halo Occupation Distribution (HOD) technique via the HODCatalog
class. The HOD technique populates a catalog of halos with galaxies based on
a functional form for the probability that a halo of mass \(M\) hosts
\(N\) objects, \(P(N|M)\). The functional form of the HOD used by
HODCatalog
is the form used in Zheng et al 2007.
The average number of galaxies in a halo of mass \(M\) is
where the occupation functions for centrals and satellites are given by
This HOD parametrization has 5 parameters, which can be summarized as:
Parameter | Name | Default | Description |
\(\log_{10}M_\mathrm{min}\) | logMmin |
13.031 | Minimum mass required for a halo to host a central galaxy |
\(\sigma_{\log_{10}M}\) | sigma_logM |
0.38 | Rate of transition from \(N_\mathrm{cen}=0\) to \(N_\mathrm{cen}=1\) |
\(\alpha\) | alpha |
0.76 | Power law slope of the relation between halo mass and \(N_\mathrm{sat}\) |
\(\log_{10}M_0\) | logM0 |
13.27 | Low-mass cutoff in \(N_\mathrm{sat}\) |
\(\log_{10}M_1\) | logM1 |
14.08 | Characteristic halo mass where \(N_\mathrm{sat}\) begins to assume a power law form |
The default values of the HOD parameters are taken from Reid et al. 2014.
This form of the HOD clustering description assumes the galaxy – halo connection depends only on the halo mass. Thus, given a catalog of halo objects, with associated mass values, users can quickly generate realistic galaxy catalogs using this class.
halotools
via a HaloCatalog
¶Internally, the HODCatalog
class uses the halotools
package to perform the halo population step. For further details,
see the documentation for halotools.empirical_models.Zheng07Cens
and
halotools.empirical_models.Zheng07Sats
, as well as
this tutorial on the Zheng 07 HOD model.
The catalog of halos input to the HODCatalog
class must be of type
halotools.sim_manager.UserSuppliedHaloCatalog
, the tabular
data format preferred by halotools
. nbodykit includes the
HaloCatalog
class in order to interface nicely with
halotools
. In particular, this catalog object
includes a to_halotools()
function to create a
UserSuppliedHaloCatalog
from the data columns
in the HaloCatalog
object.
Given a CatalogSource
object, the HaloCatalog
object
interprets the objects as halos, using a specified redshift, cosmology,
and mass definition, to add several analytic columns to the catalog, including
Radius()
and Concentration()
.
For example, below we generate uniform particles in a box and then interpret them as halos by specifying a redshift and cosmology:
In [3]:
from nbodykit.lab import HaloCatalog, cosmology
# uniform objects in a box
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
# add a Mass column to the objects
cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))
# initialize the halos
halos = HaloCatalog(cat, cosmo=cosmology.Planck15, redshift=0., mdef='vir', position='Position', velocity='Velocity', mass='Mass')
print(halos.columns)
['Concentration', 'Mass', 'Position', 'Radius', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight']
And using the to_halotools()
function, we can create
the halotools.sim_manager.UserSuppliedHaloCatalog
object needed
to initialize the HODCatalog
object.
In [4]:
halocat = halos.to_halotools()
print(halocat)
print(halocat.halo_table[:10])
<halotools.sim_manager.user_supplied_halo_catalog.UserSuppliedHaloCatalog object at 0x10e4f05c0>
halo_x halo_y halo_z ... halo_upid halo_local_id
-------------- -------------- --------------- ... --------- -------------
0.454701052221 0.832632032315 0.0690513408736 ... -1.0 0
0.31944725104 0.485187192976 0.298261631365 ... -1.0 1
0.212426273992 0.16674683797 0.176221306605 ... -1.0 2
0.318545243222 0.349067663211 0.99925086034 ... -1.0 3
0.506684614224 0.237059489969 0.389253212345 ... -1.0 4
0.472737136136 0.856967930036 0.973742710042 ... -1.0 5
0.459341161505 0.952380079848 0.456999189045 ... -1.0 6
0.924477185104 0.884377985707 0.0908425359374 ... -1.0 7
0.455365520004 0.808608669796 0.827823954099 ... -1.0 8
0.523110562052 0.100086947087 0.700826727159 ... -1.0 9
Caveats
HaloCatalog
are assumed to be \(\mathrm{Mpc}/h\), km/s,
and \(M_\odot/h\), respectively. These units are necessary to interface
with halotools
.HaloCatalog
can be “vir”
to use virial masses, or an overdensity factor with respect to the critical
or mean density, i.e. “200c”, “500c”, or “200m”, “500m”.FOF
, to identify halos, the user can use
the to_halos()
function
to directly produce a HaloCatalog
from the result of running
the FOF algorithm.Concentration
column of a HaloCatalog
object are generated using the input mass
definition and the analytic formulas from
Dutton and Maccio 2014. Users can
overwrite this column with their own values if they wish to use custom
concentration values when generating HOD catalogs.HODCatalog
Class¶Users can initialize the HOD catalog directly from the
UserSuppliedHaloCatalog
object and the desired
HOD parameters. The HODCatalog
object will include all of the
columns from the UserSuppliedHaloCatalog
object,
with the usual columns Position
, Velocity
, and VelocityOffset
for the generated galaxies. The additional columns are:
csize
size
conc_NFWmodel
Velocity
, in units of km/sPosition
, in units of \(\mathrm{Mpc}/h\)Below we initialize a HODCatalog
of galaxies and compute the
number of centrals and satellites in the catalog, using the gal_type
column.
In [5]:
from nbodykit.lab import HODCatalog
hod = HODCatalog(halocat, alpha=0.5, sigma_logM=0.40, seed=42)
print("total number of HOD galaxies = ", hod.csize)
print(hod.columns)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies = 127
['Position', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight', 'conc_NFWmodel', 'gal_type', 'halo_id', 'halo_local_id', 'halo_mvir', 'halo_nfw_conc', 'halo_num_centrals', 'halo_num_satellites', 'halo_rvir', 'halo_upid', 'halo_vx', 'halo_vy', 'halo_vz', 'halo_x', 'halo_y', 'halo_z', 'host_centric_distance', 'vx', 'vy', 'vz', 'x', 'y', 'z']
number of centrals = 66
number of satellites = 61
Caveats
UserSuppliedHaloCatalog
table should contain
a halo_nfw_conc
column. Otherwise, the analytic prescriptions from
Dutton and Maccio 2014 are used.We can also quickly repopulate a HOD catalog in place, generating a new set of galaxies for the same set of halos, either changing the random seed or the HOD parameters. For example,
In [6]:
# repopulate, just changing the random seed
hod.repopulate(seed=84)
print("total number of HOD galaxies = ", hod.csize)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
# re-populate with new parameters
hod.repopulate(logM0=13.2, logM1=14.5)
print("total number of HOD galaxies = ", hod.csize)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies = 118
number of centrals = 60
number of satellites = 58
total number of HOD galaxies = 108
number of centrals = 63
number of satellites = 45
Note
For examples using HOD mocks, see the Halo Occupation Distribution (HOD) Mocks recipe in The Cookbook.
Users can implement catalogs that use custom HOD modeling by subclassing
the HODBase
class. This base class is abstract, and subclasses must
implement the __makemodel__()
function. This function
returns a HodModelFactory
object, which is the halotools
object responsible for supporting
custom HOD models. For more information on designing your own HOD model
using halotools
, see
this series of halotools tutorials.