nbodykit includes several methods for generating mock catalogs, with varying
levels of sophistication. These CatalogSource
objects allow users to create catalogs of objects at run time and include:
nbodykit includes two subclasses of
CatalogSource that
generate particles randomly in a box: RandomCatalog
and UniformCatalog. While these catalogs do not produce
realistic cosmological distributions of objects, they are especially useful
for generating catalogs quickly and for testing purposes.
RandomCatalog¶The RandomCatalog class includes a random number generator
with the functionality of numpy.random.RandomState that generates
random numbers in parallel and in a manner that is independent of the number
of MPI ranks being used. This property is especially useful for running
reproducible tests where the number of CPUs might vary. The random number
generator is stored as the rng attribute.
Users can use this random number generator to add columns to the catalog,
using the syntax to add columns.
For example,
In [1]:
from nbodykit.lab import RandomCatalog
import numpy
# initialize a catalog with only the default columns
cat = RandomCatalog(csize=100) # collective size of 100
print("columns = ", cat.columns) # only the default columns present
# add mass uniformly distributed in log10
cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))
# add normally distributed velocity
cat['Velocity'] = cat.rng.normal(loc=0, scale=500, size=cat.size)
print(cat.columns)
columns = ['Selection', 'Value', 'Weight']
['Mass', 'Selection', 'Value', 'Velocity', 'Weight']
Caveats
rng
attribute, please see the API documentation for MPIRandomState.size attribute. Most functions
of the rng attribute accept the size
keyword to generate an array of the correct size.UniformCatalog¶The UniformCatalog is a subclass of
RandomCatalog that includes Position and Velocity
columns that are uniformly distributed. The positions of the particles are
uniformly distributed between zero and the size of the box (as specified by
the user), with an input number density. The velocities are also
uniformly distributed but on a scale that is 1% of the size of the box.
For example,
In [2]:
from nbodykit.lab import UniformCatalog
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
print("columns = ", cat.columns)
# min must be greater than 0
print("position minimum = ", cat.compute(cat['Position'].min()))
# max must be less than 1.0
print("position maximum = ", cat.compute(cat['Position'].max()))
# min must be greater than 0
print("velocity minimum = ", cat.compute(cat['Velocity'].min()))
# max must be less than 0.01
print("velocity maximum = ", cat.compute(cat['Velocity'].max()))
columns = ['Position', 'Selection', 'Value', 'Velocity', 'Weight']
position minimum = 0.000156849663905
position maximum = 0.99925086034
velocity minimum = 1.56849663905e-06
velocity maximum = 0.0099925086034
Note that because UniformCatalog is a subclass of
RandomCatalog users can also use the
rng attribute to add new columns to
a UniformCatalog object.
The LogNormalCatalog offers a more realistic
approximation of cosmological large-scale structure. The class
generates a set of objects by Poisson sampling a log-normal density field,
using the Zel’dovich approximation to model non-linear evolution.
Given a linear power spectrum function, redshift, and linear bias
supplied by the user, this class performs the following steps:
Generate Gaussian initial conditions
First, a Gaussian overdensity field \(\delta_L(\vk)\) is generated in Fourier space with a power spectrum given by a function specified by the user. We also generate linear velocity fields from the overdensity field in Fourier space, using
where \(f\) is the logarithmic growth rate, \(a\) is the scale factor, and \(H\) is the Hubble parameter at \(a\). Note that bold variables reflect vector quantities.
Finally, we Fourier transform \(\delta_L(\vk)\) and \(\vv(\vk)\) to configuration space. These fields serve as the “initial conditions”. They will be evolved forward in time and Poisson sampled to create the final catalog.
Perform the log-normal transformation
Next, we perform a log-normal transformation on the density field \(\delta\). As first discussed in Coles and Jones 1991, the distribution of galaxies on intermediate to large scales can be well-approximated by a log-normal distribution. An additional useful property of a log-normal field is that it follows the natural constraint \(\delta(\vx) \ge -1\) of density contrasts by definition. This property does not generally hold true for Gaussian realizations of the overdensity field.
The new, transformed density field is given by
where the normalization factor \(\sigma^2\) ensures that the mean of
the \(\delta(\vx)\) vanishes and \(b_L\) is the Lagrangian bias
factor, which is related to the final, linear bias as \(b_L = b_1 - 1\).
Here, \(b_1\) is the value input by the user as the bias keyword
to the LogNormalCatalog class.
Poisson sample the density field
We then generate discrete positions of objects by Poisson sampling the
overdensity field in each cell of the mesh. We assign each object the
velocity of the mesh cell that it is located in, and objects are placed randomly
inside their cells. The desired number density of objects in the box is
specified by the user as the nbar parameter to the
LogNormalCatalog class.
Apply the Zel’dovich approximation
Finally, we evolve the overdensity field according to the Zel’dovich approximation, which is 1st order Lagrangian perturbation theory. To do this, we move the positions of each object according to the linear velocity field,
where \(\vr(\vx)\) is the final position of the objects, \(\vr_0(\vx)\) is the initial position, and \(\vv(\vx)\) is the velocity assigned in the previous step.
After this step, we have a catalog of discrete objects, with a Position
column in units of \(\mathrm{Mpc}/h\), a Velocity columns in units
of km/s, and a VelocityOffset column in units of \(\mathrm{Mpc}/h\).
The VelocityOffset is a convenience function for adding redshift-space
distortions (see Adding Redshift-space Distortions), such that RSD can be added using:
line_of_sight = [0,0,1]
src['Position'] = src['Position'] + src['VelocityOffset'] * line_of_sight
Note
For examples using log-normal mocks, see the Log-normal Mocks recipe in The Cookbook.
nbodykit includes functionality to generate mock galaxy catalogs using the
Halo Occupation Distribution (HOD) technique via the HODCatalog
class. The HOD technique populates a catalog of halos with galaxies based on
a functional form for the probability that a halo of mass \(M\) hosts
\(N\) objects, \(P(N|M)\). The functional form of the HOD used by
HODCatalog is the form used in Zheng et al 2007.
The average number of galaxies in a halo of mass \(M\) is
where the occupation functions for centrals and satellites are given by
This HOD parametrization has 5 parameters, which can be summarized as:
| Parameter | Name | Default | Description |
| \(\log_{10}M_\mathrm{min}\) | logMmin |
13.031 | Minimum mass required for a halo to host a central galaxy |
| \(\sigma_{\log_{10}M}\) | sigma_logM |
0.38 | Rate of transition from \(N_\mathrm{cen}=0\) to \(N_\mathrm{cen}=1\) |
| \(\alpha\) | alpha |
0.76 | Power law slope of the relation between halo mass and \(N_\mathrm{sat}\) |
| \(\log_{10}M_0\) | logM0 |
13.27 | Low-mass cutoff in \(N_\mathrm{sat}\) |
| \(\log_{10}M_1\) | logM1 |
14.08 | Characteristic halo mass where \(N_\mathrm{sat}\) begins to assume a power law form |
The default values of the HOD parameters are taken from Reid et al. 2014.
This form of the HOD clustering description assumes the galaxy – halo connection depends only on the halo mass. Thus, given a catalog of halo objects, with associated mass values, users can quickly generate realistic galaxy catalogs using this class.
halotools via a HaloCatalog¶Internally, the HODCatalog class uses the halotools
package to perform the halo population step. For further details,
see the documentation for halotools.empirical_models.Zheng07Cens and
halotools.empirical_models.Zheng07Sats, as well as
this tutorial on the Zheng 07 HOD model.
The catalog of halos input to the HODCatalog class must be of type
halotools.sim_manager.UserSuppliedHaloCatalog, the tabular
data format preferred by halotools. nbodykit includes the
HaloCatalog class in order to interface nicely with
halotools. In particular, this catalog object
includes a to_halotools() function to create a
UserSuppliedHaloCatalog from the data columns
in the HaloCatalog object.
Given a CatalogSource object, the HaloCatalog object
interprets the objects as halos, using a specified redshift, cosmology,
and mass definition, to add several analytic columns to the catalog, including
Radius() and Concentration().
For example, below we generate uniform particles in a box and then interpret them as halos by specifying a redshift and cosmology:
In [3]:
from nbodykit.lab import HaloCatalog, cosmology
# uniform objects in a box
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
# add a Mass column to the objects
cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))
# initialize the halos
halos = HaloCatalog(cat, cosmo=cosmology.Planck15, redshift=0., mdef='vir', position='Position', velocity='Velocity', mass='Mass')
print(halos.columns)
['Concentration', 'Mass', 'Position', 'Radius', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight']
And using the to_halotools() function, we can create
the halotools.sim_manager.UserSuppliedHaloCatalog object needed
to initialize the HODCatalog object.
In [4]:
halocat = halos.to_halotools()
print(halocat)
print(halocat.halo_table[:10])
<halotools.sim_manager.user_supplied_halo_catalog.UserSuppliedHaloCatalog object at 0x10e4f05c0>
halo_x halo_y halo_z ... halo_upid halo_local_id
-------------- -------------- --------------- ... --------- -------------
0.454701052221 0.832632032315 0.0690513408736 ... -1.0 0
0.31944725104 0.485187192976 0.298261631365 ... -1.0 1
0.212426273992 0.16674683797 0.176221306605 ... -1.0 2
0.318545243222 0.349067663211 0.99925086034 ... -1.0 3
0.506684614224 0.237059489969 0.389253212345 ... -1.0 4
0.472737136136 0.856967930036 0.973742710042 ... -1.0 5
0.459341161505 0.952380079848 0.456999189045 ... -1.0 6
0.924477185104 0.884377985707 0.0908425359374 ... -1.0 7
0.455365520004 0.808608669796 0.827823954099 ... -1.0 8
0.523110562052 0.100086947087 0.700826727159 ... -1.0 9
Caveats
HaloCatalog are assumed to be \(\mathrm{Mpc}/h\), km/s,
and \(M_\odot/h\), respectively. These units are necessary to interface
with halotools.HaloCatalog can be “vir”
to use virial masses, or an overdensity factor with respect to the critical
or mean density, i.e. “200c”, “500c”, or “200m”, “500m”.FOF, to identify halos, the user can use
the to_halos() function
to directly produce a HaloCatalog from the result of running
the FOF algorithm.Concentration
column of a HaloCatalog object are generated using the input mass
definition and the analytic formulas from
Dutton and Maccio 2014. Users can
overwrite this column with their own values if they wish to use custom
concentration values when generating HOD catalogs.HODCatalog Class¶Users can initialize the HOD catalog directly from the
UserSuppliedHaloCatalog object and the desired
HOD parameters. The HODCatalog object will include all of the
columns from the UserSuppliedHaloCatalog object,
with the usual columns Position, Velocity, and VelocityOffset
for the generated galaxies. The additional columns are:
csizesizeconc_NFWmodelVelocity, in units of km/sPosition, in units of \(\mathrm{Mpc}/h\)Below we initialize a HODCatalog of galaxies and compute the
number of centrals and satellites in the catalog, using the gal_type
column.
In [5]:
from nbodykit.lab import HODCatalog
hod = HODCatalog(halocat, alpha=0.5, sigma_logM=0.40, seed=42)
print("total number of HOD galaxies = ", hod.csize)
print(hod.columns)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies = 127
['Position', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight', 'conc_NFWmodel', 'gal_type', 'halo_id', 'halo_local_id', 'halo_mvir', 'halo_nfw_conc', 'halo_num_centrals', 'halo_num_satellites', 'halo_rvir', 'halo_upid', 'halo_vx', 'halo_vy', 'halo_vz', 'halo_x', 'halo_y', 'halo_z', 'host_centric_distance', 'vx', 'vy', 'vz', 'x', 'y', 'z']
number of centrals = 66
number of satellites = 61
Caveats
UserSuppliedHaloCatalog table should contain
a halo_nfw_conc column. Otherwise, the analytic prescriptions from
Dutton and Maccio 2014 are used.We can also quickly repopulate a HOD catalog in place, generating a new set of galaxies for the same set of halos, either changing the random seed or the HOD parameters. For example,
In [6]:
# repopulate, just changing the random seed
hod.repopulate(seed=84)
print("total number of HOD galaxies = ", hod.csize)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
# re-populate with new parameters
hod.repopulate(logM0=13.2, logM1=14.5)
print("total number of HOD galaxies = ", hod.csize)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies = 118
number of centrals = 60
number of satellites = 58
total number of HOD galaxies = 108
number of centrals = 63
number of satellites = 45
Note
For examples using HOD mocks, see the Halo Occupation Distribution (HOD) Mocks recipe in The Cookbook.
Users can implement catalogs that use custom HOD modeling by subclassing
the HODBase class. This base class is abstract, and subclasses must
implement the __makemodel__() function. This function
returns a HodModelFactory
object, which is the halotools object responsible for supporting
custom HOD models. For more information on designing your own HOD model
using halotools, see
this series of halotools tutorials.