Generating Catalogs of Mock Data¶
nbodykit includes several methods for generating mock catalogs, with varying
levels of sophistication. These CatalogSource
objects allow users to create catalogs of objects at run time and include:
Randomly Distributed Objects¶
nbodykit includes two subclasses of
CatalogSource
that
generate particles randomly in a box: RandomCatalog
and UniformCatalog
. While these catalogs do not produce
realistic cosmological distributions of objects, they are especially useful
for generating catalogs quickly and for testing purposes.
RandomCatalog
¶
The RandomCatalog
class includes a random number generator
with the functionality of numpy.random.RandomState
that generates
random numbers in parallel and in a manner that is independent of the number
of MPI ranks being used. This property is especially useful for running
reproducible tests where the number of CPUs might vary. The random number
generator is stored as the rng
attribute.
Users can use this random number generator to add columns to the catalog,
using the syntax to add columns.
For example,
[1]:
from nbodykit.lab import RandomCatalog
import numpy
# initialize a catalog with only the default columns
cat = RandomCatalog(csize=100) # collective size of 100
print("columns = ", cat.columns) # only the default columns present
# add mass uniformly distributed in log10
cat['Mass'] = 10**(cat.rng.uniform(12, 15))
# add normally distributed velocity
cat['Velocity'] = cat.rng.normal(loc=0, scale=500)
print(cat.columns)
columns = ['Selection', 'Value', 'Weight']
['Mass', 'Selection', 'Value', 'Velocity', 'Weight']
/home/yfeng1/anaconda3/install/lib/python3.6/sitepackages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Caveats
For a list of the full functionality of the
rng
attribute, please see the API documentation forMPIRandomState
.When adding columns, the new column must have the same length as the local size of the catalog, as specified by the
size
attribute. Most functions of therng
attribute accept thesize
keyword to generate an array of the correct size.
UniformCatalog
¶
The UniformCatalog
is a subclass of
RandomCatalog
that includes Position
and Velocity
columns that are uniformly distributed. The positions of the particles are
uniformly distributed between zero and the size of the box (as specified by
the user), with an input number density. The velocities are also
uniformly distributed but on a scale that is 1% of the size of the box.
For example,
[2]:
from nbodykit.lab import UniformCatalog
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
print("columns = ", cat.columns)
# min must be greater than 0
print("position minimum = ", cat.compute(cat['Position'].min()))
# max must be less than 1.0
print("position maximum = ", cat.compute(cat['Position'].max()))
# min must be greater than 0
print("velocity minimum = ", cat.compute(cat['Velocity'].min()))
# max must be less than 0.01
print("velocity maximum = ", cat.compute(cat['Velocity'].max()))
columns = ['Position', 'Selection', 'Value', 'Velocity', 'Weight']
position minimum = 0.00015684966390538957
position maximum = 0.9992508603400948
velocity minimum = 2.4238347118806793e05
velocity maximum = 0.00999869916263692
Note that because UniformCatalog
is a subclass of
RandomCatalog
users can also use the
rng
attribute to add new columns to
a UniformCatalog
object.
Lognormal Mocks¶
The LogNormalCatalog
offers a more realistic
approximation of cosmological largescale structure. The class
generates a set of objects by Poisson sampling a lognormal density field,
using the Zel’dovich approximation to model nonlinear evolution.
Given a linear power spectrum function, redshift, and linear bias
supplied by the user, this class performs the following steps:
Generate Gaussian initial conditions
First, a Gaussian overdensity field \(\delta_L(\vk)\) is generated in Fourier space with a power spectrum given by a function specified by the user. We also generate linear velocity fields from the overdensity field in Fourier space, using
where \(f\) is the logarithmic growth rate, \(a\) is the scale factor, and \(H\) is the Hubble parameter at \(a\). Note that bold variables reflect vector quantities.
Finally, we Fourier transform \(\delta_L(\vk)\) and \(\vv(\vk)\) to configuration space. These fields serve as the “initial conditions”. They will be evolved forward in time and Poisson sampled to create the final catalog.
Perform the lognormal transformation
Next, we perform a lognormal transformation on the density field \(\delta\). As first discussed in Coles and Jones 1991, the distribution of galaxies on intermediate to large scales can be wellapproximated by a lognormal distribution. An additional useful property of a lognormal field is that it follows the natural constraint \(\delta(\vx) \ge 1\) of density contrasts by definition. This property does not generally hold true for Gaussian realizations of the overdensity field.
The new, transformed density field is given by
where the normalization factor \(\sigma^2\) ensures that the mean of
the \(\delta(\vx)\) vanishes and \(b_L\) is the Lagrangian bias
factor, which is related to the final, linear bias as \(b_L = b_1  1\).
Here, \(b_1\) is the value input by the user as the bias
keyword
to the LogNormalCatalog
class.
Poisson sample the density field
We then generate discrete positions of objects by Poisson sampling the
overdensity field in each cell of the mesh. We assign each object the
velocity of the mesh cell that it is located in, and objects are placed randomly
inside their cells. The desired number density of objects in the box is
specified by the user as the nbar
parameter to the
LogNormalCatalog
class.
Apply the Zel’dovich approximation
Finally, we evolve the overdensity field according to the Zel’dovich approximation, which is 1st order Lagrangian perturbation theory. To do this, we move the positions of each object according to the linear velocity field,
where \(\vr(\vx)\) is the final position of the objects, \(\vr_0(\vx)\) is the initial position, and \(\vv(\vx)\) is the velocity assigned in the previous step.
After this step, we have a catalog of discrete objects, with a Position
column in units of \(\mathrm{Mpc}/h\), a Velocity
columns in units
of km/s, and a VelocityOffset
column in units of \(\mathrm{Mpc}/h\).
The VelocityOffset
is a convenience function for adding redshiftspace
distortions (see Adding Redshiftspace Distortions), such that RSD can be added using:
line_of_sight = [0,0,1]
src['Position'] = src['Position'] + transform.VectorProjection(src['VelocityOffset'], line_of_sight)
Note
For examples using lognormal mocks, see the cookbook/lognormalmocks.ipynb recipe in The Cookbook.
Halo Occupation Distribution Mocks¶
nbodykit includes functionality to generate mock galaxy catalogs using the
Halo Occupation Distribution (HOD) technique via the HODCatalog
class. The HOD technique populates a catalog of halos with galaxies based on
a functional form for the probability that a halo of mass \(M\) hosts
\(N\) objects, \(P(NM)\). The functional form of the HOD used by
HODCatalog
is the form used in Zheng et al 2007.
The average number of galaxies in a halo of mass \(M\) is
where the occupation functions for centrals and satellites are given by
This HOD parametrization has 5 parameters, which can be summarized as:
Parameter 
Name 
Default 
Description 
\(\log_{10}M_\mathrm{min}\) 

13.031 
Minimum mass required for a halo to host a central galaxy 
\(\sigma_{\log_{10}M}\) 

0.38 
Rate of transition from \(N_\mathrm{cen}=0\) to \(N_\mathrm{cen}=1\) 
\(\alpha\) 

0.76 
Power law slope of the relation between halo mass and \(N_\mathrm{sat}\) 
\(\log_{10}M_0\) 

13.27 
Lowmass cutoff in \(N_\mathrm{sat}\) 
\(\log_{10}M_1\) 

14.08 
Characteristic halo mass where \(N_\mathrm{sat}\) begins to assume a power law form 
The default values of the HOD parameters are taken from Reid et al. 2014.
This form of the HOD clustering description assumes the galaxy – halo connection depends only on the halo mass. Thus, given a catalog of halo objects, with associated mass values, users can quickly generate realistic galaxy catalogs using this class.
Interfacing with halotools
via a HaloCatalog
¶
Internally, the HODCatalog
class uses the halotools
package to perform the halo population step. For further details,
see the documentation for halotools.empirical_models.Zheng07Cens
and
halotools.empirical_models.Zheng07Sats
, as well as
this tutorial on the Zheng 07 HOD model.
The catalog of halos input to the HODCatalog
class must be of type
halotools.sim_manager.UserSuppliedHaloCatalog
, the tabular
data format preferred by halotools
. nbodykit includes the
HaloCatalog
class in order to interface nicely with
halotools
. In particular, this catalog object
includes a to_halotools()
function to create a
UserSuppliedHaloCatalog
from the data columns
in the HaloCatalog
object.
Given a CatalogSource
object, the HaloCatalog
object
interprets the objects as halos, using a specified redshift, cosmology,
and mass definition, to add several analytic columns to the catalog, including
Radius()
and Concentration()
.
For example, below we generate uniform particles in a box and then interpret them as halos by specifying a redshift and cosmology:
[3]:
from nbodykit.lab import HaloCatalog, cosmology
# uniform objects in a box
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
# add a Mass column to the objects
cat['Mass'] = 10**(cat.rng.uniform(12, 15))
# initialize the halos
halos = HaloCatalog(cat, cosmo=cosmology.Planck15, redshift=0., mdef='vir', position='Position', velocity='Velocity', mass='Mass')
print(halos.columns)
['Concentration', 'Mass', 'Position', 'Radius', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight']
And using the to_halotools()
function, we can create
the halotools.sim_manager.UserSuppliedHaloCatalog
object needed
to initialize the HODCatalog
object.
[4]:
halocat = halos.to_halotools()
print(halocat)
print(halocat.halo_table[:10])
<halotools.sim_manager.user_supplied_halo_catalog.UserSuppliedHaloCatalog object at 0x7f78f96c0198>
halo_x halo_y ... halo_upid halo_local_id
  ...  
0.45470105222137214 0.8326320323149597 ... 1.0 0
0.31944725103976734 0.48518719297596336 ... 1.0 1
0.21242627399222258 0.16674683796980805 ... 1.0 2
0.3185452432219371 0.3490676632113582 ... 1.0 3
0.5066846142238558 0.23705948996927073 ... 1.0 4
0.47273713613639634 0.8569679300356892 ... 1.0 5
0.45934116150501314 0.9523800798479475 ... 1.0 6
0.9244771851038425 0.8843779857073643 ... 1.0 7
0.4553655200041342 0.8086086697963959 ... 1.0 8
0.5231105620516908 0.10008694708748456 ... 1.0 9
Caveats
The units of the halo position, velocity, and mass input to
HaloCatalog
are assumed to be \(\mathrm{Mpc}/h\), km/s, and \(M_\odot/h\), respectively. These units are necessary to interface withhalotools
.The mass definition input to
HaloCatalog
can be “vir” to use virial masses, or an overdensity factor with respect to the critical or mean density, i.e. “200c”, “500c”, or “200m”, “500m”.If using the builtin friendsoffriends (FOF) finder class,
FOF
, to identify halos, the user can use theto_halos()
function to directly produce aHaloCatalog
from the result of running the FOF algorithm.By default, the halo concentration values stored in the
Concentration
column of aHaloCatalog
object are generated using the input mass definition and the analytic formulas from Dutton and Maccio 2014. Users can overwrite this column with their own values if they wish to use custom concentration values when generating HOD catalogs.
The HODCatalog
Class¶
Users can initialize the HOD catalog directly from the
UserSuppliedHaloCatalog
object and the desired
HOD parameters. The HODCatalog
object will include all of the
columns from the UserSuppliedHaloCatalog
object,
with the usual columns Position
, Velocity
, and VelocityOffset
for the generated galaxies. The additional columns are:
conc_NFWmodel: the concentration of the halo
gal_type: the galaxy type, 0 for centrals and 1 for satellites
halo_id: the global ID of the halo that this galaxy belongs to, between 0 and
csize
halo_local_id: the local ID of the halo that this galaxy belongs to, between 0 and
size
halo_mvir: the halo mass, in units of \(M_\odot/h\)
halo_nfw_conc: alias of
conc_NFWmodel
halo_num_centrals: the number of centrals that this halo hosts, either 0 or 1
halo_num_satellites: the number of satellites that this halo hosts
halo_rvir: the halo radius, in units of \(\mathrm{Mpc}/h\)
halo_upid: equal to 1; should be ignored by the user
halo_vx, halo_vy, halo_vz: the three components of the halo velocity, in units of km/s
halo_x, halo_y, halo_z: the three components of the halo position, in units of \(\mathrm{Mpc}/h\)
host_centric_distance: the distance from this galaxy to the center of the halo, in units of \(\mathrm{Mpc}/h\)
vx, vy, vz: the three components of the galaxy velocity, equal to
Velocity
, in units of km/sx,y,z: the three components of the galaxy position, equal to
Position
, in units of \(\mathrm{Mpc}/h\)
Below we populate galaxies with Zheng07Model, and compute the
number of centrals and satellites in the catalog, using the gal_type
column.
[5]:
from nbodykit.hod import Zheng07Model
hod = halos.populate(Zheng07Model, alpha=0.5, sigma_logM=0.40, seed=42)
print("total number of HOD galaxies = ", hod.csize)
print(hod.columns)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies = 291
['Position', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight', 'conc_NFWmodel', 'gal_type', 'halo_hostid', 'halo_id', 'halo_mvir', 'halo_num_centrals', 'halo_num_satellites', 'halo_rvir', 'halo_upid', 'halo_vx', 'halo_vy', 'halo_vz', 'halo_x', 'halo_y', 'halo_z', 'host_centric_distance', 'vx', 'vy', 'vz', 'x', 'y', 'z']
number of centrals = 94
number of satellites = 197
Caveats
The HOD population step requires halo concentration. If the user wishes to uses custom concentration values, the
UserSuppliedHaloCatalog
table should contain ahalo_nfw_conc
column. Otherwise, the analytic prescriptions from Dutton and Maccio 2014 are used.
Repopulating a HOD Catalog¶
We can also quickly repopulate a HOD catalog in place, generating a new set of galaxies for the same set of halos, either changing the random seed or the HOD parameters. For example,
[6]:
# repopulate, just changing the random seed
hod.repopulate(seed=84)
print("total number of HOD galaxies = ", hod.csize)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
# repopulate with new parameters
hod.repopulate(logM0=13.2, logM1=14.5)
print("total number of HOD galaxies = ", hod.csize)
print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies = 282
number of centrals = 95
number of satellites = 187
total number of HOD galaxies = 141
number of centrals = 93
number of satellites = 48
Note
For examples using HOD mocks, see the cookbook/hodmocks.ipynb recipe in The Cookbook.
Using a Custom HOD Model¶
Users can implement catalogs that use custom HOD modeling by subclassing
the HODBase
class. This base class is abstract, and subclasses must
implement the __makemodel__()
function. This function
returns a HodModelFactory
object, which is the halotools
object responsible for supporting
custom HOD models. For more information on designing your own HOD model
using halotools
, see
this series of halotools tutorials.
[7]: