Generating Catalogs of Mock Data

nbodykit includes several methods for generating mock catalogs, with varying levels of sophistication. These CatalogSource objects allow users to create catalogs of objects at run time and include:

Randomly Distributed Objects

nbodykit includes two subclasses of CatalogSource that generate particles randomly in a box: RandomCatalog and UniformCatalog. While these catalogs do not produce realistic cosmological distributions of objects, they are especially useful for generating catalogs quickly and for testing purposes.


The RandomCatalog class includes a random number generator with the functionality of numpy.random.RandomState that generates random numbers in parallel and in a manner that is independent of the number of MPI ranks being used. This property is especially useful for running reproducible tests where the number of CPUs might vary. The random number generator is stored as the rng attribute. Users can use this random number generator to add columns to the catalog, using the syntax to add columns.

For example,

In [1]:
from nbodykit.lab import RandomCatalog
import numpy

# initialize a catalog with only the default columns
cat = RandomCatalog(csize=100) # collective size of 100
print("columns = ", cat.columns) # only the default columns present

# add mass uniformly distributed in log10
cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))

# add normally distributed velocity
cat['Velocity'] = cat.rng.normal(loc=0, scale=500, size=cat.size)

columns =  ['Selection', 'Value', 'Weight']
['Mass', 'Selection', 'Value', 'Velocity', 'Weight']


  • For a list of the full functionality of the rng attribute, please see the API documentation for MPIRandomState.
  • When adding columns, the new column must have the same length as the local size of the catalog, as specified by the size attribute. Most functions of the rng attribute accept the size keyword to generate an array of the correct size.


The UniformCatalog is a subclass of RandomCatalog that includes Position and Velocity columns that are uniformly distributed. The positions of the particles are uniformly distributed between zero and the size of the box (as specified by the user), with an input number density. The velocities are also uniformly distributed but on a scale that is 1% of the size of the box.

For example,

In [2]:
from nbodykit.lab import UniformCatalog

cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
print("columns = ", cat.columns)

# min must be greater than 0
print("position minimum = ", cat.compute(cat['Position'].min()))

# max must be less than 1.0
print("position maximum = ", cat.compute(cat['Position'].max()))

# min must be greater than 0
print("velocity minimum = ", cat.compute(cat['Velocity'].min()))

# max must be less than 0.01
print("velocity maximum = ", cat.compute(cat['Velocity'].max()))
columns =  ['Position', 'Selection', 'Value', 'Velocity', 'Weight']
position minimum =  0.000156849663905
position maximum =  0.99925086034
velocity minimum =  1.56849663905e-06
velocity maximum =  0.0099925086034

Note that because UniformCatalog is a subclass of RandomCatalog users can also use the rng attribute to add new columns to a UniformCatalog object.

Log-normal Mocks

The LogNormalCatalog offers a more realistic approximation of cosmological large-scale structure. The class generates a set of objects by Poisson sampling a log-normal density field, using the Zel’dovich approximation to model non-linear evolution. Given a linear power spectrum function, redshift, and linear bias supplied by the user, this class performs the following steps:

Generate Gaussian initial conditions

First, a Gaussian overdensity field \(\delta_L(\vk)\) is generated in Fourier space with a power spectrum given by a function specified by the user. We also generate linear velocity fields from the overdensity field in Fourier space, using

\[\vv(\vk) = i f a H \delta_L(\vk) \frac{\vk}{k^2}.\]

where \(f\) is the logarithmic growth rate, \(a\) is the scale factor, and \(H\) is the Hubble parameter at \(a\). Note that bold variables reflect vector quantities.

Finally, we Fourier transform \(\delta_L(\vk)\) and \(\vv(\vk)\) to configuration space. These fields serve as the “initial conditions”. They will be evolved forward in time and Poisson sampled to create the final catalog.

Perform the log-normal transformation

Next, we perform a log-normal transformation on the density field \(\delta\). As first discussed in Coles and Jones 1991, the distribution of galaxies on intermediate to large scales can be well-approximated by a log-normal distribution. An additional useful property of a log-normal field is that it follows the natural constraint \(\delta(\vx) \ge -1\) of density contrasts by definition. This property does not generally hold true for Gaussian realizations of the overdensity field.

The new, transformed density field is given by

\[\delta(\vx) = e^{-\sigma^2 + b_L \delta_L(\vx)} - 1,\]

where the normalization factor \(\sigma^2\) ensures that the mean of the \(\delta(\vx)\) vanishes and \(b_L\) is the Lagrangian bias factor, which is related to the final, linear bias as \(b_L = b_1 - 1\). Here, \(b_1\) is the value input by the user as the bias keyword to the LogNormalCatalog class.

Poisson sample the density field

We then generate discrete positions of objects by Poisson sampling the overdensity field in each cell of the mesh. We assign each object the velocity of the mesh cell that it is located in, and objects are placed randomly inside their cells. The desired number density of objects in the box is specified by the user as the nbar parameter to the LogNormalCatalog class.

Apply the Zel’dovich approximation

Finally, we evolve the overdensity field according to the Zel’dovich approximation, which is 1st order Lagrangian perturbation theory. To do this, we move the positions of each object according to the linear velocity field,

\[\vr(\vx) = \vr_0(\vx) + \frac{\vv(\vx)}{f a H},\]

where \(\vr(\vx)\) is the final position of the objects, \(\vr_0(\vx)\) is the initial position, and \(\vv(\vx)\) is the velocity assigned in the previous step.

After this step, we have a catalog of discrete objects, with a Position column in units of \(\mathrm{Mpc}/h\), a Velocity columns in units of km/s, and a VelocityOffset column in units of \(\mathrm{Mpc}/h\). The VelocityOffset is a convenience function for adding redshift-space distortions (see Adding Redshift-space Distortions), such that RSD can be added using:

line_of_sight = [0,0,1]
src['Position'] = src['Position'] + src['VelocityOffset'] * line_of_sight


For examples using log-normal mocks, see the Log-normal Mocks recipe in The Cookbook.

Halo Occupation Distribution Mocks

nbodykit includes functionality to generate mock galaxy catalogs using the Halo Occupation Distribution (HOD) technique via the HODCatalog class. The HOD technique populates a catalog of halos with galaxies based on a functional form for the probability that a halo of mass \(M\) hosts \(N\) objects, \(P(N|M)\). The functional form of the HOD used by HODCatalog is the form used in Zheng et al 2007. The average number of galaxies in a halo of mass \(M\) is

\[\langle N_\mathrm{gal}(M) \rangle = N_\mathrm{cen}(M) \left [ 1 + N_\mathrm{sat}(M) \right],\]

where the occupation functions for centrals and satellites are given by

\[ \begin{align}\begin{aligned}N_\mathrm{cen}(M) &= \frac{1}{2} \left (1 + \mathrm{erf} \left[ \frac{\log_{10}M - \log_{10}M_\mathrm{min}}{\sigma_{\log_{10}M}} \right] \right),\\N_\mathrm{sat}(M) &= \left ( \frac{M - M_0}{M_1} \right )^\alpha.\end{aligned}\end{align} \]

This HOD parametrization has 5 parameters, which can be summarized as:

Parameter Name Default Description
\(\log_{10}M_\mathrm{min}\) logMmin 13.031 Minimum mass required for a halo to host a central galaxy
\(\sigma_{\log_{10}M}\) sigma_logM 0.38 Rate of transition from \(N_\mathrm{cen}=0\) to \(N_\mathrm{cen}=1\)
\(\alpha\) alpha 0.76 Power law slope of the relation between halo mass and \(N_\mathrm{sat}\)
\(\log_{10}M_0\) logM0 13.27 Low-mass cutoff in \(N_\mathrm{sat}\)
\(\log_{10}M_1\) logM1 14.08 Characteristic halo mass where \(N_\mathrm{sat}\) begins to assume a power law form

The default values of the HOD parameters are taken from Reid et al. 2014.

This form of the HOD clustering description assumes the galaxy – halo connection depends only on the halo mass. Thus, given a catalog of halo objects, with associated mass values, users can quickly generate realistic galaxy catalogs using this class.

Interfacing with halotools via a HaloCatalog

Internally, the HODCatalog class uses the halotools package to perform the halo population step. For further details, see the documentation for halotools.empirical_models.Zheng07Cens and halotools.empirical_models.Zheng07Sats, as well as this tutorial on the Zheng 07 HOD model.

The catalog of halos input to the HODCatalog class must be of type halotools.sim_manager.UserSuppliedHaloCatalog, the tabular data format preferred by halotools. nbodykit includes the HaloCatalog class in order to interface nicely with halotools. In particular, this catalog object includes a to_halotools() function to create a UserSuppliedHaloCatalog from the data columns in the HaloCatalog object.

Given a CatalogSource object, the HaloCatalog object interprets the objects as halos, using a specified redshift, cosmology, and mass definition, to add several analytic columns to the catalog, including Radius() and Concentration().

For example, below we generate uniform particles in a box and then interpret them as halos by specifying a redshift and cosmology:

In [3]:
from nbodykit.lab import HaloCatalog, cosmology

# uniform objects in a box
cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)

# add a Mass column to the objects
cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))

# initialize the halos
halos = HaloCatalog(cat, cosmo=cosmology.Planck15, redshift=0., mdef='vir', position='Position', velocity='Velocity', mass='Mass')

['Concentration', 'Mass', 'Position', 'Radius', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight']

And using the to_halotools() function, we can create the halotools.sim_manager.UserSuppliedHaloCatalog object needed to initialize the HODCatalog object.

In [4]:
halocat = halos.to_halotools()


<halotools.sim_manager.user_supplied_halo_catalog.UserSuppliedHaloCatalog object at 0x112c9f3c8>
    halo_x         halo_y          halo_z     ... halo_upid halo_local_id
-------------- -------------- --------------- ... --------- -------------
0.454701052221 0.832632032315 0.0690513408736 ...      -1.0             0
 0.31944725104 0.485187192976  0.298261631365 ...      -1.0             1
0.212426273992  0.16674683797  0.176221306605 ...      -1.0             2
0.318545243222 0.349067663211   0.99925086034 ...      -1.0             3
0.506684614224 0.237059489969  0.389253212345 ...      -1.0             4
0.472737136136 0.856967930036  0.973742710042 ...      -1.0             5
0.459341161505 0.952380079848  0.456999189045 ...      -1.0             6
0.924477185104 0.884377985707 0.0908425359374 ...      -1.0             7
0.455365520004 0.808608669796  0.827823954099 ...      -1.0             8
0.523110562052 0.100086947087  0.700826727159 ...      -1.0             9


  • The units of the halo position, velocity, and mass input to HaloCatalog are assumed to be \(\mathrm{Mpc}/h\), km/s, and \(M_\odot/h\), respectively. These units are necessary to interface with halotools.
  • The mass definition input to HaloCatalog can be “vir” to use virial masses, or an overdensity factor with respect to the critical or mean density, i.e. “200c”, “500c”, or “200m”, “500m”.
  • If using the built-in friends-of-friends (FOF) finder class, FOF, to identify halos, the user can use the to_halos() function to directly produce a HaloCatalog from the result of running the FOF algorithm.
  • By default, the halo concentration values stored in the Concentration column of a HaloCatalog object are generated using the input mass definition and the analytic formulas from Dutton and Maccio 2014. Users can overwrite this column with their own values if they wish to use custom concentration values when generating HOD catalogs.

The HODCatalog Class

Users can initialize the HOD catalog directly from the UserSuppliedHaloCatalog object and the desired HOD parameters. The HODCatalog object will include all of the columns from the UserSuppliedHaloCatalog object, with the usual columns Position, Velocity, and VelocityOffset for the generated galaxies. The additional columns are:

  1. conc_NFWmodel: the concentration of the halo
  2. gal_type: the galaxy type, 0 for centrals and 1 for satellites
  3. halo_id: the global ID of the halo that this galaxy belongs to, between 0 and csize
  4. halo_local_id: the local ID of the halo that this galaxy belongs to, between 0 and size
  5. halo_mvir: the halo mass, in units of \(M_\odot/h\)
  6. halo_nfw_conc: alias of conc_NFWmodel
  7. halo_num_centrals: the number of centrals that this halo hosts, either 0 or 1
  8. halo_num_satellites: the number of satellites that this halo hosts
  9. halo_rvir: the halo radius, in units of \(\mathrm{Mpc}/h\)
  10. halo_upid: equal to -1; should be ignored by the user
  11. halo_vx, halo_vy, halo_vz: the three components of the halo velocity, in units of km/s
  12. halo_x, halo_y, halo_z: the three components of the halo position, in units of \(\mathrm{Mpc}/h\)
  13. host_centric_distance: the distance from this galaxy to the center of the halo, in units of \(\mathrm{Mpc}/h\)
  14. vx, vy, vz: the three components of the galaxy velocity, equal to Velocity, in units of km/s
  15. x,y,z: the three components of the galaxy position, equal to Position, in units of \(\mathrm{Mpc}/h\)

Below we initialize a HODCatalog of galaxies and compute the number of centrals and satellites in the catalog, using the gal_type column.

In [5]:
from nbodykit.lab import HODCatalog
hod = HODCatalog(halocat, alpha=0.5, sigma_logM=0.40, seed=42)

print("total number of HOD galaxies = ", hod.csize)

print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies =  127
['Position', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight', 'conc_NFWmodel', 'gal_type', 'halo_id', 'halo_local_id', 'halo_mvir', 'halo_nfw_conc', 'halo_num_centrals', 'halo_num_satellites', 'halo_rvir', 'halo_upid', 'halo_vx', 'halo_vy', 'halo_vz', 'halo_x', 'halo_y', 'halo_z', 'host_centric_distance', 'vx', 'vy', 'vz', 'x', 'y', 'z']
number of centrals =  66
number of satellites =  61


  • The HOD population step requires halo concentration. If the user wishes to uses custom concentration values, the UserSuppliedHaloCatalog table should contain a halo_nfw_conc column. Otherwise, the analytic prescriptions from Dutton and Maccio 2014 are used.

Repopulating a HOD Catalog

We can also quickly repopulate a HOD catalog in place, generating a new set of galaxies for the same set of halos, either changing the random seed or the HOD parameters. For example,

In [6]:
# repopulate, just changing the random seed
print("total number of HOD galaxies = ", hod.csize)

print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))

# re-populate with new parameters
hod.repopulate(logM0=13.2, logM1=14.5)
print("total number of HOD galaxies = ", hod.csize)

print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
total number of HOD galaxies =  118
number of centrals =  60
number of satellites =  58
total number of HOD galaxies =  101
number of centrals =  66
number of satellites =  35


For examples using HOD mocks, see the Halo Occupation Distribution (HOD) Mocks recipe in The Cookbook.

Using a Custom HOD Model

Users can implement catalogs that use custom HOD modeling by subclassing the HODBase class. This base class is abstract, and subclasses must implement the __makemodel__() function. This function returns a HodModelFactory object, which is the halotools object responsible for supporting custom HOD models. For more information on designing your own HOD model using halotools, see this series of halotools tutorials.