nbodykit includes several methods for generating mock catalogs, with varying
levels of sophistication. These `CatalogSource`

objects allow users to create catalogs of objects at run time and include:

nbodykit includes two subclasses of
`CatalogSource`

that
generate particles randomly in a box: `RandomCatalog`

and `UniformCatalog`

. While these catalogs do not produce
realistic cosmological distributions of objects, they are especially useful
for generating catalogs quickly and for testing purposes.

`RandomCatalog`

¶The `RandomCatalog`

class includes a random number generator
with the functionality of `numpy.random.RandomState`

that generates
random numbers in parallel and in a manner that is independent of the number
of MPI ranks being used. This property is especially useful for running
reproducible tests where the number of CPUs might vary. The random number
generator is stored as the `rng`

attribute.
Users can use this random number generator to add columns to the catalog,
using the syntax to add columns.

For example,

```
In [1]: from nbodykit.lab import RandomCatalog
In [2]: import numpy
# initialize a catalog with only the default columns
In [3]: cat = RandomCatalog(csize=100) # collective size of 100
In [4]: print("columns = ", cat.columns) # only the default columns present
columns = ['Selection', 'Value', 'Weight']
# add mass uniformly distributed in log10
In [5]: cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))
# add normally distributed velocity
In [6]: cat['Velocity'] = cat.rng.normal(loc=0, scale=500, size=cat.size)
In [7]: print(cat.columns)
['Mass', 'Selection', 'Value', 'Velocity', 'Weight']
```

**Caveats**

- For a list of the full functionality of the
`rng`

attribute, please see the API documentation for`MPIRandomState`

. - When adding columns, the new column must have the same length as the local
size of the catalog, as specified by the
`size`

attribute. Most functions of the`rng`

attribute accept the`size`

keyword to generate an array of the correct size.

`UniformCatalog`

¶The `UniformCatalog`

is a subclass of
`RandomCatalog`

that includes `Position`

and `Velocity`

columns that are uniformly distributed. The positions of the particles are
uniformly distributed between zero and the size of the box (as specified by
the user), with an input number density. The velocities are also
uniformly distributed but on a scale that is 1% of the size of the box.

For example,

```
In [8]: from nbodykit.lab import UniformCatalog
In [9]: cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
In [10]: print("columns = ", cat.columns)
columns = ['Position', 'Selection', 'Value', 'Velocity', 'Weight']
# min must be greater than 0
In [11]: print("position minimum = ", cat.compute(cat['Position'].min()))
position minimum = 0.000156849663905
# max must be less than 1.0
In [12]: print("position maximum = ", cat.compute(cat['Position'].max()))
position maximum = 0.99925086034
# min must be greater than 0
In [13]: print("velocity minimum = ", cat.compute(cat['Velocity'].min()))
velocity minimum = 1.56849663905e-06
# max must be less than 0.01
In [14]: print("velocity maximum = ", cat.compute(cat['Velocity'].max()))
velocity maximum = 0.0099925086034
```

Note that because `UniformCatalog`

is a subclass of
`RandomCatalog`

users can also use the
`rng`

attribute to add new columns to
a `UniformCatalog`

object.

The `LogNormalCatalog`

offers a more realistic
approximation of cosmological large-scale structure. The class
generates a set of objects by Poisson sampling a log-normal density field,
using the Zel’dovich approximation to model non-linear evolution.
Given a linear power spectrum function, redshift, and linear bias
supplied by the user, this class performs the following steps:

**Generate Gaussian initial conditions**

First, a Gaussian overdensity field \(\delta_L(\vk)\) is generated in Fourier space with a power spectrum given by a function specified by the user. We also generate linear velocity fields from the overdensity field in Fourier space, using

\[\vv(\vk) = i f a H \delta_L(\vk) \frac{\vk}{k^2}.\]

where \(f\) is the logarithmic growth rate, \(a\) is the scale factor, and \(H\) is the Hubble parameter at \(a\). Note that bold variables reflect vector quantities.

Finally, we Fourier transform \(\delta_L(\vk)\) and \(\vv(\vk)\) to configuration space. These fields serve as the “initial conditions”. They will be evolved forward in time and Poisson sampled to create the final catalog.

**Perform the log-normal transformation**

Next, we perform a log-normal transformation on the density field \(\delta\). As first discussed in Coles and Jones 1991, the distribution of galaxies on intermediate to large scales can be well-approximated by a log-normal distribution. An additional useful property of a log-normal field is that it follows the natural constraint \(\delta(\vx) \ge -1\) of density contrasts by definition. This property does not generally hold true for Gaussian realizations of the overdensity field.

The new, transformed density field is given by

\[\delta(\vx) = e^{-\sigma^2 + b_L \delta_L(\vx)} - 1,\]

where the normalization factor \(\sigma^2\) ensures that the mean of
the \(\delta(\vx)\) vanishes and \(b_L\) is the Lagrangian bias
factor, which is related to the final, linear bias as \(b_L = b_1 - 1\).
Here, \(b_1\) is the value input by the user as the `bias`

keyword
to the `LogNormalCatalog`

class.

**Poisson sample the density field**

We then generate discrete positions of objects by Poisson sampling the
overdensity field in each cell of the mesh. We assign each object the
velocity of the mesh cell that it is located in, and objects are placed randomly
inside their cells. The desired number density of objects in the box is
specified by the user as the `nbar`

parameter to the
`LogNormalCatalog`

class.

**Apply the Zel’dovich approximation**

Finally, we evolve the overdensity field according to the Zel’dovich approximation, which is 1st order Lagrangian perturbation theory. To do this, we move the positions of each object according to the linear velocity field,

\[\vr(\vx) = \vr_0(\vx) + \frac{\vv(\vx)}{f a H},\]

where \(\vr(\vx)\) is the final position of the objects, \(\vr_0(\vx)\) is the initial position, and \(\vv(\vx)\) is the velocity assigned in the previous step.

After this step, we have a catalog of discrete objects, with a `Position`

column in units of \(\mathrm{Mpc}/h\), a `Velocity`

columns in units
of km/s, and a `VelocityOffset`

column in units of \(\mathrm{Mpc}/h\).
The `VelocityOffset`

is a convenience function for adding redshift-space
distortions (see Adding Redshift-space Distortions), such that RSD can be added using:

```
line_of_sight = [0,0,1]
src['Position'] = src['Position'] + src['VelocityOffset'] * line_of_sight
```

Note

For examples using log-normal mocks, see the Log-normal Mocks recipe in The Cookbook.

nbodykit includes functionality to generate mock galaxy catalogs using the
Halo Occupation Distribution (HOD) technique via the `HODCatalog`

class. The HOD technique populates a catalog of halos with galaxies based on
a functional form for the probability that a halo of mass \(M\) hosts
\(N\) objects, \(P(N|M)\). The functional form of the HOD used by
`HODCatalog`

is the form used in Zheng et al 2007.
The average number of galaxies in a halo of mass \(M\) is

\[\langle N_\mathrm{gal}(M) \rangle = N_\mathrm{cen}(M) \left [ 1 + N_\mathrm{sat}(M) \right],\]

where the occupation functions for centrals and satellites are given by

\[ \begin{align}\begin{aligned}N_\mathrm{cen}(M) &= \frac{1}{2} \left (1 + \mathrm{erf}
\left[ \frac{\log_{10}M - \log_{10}M_\mathrm{min}}{\sigma_{\log_{10}M}}
\right]
\right),\\N_\mathrm{sat}(M) &= \left ( \frac{M - M_0}{M_1} \right )^\alpha.\end{aligned}\end{align} \]

This HOD parametrization has 5 parameters, which can be summarized as:

Parameter |
Name |
Default |
Description |

\(\log_{10}M_\mathrm{min}\) | `logMmin` |
13.031 | Minimum mass required for a halo to host a central galaxy |

\(\sigma_{\log_{10}M}\) | `sigma_logM` |
0.38 | Rate of transition from \(N_\mathrm{cen}=0\) to \(N_\mathrm{cen}=1\) |

\(\alpha\) | `alpha` |
0.76 | Power law slope of the relation between halo mass and \(N_\mathrm{sat}\) |

\(\log_{10}M_0\) | `logM0` |
13.27 | Low-mass cutoff in \(N_\mathrm{sat}\) |

\(\log_{10}M_1\) | `logM1` |
14.08 | Characteristic halo mass where \(N_\mathrm{sat}\) begins to assume a power law form |

The default values of the HOD parameters are taken from Reid et al. 2014.

This form of the HOD clustering description assumes the galaxy – halo connection depends only on the halo mass. Thus, given a catalog of halo objects, with associated mass values, users can quickly generate realistic galaxy catalogs using this class.

`halotools`

via a `HaloCatalog`

¶Internally, the `HODCatalog`

class uses the `halotools`

package to perform the halo population step. For further details,
see the documentation for `halotools.empirical_models.Zheng07Cens`

and
`halotools.empirical_models.Zheng07Sats`

, as well as
this tutorial on the Zheng 07 HOD model.

The catalog of halos input to the `HODCatalog`

class must be of type
`halotools.sim_manager.UserSuppliedHaloCatalog`

, the tabular
data format preferred by `halotools`

. nbodykit includes the
`HaloCatalog`

class in order to interface nicely with
`halotools`

. In particular, this catalog object
includes a `to_halotools()`

function to create a
`UserSuppliedHaloCatalog`

from the data columns
in the `HaloCatalog`

object.

Given a `CatalogSource`

object, the `HaloCatalog`

object
interprets the objects as halos, using a specified redshift, cosmology,
and mass definition, to add several analytic columns to the catalog, including
`Radius()`

and `Concentration()`

.

For example, below we generate uniform particles in a box and then interpret them as halos by specifying a redshift and cosmology:

```
In [15]: from nbodykit.lab import HaloCatalog, cosmology
# uniform objects in a box
In [16]: cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
# add a Mass column to the objects
In [17]: cat['Mass'] = 10**(cat.rng.uniform(12, 15, size=cat.size))
# initialize the halos
In [18]: halos = HaloCatalog(cat, cosmo=cosmology.Planck15, redshift=0., mdef='vir', position='Position', velocity='Velocity', mass='Mass')
In [19]: print(halos.columns)
['Concentration', 'Mass', 'Position', 'Radius', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight']
```

And using the `to_halotools()`

function, we can create
the `halotools.sim_manager.UserSuppliedHaloCatalog`

object needed
to initialize the `HODCatalog`

object.

```
In [20]: halocat = halos.to_halotools()
In [21]: print(halocat)
<halotools.sim_manager.user_supplied_halo_catalog.UserSuppliedHaloCatalog object at 0x7fd572fc2208>
In [22]: print(halocat.halo_table[:10])
halo_x halo_vz halo_y ... halo_local_id halo_upid
-------------- ----------------- -------------- ... ------------- ---------
0.454701052221 0.000690513408736 0.832632032315 ... 0 -1.0
0.31944725104 0.00298261631365 0.485187192976 ... 1 -1.0
0.212426273992 0.00176221306605 0.16674683797 ... 2 -1.0
0.318545243222 0.0099925086034 0.349067663211 ... 3 -1.0
0.506684614224 0.00389253212345 0.237059489969 ... 4 -1.0
0.472737136136 0.00973742710042 0.856967930036 ... 5 -1.0
0.459341161505 0.00456999189045 0.952380079848 ... 6 -1.0
0.924477185104 0.000908425359374 0.884377985707 ... 7 -1.0
0.455365520004 0.00827823954099 0.808608669796 ... 8 -1.0
0.523110562052 0.00700826727159 0.100086947087 ... 9 -1.0
```

**Caveats**

- The units of the halo position, velocity, and mass input to
`HaloCatalog`

are assumed to be \(\mathrm{Mpc}/h\), km/s, and \(M_\odot/h\), respectively. These units are necessary to interface with`halotools`

. - The mass definition input to
`HaloCatalog`

can be “vir” to use virial masses, or an overdensity factor with respect to the critical or mean density, i.e. “200c”, “500c”, or “200m”, “500m”. - If using the built-in friends-of-friends (FOF) finder class,
`FOF`

, to identify halos, the user can use the`to_halos()`

function to directly produce a`HaloCatalog`

from the result of running the FOF algorithm. - By default, the halo concentration values stored in the
`Concentration`

column of a`HaloCatalog`

object are generated using the input mass definition and the analytic formulas from Dutton and Maccio 2014. Users can overwrite this column with their own values if they wish to use custom concentration values when generating HOD catalogs.

`HODCatalog`

Class¶Users can initialize the HOD catalog directly from the
`UserSuppliedHaloCatalog`

object and the desired
HOD parameters. The `HODCatalog`

object will include all of the
columns from the `UserSuppliedHaloCatalog`

object,
with the usual columns `Position`

, `Velocity`

, and `VelocityOffset`

for the generated galaxies. The additional columns are:

**conc_NFWmodel**: the concentration of the halo**gal_type**: the galaxy type, 0 for centrals and 1 for satellites**halo_id**: the global ID of the halo that this galaxy belongs to, between 0 and`csize`

**halo_local_id**: the local ID of the halo that this galaxy belongs to, between 0 and`size`

**halo_mvir**: the halo mass, in units of \(M_\odot/h\)**halo_nfw_conc**: alias of`conc_NFWmodel`

**halo_num_centrals**: the number of centrals that this halo hosts, either 0 or 1**halo_num_satellites**: the number of satellites that this halo hosts**halo_rvir**: the halo radius, in units of \(\mathrm{Mpc}/h\)**halo_upid**: equal to -1; should be ignored by the user**halo_vx, halo_vy, halo_vz**: the three components of the halo velocity, in units of km/s**halo_x, halo_y, halo_z**: the three components of the halo position, in units of \(\mathrm{Mpc}/h\)**host_centric_distance**: the distance from this galaxy to the center of the halo, in units of \(\mathrm{Mpc}/h\)**vx, vy, vz**: the three components of the galaxy velocity, equal to`Velocity`

, in units of km/s**x,y,z**: the three components of the galaxy position, equal to`Position`

, in units of \(\mathrm{Mpc}/h\)

Below we initialize a `HODCatalog`

of galaxies and compute the
number of centrals and satellites in the catalog, using the `gal_type`

column.

```
In [23]: from nbodykit.lab import HODCatalog
In [24]: hod = HODCatalog(halocat, alpha=0.5, sigma_logM=0.40, seed=42)
In [25]: print("total number of HOD galaxies = ", hod.csize)
total number of HOD galaxies = 127
In [26]: print(hod.columns)
['Position', 'Selection', 'Value', 'Velocity', 'VelocityOffset', 'Weight', 'conc_NFWmodel', 'gal_type', 'halo_id', 'halo_local_id', 'halo_mvir', 'halo_nfw_conc', 'halo_num_centrals', 'halo_num_satellites', 'halo_rvir', 'halo_upid', 'halo_vx', 'halo_vy', 'halo_vz', 'halo_x', 'halo_y', 'halo_z', 'host_centric_distance', 'vx', 'vy', 'vz', 'x', 'y', 'z']
In [27]: print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
number of centrals = 66
In [28]: print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
number of satellites = 61
```

**Caveats**

- The HOD population step requires halo concentration. If the user wishes to
uses custom concentration values, the
`UserSuppliedHaloCatalog`

table should contain a`halo_nfw_conc`

column. Otherwise, the analytic prescriptions from Dutton and Maccio 2014 are used.

We can also quickly repopulate a HOD catalog in place, generating a new set of galaxies for the same set of halos, either changing the random seed or the HOD parameters. For example,

```
# repopulate, just changing the random seed
In [29]: hod.repopulate(seed=84)
In [30]: print("total number of HOD galaxies = ", hod.csize)
total number of HOD galaxies = 118
In [31]: print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
number of centrals = 60
In [32]: print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
number of satellites = 58
# re-populate with new parameters
In [33]: hod.repopulate(logM0=13.2, logM1=14.5)
In [34]: print("total number of HOD galaxies = ", hod.csize)
total number of HOD galaxies = 114
In [35]: print("number of centrals = ", hod.compute((hod['gal_type']==0).sum()))
number of centrals = 65
In [36]: print("number of satellites = ", hod.compute((hod['gal_type']==1).sum()))
number of satellites = 49
```

Note

For examples using HOD mocks, see the Halo Occupation Distribution (HOD) Mocks recipe in The Cookbook.

Users can implement catalogs that use custom HOD modeling by subclassing
the `HODBase`

class. This base class is abstract, and subclasses must
implement the `__makemodel__()`

function. This function
returns a `HodModelFactory`

object, which is the `halotools`

object responsible for supporting
custom HOD models. For more information on designing your own HOD model
using `halotools`

, see
this series of halotools tutorials.