In this section, we provide a brief overview of the major functionality of nbodykit, as well as an introduction to some of the technical jargon needed to get up and running quickly. We try to familiarize the user with the various aspects of nbodykit needed to take full advantage of nbodykit’s computing power. This section also serves as a nice outline of the documentation, with links to more detailed descriptions included throughout.
lab
framework¶A core design goal of nbodykit is maintaining an interactive user experience, allowing the user to quickly experiment and play around with data sets and statistics, while still leveraging the power of parallel processing when necessary. Motivated by the power of Jupyter notebooks, we adopt a “lab” framework for nbodykit, where all of the necessary data containers and algorithms can be imported from a single module:
from nbodykit.lab import *
[insert cool science here]
See the documentation for nbodykit.lab
for a full list of the
imported members in this module.
With all of the necessary tools now in hand, the user can easily load a data set, compute statistics of that data via one of the built-in algorithms, and save the results in just a few lines. The end result is a reproducible scientific result, generated from clear and concise code that flows from step to step.
We use the logging
module throughout nbodykit to provide the user
with output as scripts progress. This is especially helpful when problems
are encountered when using nbodykit in parallel. Users can turn on logging
via the nbodykit.setup_logging()
function, optionally providing
the “debug” argument to increase the
logging level.
We typically begin our nbodykit scripts using:
from nbodykit.lab import *
from nbodykit import setup_logging
setup_logging() # log output to stdout
The nbodykit package is fully parallelized using the Python
bindings of the Message Passage Interface (MPI) available in
mpi4py
. While we aim to hide most of the complexities of MPI from
the top-level user interface, it is helpful to know some basic aspects of the
MPI framework for understanding how nbodykit works to compute its results. If
you are unfamiliar with MPI, a good place to start is the documentation for
mpi4py. Briefly,
MPI allows nbodykit to use a specified number of CPUs, which work independently
to achieve a common goal and pass messages back and forth to coordinate their
work.
We provide a more in-depth discussion of the key MPI-related features of nbodykit in the Parallel Computation with nbodykit section. This section also includes a guide on how to execute nbodykit scripts in parallel using MPI.
nbodykit includes a cosmology calculator Cosmology
,
as well as several built-in cosmologies,
in the nbodykit.cosmology
module. This class uses
the CLASS CMB Boltzmann code for the majority of its
cosmology calculations by using the Python binding of the CLASS code provided
by the classylss
package. As such, the syntax used in the
Cosmology
class largely follows that
of the CLASS code.
To best interface with CLASS, and avoid unnecessary confusion, nbodykit assumes a default set of units:
We choose to define quantities with respect to the dimensionless Hubble parameter
\(h\) when appropriate. Users should always take care when loading data to
verify that the units follow the conventions defined here. Also, note that
when simulated data is generated by nbodykit, e.g.,
in HODCatalog
, the units of quantities
such as position and velocity will follow the above conventions.
The nbodykit.cosmology
module also includes functionality
for computing the theoretical linear power spectrum (using CLASS or analytic
transfer functions), correlation functions, and the Zel’dovich power
spectrum. See the Cosmological Calculations section for more details.
The algorithms in nbodykit interface with user data in two main ways: “object catalogs” and “mesh fields”.
Catalogs hold columns of data for a set of discrete objects, typically galaxies. The columns typically include the three-dimensional positions of the objects, as well as properties of the object, e.g., mass, luminosity, etc. The catalog container represents the attributes of the objects as columns in the catalog. A catalog object behaves much like a structured NumPy array, with a fixed size and named data type fields, except that the data is provided by the random-access interface.
Catalog objects are subclasses of the CatalogSource
base class and live in the nbodykit.source.catalog
module.
We provide several different subclasses that are capable of loading data
from a variety of file formats on disk. We also provide catalog classes that
can generate a simulated set of particles. Users can find a more in depth
discussion of catalog data in Discrete Data Catalogs. For a full list
of available catalogs, see the API docs.
The mesh container is fundamentally different from the catalog object. It stores a discrete representation of a continuous fluid field on a uniform mesh. The array values on the mesh are generated via a process referred to as “painting” in nbodykit. During the painting step, the positions of the discrete objects in a catalog are interpolated onto a uniform mesh. The fluid field on the mesh is often the density field, as sampled by the discrete galaxy positions.
Mesh objects are subclasses of the MeshSource
base class and live in the nbodykit.source.mesh
module.
We provide subclasses that are capable of loading mesh data
from disk or from a Numpy array, as well as classes that can generate simulated
meshes.
Furthermore, any catalog object can be converted to a mesh object
via the to_mesh()
function. This
function returns a CatalogMesh
object,
which is a view of a CatalogSource
as a MeshSource
.
A CatalogMesh
“knows” how to generate
the mesh data from the catalog data, i.e., the user has specified the desired
size of the mesh, etc. using the to_mesh()
function.
The Data on a Mesh section describes mesh objects in more detail. In particular, more details regarding the creation of mesh objects from catalogs can be found in Creating a Mesh. See the API docs for a full list of available meshes.
The design of nbodykit focuses on a component-based approach. The components are exposed to the Python language as a set of classes and interfaces, and users can combine these components to construct complex applications. This design differs from the more commonly used alternative in cosmology software, which is a monolithic application controlled by a single configuration file (e.g., as in CLASS, CAMB, Gadget). From experience, we have found that a component-based approach offers the user greater freedom and flexibility to build complex applications with nbodykit.
In the figure above, we diagram the important interfaces and components of nbodykit. There are a few items worth highlighting in more details:
Catalog: as discussed in the previous section,
catalog objects derive from the CatalogSource
class and hold information about discrete objects.
Catalogs also implement a random-read interface that allows the user
to access individual columns of data. The random-read nature of the
column access makes use of the high throughput of a parallel file
system when nbodykit is executed in parallel.
However, the backend of the random-read interface does not have to be a file on disk at
all. As an example, the ArrayCatalog
simply converts a dictionary or a NumPy array object to a CatalogSource
.
Mesh: as discussed in the previous section,
mesh objects derive from the MeshSource
class
and store a discrete representation of a continuous quantity on a uniform mesh.
These objects provide a “paintable” interface provided to the user via the
paint()
function. Calling this function
re-samples the fluid field represented by the mesh object to a
distributed three-dimensional array (returning either a
RealField
or ComplexField
,
as implemented by the pmesh
package). See the Dealing with Data on a Mesh
for more details.
Serialization: most objects in nbodykit are serializable via a
save()
function. For a more in-depth discussion of serialization,
see Saving your Results.
Algorithm classes not only save the result of the
algorithm but also input parameters and meta-data stored in the attrs
dictionary. Algorithms typically implement both a save()
and load()
function, such that the algorithm result can be
de-serialized into an object of the same type. For example, the
result of the FFTPower
algorithm
can be serialized with the
save()
function
and the algorithm re-initialized with the
load()
function.
The two main data containers, catalogs and meshes, can be serialized using
nbodykit’s intrinsic format which relies on bigfile
. The relevant
functions are save()
for catalogs and save()
for meshes.
These serialized results can later be loaded from disk by nbodykit as a
BigFileCatalog
or
BigFileMesh
object.
dask
¶The data columns of catalog objects are stored as dask
arrays rather
than the similar, more traditional NumPy arrays. Users unfamiliar with the
dask
package should start with the On Demand IO via dask.array section of the docs.
Briefly, there are two main features to keep in mind when dealing with
dask
arrays:
1. Operations on a dask array are not evaluated immediately, as is the case for NumPy
arrays, but instead stored internally in a task graph. Thus, the usual array
manipulations on dask
arrays are nearly immediate.
2. A dask
array can be evaluated, returning a NumPy array, via a call
the compute()
function of the dask
array. This operation can be
time-consuming – it evaluates all of the operations in the array’s task graph.
In most situations, users should manipulate catalog columns as they would NumPy
arrays and allow the nbodykit internals to call the necessary compute()
function to get the final result. When possible, users should opt to use the
functions defined in the dask.array
module instead of the equivalent
function defined in numpy
. The dask.array
module is designed
to provide the same functionality as the numpy
package but for dask
arrays.
nbodykit aims to implement a canonical set of algorithms in the field of large-scale structure. The goal is to provide open source, state-of-the-art implementations of the most well-known algorithms used in the analysis of large-scale structure data. We have a wide and growing range of algorithms implemented so far. Briefly, nbodykit includes functionality for:
For a full list of the available algorithms, see this section of the docs. We also aim to provide examples of many of the algorithms in The Cookbook.
The algorithms in nbodykit couple to data through the catalog and mesh objects described in the previous sections. Algorithms in nbodykit are implemented as Python classes. When the class is initialized, the algorithm is run and the returned instance holds the corresponding results via attributes. The specific attributes that hold the results vary from algorithm to algorithm – we direct users to the API docs to determine the specifics for a particular algorithm. Furthermore, the algorithm result can be serialized to disk for archiving, We also ensure that the appropriate meta-data is serialized to disk in order to sufficiently describe the input parameters for reproducibility.
As open source software, we hope community contributions will help to maximize the utility of the nbodykit package for its users. We believe community contributions and review can help increase scientific productivity for all researchers. If your favorite algorithm isn’t yet implemented, we encourage contributions and feature requests from the community (see our contributing guidelines).
We’ve created a cookbook of recipes for users to learn nbodykit by example. These recipes are designed to illustrate interesting and common uses of nbodykit for users to learn from. The goal is to have working examples for most of the algorithms in nbodykit, as well as some of the more common data tasks.
The recipes are provided as Jupyter notebooks. Each notebook is available for download by clicking the “Source” link in the navigation bar at the top of the page.
We welcome contributions of new recipes! See our see our contributing guidelines.
If you’ve run in to problems with nbodykit, do not hesitate to get in touch with us. See our Contact and Support section for details on how to best contact us.
User contributions are also very welcome! Please see our see our contributing guidelines if you’ve like to help grow the nbodykit project!