a massively parallel, large-scale structure toolkit

nbodykit is an open source project written in Python that provides a set of state-of-the-art, large-scale structure algorithms useful in the analysis of cosmological datasets from N-body simulations and observational surveys. All algorithms are massively parallel and run using the Message Passing Interface (MPI).

Driven by the optimism regarding the abundance and availability of large-scale computing resources in the future, the development of nbodykit distinguishes itself from other similar software packages (i.e., nbodyshop, pynbody, yt, xi) by focusing on:

  • a unified treatment of simulation and observational datasets by insulating algorithms from data containers
  • support for a wide variety of data formats, as well as large volumes of data
  • the ability to reduce wall-clock time by scaling to thousands of cores
  • deployment and availability on large, supercomputing facilities
  • an interactive user interface that performs as well in a Jupyter notebook as on a supercomputing machine

Learning By Example

For users who wish to dive right in, an interactive environment containing our cookbook recipes is available to users via the BinderHub service. Just click the launch button below to get started!


See The Cookbook for descriptions of the various notebook recipes.

Getting nbodykit

To get up and running with your own copy of nbodykit, please follow the installation instructions. nbodykit is currently supported on macOS and Linux architectures. The recommended installation method uses the Anaconda Python distribution.

nbodykit is compatible with Python versions 2.7, 3.5, and 3.6, and the source code is publicly available at https://github.com/bccp/nbodykit.

A 1 minute introduction to nbodykit

To start, we initialize the nbodykit “lab”:

from nbodykit.lab import *

There are two core data structures in nbodykit: catalogs and meshes. These represent the two main ways astronomers interact with data in large-scale structure analysis. Catalogs hold information describing a set of discrete objects, storing the data in columns. nbodykit includes functionality for initializing catalogs from a variety of file formats as well as more advanced techniques for generating catalogs of simulated particles.

Below, we create a very simple catalog of uniformly distributed particles in a box of side length \(L = 1 \ h^{-1} \mathrm{Mpc}\):

catalog = UniformCatalog(nbar=100, BoxSize=1.0)

Catalogs have a fixed size and a set of columns describing the particle data. In this case, our catalog has “Position” and “Velocity” columns. Users can easily manipulate the existing column data or add new columns:

BoxSize = 2500.
catalog['Position'] *= BoxSize # re-normalize units of Position
catalog['Mass'] = 10**(numpy.random(12, 15, size=len(catalog))) # add some random mass values

We can generate a representation of the density field on a mesh using our catalog of objects. Here, we interpolate the particles onto a mesh of size \(64^3\):

mesh = catalog.to_mesh(Nmesh=64, BoxSize=BoxSize)

We can save our mesh to disk to later re-load using nbodykit:


or preview a low-resolution, 2D projection of the mesh to make sure everythings looks as expected:

import matplotlib.pyplot as plt
plt.imshow(mesh.preview(axes=[0,1], Nmesh=32))

Finally, we can feed our density field mesh in to one of the nbodykit algorithms. For example, below we use the FFTPower algorithm to compute the power spectrum \(P(k,\mu)\) of the density mesh using a fast Fourier transform via

result = FFTPower(mesh, Nmu=5)

with the measured power stored as the power attribute of the result variable. The algorithm result and meta-data, input parameters, etc. can then be saved to disk as a JSON file:


It is important to remember that nbodykit is fully parallelized using MPI. This means that the above code snippets can be excuted in a Jupyter notebook with only a single CPU or using a standalone Python script with an arbitrary number of MPI workers. We aim to hide as much of the parallel abstraction from users as possible. When executing in parallel, data will automatically be divided amongst the available MPI workers, and each worker computes its own smaller portion of the algorithm result before finally these calculations are combined into the final result.

Getting Started

We also provide detailed overviews of the two main data containers in nbodykit, catalogs and meshes, and we walk through the necessary background information for each of the available algorithms in nbodykit. The main areas of the documentation can be broken down into the following sub-sections: