a massively parallel, large-scale structure toolkit¶
nbodykit is an open source project written in Python that provides a set of state-of-the-art, large-scale structure algorithms useful in the analysis of cosmological datasets from N-body simulations and observational surveys. All algorithms are massively parallel and run using the Message Passing Interface (MPI).
Driven by the optimism regarding the abundance and availability of large-scale computing resources in the future, the development of nbodykit distinguishes itself from other similar software packages (i.e., nbodyshop, pynbody, yt, xi) by focusing on:
a unified treatment of simulation and observational datasets by insulating algorithms from data containers
support for a wide variety of data formats, as well as large volumes of data
the ability to reduce wall-clock time by scaling to thousands of cores
deployment and availability on large, supercomputing facilities
an interactive user interface that performs as well in a Jupyter notebook as on a supercomputing machine
Learning By Example¶
For users who wish to dive right in, an interactive environment containing our cookbook recipes is available to users via the BinderHub service. Just click the launch button below to get started!
See The Cookbook for descriptions of the various notebook recipes.
Getting nbodykit¶
To get up and running with your own copy of nbodykit, please follow the installation instructions. nbodykit is currently supported on macOS and Linux architectures. The recommended installation method uses the Anaconda Python distribution.
nbodykit is compatible with Python versions 2.7, 3.5, and 3.6, and the source code is publicly available at https://github.com/bccp/nbodykit.
A 1 minute introduction to nbodykit¶
To start, we initialize the nbodykit “lab
”:
from nbodykit.lab import *
There are two core data structures in nbodykit: catalogs and meshes. These represent the two main ways astronomers interact with data in large-scale structure analysis. Catalogs hold information describing a set of discrete objects, storing the data in columns. nbodykit includes functionality for initializing catalogs from a variety of file formats as well as more advanced techniques for generating catalogs of simulated particles.
Below, we create a very simple catalog of uniformly distributed particles in a box of side length \(L = 1 \ h^{-1} \mathrm{Mpc}\):
catalog = UniformCatalog(nbar=100, BoxSize=1.0)
Catalogs have a fixed size and a set of columns describing the particle data. In this case, our catalog has “Position” and “Velocity” columns. Users can easily manipulate the existing column data or add new columns:
BoxSize = 2500.
catalog['Position'] *= BoxSize # re-normalize units of Position
catalog['Mass'] = 10**(numpy.random.uniform(12, 15, size=len(catalog))) # add some random mass values
We can generate a representation of the density field on a mesh using our catalog of objects. Here, we interpolate the particles onto a mesh of size \(64^3\):
mesh = catalog.to_mesh(Nmesh=64, BoxSize=BoxSize)
We can save our mesh to disk to later re-load using nbodykit:
mesh.save('mesh.bigfile')
or preview a low-resolution, 2D projection of the mesh to make sure everythings looks as expected:
import matplotlib.pyplot as plt
plt.imshow(mesh.preview(axes=[0,1], Nmesh=32))
Finally, we can feed our density field mesh in to one of the nbodykit algorithms.
For example, below we use the FFTPower
algorithm to
compute the power spectrum \(P(k,\mu)\) of the density
mesh using a fast Fourier transform via
result = FFTPower(mesh, Nmu=5)
with the measured power stored as the power
attribute of the
result
variable. The algorithm result and meta-data, input
parameters, etc. can then be saved to disk as a JSON file:
result.save("power-result.json")
It is important to remember that nbodykit is fully parallelized using MPI. This means that the above code snippets can be excuted in a Jupyter notebook with only a single CPU or using a standalone Python script with an arbitrary number of MPI workers. We aim to hide as much of the parallel abstraction from users as possible. When executing in parallel, data will automatically be divided amongst the available MPI workers, and each worker computes its own smaller portion of the algorithm result before finally these calculations are combined into the final result.
Getting Started¶
We also provide detailed overviews of the two main data containers in nbodykit, catalogs and meshes, and we walk through the necessary background information for each of the available algorithms in nbodykit. The main areas of the documentation can be broken down into the following sub-sections:
Introduction: an introduction to key nbodykit concepts and things to know
Cosmological Calculations: a guide to the cosmology-related functionality in nbodykit
Discrete Data Catalogs: a guide to dealing with catalogs of discrete data catalogs
Data on a Mesh: an overview of data on a discrete mesh
Getting Results: an introduction to the available algorithms, parallel computation, and saving/analyzing results