
a massively parallel large-scale structure toolkit¶
nbodykit is an open source project and Python package providing a set of algorithms useful in the analysis of cosmological datasets from N-body simulations and large-scale structure surveys.
Driven by the optimism regarding the abundance and availability of large-scale computing resources in the future, the development of nbodykit distinguishes itself from other similar software packages (i.e., nbodyshop, pynbody, yt, xi) by focusing on :
- a unified treatment of simulation and observational datasets by insulating algorithms from data containers
- reducing wall-clock time by scaling to thousands of cores
- deployment and availability on large, super computing facilities
All algorithms are parallel and run with Message Passing Interface (MPI).
For users using the NERSC super-computers, we provide a ready-to-use tarball of nbodykit and its dependencies; see Using nbodykit on NERSC for more details.
Documentation¶
Installation¶
Required dependencies¶
The well-established dependencies are:
- Python 2.7 or 3.4
- scipy, numpy : the foundations for scientific Python
- mpi4py : MPI for Python
- h5py : support for HDF5 files in Python
with a suite of additional tools:
- astropy : a community Python library for astronomy
- pfft-python : a Python binding of pfft, a massively parallel Fast Fourier Transform implementation with pencil domains
- pmesh : a particle mesh framework in Python
- kdcount : pair-counting and Friends-of-Friends clustering with KD-Tree
- bigfile : a reproducible, massively parallel IO library for hierarchical data
- MP-sort : massively parallel sorting
- sharedmem : in-node parallelism with fork and copy-on-write
Optional dependencies¶
For reading data using pandas¶
Instructions¶
The software is designed to be installed with the pip
utility like a regular
Python package. The first step on all supported platforms is to checkout the
source code via:
git clone http://github.com/bccp/nbodykit
cd nbodykit
Linux¶
The steps listed below are intended for a commodity Linux-based cluster (e.g., a Rocks cluster) or a Linux-based workstation / laptop.
To install the main nbodykit package, as well as the external dependencies listed above, into the default Python installation directory:
pip install -r requirements.txt
pip install -U --force --no-deps .
A different installation directory can be specified via the --user
or --root <dir>
options of the pip install
command.
Mac OS X¶
The autotools
software is needed on Mac:
sudo port install autoconf automake libtool
Using recent versions of MacPorts, we also need to tell mpicc
to use gcc
rather than the default clang
compiler, which doesn’t compile fftw
correctly
due to the lack of openmp
support. Additionally, the LDSHARED
environment variable must be explicitly set.
In bash, the installation command is:
export OMPI_CC=gcc
export LDSHARED="mpicc -bundle -undefined dynamic_lookup -DOMPI_IMPORTS"; pip install -r requirements.txt
pip install -U --force --no-deps .
Development Mode¶
nbodykit can be installed with the development mode (-e
) of pip
pip install -r requirements.txt -e .
In addition to the dependency packages, the ‘development’ installation of nbodykit may require a forced update from time to time:
pip install -U --force --no-deps -e .
It is sometimes required to manually remove the nbodykit
directory in
site-packages
, if the above command does not appear to update the installation
as expected.
Final Notes¶
The dependencies of nbodykit are not fully stable, thus we recommend updating
the external dependencies occassionally via the -U
option of pip install
.
Also, the --force
option ensures that the current sourced version is installed:
pip install -U -r requirements.txt
pip install -U --force --no-deps .
To confirm that nbodykit is working, we can type, in a interactive Python session:
import nbodykit
print(nbodykit)
import kdcount
print(kdcount)
import pmesh
print(pmesh)
Or try the scripts in the bin directory:
cd bin/
mpirun -n 4 python nbkit.py -h
To run the test suite after installing nbodykit, install py.test and
pytest-pipeline and run py.test nbodykit
from the base directory
of the source code:
pip install pytest pytest-pipeline
pytest nbodykit
Using nbodykit on NERSC¶
In this section, we give instructions for using the latest stable build of nbodykit on NERSC machines (Edison and Cori), which is provided ready-to-use and is recommended for first-time users. For more advanced users, we also provide instructions for performing active development of the source code on NERSC.
When using nbodykit on NERSC, we need to ensure that the Python environment is set up to work efficiently on the computing nodes. The default Python start-up time scales badly with the number of processes, so we employ the python-mpi-bcast tool to ensure fast and reliable start-up times when using nbodykit. This tool can be accessed on both the Cori and Edison machines.
General Usage¶
We maintain a daily build of the latest stable version of nbodykit on NERSC systems
that works with the 2.7-anaconda
Python module and uses the python-mpi-bcast helper
tool for fast startup of Python. Please see this tutorial for further details about
using python-mpi-bcast to launch Python applications on NERSC.
In addition to up-to-date builds of nbodykit, we provide a tool
(/usr/common/contrib/bccp/nbodykit/activate.sh
) designed to be used in job scripts to automatically
load nbodykit and ensure a fast startup time using python-mpi-bcast.
Below is an example job script that prints the help message of the
FFTPower
algorithm:
#!/bin/bash
#SBATCH -p debug
#SBATCH -o nbkit-example
#SBATCH -n 16
# You can also allocate the nodes with salloc
#
# salloc -n 16
#
# and type the commands in the shell obtained from salloc
module unload python
module load python/2.7-anaconda
source /usr/common/contrib/bccp/nbodykit/activate.sh
# regular nbodykit command lines
# replace nbkit.py with srun-nbkit
srun-nbkit -n 16 FFTPower --help
# You can also do this in an interactive shell
# e.g.
Active development¶
If you would like to use your own development version of nbodykit directly on NERSC, more installation work is required, although we also provide tools to simplify this process.
We can divide the addititional work into 3 separate steps:
1. When building nbodykit on a NERSC machine, we need to ensure the Python environment is set up to work efficiently on the computing nodes.
If darshan or altd are loaded by default, be sure to unload them before installing, as they tend to interfere with Python:
module unload darshan
module unload altd
and preferentially, use GNU compilers from PrgEnv-gnu
module unload PrgEnv-intel
module unload PrgEnv-cray
module load PrgEnv-gnu
then load the Anaconda Python distribution,
module load python/2.7-anaconda
For convenience, these lines can be included in the shell profile configuration
file on NERSC (i.e., ~/.bash_profile.ext
).
2. For easy loading of nbodykit on the compute nodes, we provide tools to create
separate bundles (tarballs) of the nbodykit source code and dependencies.
This can be performed using the build.sh
script in the nbodykit/nersc
directory in the source code tree.
cd nbodykit/nersc;
# build the dependencies into a bundle
# this creates the file `$NERSC_HOST/nbodykit-dep.tar.gz`
bash build.sh deps
# build the source code into a separate bundle
# this creates the file `$NERSC_HOST/nbodykit.tar.gz`
bash build.sh source
When the source code changes or the dependencies need to be updated,
simply repeat the relevant build.sh
command given above to regenerate the
bundle.
3. Finally, in the job script, we must explicitly activate python-mpi-bcast and load the nbodykit bundles.
#!/bin/bash
#SBATCH -p debug
#SBATCH -o nbkit-dev-example
#SBATCH -n 16
# load anaconda
module unload python
module load python/2.7-anaconda
# activate python-mpi-bcast
source /usr/common/contrib/bccp/python-mpi-bcast/nersc/activate.sh
# go to the nbodykit source directory
cd /path/to/nbodykit
# bcast the nbodykit tarballs
bcast nersc/$NERSC_HOST/nbodykit-dep.tar.gz nersc/$NERSC_HOST/nbodykit.tar.gz
# run the main nbodykit executable
srun -n 16 python-mpi /dev/shm/local/bin/nbkit.py FFTPower --help
Overview¶
nbodykit aims to take advantage of the wealth of large-scale computing resources by providing a massively-parallel toolkit to tackle a wide range of problems that arise in the analysis of large-scale structure datasets.
A major goal of the project is to provide a unified treatment of both simulation and observational datasets, allowing nbodykit to be used in the analysis of not only N-body simulations, but also data from current and future large-scale structure surveys.
nbodykit implements a framework that insulates analysis algorithms from data containers by relying on plugins that interact with the core of the code base through distinct extension points. Such a framework allows the user to create plugins designed for a specific task, which can then be easily loaded by nbodykit, provided that the plugin implements the minimal interface required by the desired extension point.
We provide several built-in extension points and plugins, which we outline below. For more detailed instructions on how to add new plugins to nbodykit, see Extending nbodykit.
Extension Points¶
There are several built-in extension points, which can be found
in the nbodykit.core
module. These classes serve as
the mount point for plugins, connecting the core of the nbodykit
package to the individual plugin classes. Each extension point defines
a specific interface that all plugins of that type must implement.
There are four built-in extension points. Each extension point carries a registry, which stores all plugins of that type that have been successfully loaded by the main nbodykit code.
Algorithm
- location:
nbodykit.core.Algorithm
- registry:
nbodykit.algorithms
- description: the mount point for plugins that run one of the high-level algorithms, i.e, a power spectrum calculation or Friends-of-Friends halo finder
- location:
DataSource
- location:
nbodykit.core.DataSource
- registry:
nbodykit.datasources
- description: the mount point for plugins that refer to the reading of input data files
- location:
Painter
- location:
nbodykit.core.Painter
- registry:
nbodykit.painters
- description: the mount point for plugins that “paint” input data files, where painting refers to the process of gridding a desired quantity on a mesh; the most common example is gridding the density field of a catalog of objects
- location:
Transfer
- location:
nbodykit.core.Transfer
- registry:
nbodykit.transfers
- description: the mount point for plugins that apply a kernel to the painted field in Fourier space during power spectrum calculations
- location:
Plugins¶
Plugins are subclasses of an extension point that are designed to handle a specific task, such as reading a certain type of data file, or computing a specific type of algorithm.
The core of the nbodykit functionality comes from the built-in plugins, of which there are numerous. Below, we list each of the built-in plugins and a brief desciption of the class. For further details, the name of each plugin provides a link to the API reference for each class.
- Algorithms
BianchiFFTPower
: power spectrum multipoles using FFTs for a data survey with non-trivial geometry, as detailed in Bianchi et al. 2015 (1505.05341)Describe
: describe a specific column of the input DataSourceFFTCorrelation
: correlation spectrum calculator via FFT in a periodic boxFFTPower
: periodic power spectrum calculator via FFTFOF
: a Friends-of-Friends (FOF) halo finderFOF6D
: finding subhalos from FOF groups; a variant of FOF6DFiberCollisions
: the application of fiber collisions to a galaxy surveyPaintGrid
: periodic power spectrum calculator via FFTPairCountCorrelation
: correlation function calculator via pair countingPlay
: describe a specific column of the input DataSourceRedshiftHistogram
: compute n(z) from the input DataSourceSubsample
: create a subsample from a DataSource, and evaluate density (1 + delta) smoothed at the given scaleTestBoxSize
: test if all objects in a DataSource fit within a specified BoxSizeTidalTensor
: compute the tidal force tensorTraceHalo
: calculate the halo property based on a different set of halo labels.
- DataSource
FOFGroups
: read data from a HDF5 FOFGroup fileFastPM
: read snapshot files of the FastPM simulationGadget
: read a flavor of Gadget 2 files (experimental)GadgetGroupTab
: read a flavor of Gadget 2 FOF catalogs (experimental)HaloLabel
: read a file of halo labels (halo id per particle), as generated the FOF algorithmMultiFile
: read snapshot files a multitype filePandas
: read data from a plaintext or HDF5 file using PandasPlainText
: read data from a plaintext file using numpyRaDecRedshift
: read (ra, dec, z) from a plaintext file, returning Cartesian coordinatesShiftedObserver
: establish an explicit observer (outside the box) for a periodic boxSubsample
: read data from a HDF5 Subsample fileTPMLabel
: read file of halo labels as generated from Martin White’s TPMTPMSnapshot
: read snapshot files from Martin White’s TPMUniformBox
: data particles with uniform positions and velocitiesZeldovichSim
: simulated particles using the Zel’dovich approximationZheng07Hod
: populate an input halo catalog with galaxies using the Zheng et al. 2007 HOD
- Painter
DefaultPainter
: grid the density field of an input DataSource of objects, optionally using a weight for each object.MomentumPainter
: grid the velocity-weighted density field (momentum) field of an input DataSource of objects
- Transfer
AnisotropicCIC
: divide by a Fourier-space kernel to account for the CIC gridding window function; see Jing et al 2005 (arxiv:0409240)AnisotropicTSC
: divide by a Fourier-space kernel to account for the TSC gridding window function; see Jing et al 2005 (arxiv:0409240)CICWindow
: divide by a Fourier-space kernel to account for the CIC gridding window function; see Jing et al 2005 (arxiv:0409240)NormalizeDC
: normalize the DC amplitude in Fourier space, which effectively divides by the mean in configuration spaceRemoveDC
: remove the DC amplitude in Fourier space, which sets the mean of the field in configuration space to zeroTSCWindow
: divide by a Fourier-space kernel to account for the TSC gridding window function; see Jing et al 2005 (arxiv:0409240)
Running an Algorithm¶
An nbodykit Algorithm
can be run using the nbkit.py
executable in the bin
directory. The user can ask for help with the calling signature
of the script in the usual way:
python bin/nbkit.py -h
The intended usage is:
python bin/nbkit.py AlgorithmName ConfigFilename
The first argument gives the name of the algorithm plugin that the user wishes to execute, while the second argument gives the name of the file to read configuration parameters from (if no file name is given, the script will read from standard input). For a discussion of the parsing of configuration files, see Writing configuration files.
Note
If no configuration file is supplied to nbkit.py
, the code will attempt to
read the configuration file from standard input. See Reading configuration from stdin
for further details.
The nbkit.py
script also provides an interface for getting help on extension points
and individual plugins. A list of the configuration parameters for the built-in plugins of
each extension point can be accessed by:
# prints help for all DataSource plugins
python bin/nbkit.py --list-datasources
# prints help for all Algorithm plugins
python bin/nbkit.py --list-algorithms
# prints help for all Painter plugins
python bin/nbkit.py --list-painters
# prints help for all Transfer plugins
python bin/nbkit.py --list-transfers
and the help message for an individual plugin can be printed by passing the
plugin name to the --list-*
option, i.e.,
# prints help message for only the FFTPower algorithm
python bin/nbkit.py --list-algorithms FFTPower
will print the help message for the FFTPower
algorithm. Similarly, the help messages for specific algorithms can also be accessed by
passing the algorithm name and the -h
option:
python bin/nbkit.py FFTPower -h
Using MPI¶
The nbodykit is designed to be run in parallel using the Message Passage Interface (MPI)
and the python package mpi4py. The executable nbkit.py
can take advantage of
multiple processors to run algorithms in parallel. The usage for running with n
processors is:
mpirun -n [n] python bin/nbkit.py ...
Writing configuration files¶
The parameters needed to execute the desired algorithms should be stored in a file
and passed to the nbkit.py
file as the second argument. The configuration file
should be written using YAML, which relies on the name: value
syntax to parse (key, value) pairs into dictionaries in Python.
By example¶
The YAML syntax is best learned by example. Let’s consider the
FFTPower
algorithm, which
computes the power spectrum of two data fields using a
Fast Fourier Transform in a periodic box. The necessary parameters to initialize and
run this algorithm can be accessed from the schema attribute of the FFTPower class:
# import the NameSpace holding the loaded algorithms
In [1]: from nbodykit import algorithms
# can also use algorithms.FFTPower? in IPython
In [2]: print(algorithms.FFTPower.schema)
periodic power spectrum calculator via FFT
Parameters
----------
mode : { '1d', '2d' }
compute the power as a function of `k` or `k` and `mu`
Nmesh : int
the number of cells in the gridded mesh
field : FieldType
first data field; a tuple of (DataSource, Painter, Transfer)
The 3 subfields are:
DataSource : DataSource.from_config, GridSource.from_config
the 1st DataSource; run `nbkit.py --list-datasources` for all options
Painter : Painter.from_config
the 1st Painter; run `nbkit.py --list-painters` for all options
Transfer : Transfer.from_config
the 1st Transfer chain; run `nbkit.py --list-transfers` for all options
other : FieldType, optional
the other data field; a tuple of (DataSource, Painter, Transfer)
The 3 subfields are:
DataSource : DataSource.from_config, GridSource.from_config
the 2nd DataSource; run `nbkit.py --list-datasources` for all options
Painter : Painter.from_config
the 2nd Painter; run `nbkit.py --list-painters` for all options
Transfer : Transfer.from_config
the 2nd Transfer chain; run `nbkit.py --list-transfers` for all options
los : { 'x', 'y', 'z' }, optional
the line-of-sight direction -- the angle `mu` is defined with respect to (default: z)
Nmu : int, optional
the number of mu bins to use from mu=[0,1]; if `mode = 1d`, then `Nmu` is set to 1 (default: 5)
dk : float, optional
the spacing of k bins to use; if not provided, the fundamental mode of the box is used
kmin : float, optional
the edge of the first `k` bin to use; default is 0 (default: 0.0)
quiet : bool, optional
silence the logging output (default: False)
poles : int, optional
if specified, also compute these multipoles from P(k,mu) (default: [])
paintbrush : { 'cic', 'tsc' }, optional
the density assignment kernel to use when painting; CIC (2nd order) or TSC (3rd order) (default: cic)
comm : optional
the global MPI communicator
An example configuration file for this algorithm is given below. The algorithm reads in
two data files using the
FastPM
DataSource and
FOFGroups
DataSource classes
and computes the cross power spectrum of the density fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | mode: 1d
Nmesh: 256
cosmo: {Om0: 0.27, H0: 100}
# the first field
field:
DataSource:
plugin: FastPM
path: ${NBKIT_CACHE}/data/fastpm_1.0000
Painter:
DefaultPainter
Transfer:
[NormalizeDC, RemoveDC, AnisotropicCIC]
# the second field to cross-correlate with
other:
# datasource
DataSource:
FOFGroups:
path: ${NBKIT_CACHE}/data/fof_ll0.200_1.0000.hdf5
m0: 10.0
# painter (can omit this and get same value)
Painter: DefaultPainter
# transfers (can omit and get this sequence)
Transfer: [NormalizeDC, RemoveDC, AnisotropicCIC]
output: ${NBKIT_HOME}/examples/output/test_power_cross.dat
|
The key aspect of YAML syntax for nbodykit configuration files is that parameters listed at a common indent level will be parsed together into a dictionary. This is illustrated explicitly with the cosmo keyword in line 4, which could have been equivalently expressed as:
cosmo:
Om0: 0.27
H0: 100
A few other things to note:
- The names of the parameters given in the configuration file must exactly match the names of the attributes listed in the algorithm’s schema.
- All required parameters must be listed in the configuration file, otherwise the code will raise an exception.
- The field and other parameters in this example have subfields, named DataSource, Painter, and Transfer. The parameters that are subfields must be indented from the their parent parameters to indicate that they are subfields.
- Environment variables can be used in configuration files, using the syntax
${ENV_VAR}
or$ENV_VAR
. In the above file, both NBKIT_CACHE and NBKIT_HOME are assumed to be environment variables.
Plugin representations¶
A key aspect of the nbodykit code is the use of plugins; representing them properly in configuration files is an important step in becoming a successful nbodykit user.
The function responsible for initializing plugins from their configuration file
representation is from_config()
. This function
accepts several different ways of representing plugins, and we will illustrate these
methods using the previous configuration file.
1. The parameters needed to initialize a plugin can be given at a common indent level, and the
keyword plugin can be used to give the name of the plugin to load. This is illustrated
for the field.DataSource parameter, which will be loaded into a FastPM
DataSource:
field:
DataSource:
plugin: FastPM
path: ${NBKIT_CACHE}/data/fastpm_1.0000
2. Rather than using the plugin parameter to give the name of the plugin to load, the user
can indent the plugin arguments under the name of the plugin, as is illustrated below for
the FOFGroups
DataSource:
other:
# datasource
DataSource:
FOFGroups:
path: ${NBKIT_CACHE}/data/fof_ll0.200_1.0000.hdf5
m0: 10.0
- If the plugin needs no arguments to be intialized, the user can simply use the name of the plugin, as is illustrated below for the field.Painter parameter:
Painter:
DefaultPainter
For more examples on how to accurately represent plugins in configuration files,
see the myriad of configuration files listed in the examples
directory
of the source code.
Specifying the output file¶
All configuration files must include the output
parameter. This parameter
gives the name of the output file to which the results of the algorithm will be saved.
The nbkit.py
script will raise an exception when the output
parameter is not
present in the input configuration file.
Specifying the cosmology¶
For the succesful reading of data using some nbodykit DataSource classes, cosmological parameters must be specified. The desired cosmology should be set in the configuration file, as is done in line 4 of the previous example. A single, global cosmology class will be initialized and passed to all DataSource objects that are created while running the nbodykit code.
The cosmology class is located at nbodykit.cosmology.Cosmology
, and the
syntax for the class is borrowed from astropy.cosmology.wCDM
class. The constructor
arguments are:
In [3]: from nbodykit.cosmology import Cosmology
# can also do ``Cosmology?`` in IPython
In [4]: help(Cosmology.__init__)
Help on function __init__ in module nbodykit.cosmology:
__init__(self, H0=67.6, Om0=0.31, Ob0=0.0486, Ode0=0.69, w0=-1.0, Tcmb0=2.7255, Neff=3.04, m_nu=0.0, flat=False)
Initialize self. See help(type(self)) for accurate signature.
Reading configuration from stdin¶
If no configuration file name is supplied to nbkit.py
, the code will attempt
to read the configuration from standard input. Note that the syntax for passing
information via standard input varies by operating system and shell type, and
may not be supported for all operating systems.
An example of such a usage is given in the examples/batch
directory and is listed below:
DIR=`dirname $0`
cd $DIR
[ -d ../output ] || mkdir ../output
echo testing nbkit.py from STDIN ...
echo Some openmpi implementations are buggy causing this test to hang
echo https://bugzilla.redhat.com/show_bug.cgi?id=1235044
echo use Control-C to stop this one if it hangs.
mpirun -np 2 python ../../bin/nbkit.py FFTPower <<EOF
mode: 1d
Nmesh: 256
output: ${NBKIT_HOME}/examples/output/test_stdin.dat
field:
DataSource:
plugin: FastPM
path: ${NBKIT_CACHE}/data/fastpm_1.0000
Transfer: [NormalizeDC, RemoveDC, AnisotropicCIC]
EOF
Running in batch mode¶
The nbodykit code also provides a tool to run a specific Algorithm for a set of configuration
files, possibly executing the algorithms in parallel. We refer to this as “batch mode” and
provide the nbkit-batch.py script in the bin
directory for this purpose.
Once again, the -h
flag will provide the help message for this script; the intended
usage is:
mpirun -n [n] python bin/nbkit-batch.py [--extras EXTRAS] [--debug] [--use_all_cpus] -i TASKS -c CONFIG AlgorithmName cpus_per_worker
The idea here is that a “template” configuration file can be passed to nbkit-batch.py
via the -c
option,
and this file should contain special keys that will be formatted using str.format()
syntax when iterating
through a set of configuration files. The names of these keys and the desired values for the
keys to take when iterating can be specified by the -i
option.
Note
The configuration template file in “batch” mode using nbkit-batch.py
should be passed
explicitly with a -c
option, while for normal usage of nbkit.py
, the configuration
file should be passed as the second positional argument.
By example¶
Let’s consider the following invocation of the nbkit-batch.py
script:
mpirun -np 7 python bin/nbkit-batch.py FFTPower 2 -c examples/batch/test_power_batch.template -i "los: [x, y, z]" --extras examples/batch/extra.template
In this example, the code is executed using MPI with 7 available processors, and we have set cpus_per_worker to 2.
The nbkit-batch.py
script reserves one processor to keep track of the task scheduling (the “master” processor),
which means that 6 processors
are available for computation. With 2 cpus for each worker, the script is able to use 3 workers to execute FFTPower
algorithms in parallel. Furthermore, we have asked for 3 task values – the input configuration template will have the
los key updated with values ‘x’, ‘y’, and ‘z’. With only three tasks and exactly 3 workers, each task can be computed
in parallel simulataneously.
For a closer look at how the task values are updated in the template configuration file, let’s examine the template file:
cosmo : {Om0: 0.28, H0: 70}
mode: 1d
Nmesh: 256
field:
DataSource:
plugin: FastPM
path: ${NBKIT_CACHE}/data/fastpm_1.0000
rsd: {los}
Transfer: [NormalizeDC, RemoveDC, AnisotropicCIC]
los: {los}
output: ${NBKIT_HOME}/examples/output/test_batch_power_fastpm_1d_{los}los_{tag}.dat
In this file, we see that there is exactly one task key: los. The {los}
string will be updated with the values
given on the command-line (‘x’, ‘y’, and ‘z’), and the FFTPower algorithm will be executed for each of the resulting
configuration files. The task keys are formatted using the Python string formatting syntax of str.format()
.
Lastly, we have also passed a file to the nbkit-batch.py
script using the --extras
option. This option allows an arbitrary
number of extra string keys to be formatted for each task iteration. In this example, the only “extra” key provided
is {tag}
, and the extra.template
file looks like:
tag = ['task_1', 'task_2', 'task_3']
So, when updating los to the first task value (‘x’), the tag key is updated to ‘task_1’, and the pattern continues
for the other tasks. With this configuration, nbkit-batch.py
will output 3 separate files, named:
- test_batch_power_fastpm_1d_xlos_task_1.dat
- test_batch_power_fastpm_1d_ylos_task_2.dat
- test_batch_power_fastpm_1d_zlos_task_3.dat
Multiple task keys¶
The -i
flag can be passed multiple times to the nbkit-batch.py
script. For example, let us imagine that
in addition to the los task key, we also wanted to iterate over a box key. If we had two boxes, labeled 1
and 2, then we could also specify -i box: ['1', '2']
on the command-line. Then, the task values that would be iterated over are:
(`los`, `box`) = ('x', '1'), ('x', '2'), ('y', '1'), ('y', '2'), ('z', '1'), ('z', '2')
DataSet for Algorithm results¶
Several nbodykit algorithms compute two-point clustering statistics,
and we provide the DataSet
class for analyzing these results.
The class is designed to hold data variables at fixed coordinates,
i.e., a grid of \((r, \mu)\) or \((k, \mu)\) bins.
The DataSet class is modeled after the syntax of xarray.Dataset
, and there
are several subclasses of DataSet that are specifically designed to hold correlation
function or power spectrum results (in 1D or 2D).
For algorithms that compute power spectra, we have:
FFTPower
- computes: \(P(k, \mu)\) or \(P(k)\)
- results class:
nbodykit.dataset.Power2dDataSet
ornbodykit.dataset.Power1dDataSet
BianchiFFTPower
- computes: \(P(k)\)
- results class:
nbodykit.dataset.Power1dDataSet
And for algorithms computing correlation functions:
FFTCorrelation
,PairCountCorrelation
- computes: \(\xi(k, \mu)\) or \(\xi(k)\)
- results class:
nbodykit.dataset.Corr2dDataSet
ornbodykit.dataset.Corr1dDataSet
Loading results¶
To load power spectrum or correlation function results, the user must first read the plaintext
files and then initialize the relevant subclass of DataSet. The functions nbodykit.files.Read2DPlainText()
and nbodykit.files.Read1DPlainText()
should be used for reading 2D and 1D result files, respectively.
The reading and DataSet initialization can be performed in one step, taking advantage of
from_nbkit()
:
In [1]: from nbodykit import dataset, files
# output file of 'examples/power/test_plaintext.params'
In [2]: filename_2d = os.path.join(cache_dir, 'results', 'test_power_plaintext.dat')
# load a 2D power result
In [3]: power_2d = dataset.Power2dDataSet.from_nbkit(*files.Read2DPlainText(filename_2d))
In [4]: power_2d
Out[4]: <Power2dDataSet: dims: (k_cen: 128, mu_cen: 5), variables: ('modes', 'k', 'power', 'mu')>
# output file of 'examples/power/test_cross_power.params'
In [5]: filename_1d = os.path.join(cache_dir, 'results', 'test_power_cross.dat')
# load a 1D power result
In [6]: power_1d = dataset.Power1dDataSet.from_nbkit(*files.Read1DPlainText(filename_1d))
In [7]: power_1d
Out[7]: <Power1dDataSet: dims: (k_cen: 128), variables: ('modes', 'power.imag', 'power.real', 'k')>
Coordinate grid¶
The clustering statistics are measured for fixed bins, and the DataSet class has several attributes to access the coordinate grid defined by these bins:
shape
: the shape of the coordinate griddims
: the names of each dimension of the coordinate gridcoords
: a dictionary that gives the center bin values for each dimension of the gridedges
: a dictionary giving the edges of the bins for each coordinate dimension
In [8]: print(power_1d.shape, power_2d.shape)
(128,) (128, 5)
In [9]: print(power_1d.dims, power_2d.dims)