nbodykit.source.catalog

class nbodykit.source.catalog.CSVCatalog(*args, **kwargs)

A CatalogSource that uses CSVFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the name of the file to load
  • names (list of str) – the names of the columns of the csv file; this should give names of all the columns in the file – pass usecols to select a subset of columns
  • blocksize (int, optional) – the file will be partitioned into blocks of bytes roughly of this size
  • dtype (dict, str, optional) – if specified as a string, assume all columns have this dtype, otherwise; each column can have a dtype entry in the dict; if not specified, the data types will be inferred from the file
  • usecols (list, optional) – a pandas.read_csv; a subset of names to store, ignoring all other columns
  • delim_whitespace (bool, optional) – a pandas.read_csv keyword; if the CSV file is space-separated, set this to True
  • **config – additional keyword arguments that will be passed to pandas.read_csv(); see the documentation of that function for a full list of possible options
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.BinaryCatalog(*args, **kwargs)

A CatalogSource that uses BinaryFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the name of the binary file to load
  • dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a numpy.dtype or be able to be converted to one via a numpy.dtype() call
  • offsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage
  • header_size (int, optional) – the size of the header in bytes
  • size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.BigFileCatalog(*args, **kwargs)

A CatalogSource that uses BigFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the name of the directory holding the bigfile data
  • exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header
  • header (str, optional) – the path to the header; default is to use a column ‘Header’. It is relative to the file, not the dataset.
  • dataset (str) – load a specific dataset from the bigfile; default is to starting from the root.
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.HDFCatalog(*args, **kwargs)

A CatalogSource that uses HDFFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the file path to load
  • root (str, optional) – the start path in the HDF file, loading all data below this path
  • exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to root
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.TPMBinaryCatalog(*args, **kwargs)

A CatalogSource that uses TPMBinaryFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the path to the binary file to load
  • precision ({'f4', 'f8'}, optional) – the string dtype specifying the precision
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.FITSCatalog(*args, **kwargs)

A CatalogSource that uses FITSFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the file path to load
  • ext (number or string, optional) – The extension. Either the numerical extension from zero or a string extension name. If not sent, data is read from the first HDU that has data.
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.Gadget1Catalog(*args, **kwargs)

A CatalogSource that uses Gadget1File to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:
  • path (str) – the path to the binary file to load
  • columndefs (list) – a list of triplets (columnname, element_dtype, particle_types)
  • ptype (int) – type of particle of interest.
  • hdtype (list, dtype) – dtype of the header; must define Massarr and Npart
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • attrs (dict, optional) – dictionary of meta-data to store in attrs

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying file source.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
class nbodykit.source.catalog.ArrayCatalog(data, comm=None, use_cache=False, **kwargs)[source]

A CatalogSource initialized from a dictionary or structured ndarray.

Parameters:
  • data (obj:dict or numpy.ndarray) – a dictionary or structured ndarray; items are interpreted as the columns of the catalog; the length of any item is used as the size of the catalog.
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False
  • **kwargs – additional keywords to store as meta-data in attrs

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying data array/dict.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
get_hardcolumn(col)[source]

Return a column from the underlying data array/dict.

Columns are returned as dask arrays.

hardcolumns

The union of the columns in the file and any transformed columns.

class nbodykit.source.catalog.LogNormalCatalog(Plin, nbar, BoxSize, Nmesh, bias=2.0, seed=None, cosmo=None, redshift=None, unitary_amplitude=False, inverted_phase=False, comm=None, use_cache=False)[source]

A CatalogSource containing biased particles that have been Poisson-sampled from a log-normal density field.

Parameters:
  • Plin (callable) – callable specifying the linear power spectrum
  • nbar (float) – the number density of the particles in the box, assumed constant across the box; this is used when Poisson sampling the density field
  • BoxSize (float, 3-vector of floats) – the size of the box to generate the grid on
  • Nmesh (int) – the mesh size to use when generating the density and displacement fields, which are Poisson-sampled to particles
  • bias (float, optional) – the desired bias of the particles; applied while applying a log-normal transformation to the density field
  • seed (int, optional) – the global random seed; if set to None, the seed will be set randomly
  • cosmo (nbodykit.cosmology.core.Cosmology, optional) – this must be supplied if Plin does not carry cosmo attribute
  • redshift (float, optional) – this must be supplied if Plin does not carry a redshift attribute
  • comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
  • use_cache (bool, optional) – whether to cache data read from disk; default is False

References

Cole and Jones, 1991 Agrawal et al. 2017

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns A list of the hard-coded columns in the CatalogSource.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Position() Position assumed to be in Mpc/h
Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity() Velocity in km/s
VelocityOffset() The corresponding RSD offset, in Mpc/h
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Construct and return a hard-coded column.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
Position()[source]

Position assumed to be in Mpc/h

Velocity()[source]

Velocity in km/s

VelocityOffset()[source]

The corresponding RSD offset, in Mpc/h

class nbodykit.source.catalog.UniformCatalog(nbar, BoxSize, seed=None, comm=None, use_cache=False)[source]

A CatalogSource that has uniformly-distributed Position and Velocity columns.

The random numbers generated do not depend on the number of available ranks.

Parameters:
  • nbar (float) – the desired number density of particles in the box
  • BoxSize (float, 3-vector) – the size of the box
  • seed (int, optional) – the random seed
  • comm – the MPI communicator
  • use_cache (bool, optional) – whether to cache data on disk

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns A list of the hard-coded columns in the CatalogSource.
rng A MPIRandomState that behaves as numpy.random.RandomState but generates random numbers in a manner independent of the number of ranks.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Position() The position of particles, uniformly distributed in BoxSize
Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity() The velocity of particles, uniformly distributed in 0.01 x BoxSize
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Construct and return a hard-coded column.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
Position()[source]

The position of particles, uniformly distributed in BoxSize

Velocity()[source]

The velocity of particles, uniformly distributed in 0.01 x BoxSize

class nbodykit.source.catalog.RandomCatalog(csize, seed=None, comm=None, use_cache=False)[source]

A CatalogSource that can have columns added via a collective random number generator.

The random number generator stored as rng behaves as numpy.random.RandomState but generates random numbers only on the local rank in a manner independent of the number of ranks.

Parameters:
  • csize (int) – the desired collective size of the Source
  • seed (int, optional) – the global seed for the random number generator
  • comm (MPI communicator) – the MPI communicator; set automatically if None

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns A list of the hard-coded columns in the CatalogSource.
rng A MPIRandomState that behaves as numpy.random.RandomState but generates random numbers in a manner independent of the number of ranks.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Construct and return a hard-coded column.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
rng

A MPIRandomState that behaves as numpy.random.RandomState but generates random numbers in a manner independent of the number of ranks.

class nbodykit.source.catalog.FKPCatalog(data, randoms, BoxSize=None, BoxPad=0.02, use_cache=True)[source]

An interface for simultaneous modeling of a data CatalogSource and a randoms CatalogSource, in the spirit of Feldman, Kaiser, and Peacock, 1994.

This main functionality of this class is:

  • provide a uniform interface to accessing columns from the data CatalogSource and randoms CatalogSource, using column names prefixed with “data/” or “randoms/”
  • compute the shared BoxSize of the source, by finding the maximum Cartesian extent of the randoms
  • provide an interface to a mesh object, which knows how to paint the FKP density field from the data and randoms
Parameters:
  • data (CatalogSource) – the CatalogSource of particles representing the data catalog
  • randoms (CatalogSource) – the CatalogSource of particles representing the randoms catalog
  • BoxSize (float, 3-vector, optional) – the size of the Cartesian box to use for the unified data and randoms; if not provided, the maximum Cartesian extent of the randoms defines the box
  • BoxPad (float, 3-vector, optional) – optionally apply this additional buffer to the extent of the Cartesian box
  • use_cache (bool, optional) – if True, use the built-in dask cache system to cache data, providing significant speed-ups; requires cachey

References

Attributes

attrs A dictionary storing relevant meta-data about the CatalogSource.
columns Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.
hardcolumns Hardcolumn of the form species/name
species List of species names
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Construct and return a hard-coded column.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the FKPCatalog to a mesh, which knows how to “paint” the FKP density field.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', fkp_weight='FKPWeight', comp_weight='Weight', nbar='NZ', selection='Selection', position='Position')[source]

Convert the FKPCatalog to a mesh, which knows how to “paint” the FKP density field.

Additional keywords to the to_mesh() function include the FKP weight column, completeness weight column, and the column specifying the number density as a function of redshift.

Parameters:
  • Nmesh (int, 3-vector, optional) – the number of cells per box side; if not specified in attrs, this must be provided
  • BoxSize (float, 3-vector, optional) – the size of the box; if provided, this will use the default value in attrs
  • dtype (str, dtype, optional) – the data type of the mesh when painting
  • interlaced (bool, optional) – whether to use interlacing to reduce aliasing when painting the particles on the mesh
  • compensated (bool, optional) – whether to apply a Fourier-space transfer function to account for the effects of the gridding + aliasing
  • window (str, optional) – the string name of the window to use when interpolating the particles to the mesh; see pmesh.window.methods for choices
  • fkp_weight (str, optional) – the name of the column in the source specifying the FKP weight; this weight is applied to the FKP density field: n_data - alpha*n_randoms
  • comp_weight (str, optional) – the name of the column in the source specifying the completeness weight; this weight is applied to the individual fields, either n_data or n_random
  • selection (str, optional) – the name of the column used to select a subset of the source when painting
  • nbar (str, optional) – the name of the column specifying the number density as a function of redshift
  • position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
class nbodykit.source.catalog.HaloCatalog(source, cosmo, redshift, mdef='vir', mass='Mass', position='Position', velocity='Velocity')[source]

A wrapper CatalogSource of halo objects to interface nicely with halotools.sim_manager.UserSuppliedHaloCatalog.

Parameters:
  • source (CatalogSource) – the source holding the particles to be interpreted as halos
  • cosmo (Cosmology) – the cosmology instance;
  • redshift (float) – the redshift of the halo catalog
  • mdef (str, optional) – string specifying mass definition, used for computing default halo radii and concentration; should be ‘vir’ or ‘XXXc’ or ‘XXXm’ where ‘XXX’ is an int specifying the overdensity
  • mass (str, optional) – the column name specifying the mass of each halo
  • position (str, optional) – the column name specifying the position of each halo
  • velocity (str, optional) – the column name specifying the velocity of each halo

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns A list of the hard-coded columns in the CatalogSource.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Concentration() The halo concentration, computed using nbodykit.transform.HaloConcentration().
Mass() The halo mass column, assumed to be in units of \(M_\odot/h\).
Position() The halo position column, assumed to be in units of \(\mathrm{Mpc}/h\).
Radius() The halo radius, computed using nbodykit.transform.HaloRadius().
Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity() The halo velocity column, assumed to be in units of km/s.
VelocityOffset() The redshift-space distance offset due to the velocity in units of distance.
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Construct and return a hard-coded column.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_halotools([BoxSize, selection]) Return the CatalogSource as a halotools.sim_manager.UserSuppliedHaloCatalog.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
Concentration()[source]

The halo concentration, computed using nbodykit.transform.HaloConcentration().

This uses the analytic formulas for concentration from Dutton and Maccio 2014.

Mass()[source]

The halo mass column, assumed to be in units of \(M_\odot/h\).

Position()[source]

The halo position column, assumed to be in units of \(\mathrm{Mpc}/h\).

Radius()[source]

The halo radius, computed using nbodykit.transform.HaloRadius().

Assumed units of \(\mathrm{Mpc}/h\).

Velocity()[source]

The halo velocity column, assumed to be in units of km/s.

VelocityOffset()[source]

The redshift-space distance offset due to the velocity in units of distance. The assumed units are \(\mathrm{Mpc}/h\).

This multiplies Velocity by \(1 / (a 100 E(z)) = 1 / (a H(z)/h)\).

to_halotools(BoxSize=None, selection='Selection')[source]

Return the CatalogSource as a halotools.sim_manager.UserSuppliedHaloCatalog.

The Halotools catalog only holds the local data, although halos are labeled via the halo_id column using the global index.

Parameters:
  • BoxSize (float, array_like, optional) – the size of the box; note that anisotropic boxes are currently not supported by halotools
  • selection (str, optional) – the name of the column to slice the data on before converting to a halotools catalog
Returns:

cat – the Halotools halo catalog, storing the local halo data

Return type:

halotools.sim_manager.UserSuppliedHaloCatalog

class nbodykit.source.catalog.HODCatalog(halos, logMmin=13.031, sigma_logM=0.38, alpha=0.76, logM0=13.27, logM1=14.08, seed=None, use_cache=False, comm=None)[source]

A CatalogSource that uses the HOD prescription of Zheng et al 2007 to populate an input halo catalog with galaxies.

The mock population is done using halotools. See the documentation for halotools.empirical_models.Zheng07Cens and halotools.empirical_models.Zheng07Sats for further details regarding the HOD.

The columns generated in this catalog are:

  1. Position: the galaxy position
  2. Velocity: the galaxy velocity
  3. VelocityOffset: the RSD velocity offset, in units of distance
  4. conc_NFWmodel: the concentration of the halo
  5. gal_type: the galaxy type, 0 for centrals and 1 for satellites
  6. halo_id: the global ID of the halo this galaxy belongs to, between 0 and csize
  7. halo_local_id: the local ID of the halo this galaxy belongs to, between 0 and size
  8. halo_mvir: the halo mass
  9. halo_nfw_conc: alias of conc_NFWmodel
  10. halo_num_centrals: the number of centrals that this halo hosts, either 0 or 1
  11. halo_num_satellites: the number of satellites that this halo hosts
  12. halo_rvir: the halo radius
  13. halo_upid: equal to -1; should be ignored by the user
  14. halo_vx, halo_vy, halo_vz: the three components of the halo velocity
  15. halo_x, halo_y, halo_z: the three components of the halo position
  16. host_centric_distance: the distance from this galaxy to the center of the halo
  17. vx, vy, vz: the three components of the galaxy velocity, equal to Velocity
  18. x,y,z: the three components of the galaxy position, equal to Position

For futher details, please see the documentation.

Note

Default HOD values are from Reid et al. 2014

Parameters:
  • halos (UserSuppliedHaloCatalog) – the halotools table holding the halo data; this object must have the following attributes: cosmology, Lbox, redshift
  • logMmin (float, optional) – Minimum mass required for a halo to host a central galaxy
  • sigma_logM (float, optional) – Rate of transition from <Ncen>=0 –> <Ncen>=1
  • alpha (float, optional) – Power law slope of the relation between halo mass and <Nsat>
  • logM0 (float, optional) – Low-mass cutoff in <Nsat>
  • logM1 (float, optional) – Characteristic halo mass where <Nsat> begins to assume a power law form
  • seed (int, optional) – the random seed to generate deterministic mocks

References

Zheng et al. (2007), arXiv:0703457

Attributes

Index The attribute giving the global index rank of each particle in the list.
attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
csize The total, collective size of the CatalogSource, i.e., summed across all ranks.
hardcolumns The union of the columns in the file and any transformed columns.
size The number of objects in the CatalogSource on the local rank.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

Position() Galaxy positions, in units of Mpc/h
Selection() A boolean column that selects a subset slice of the CatalogSource.
Value() When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
Velocity() Galaxy velocity, in units of km/s
VelocityOffset() The RSD velocity offset, in units of Mpc/h
Weight() The column giving the weight to use for each particle on the mesh.
compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Return a column from the underlying data array/dict.
gslice(start, stop[, end, redistribute]) Execute a global slice of a CatalogSource.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
repopulate([seed]) Update the HOD parameters and then re-populate the mock catalog
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
sort(keys[, reverse, usecols]) Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the CatalogSource to a MeshSource, using the specified parameters.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
__makemodel__()[source]

Return the Zheng 07 HOD model.

This model evaluates Eqs. 2 and 5 of Zheng et al. 2007

class nbodykit.source.catalog.MultipleSpeciesCatalog(names, *species, **kwargs)[source]

A CatalogSource interface for handling multiples species of particles.

This CatalogSource stores a copy of the original CatalogSource objects for each species, providing access to the columns via the format species/ where “species” is one of the species names provided.

Parameters:
  • names (list of str) – list of strings specifying the names of the various species; data columns are prefixed with “species/” where “species” is in names
  • *species (two or more CatalogSource objects) – catalogs to be combined into a single catalog, which give the data for different species of particles; as many catalogs as names must be provided
  • use_cache (bool, optional) – whether to cache data when reading; default is True

Examples

Initialization:

>>> data = UniformCatalog(nbar=3e-5, BoxSize=512., seed=42)
>>> randoms = UniformCatalog(nbar=3e-5, BoxSize=512., seed=84)
>>> cat = MultipleSpeciesCatalog(['data', 'randoms'], data, randoms)

Accessing the Catalogs for individual species:

>>> data = cat["data"] # a copy of the original "data" object

Accessing individual columns:

>>> data_pos = cat["data/Position"]

Setting new columns:

>>> cat["data"]["new_column"] = 1.0
>>> assert "data/new_column" in cat

Attributes

attrs A dictionary storing relevant meta-data about the CatalogSource.
columns Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.
hardcolumns Hardcolumn of the form species/name
species List of species names
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
copy() Return a shallow copy of the object, where each column is a reference of the corresponding column in self.
get_hardcolumn(col) Construct and return a hard-coded column.
make_column(array) Utility function to convert an array-like object to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the catalog to a mesh, which knows how to “paint” the the combined density field, summed over all particle species.
view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
__delitem__(col)[source]

Delete a column of the form species/column

__getitem__(key)[source]

This provides access to the underlying data in two ways:

  • The CatalogSource object for a species can be accessed if key is a species name.
  • Individual columns for a species can be accessed using the format: species/column.
__setitem__(col, value)[source]

Add columns to any of the species catalogs.

Note

New column names should be prefixed by ‘species/’ where ‘species’ is a name in the species attribute.

columns

Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.

hardcolumns

Hardcolumn of the form species/name

species

List of species names

to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', selection='Selection', value='Value', position='Position')[source]

Convert the catalog to a mesh, which knows how to “paint” the the combined density field, summed over all particle species.

Parameters:
  • Nmesh (int, 3-vector, optional) – the number of cells per box side; can be inferred from attrs if the value is the same for all species
  • BoxSize (float, 3-vector, optional) – the size of the box; can be inferred from attrs if the value is the same for all species
  • dtype (str, dtype, optional) – the data type of the mesh when painting
  • interlaced (bool, optional) – whether to use interlacing to reduce aliasing when painting the particles on the mesh
  • compensated (bool, optional) – whether to apply a Fourier-space transfer function to account for the effects of the gridding + aliasing
  • window (str, optional) – the string name of the window to use when interpolating the
  • weight (str, optional) – the name of the column specifying the weight for each particle
  • selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
  • value (str, optional) – the name of the column specifying the field value for each particle
  • position (str, optional) – the name of the column that specifies the position data of the objects in the catalog