nbodykit.source.catalog.fkp module

class nbodykit.source.catalog.fkp.FKPCatalog(data, randoms, BoxSize=None, BoxPad=0.02, use_cache=True)[source]

Bases: nbodykit.source.catalog.species.MultipleSpeciesCatalog

An interface for simultaneous modeling of a data CatalogSource and a randoms CatalogSource, in the spirit of Feldman, Kaiser, and Peacock, 1994.

This main functionality of this class is:

  • provide a uniform interface to accessing columns from the data CatalogSource and randoms CatalogSource, using column names prefixed with “data/” or “randoms/”
  • compute the shared BoxSize of the source, by finding the maximum Cartesian extent of the randoms
  • provide an interface to a mesh object, which knows how to paint the FKP density field from the data and randoms
Parameters:
  • data (CatalogSource) – the CatalogSource of particles representing the data catalog
  • randoms (CatalogSource) – the CatalogSource of particles representing the randoms catalog
  • BoxSize (float, 3-vector, optional) – the size of the Cartesian box to use for the unified data and randoms; if not provided, the maximum Cartesian extent of the randoms defines the box
  • BoxPad (float, 3-vector, optional) – optionally apply this additional buffer to the extent of the Cartesian box
  • use_cache (bool, optional) – if True, use the built-in dask cache system to cache data, providing significant speed-ups; requires cachey

References

Attributes

attrs A dictionary storing relevant meta-data about the CatalogSource.
columns All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
hardcolumns A list of the hard-coded columns in the CatalogSource.
use_cache If set to True, use the built-in caching features of dask to cache data in memory.

Methods

compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once.
get_hardcolumn(col) Construct and return a hard-coded column.
make_column(array) Utility function to convert a numpy array to a dask.array.Array.
read(columns) Return the requested columns as dask arrays.
save(output, columns[, datasets, header]) Save the CatalogSource to a bigfile.BigFile.
to_mesh([Nmesh, BoxSize, dtype, interlaced, …]) Convert the FKPCatalog to a mesh, which knows how to “paint” the FKP density field.
__delitem__(col)

Delete a column; cannot delete a “hard-coded” column

__getitem__(key)

This modifies the behavior of CatalogSourceBase.__getitem__() such that if key is a species name, a CatalogCopy will be returned that holds that data only for the species.

__setitem__(col, value)

Add columns to any of the species catalogs.

Note

New column names should be prefixed by ‘species/’ where ‘species’ is a name in the species attribute.

attrs

A dictionary storing relevant meta-data about the CatalogSource.

columns

All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

get_hardcolumn(col)

Construct and return a hard-coded column.

These are usually produced by calling member functions marked by the @column decorator.

Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.

hardcolumns

A list of the hard-coded columns in the CatalogSource.

These columns are usually member functions marked by @column decorator. Subclasses may override this method and use get_hardcolumn() to bypass the decorator logic.

logger = <logging.Logger object>
make_column(array)

Utility function to convert a numpy array to a dask.array.Array.

read(columns)

Return the requested columns as dask arrays.

Parameters:columns (list of str) – the names of the requested columns
Returns:the list of column data, in the form of dask arrays
Return type:list of dask.array.Array
save(output, columns, datasets=None, header='Header')

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:
  • output (str) – the name of the file to write to
  • columns (list of str) – the names of the columns to save in the file
  • datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column
  • header (str, optional) – the name of the data set holding the header information, where attrs is stored
to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', fkp_weight='FKPWeight', comp_weight='Weight', nbar='NZ', selection='Selection', position='Position')[source]

Convert the FKPCatalog to a mesh, which knows how to “paint” the FKP density field.

Additional keywords to the to_mesh() function include the FKP weight column, completeness weight column, and the column specifying the number density as a function of redshift.

Parameters:
  • Nmesh (int, 3-vector, optional) – the number of cells per box side; if not specified in attrs, this must be provided
  • BoxSize (float, 3-vector, optional) – the size of the box; if provided, this will use the default value in attrs
  • dtype (str, dtype, optional) – the data type of the mesh when painting
  • interlaced (bool, optional) – whether to use interlacing to reduce aliasing when painting the particles on the mesh
  • compensated (bool, optional) – whether to apply a Fourier-space transfer function to account for the effects of the gridding + aliasing
  • window (str, optional) – the string name of the window to use when interpolating the particles to the mesh; see pmesh.window.methods for choices
  • fkp_weight (str, optional) – the name of the column in the source specifying the FKP weight; this weight is applied to the FKP density field: n_data - alpha*n_randoms
  • comp_weight (str, optional) – the name of the column in the source specifying the completeness weight; this weight is applied to the individual fields, either n_data or n_random
  • selection (str, optional) – the name of the column used to select a subset of the source when painting
  • nbar (str, optional) – the name of the column specifying the number density as a function of redshift
  • position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
use_cache

If set to True, use the built-in caching features of dask to cache data in memory.