# nbodykit.algorithms¶

class nbodykit.algorithms.ConvolvedFFTPower(first, poles, second=None, Nmesh=None, kmin=0.0, kmax=None, dk=None, use_fkp_weights=None, P0_FKP=None)[source]

Algorithm to compute power spectrum multipoles using FFTs for a data survey with non-trivial geometry.

Due to the geometry, the estimator computes the true power spectrum convolved with the window function (FFT of the geometry).

This estimator implemented in this class is described in detail in Hand et al. 2017 (arxiv:1704.02357). It uses the spherical harmonic addition theorem such that only $$2\ell+1$$ FFTs are required to compute each multipole. This differs from the implementation in Bianchi et al. and Scoccimarro et al., which requires $$(\ell+1)(\ell+2)/2$$ FFTs.

Results are computed when the object is inititalized, and the result is stored in the poles attribute. Important meta-data computed during algorithm execution is stored in the attrs dict. See the documenation of run().

Note

A full tutorial on the class is available in the documentation here.

Note

Cross correlations are only supported when the FKP weight column differs between the two mesh objects, i.e., the underlying data and randoms must be the same. This allows users to compute the cross power spectrum of the same density field, weighted differently.

Parameters
• first (FKPCatalog, FKPCatalogMesh) – the first source to paint the data/randoms; FKPCatalog is automatically converted to a FKPCatalogMesh, using default painting parameters

• poles (list of int) – a list of integer multipole numbers ell to compute

• second (FKPCatalog, FKPCatalogMesh, optional) – the second source to paint the data/randoms; cross correlations are only supported when the weight column differs between the two mesh objects, i.e., the underlying data and randoms must be the same!

• kmin (float, optional) – the edge of the first wavenumber bin; default is 0

• kmax (float, optional) – the limit of the last wavenumber bin; default is None, no limit.

• dk (float, optional) – the spacing in wavenumber to use; if not provided; the fundamental mode of the box is used

References

• Hand, Nick et al. An optimal FFT-based anisotropic power spectrum estimator, 2017

• Bianchi, Davide et al., Measuring line-of-sight-dependent Fourier-space clustering using FFTs, MNRAS, 2015

• Scoccimarro, Roman, Fast estimators for redshift-space clustering, Phys. Review D, 2015

Methods

 load(output[, comm, format]) Load a saved ConvolvedFFTPower result, which has been saved to disk with ConvolvedFFTPower.save(). normalization(name, alpha) Compute the power spectrum normalization, using either the data or randoms source. Compute the power spectrum multipoles. save(output) Save the ConvolvedFFTPower result to disk. shotnoise(alpha) Compute the power spectrum shot noise, using either the data or randoms source. to_pkmu(mu_edges, max_ell) Invert the measured multipoles $$P_\ell(k)$$ into power spectrum wedges, $$P(k,\mu)$$.
__setstate_pre000305__(state)[source]

compatible version of setstate for files generated before 0.3.5

classmethod load(output, comm=None, format='current')[source]

Load a saved ConvolvedFFTPower result, which has been saved to disk with ConvolvedFFTPower.save().

The current MPI communicator is automatically used if the comm keyword is None

format can be ‘current’, or ‘pre000305’ for files generated before 0.3.5.

normalization(name, alpha)[source]

Compute the power spectrum normalization, using either the data or randoms source.

The normalization is given by:

$A = \int d^3x \bar{n}'_1(x) \bar{n}'_2(x) w_{\mathrm{fkp},1} w_{\mathrm{fkp},2}.$

The mean densities are assumed to be the same, so this can be converted to a summation over objects in the source, as

$A = \sum w_{\mathrm{comp},1} \bar{n}_2 w_{\mathrm{fkp},1} w_{\mathrm{fkp},2}.$

References

see Eqs. 13,14 of Beutler et al. 2014, “The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: testing gravity with redshift space distortions using the power spectrum multipoles”

run()[source]

Compute the power spectrum multipoles. This function does not return anything, but adds several attributes (see below).

edges

the edges of the wavenumber bins

Type

array_like

poles

a BinnedStatistic object that behaves similar to a structured array, with fancy slicing and re-indexing; it holds the measured multipole results, as well as the number of modes (modes) and average wavenumbers values in each bin (k)

Type

BinnedStatistic

attrs

dictionary holding input parameters and several important quantites computed during execution:

1. data.N, randoms.N :

the unweighted number of data and randoms objects

2. data.W, randoms.W :

the weighted number of data and randoms objects, using the column specified as the completeness weights

3. alpha :

the ratio of data.W to randoms.W

4. data.norm, randoms.norm :

the normalization of the power spectrum, computed from either the “data” or “randoms” catalog (they should be similar). See equations 13 and 14 of arxiv:1312.4611.

5. data.shotnoise, randoms.shotnoise :

the shot noise values for the “data” and “random” catalogs; See equation 15 of arxiv:1312.4611.

6. shotnoise :

the total shot noise for the power spectrum, equal to data.shotnoise + randoms.shotnoise; this should be subtracted from the monopole.

7. BoxSize :

the size of the Cartesian box used to grid the data and randoms objects on a Cartesian mesh.

For further details on the meta-data, see the documentation.

Type

dict

save(output)[source]

Save the ConvolvedFFTPower result to disk.

The format is currently json.

Parameters

output (str) – the name of the file to dump the JSON results to

shotnoise(alpha)[source]

Compute the power spectrum shot noise, using either the data or randoms source.

This computes:

$S = \sum (w_\mathrm{comp} w_\mathrm{fkp})^2$

References

see Eq. 15 of Beutler et al. 2014, “The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: testing gravity with redshift space distortions using the power spectrum multipoles”

to_pkmu(mu_edges, max_ell)[source]

Invert the measured multipoles $$P_\ell(k)$$ into power spectrum wedges, $$P(k,\mu)$$.

Parameters
• mu_edges (array_like) – the edges of the $$\mu$$ bins

• max_ell (int) – the maximum multipole to use when computing the wedges; all even multipoles with $$ell$$ less than or equal to this number are included

Returns

pkmu – a data set holding the $$P(k,\mu)$$ wedges

Return type

BinnedStatistic

class nbodykit.algorithms.CylindricalGroups(source, rankby, rperp, rpar, flat_sky_los=None, periodic=False, BoxSize=None)[source]

Compute groups of objects using a cylindrical grouping method. We identify all satellites within a given cylindrical volume around a central object.

Results are computed when the object is inititalized, and the result is stored in the groups attribute; see the documenation of run().

Input parameters are stored in the attrs attribute dictionary.

Parameters
• source (subclass of CatalogSource) – the input source of particles providing the ‘Position’ column; the grouping algorithm is run on this catalog

• rperp (float) – the radius of the cylinder in the sky plane (i.e., perpendicular to the line-of-sight)

• rpar (float) – the radius along the line-of-sight direction; this is 1/2 the height of the cylinder

• rankby (str, list, None) – a single or list of column names to rank order the input source by before computing the cylindrical groups, such that objects ranked first are marked as CGM centrals; if None is supplied, no sorting will be done

• flat_sky_los (bool, optional) – a unit vector of length 3 providing the line-of-sight direction, assuming a fixed line-of-sight across the box, e.g., [0,0,1] to use the z-axis. If None, the observer at (0,0,0) is used to compute the line-of-sight for each pair

• periodic (bool, optional) – whether to use periodic boundary conditions

• BoxSize (float, 3-vector, optional) – the size of the box of the input data; must be provided as a keyword or in source.attrs if periodic=True

References

Okumura, Teppei, et al. “Reconstruction of halo power spectrum from redshift-space galaxy distribution: cylinder-grouping method and halo exclusion effect”, arXiv:1611.04165, 2016.

Methods

 Compute the cylindrical groups, saving the results to the groups attribute
run()[source]

Compute the cylindrical groups, saving the results to the groups attribute

groups

a catalog holding the result of the grouping. The length of the catalog is equal to the length of the input size, i.e., the length is equal to the size attribute. The relevant fields are:

1. cgm_type :

a flag specifying the type for each object, with 0 specifying CGM central and 1 denoting CGM satellite

2. cgm_haloid :

The index of the CGM object this object belongs to; an integer between 0 and the total number of CGM halos

3. num_cgm_sats :

The number of satellites in the CGM halo

Type

ArrayCatalog

class nbodykit.algorithms.FFTCorr(first, mode, Nmesh=None, BoxSize=None, second=None, los=[0, 0, 1], Nmu=5, dr=None, rmin=0.0, rmax=None, poles=[])[source]

Algorithm to compute the 1d or 2d correlation and/or multipoles in a periodic box, using a Fast Fourier Transform (FFT).

This computes the power spectrum as the square of the Fourier modes of the density field, which are computed via a FFT. Then it is transformed back to obtain the correlation function.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Note

This is very similar to FFTPower.

Parameters
• first (CatalogSource, MeshSource) – the source for the first field; if a CatalogSource is provided, it is automatically converted to MeshSource using the default painting parameters (via to_mesh())

• mode ({'1d', '2d'}) – compute either 1d or 2d power spectra

• Nmesh (int, optional) – the number of cells per side in the particle mesh used to paint the source

• BoxSize (int, 3-vector, optional) – the size of the box

• second (CatalogSource, MeshSource, optional) – the second source for cross-correlations

• los (array_like , optional) – the direction to use as the line-of-sight; must be a unit vector

• Nmu (int, optional) – the number of mu bins to use from $$\mu=[0,1]$$; if mode = 1d, then Nmu is set to 1

• dr (float, optional) – the linear spacing of r bins to use; if not provided, the fundamental mode of the box is used; if dr=0, the bins are tight, such that each bin has a unique r value.

• rmin (float, optional) – the lower edge of the first r bin to use

• rmax (float, optional) – the upper limit of the last r bin to use

• poles (list of int, optional) – a list of multipole numbers ell to compute $$\xi_\ell(r)$$ from $$\xi(r,\mu)$$

Methods

 load(output[, comm]) Load a saved result. Compute the correlation function in a periodic box, using FFTs. save(output) Save the result to disk.

Load a saved result. The result has been saved to disk with save().

run()[source]

Compute the correlation function in a periodic box, using FFTs.

Returns

• corr (BinnedStatistic) – a BinnedStatistic object that holds the measured $$\xi(r)$$ or $$\xi(r,\mu)$$. It stores the following variables:

• r :

the mean value for each r bin

• mumode=2d only

the mean value for each mu bin

• corr :

real array storing the correlation function

• modes :

the number of modes averaged together in each bin

• poles (BinnedStatistic or None) – a BinnedStatistic object to hold the multipole results $$\xi_\ell(r)$$; if no multipoles were requested by the user, this is None. It stores the following variables:

• r :

the mean value for each r bin

• power_L :

complex array storing the real and imaginary components for the $$\ell=L$$ multipole

• modes :

the number of modes averaged together in each bin

• corr.attrs, poles.attrs (dict) – dictionary of meta-data; in addition to storing the input parameters, it includes the following fields computed during the algorithm execution:

• shotnoisefloat

the power Poisson shot noise, equal to $$V/N$$, where $$V$$ is the volume of the box and N is the total number of objects; if a cross-correlation is computed, this will be equal to zero

• N1int

the total number of objects in the first source

• N2int

the total number of objects in the second source

save(output)

Save the result to disk. The format is currently JSON.

class nbodykit.algorithms.FFTPower(first, mode, Nmesh=None, BoxSize=None, second=None, los=[0, 0, 1], Nmu=5, dk=None, kmin=0.0, kmax=None, poles=[])[source]

Algorithm to compute the 1d or 2d power spectrum and/or multipoles in a periodic box, using a Fast Fourier Transform (FFT).

This computes the power spectrum as the square of the Fourier modes of the density field, which are computed via a FFT.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Note

A full tutorial on the class is available in the documentation here.

Parameters
• first (CatalogSource, MeshSource) – the source for the first field; if a CatalogSource is provided, it is automatically converted to MeshSource using the default painting parameters (via to_mesh())

• mode ({'1d', '2d'}) – compute either 1d or 2d power spectra

• Nmesh (int, optional) – the number of cells per side in the particle mesh used to paint the source

• BoxSize (int, 3-vector, optional) – the size of the box

• second (CatalogSource, MeshSource, optional) – the second source for cross-correlations

• los (array_like , optional) – the direction to use as the line-of-sight; must be a unit vector

• Nmu (int, optional) – the number of mu bins to use from $$\mu=[0,1]$$; if mode = 1d, then Nmu is set to 1

• dk (float, optional) – the linear spacing of k bins to use; if not provided, the fundamental mode of the box is used; if dk=0 is set, use fine bins such that the modes contributing to the bin has identical modulus.

• kmin (float, optional) – the lower edge of the first k bin to use

• kmin – the upper limit of the last k bin to use (not exact)

• poles (list of int, optional) – a list of multipole numbers ell to compute $$P_\ell(k)$$ from $$P(k,\mu)$$

Methods

 load(output[, comm]) Load a saved result. Compute the power spectrum in a periodic box, using FFTs. save(output) Save the result to disk.

Load a saved result. The result has been saved to disk with save().

run()[source]

Compute the power spectrum in a periodic box, using FFTs.

Returns

• power (BinnedStatistic) – a BinnedStatistic object that holds the measured $$P(k)$$ or $$P(k,\mu)$$. It stores the following variables:

• k :

the mean value for each k bin

• mumode=2d only

the mean value for each mu bin

• power :

complex array storing the real and imaginary components of the power

• modes :

the number of Fourier modes averaged together in each bin

• poles (BinnedStatistic or None) – a BinnedStatistic object to hold the multipole results $$P_\ell(k)$$; if no multipoles were requested by the user, this is None. It stores the following variables:

• k :

the mean value for each k bin

• power_L :

complex array storing the real and imaginary components for the $$\ell=L$$ multipole

• modes :

the number of Fourier modes averaged together in each bin

• power.attrs, poles.attrs (dict) – dictionary of meta-data; in addition to storing the input parameters, it includes the following fields computed during the algorithm execution:

• shotnoisefloat

the power Poisson shot noise, equal to $$V/N$$, where $$V$$ is the volume of the box and N is the total number of objects; if a cross-correlation is computed, this will be equal to zero

• N1int

the total number of objects in the first source

• N2int

the total number of objects in the second source

save(output)

Save the result to disk. The format is currently JSON.

class nbodykit.algorithms.FFTRecon(data, ran, Nmesh, bias=1.0, f=0.0, los=[0, 0, 1], R=20, position='Position', revert_rsd_random=False, scheme='LGS', BoxSize=None)[source]

FFT based Lagrangian reconstruction algorithm in a periodic box.

References

Eisenstein et al, 2007 http://adsabs.harvard.edu/abs/2007ApJ…664..675E Section 3, paragraph starting with ‘Restoring in full the …’

We follow a cleaner description in Schmitfull et al 2015,

Table I, and text below. Schemes are LGS, LF2 and LRR.

A slight difference against the paper is that Redshift distortion and bias are corrected in the linear order. The Random shifting followed Martin White’s suggestion to exclude the RSD by default. (with default revert_rsd_random=False.)

Parameters
• data (CatalogSource,) – the data catalog, e.g. halos. data.attrs[‘BoxSize’] is used if argument BoxSize is not given.

• ran (CatalogSource) – the random catalog, e.g. from a UniformCatalog object.

• Nmesh (int) – The size of the FFT Mesh. Rule of thumb is that the size of a mesh cell shall be 2 ~ 4 times smaller than the smoothing length, R.

• revert_rsd_random (boolean) – Revert the rsd for randoms as well as data. There are two conventions. either reverting rsd displacement in data displacement only(False) or in both data and randoms (True). Default is False.

• R (float) – The radius of smoothing. 10 to 20 Mpc/h is usually cool.

• bias (float) – The bias of the data catalog.

• f (float) – The growth rate; if non-zero, correct for RSD

• los (list) – The direction of the line of sight for RSD. Usually (default) [0, 0, 1].

• position (string) – column to use for picking up the Position of the objects.

• BoxSize (float or array_like) – the size of the periodic box, default is to infer from the data.

• scheme (string) – The reconstruction scheme. LGS is the standard reconstruction (Lagrangian growth shift). LF2 is the F2 Lagrangian reconstruction. LRR is the random-random Lagrangian reconstruction.

Attributes
actions

A list of actions to apply to the density field when interpolating to the mesh.

attrs

A dictionary storing relevant meta-data about the CatalogSource.

Methods

 apply(func[, kind, mode]) Return a view of the mesh, with actions updated to apply the specified function, either in Fourier space or configuration space, based on mode compute([mode, Nmesh]) Compute / Fetch the mesh object into memory as a RealField or ComplexField object. preview([axes, Nmesh, root]) Gather the mesh into as a numpy array, with (reduced) resolution. save(output[, dataset, mode]) Save the mesh as a BigFileMesh on disk, either in real or complex space. to_complex_field([out]) Convert the mesh source to the Fourier-space field, returning a pmesh.pm.ComplexField object. to_field([mode, out]) Return the mesh as a pmesh Field object, either in Fourier space or configuration space, based on mode. Convert the mesh source to the configuration-space field, returning a pmesh.pm.RealField object. Return a “view” of the MeshSource, in the spirit of numpy’s ndarray view.
 paint run work_with
__finalize__(other)

Finalize the creation of a MeshSource object by copying over attributes from a second MeshSource.

Parameters

other (MeshSource) – the second MeshSource to copy over attributes from

__len__()

Length of a mesh source is zero

property actions

A list of actions to apply to the density field when interpolating to the mesh.

This stores tuples of (mode, func, kind); see apply() for more details.

apply(func, kind='wavenumber', mode='complex')

Return a view of the mesh, with actions updated to apply the specified function, either in Fourier space or configuration space, based on mode

Parameters
• func (callable or a MeshFilter object) – func(x, y) where x is a list of r (k) values that broadcasts into a full array, when mode is ‘real’ (‘complex’); the value of x depends on kind. y is the value of the mesh field on the corresponding locations.

• kind (string, optional) –

if a MeshFilter object is given as func, this is ignored. The kind of value in x.

• When mode is ‘complex’:

• ’wavenumber’ means wavenumber from [- 2 pi / L * N / 2, 2 pi / L * N / 2).

• ’circular’ means circular frequency from [- pi, pi).

• ’index’ means [0, Nmesh )

• When mode is ‘real’:

• ’relative’ means distance from [-0.5 Boxsize, 0.5 BoxSize).

• ’index’ means [0, Nmesh )

• mode ('complex' or 'real', optional) – if a MeshFilter object is given as func, this is ignored. whether to apply the function to the mesh in configuration space or Fourier space

Returns

a view of the mesh object with the actions attribute updated to include the new action

Return type

MeshSource

property attrs

A dictionary storing relevant meta-data about the CatalogSource.

compute(mode='real', Nmesh=None)

Compute / Fetch the mesh object into memory as a RealField or ComplexField object.

preview(axes=None, Nmesh=None, root=0)

Gather the mesh into as a numpy array, with (reduced) resolution. The result is broadcast to all ranks, so this uses $$\mathrm{Nmesh}^3$$ per rank.

Parameters
• Nmesh (int, array_like) – The desired Nmesh of the result. Be aware this function allocates memory to hold a full Nmesh on each rank.

• axes (int, array_like) – The axes to project the preview onto., e.g. (0, 1)

• root (int, optional) – the rank number to treat as root when gathering to a single rank

Returns

out – An numpy array holding the real density field.

Return type

array_like

save(output, dataset='Field', mode='real')

Save the mesh as a BigFileMesh on disk, either in real or complex space.

Parameters
• output (str) – name of the bigfile file

• dataset (str, optional) – name of the bigfile data set where the field is stored

• mode (str, optional) – real or complex; the form of the field to store

to_complex_field(out=None)

Convert the mesh source to the Fourier-space field, returning a pmesh.pm.ComplexField object.

Not implemented in the base class, unless object is a view.

to_field(mode='real', out=None)

Return the mesh as a pmesh Field object, either in Fourier space or configuration space, based on mode.

This will call to_real_field() or to_complex_field() based on mode.

Parameters

mode ('real' or 'complex') – the return type of the field

Returns

either a RealField of ComplexField, storing the value of the field on the mesh

Return type

RealField, ComplexField

to_real_field()[source]

Convert the mesh source to the configuration-space field, returning a pmesh.pm.RealField object.

Not implemented in the base class, unless object is a view.

view()

Return a “view” of the MeshSource, in the spirit of numpy’s ndarray view.

This returns a new MeshSource whose memory is owned by self.

Note that for CatalogMesh objects, this is overidden by the CatalogSource.view function.

class nbodykit.algorithms.FKPCatalog(data, randoms, BoxSize=None, BoxPad=0.02, P0=None, nbar='NZ')[source]

An interface for simultaneous modeling of a data CatalogSource and a randoms CatalogSource, in the spirit of Feldman, Kaiser, and Peacock, 1994.

This main functionality of this class is:

• provide a uniform interface to accessing columns from the data CatalogSource and randoms CatalogSource, using column names prefixed with “data/” or “randoms/”

• compute the shared BoxSize of the source, by finding the maximum Cartesian extent of the randoms

• provide an interface to a mesh object, which knows how to paint the FKP density field from the data and randoms

Parameters
• data (CatalogSource) – the CatalogSource of particles representing the data catalog

• randoms (CatalogSource, or None) – the CatalogSource of particles representing the randoms catalog if None is given an empty catalog is used.

• BoxSize (float, 3-vector, optional) – the size of the Cartesian box to use for the unified data and randoms; if not provided, the maximum Cartesian extent of the randoms defines the box

• BoxPad (float, 3-vector, optional) – optionally apply this additional buffer to the extent of the Cartesian box

• nbar (str, optional) – the name of the column specifying the number density as a function of redshift. default is NZ.

• P0 (float or None) – if not None, a column named FKPWeight is added to data and random based on nbar.

References

Attributes
attrs

A dictionary storing relevant meta-data about the CatalogSource.

columns

Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.

hardcolumns

Hardcolumn of the form species/name

species

List of species names

Methods

 compute(*args, **kwargs) Our version of dask.compute() that computes multiple delayed dask collections at once. Return a shallow copy of the object, where each column is a reference of the corresponding column in self. Construct and return a hard-coded column. make_column(array) Utility function to convert an array-like object to a dask.array.Array. read(columns) Return the requested columns as dask arrays. save(output[, columns, dataset, datasets, …]) Save the CatalogSource to a bigfile.BigFile. to_mesh([Nmesh, BoxSize, BoxCenter, dtype, …]) Convert the FKPCatalog to a mesh, which knows how to “paint” the FKP density field. to_subvolumes([domain, position, columns]) Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. view([type]) Return a “view” of the CatalogSource object, with the returned type set by type.
 create_instance
__delitem__(col)

Delete a column of the form species/column

__finalize__(other)

Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.

The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the CatalogSource object.

Parameters

other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied

Returns

return self, with the added attributes

Return type

CatalogSource

__getitem__(key)

This provides access to the underlying data in two ways:

• The CatalogSource object for a species can be accessed if key is a species name.

• Individual columns for a species can be accessed using the format: species/column.

__setitem__(col, value)

Add columns to any of the species catalogs.

Note

New column names should be prefixed by ‘species/’ where ‘species’ is a name in the species attribute.

property attrs

A dictionary storing relevant meta-data about the CatalogSource.

property columns

Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.

compute(*args, **kwargs)

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

. note::

If the base attribute is set, compute() will called using base instead of self.

Parameters

args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

copy()

Return a shallow copy of the object, where each column is a reference of the corresponding column in self.

Note

No copy of data is made.

Note

This is different from view in that the attributes dictionary of the copy no longer related to self.

Returns

a new CatalogSource that holds all of the data columns of self

Return type

CatalogSource

get_hardcolumn(col)

Construct and return a hard-coded column.

These are usually produced by calling member functions marked by the @column decorator.

Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.

Note

If the base attribute is set, get_hardcolumn() will called using base instead of self.

property hardcolumns

Hardcolumn of the form species/name

static make_column(array)

Utility function to convert an array-like object to a dask.array.Array.

Note

The dask array chunk size is controlled via the dask_chunk_size global option. See set_options.

Parameters

array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object

Returns

a dask array initialized from array

Return type

dask.array.Array

Return the requested columns as dask arrays.

Parameters

columns (list of str) – the names of the requested columns

Returns

the list of column data, in the form of dask arrays

Return type

list of dask.array.Array

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters
• output (str) – the name of the file to write to

• columns (list of str) – the names of the columns to save in the file, or None to use all columns

• dataset (str, optional) – dataset to store the columns under.

• datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)

• header (str, optional, or None) – the name of the data set holding the header information, where attrs is stored if header is None, do not save the header.

• compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.

property species

List of species names

to_mesh(Nmesh=None, BoxSize=None, BoxCenter=None, dtype='c16', interlaced=False, compensated=False, resampler='cic', fkp_weight='FKPWeight', comp_weight='Weight', selection='Selection', position='Position', bbox_from_species=None, window=None, nbar=None)[source]

Convert the FKPCatalog to a mesh, which knows how to “paint” the FKP density field.

Additional keywords to the to_mesh() function include the FKP weight column, completeness weight column, and the column specifying the number density as a function of redshift.

Parameters
• Nmesh (int, 3-vector, optional) – the number of cells per box side; if not specified in attrs, this must be provided

• dtype (str, dtype, optional) – the data type of the mesh when painting. dtype=’f8’ or ‘f4’ assumes Hermitian symmetry of the input field (delta(x) = delta^{*}(-x)), and stores it as an N x N x N/2+1 real array. This speeds evaluation of even multipoles but yields incorrect odd multipoles in the presence of the wide-angle effect. dtype=’c16’ or ‘c8’ stores the field as an N x N x N complex array to correctly recover the odd multipoles.

• interlaced (bool, optional) – whether to use interlacing to reduce aliasing when painting the particles on the mesh

• compensated (bool, optional) – whether to apply a Fourier-space transfer function to account for the effects of the gridding + aliasing

• resampler (str, optional) – the string name of the resampler to use when interpolating the particles to the mesh; see pmesh.window.methods for choices

• fkp_weight (str, optional) – the name of the column in the source specifying the FKP weight; this weight is applied to the FKP density field: n_data - alpha*n_randoms

• comp_weight (str, optional) – the name of the column in the source specifying the completeness weight; this weight is applied to the individual fields, either n_data or n_random

• selection (str, optional) – the name of the column used to select a subset of the source when painting

• position (str, optional) – the name of the column that specifies the position data of the objects in the catalog

• bbox_from_species (str, optional) – if given, use the species to infer a bbox. if not give, will try random, then data (if random is empty)

• window (deprecated.) – use resampler=

• nbar (deprecated.) – deprecated. set nbar in the call to FKPCatalog()

to_subvolumes(domain=None, position='Position', columns=None)

Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.

This will read in the full position array and all of the requested columns.

Parameters
• domain (pmesh.domain.GridND object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is a pmesh.pm.ParticleMesh object.

• position (string_like) – column to use to compute the position.

• columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.

Returns

A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.

self.attrs are carried over as a shallow copy to the returned object.

Return type

CatalogSource

view(type=None)

Return a “view” of the CatalogSource object, with the returned type set by type.

This initializes a new empty class of type type and attaches attributes to it via the __finalize__() mechanism.

Parameters

type (Python type) – the desired class type of the returned object.

nbodykit.algorithms.FKPPower
nbodykit.algorithms.FKPWeightFromNbar(P0, nbar)[source]

Create FKPWeight from nbar, the number density of objects per redshift.

Parameters
• P0 (float) – the FKP normalization, when P0 == 0, returns 1.0, ignoring size / shape of nbar.

• nbar (array_like) – the number density of objects per redshift

Returns

class nbodykit.algorithms.FOF(source, linking_length, nmin, absolute=False, periodic=True, domain_factor=1)[source]

A friends-of-friends halo finder that computes the label for each particle, denoting which halo it belongs to.

Friends-of-friends was first used by Davis et al 1985 to define halos in hierachical structure formation of cosmological simulations. The algorithm is also known as DBSCAN in computer science. The subroutine here implements a parallel version of the FOF.

The underlying local FOF algorithm is from kdcount.cluster, which is an adaptation of the implementation in Volker Springel’s Gadget and Martin White’s PM.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

For returning a CatalogSource of the FOF halos, see find_features() and for computing a halo catalog with added analytic information for a specific redshift and cosmology, see to_halos().

Parameters
• source (CatalogSource) – the source to run the FOF algorithm on; must support ‘Position’

• linking_length (float) – the linking length, either in absolute units, or relative to the mean particle separation

• nmin (int) – halo with fewer particles are ignored

• absolute (bool, optional) – If True, the linking length is in absolute units, otherwise it is relative to the mean particle separation; default is False

Methods

 find_features([peakcolumn]) Based on the particle labels, identify the groups, and return the center-of-mass CMPosition, CMVelocity, and Length of each feature. Run the FOF algorithm. to_halos(particle_mass, cosmo, redshift[, …]) Return a HaloCatalog, holding the center-of-mass position and velocity of each FOF halo, as well as the properly scaled mass, for a given cosmology and redshift.
find_features(peakcolumn=None)[source]

Based on the particle labels, identify the groups, and return the center-of-mass CMPosition, CMVelocity, and Length of each feature.

If a peakcolumn is given, the PeakPosition and PeakVelocity is also calculated for the particle at the peak value of the column.

Data is scattered evenly across all ranks.

Returns

a source holding the (‘CMPosition’, ‘CMVelocity’, ‘Length’) of each feature; optionaly, PeakPosition, PeakVelocity are also included if peakcolumn is not None

Return type

ArrayCatalog

run()[source]

Run the FOF algorithm. This function returns nothing, but does attach several attributes to the class instance:

• attr:labels

• max_labels

Note

The labels array is scattered evenly across all ranks.

labels

an array the label that specifies which FOF halo each particle belongs to

Type

array_like, length: size

max_label

the maximum label across all ranks; this represents the total number of FOF halos found

Type

int

to_halos(particle_mass, cosmo, redshift, mdef='vir', posdef='cm', peakcolumn='Density')[source]

Return a HaloCatalog, holding the center-of-mass position and velocity of each FOF halo, as well as the properly scaled mass, for a given cosmology and redshift.

The returned catalog also has default analytic prescriptions for halo radius and concentration.

The data is scattered evenly across all ranks.

Parameters
• particle_mass (float) – the particle mass, which is used to convert the number of particles in each halo to a total mass

• cosmo (nbodykit.cosmology.core.Cosmology) – the cosmology of the catalog

• redshift (float) – the redshift of the catalog

• mdef (str, optional) – string specifying mass definition, used for computing default halo radii and concentration; should be ‘vir’ or ‘XXXc’ or ‘XXXm’ where ‘XXX’ is an int specifying the overdensity

• posdef (str, optional) – position, can be cm (center of mass) or peak (particle with maximum value on a column)

• peakcolumn (str , optional) – when posdef is ‘peak’, this is the column in source for identifying particles at the peak for the position and velocity.

Returns

a HaloCatalog at the specified cosmology and redshift

Return type

HaloCatalog

class nbodykit.algorithms.FiberCollisions(ra, dec, collision_radius=0.017222222222222226, seed=None, degrees=True, comm=None)[source]

Run an angular FOF algorithm to determine fiber collision groups from an input catalog, and then assign fibers such that the maximum amount of object receive a fiber.

This amounts to determining the following population of objects:

• population 1:

the maximal “clean” sample of objects in which each object is not angularly collided with any other object in this subsample

• population 2:

the potentially-collided objects; these objects are those that are fiber collided + those that have been “resolved” due to multiple coverage in tile overlap regions

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Parameters
• ra (array_like) – the right ascension coordinate column

• dec (array_like) – the declination coordinate column

• collision_radius (float, optional) – the size of the angular collision radius (in degrees); default is 62 arcseconds

• seed (int, optional) – the random seed to use when determining which objects get fibers

• degrees (bool, optional) – set to True if the units of ra and dec are degrees

References

Methods

 Run the fiber assignment algorithm.
run()[source]

Run the fiber assignment algorithm. This attaches the following attribute to the object:

Note

The labels attribute has a 1-to-1 correspondence with the rows in the input source.

labels

a CatalogSource that has the following columns:

• Label :

the group labels for each object in the input CatalogSource; label == 0 objects are not in a group

• Collided :

a flag array specifying which objects are collided, i.e., do not receive a fiber

• NeighborID :

for those objects that are collided, this gives the (global) index of the nearest neighbor on the sky (0-indexed) in the input catalog source, else it is set to -1

Type

ArrayCatalog; size: size

class nbodykit.algorithms.KDDensity(source, margin=1.0)[source]

Estimate a proxy density based on the distance to the nearest neighbor. The result is proportional to the density but the scale is unspecified.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Parameters
• source (CatalogSource) – the input source of particles to compute the proxy density on; must specify the ‘Position’ column

• margin (float, optional) – Padding region per parallel domain; relative to the mean seperation

Methods

 Compute the density proxy.
run()[source]

Compute the density proxy. This attaches the following attribute:

density

a unit-less, proxy density value for each object on the local rank. This is computed as the inverse cube of the distance to the closest, nearest neighbor

Type

array_like, length: size

class nbodykit.algorithms.ProjectedFFTPower(first, Nmesh=None, BoxSize=None, second=None, axes=(0, 1), dk=None, kmin=0.0)[source]

The power spectrum of a field in a periodic box, projected over certain axes.

This is not really always physically meaningful, but convenient for making sense of Lyman-Alpha forest or lensing maps.

This is usually called the 1d power spectrum or 2d power spectrum.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Parameters
• first (CatalogSource, MeshSource) – the source for the first field; if a CatalogSource is provided, it is automatically converted to MeshSource using the default painting parameters (via to_mesh())

• Nmesh (int, optional) – the number of cells per side in the particle mesh used to paint the source

• BoxSize (int, 3-vector, optional) – the size of the box

• second (CatalogSource, MeshSource, optional) – the second source for cross-correlations

• axes (tuple) – axes to measure the power on. The axes not in the list will be averaged out. For example: - (0, 1) : project to x,y and measure power - (0) : project to x and measure power.

• dk (float, optional) – the linear spacing of k bins to use; if not provided, the fundamental mode of the box is used

• kmin (float, optional) – the lower edge of the first k bin to use

Methods

 load(output[, comm]) Load a saved result. Run the algorithm. save(output) Save the result to disk.

Load a saved result. The result has been saved to disk with save().

run()[source]

Run the algorithm. This attaches the following attributes to the class:

edges

the edges of the wavenumber bins

Type

array_like

power

a BinnedStatistic object that holds the projected power. It stores the following variables:

• k :

the mean value for each k bin

• power :

complex array holding the real and imaginary components of the projected power

• modes :

the number of Fourier modes averaged together in each bin

Type

BinnedStatistic

save(output)

Save the result to disk. The format is currently JSON.

class nbodykit.algorithms.RedshiftHistogram(source, fsky, cosmo, bins=None, redshift='Redshift', weight=None)[source]

Compute the mean number density as a function of redshift $$n(z)$$ from an input CatalogSource of particles.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Note

The units of the number density are $$(\mathrm{Mpc}/h)^{-3}$$

Parameters
• source (CatalogSource) – the source of particles holding the redshift column to histogram

• fsky (float) – the sky area fraction, which is used in the volume calculation when normalizing $$n(z)$$

• cosmo (nbodykit.cosmology.core.Cosmology) – the cosmological parameters, which are used to compute the volume from redshift shells when normalizing $$n(z)$$

• bins (int or sequence of scalars, optional) – If bins is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If not provided, Scott’s rule is used to estimate the optimal bin width from the input data (default)

• redshift (str, optional) – the name of the column specifying the redshift data

• weight (str, optional) – the name of the column specifying weights to use when histogramming the data

Methods

 interpolate(z[, ext]) Interpoalte dndz as a function of redshift. load(output[, comm]) Load a saved RedshiftHistogram result. Run the algorithm, which computes the histogram. save(output) Save the RedshiftHistogram result to disk.
interpolate(z, ext='zeros')[source]

Interpoalte dndz as a function of redshift.

The interpolation acts as a band pass filter, removing small scale fluctuations in the estimator.

Parameters
• z (array_like) – redshift

• ext ('extrapolate', 'zeros', 'raise', 'const') – how to deal with values out of bound.

Returns

n

Return type

n(z)

Load a saved RedshiftHistogram result.

The result has been saved to disk with RedshiftHistogram.save().

run()[source]

Run the algorithm, which computes the histogram. This function does not return anything, but adds the following attributes to the class:

Note

All ranks store the same result attributes.

bin_edges

the edges of the redshift bins

Type

array_like

bin_centers

the center values of each redshift bin

Type

array_like

dV

the volume of each redshift shell in units of $$(\mathrm{Mpc}/h)^3$$

Type

array_like

nbar

the values of the redshift histogram, normalized to number density (in units of $$(\mathrm{Mpc}/h)^{-3}$$)

Type

array_like

save(output)[source]

Save the RedshiftHistogram result to disk.

The format is JSON.

class nbodykit.algorithms.SimulationBox2PCF(mode, data1, edges, Nmu=None, pimax=None, data2=None, randoms1=None, randoms2=None, R1R2=None, periodic=True, BoxSize=None, los='z', weight='Weight', position='Position', show_progress=False, **config)[source]

Compute the two-point correlation function for data in a simulation box as a function of $$r$$, $$(r,\mu)$$, $$(r_p, \pi)$$, or $$\theta$$ using pair counting.

This uses analytic randoms when using periodic conditions, unless a randoms catalog is specified. The “natural” estimator (DD/RR-1) is used in the former case, and the Landy-Szalay estimator (DD/RR - 2DR/RR + 1) in the latter case.

Note

When using analytic randoms, the expected counts are assumed to be unweighted.

Parameters
• mode ('1d', '2d', 'projected', 'angular') – the type of two-point correlation function to compute; see the Notes below

• data1 (CatalogSource) – the data catalog;

• edges (array_like) – the separation bin edges along the first coordinate dimension; depending on mode, the options are $$r$$, $$r_p$$, or $$\theta$$. Expected units for distances are $$\mathrm{Mpc}/h$$ and degrees for angles. Length of nbins+1

• Nmu (int, optional) – the number of $$\mu$$ bins, ranging from 0 to 1; requred if mode='2d'

• pimax (float, optional) – The maximum separation along the line-of-sight when mode='projected'. Distances along the $$\pi$$ direction are binned with unit depth. For instance, if pimax=40, then 40 bins will be created along the $$\pi$$ direction.

• data2 (CatalogSource, optional) – the second data catalog to cross-correlate;

• randoms1 (CatalogSource, optional) – the catalog specifying the un-clustered, random distribution for data1; if not provided, analytic randoms will be used

• randoms2 (CatalogSource, optional) – the catalog specifying the un-clustered, random distribution for data2; if not provided, analytic randoms will be used

• R1R2 (SimulationBoxPairCount, optional) – if provided, random pairs R1R2 are not recalculated in the Landy-Szalay estimator

• periodic (bool, optional) – whether to use periodic boundary conditions

• BoxSize (float, 3-vector, optional) – the size of the box; if ‘BoxSize’ is not provided in the source ‘attrs’, it must be provided here

• los ('x', 'y', 'z'; int, optional) – the axis of the simulation box to treat as the line-of-sight direction; this can be provided as string identifying one of ‘x’, ‘y’, ‘z’ or the equivalent integer number of the axis

• weight (str, optional) – the name of the column in the source specifying the particle weights

• position (str, optional) – the name of the column in the source specifying the particle positions

• show_progress (bool, optional) – if True, perform the pair counting calculation in 10 iterations, logging the progress after each iteration; this is useful for understanding the scaling of the code

• **config (key/value pairs) – additional keywords to pass to the Corrfunc function

Notes

This class can compute correlation functions using several different coordinate choices, based on the value of the input argument mode. The choices are:

• mode='1d' : compute pairs as a function of the 3D separation $$r$$

• mode='2d' : compute pairs as a function of the 3D separation $$r$$ and the cosine of the angle to the line-of-sight, $$\mu$$

• mode='projected' : compute pairs as a function of distance perpendicular and parallel to the line-of-sight, $$r_p$$ and $$\pi$$

• mode='angular' : compute pairs as a function of angle on the sky, $$\theta$$

If mode='projected', the projected correlation function $$w_p(r_p)$$ is also computed, using the input $$\pi_\mathrm{max}$$ value.

Methods

 load(output[, comm]) Load a result has been saved to disk with save(). Run the two-point correlation function algorithm. save(output) Save result as a JSON file with name output

Load a result has been saved to disk with save().

run()[source]

Run the two-point correlation function algorithm. This attaches the following attributes:

D1D2

the data1 - data2 pair counts

Type

BinnedStatistic

D1R2

the data1 - randoms2 pair counts

Type

BinnedStatistic

D2R1

the data2 - randoms1 pair counts

Type

BinnedStatistic

R1R2

the randoms1 - randoms2 pair counts

Type

BinnedStatistic

corr

the correlation function values, stored as the corr variable, computed from the pair counts

Type

BinnedStatistic

wp

the projected correlation function, $$w_p(r_p)$$, computed if mode='projected'; correlation is stored as the corr variable

Type

BinnedStatistic

Notes

The D1D2, D1R2, D2R1, and R1R2 attributes are identical to the pairs attribute of SimulationBoxPairCount.

save(output)

Save result as a JSON file with name output

class nbodykit.algorithms.SimulationBox3PCF(source, poles, edges, BoxSize=None, periodic=True, weight='Weight', position='Position')[source]

Compute the multipoles of the isotropic, three-point correlation function in configuration space for data in a simulation box.

This uses the algorithm of Slepian and Eisenstein, 2015 which scales as $$\mathcal{O}(N^2)$$, where $$N$$ is the number of objects.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Note

The algorithm expects the positions of objects in a simulation box to be the Cartesian x, y, and z vectors. For survey data, in the form of right ascension, declination, and redshift, see SurveyData3PCF.

Parameters
• source (CatalogSource) – the input source of particles providing the ‘Position’ column

• poles (list of int) – the list of multipole numbers to compute

• edges (array_like) – the edges of the bins of separation to use; length of nbins+1

• BoxSize (float, 3-vector, optional) – the size of the box; if periodic boundary conditions used, and ‘BoxSize’ not provided in the source attrs, it must be provided here

• periodic (bool, optional) – whether to use periodic boundary conditions when computing separations between objects

• weight (str, optional) – the name of the column in the source specifying the particle weights

References

Slepian and Eisenstein, MNRAS 454, 4142-4158 (2015)

Methods

 load(filename[, comm]) Load a result from filename that has been saved to disk with save(). run([pedantic]) Compute the three-point CF multipoles. save(output) Save the poles result to a JSON file with name output.

Load a result from filename that has been saved to disk with save().

run(pedantic=False)[source]

Compute the three-point CF multipoles. This attaches the following the attributes to the class:

poles

a BinnedStatistic object to hold the multipole results; the binned statistics stores the multipoles as variables corr_0, corr_1, etc for $$\ell=0,1,$$ etc. The coordinates of the binned statistic are r1 and r2, which give the separations between the three objects in CF.

Type

BinnedStatistic

save(output)

Save the poles result to a JSON file with name output.

class nbodykit.algorithms.SimulationBoxPairCount(mode, first, edges, BoxSize=None, periodic=True, second=None, los='z', Nmu=None, pimax=None, weight='Weight', position='Position', show_progress=False, **config)[source]

Count (weighted) pairs of objects in a simulation box as a function of $$r$$, $$(r,\mu)$$, $$(r_p, \pi)$$, or $$\theta$$ using the Corrfunc package.

See the Notes below for the allowed coordinate dimensions.

The default weighting scheme uses the product of the weights for each object in a pair.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Note

The algorithm expects the positions of particles in a simulation box to be the Cartesian x, y, and z vectors. To compute pair counts on survey data, using right ascension, declination, and redshift, see SurveyDataPairCount.

Parameters
• mode ('1d', '2d', 'projected', 'angular') – compute pair counts as a function of the specified coordinate basis; see the Notes section below for specifics

• first (CatalogSource) – the first source of particles, providing the position column

• edges (array_like) – the separation bin edges along the first coordinate dimension; depending on mode, the options are $$r$$, $$r_p$$, or $$\theta$$. Expected units for distances are $$\mathrm{Mpc}/h$$ and degrees for angles. Length of nbins+1

• BoxSize (float, 3-vector, optional) – the size of the box; if ‘BoxSize’ is not provided in the source ‘attrs’, it must be provided here

• periodic (bool, optional) – whether to use periodic boundary conditions

• second (CatalogSource, optional) – the second source of particles to cross-correlate

• los ({'x', 'y', 'z'}, int, optional) – the axis of the simulation box to treat as the line-of-sight direction; this can be provided as string identifying one of ‘x’, ‘y’, ‘z’ or the equivalent integer number of the axis

• Nmu (int, optional) – the number of $$\mu$$ bins, ranging from 0 to 1; requred if mode='2d'

• pimax (float, optional) – The maximum separation along the line-of-sight when mode='projected'. Distances along the $$\pi$$ direction are binned with unit depth. For instance, if pimax=40, then 40 bins will be created along the $$\pi$$ direction.

• weight (str, optional) – the name of the column in the source specifying the particle weights

• position (str, optional) – name of the column of the position of particles

• show_progress (bool, optional) – if True, perform the pair counting calculation in 10 iterations, logging the progress after each iteration; this is useful for understanding the scaling of the code

• **config (key/value pairs) – additional keywords to pass to the Corrfunc function

Notes

This class can compute pair counts using several different coordinate choices, based on the value of the input argument mode. The choices are:

• mode='1d' : compute pairs as a function of the 3D separation $$r$$

• mode='2d' : compute pairs as a function of the 3D separation $$r$$ and the cosine of the angle to the line-of-sight, $$\mu$$

• mode='projected' : compute pairs as a function of distance perpendicular and parallel to the line-of-sight, $$r_p$$ and $$\pi$$

• mode='angular' : compute pairs as a function of angle on the sky, $$\theta$$

For angular pair counts, the observer is placed at the center of the box when converting Cartesian coordinates to angular coordinates on the unit sphere.

Methods

 load(output[, comm]) Load a result has been saved to disk with save(). Calculate the pair counts in a simulation box. save(output) Save result as a JSON file with name output

Load a result has been saved to disk with save().

run()[source]

Calculate the pair counts in a simulation box. This adds the following attributes to the class:

pairs

a BinnedStatistic object holding the pair count results. The coordinate grid will be (r,), (r,mu), (rp, pi), or (theta,) when mode is ‘1d’, ‘2d’, ‘projected’, ‘angular’, respectively.

The BinnedStatistic stores the following variables:

• r, rp, or theta : the mean separation value in the bin

• npairs: the number of pairs in the bin

• wnpairs: the average weight value in the bin; each pair contributes the product of the individual weight values

Type

BinnedStatistic

save(output)

Save result as a JSON file with name output

class nbodykit.algorithms.SurveyData2PCF(mode, data1, randoms1, edges, cosmo=None, Nmu=None, pimax=None, data2=None, randoms2=None, R1R2=None, ra='RA', dec='DEC', redshift='Redshift', weight='Weight', show_progress=False, **config)[source]

Compute the two-point correlation function for observational survey data as a function of $$r$$, $$(r,\mu)$$, $$(r_p, \pi)$$, or $$\theta$$ using pair counting.

The Landy-Szalay estimator (DD/RR - 2 DD/RR + 1) is used to transform pair counts in to the correlation function.

Parameters
• mode ('1d', '2d', 'projected', 'angular') – the type of two-point correlation function to compute; see the Notes below

• data1 (CatalogSource) – the data catalog; must have ra, dec, redshift, columns

• randoms1 (CatalogSource) – the catalog specifying the un-clustered, random distribution for data1

• edges (array_like) – the separation bin edges along the first coordinate dimension; depending on mode, the options are $$r$$, $$r_p$$, or $$\theta$$. Expected units for distances are $$\mathrm{Mpc}/h$$ and degrees for angles. Length of nbins+1

• cosmo (Cosmology, optional) – the cosmology instance used to convert redshift into comoving distance; this is required for all cases except mode='angular'

• Nmu (int, optional) – the number of $$\mu$$ bins, ranging from 0 to 1; requred if mode='2d'

• pimax (float, optional) – The maximum separation along the line-of-sight when mode='projected'. Distances along the $$\pi$$ direction are binned with unit depth. For instance, if pimax=40, then 40 bins will be created along the $$\pi$$ direction.

• data2 (CatalogSource, optional) – the second data catalog to cross-correlate;

• randoms2 (CatalogSource, optional) – the catalog specifying the un-clustered, random distribution for data2; if not specified and data2 is provied, then randoms1 will be used for both.

• R1R2 (SurveyDataPairCount, optional) – if provided, random pairs R1R2 are not recalculated in the Landy-Szalay estimator

• ra (str, optional) – the name of the column in the source specifying the right ascension coordinates in units of degrees; default is ‘RA’

• dec (str, optional) – the name of the column in the source specifying the declination coordinates; default is ‘DEC’

• redshift (str, optional) – the name of the column in the source specifying the redshift coordinates; default is ‘Redshift’

• weight (str, optional) – the name of the column in the source specifying the object weights

• show_progress (bool, optional) – if True, perform the pair counting calculation in 10 iterations, logging the progress after each iteration; this is useful for understanding the scaling of the code

• **config (key/value pairs) – additional keywords to pass to the Corrfunc function

Notes

This class can compute correlation functions using several different coordinate choices, based on the value of the input argument mode. The choices are:

• mode='1d' : compute pairs as a function of the 3D separation $$r$$

• mode='2d' : compute pairs as a function of the 3D separation $$r$$ and the cosine of the angle to the line-of-sight, $$\mu$$

• mode='projected' : compute pairs as a function of distance perpendicular and parallel to the line-of-sight, $$r_p$$ and $$\pi$$

• mode='angular' : compute pairs as a function of angle on the sky, $$\theta$$

If mode='projected', the projected correlation function $$w_p(r_p)$$ is also computed, using the input $$\pi_\mathrm{max}$$ value.

Methods

 load(output[, comm]) Load a result has been saved to disk with save(). Run the two-point correlation function algorithm. save(output) Save result as a JSON file with name output

Load a result has been saved to disk with save().

run()[source]

Run the two-point correlation function algorithm. This attaches the following attributes:

D1D2

the data1 - data2 pair counts

Type

BinnedStatistic

D1R2

the data1 - randoms2 pair counts

Type

BinnedStatistic

D2R1

the data2 - randoms1 pair counts

Type

BinnedStatistic

R1R2

the randoms1 - randoms2 pair counts

Type

BinnedStatistic

corr

the correlation function values, stored as the corr variable, computed from the pair counts

Type

BinnedStatistic

wp

the projected correlation function, $$w_p(r_p)$$, computed if mode='projected'; correlation is stored as the corr variable

Type

BinnedStatistic

Notes

The D1D2, D1R2, D2R1, and R1R2 attributes are identical to the pairs attribute of SurveyDataPairCount.

save(output)

Save result as a JSON file with name output

class nbodykit.algorithms.SurveyData3PCF(source, poles, edges, cosmo, domain_factor=4, ra='RA', dec='DEC', redshift='Redshift', weight='Weight')[source]

Compute the multipoles of the isotropic, three-point correlation function in configuration space for observational survey data.

This uses the algorithm of Slepian and Eisenstein, 2015 which scales as $$\mathcal{O}(N^2)$$, where $$N$$ is the number of objects.

Results are computed when the object is inititalized. See the documenation of run() for the attributes storing the results.

Note

The algorithm expects the positions of objects from a survey catalog be the sky coordinates, right ascension and declination, and redshift. For simulation box data in Cartesian coordinates, see SimulationBox3PCF.

Warning

The right ascension and declination columns should be specified in degrees.

Parameters
• source (CatalogSource) – the input source of particles providing the ‘Position’ column

• poles (list of int) – the list of multipole numbers to compute

• edges (array_like) – the edges of the bins of separation to use; length of nbins+1

• cosmo (Cosmology) – the cosmology instance used to convert redshifts into comoving distances

• ra (str, optional) – the name of the column in the source specifying the right ascension coordinates in units of degrees; default is ‘RA’

• dec (str, optional) – the name of the column in the source specifying the declination coordinates; default is ‘DEC’

• redshift (str, optional) – the name of the column in the source specifying the redshift coordinates; default is ‘Redshift’

• weight (str, optional) – the name of the column in the source specifying the object weights

• domain_factor (int, optional) – the integer value by which to oversubscribe the domain decomposition mesh before balancing loads; this number can affect the distribution of loads on the ranks – an optimal value will lead to balanced loads

References

Slepian and Eisenstein, MNRAS 454, 4142-4158 (2015)

Methods

 load(filename[, comm]) Load a result from filename that has been saved to disk with save(). Compute the three-point CF multipoles. save(output) Save the poles result to a JSON file with name output.

Load a result from filename that has been saved to disk with save().

run()[source]

Compute the three-point CF multipoles. This attaches the following the attributes to the class:

poles

a BinnedStatistic object to hold the multipole results; the binned statistics stores the multipoles as variables corr_0, corr_1, etc for $$\ell=0,1,$$ etc. The coordinates of the binned statistic are r1 and r2, which give the separations between the three objects in CF.

Type

BinnedStatistic

save(output)

Save the poles result to a JSON file with name output.

class nbodykit.algorithms.SurveyDataPairCount(mode, first, edges, cosmo=None, second=None, Nmu=None, pimax=None, ra='RA', dec='DEC', redshift='Redshift', weight='Weight', show_progress=False, domain_factor=4, **config)[source]

Count (weighted) pairs of objects from a survey data catalog as a function of $$r$$, $$(r,\mu)$$, $$(r_p, \pi)$$, or $$\theta$$ using the Corrfunc package.

See the Notes below for the allowed coordinate dimensions.

The default weighting scheme uses the product of the weights for each object in a pair.

Results are computed when the class is inititalized. See the documenation of run() for the attributes storing the results.

Note

The algorithm expects the positions of particles from a survey catalog be the sky coordinates, right ascension and declination, and redshift. To compute pair counts in a simulation box using Cartesian coordinates, see SimulationBoxPairCount.

Warning

The right ascension and declination columns should be specified in degrees.

Parameters
• mode ('1d', '2d', 'projected', 'angular') – compute pair counts as a function of the specified coordinate basis; see the Notes section below for specifics

• first (CatalogSource) – the first source of particles, providing the ‘Position’ column

• edges (array_like) – the separation bin edges along the first coordinate dimension; depending on mode, the options are $$r$$, $$r_p$$, or $$\theta$$. Expected units for distances are $$\mathrm{Mpc}/h$$ and degrees for angles. Length of nbins+1

• cosmo (Cosmology, optional) – the cosmology instance used to convert redshift into comoving distance; this is required for all cases except mode='angular'

• second (CatalogSource, optional) – the second source of particles to cross-correlate

• Nmu (int, optional) – the number of $$\mu$$ bins, ranging from 0 to 1; requred if mode='2d'

• pimax (float, optional) – The maximum separation along the line-of-sight when mode='projected'. Distances along the $$\pi$$ direction are binned with unit depth. For instance, if pimax=40, then 40 bins will be created along the $$\pi$$ direction.

• ra (str, optional) – the name of the column in the source specifying the right ascension coordinates in units of degrees; default is ‘RA’

• dec (str, optional) – the name of the column in the source specifying the declination coordinates; default is ‘DEC’

• redshift (str, optional) – the name of the column in the source specifying the redshift coordinates; default is ‘Redshift’

• weight (str, optional) – the name of the column in the source specifying the object weights

• show_progress (bool, optional) – if True, perform the pair counting calculation in 10 iterations, logging the progress after each iteration; this is useful for understanding the scaling of the code

• domain_factor (int, optional) – the integer value by which to oversubscribe the domain decomposition mesh before balancing loads; this number can affect the distribution of loads on the ranks – an optimal value will lead to balanced loads

• **config (key/value pairs) – additional keywords to pass to the Corrfunc function

Notes

This class can compute pair counts using several different coordinate choices, based on the value of the input argument mode. The choices are:

• mode='1d' : compute pairs as a function of the 3D separation $$r$$

• mode='2d' : compute pairs as a function of the 3D separation $$r$$ and the cosine of the angle to the line-of-sight, $$\mu$$

• mode='projected' : compute pairs as a function of distance perpendicular and parallel to the line-of-sight, $$r_p$$ and $$\pi$$

• mode='angular' : compute pairs as a function of angle on the sky, $$\theta$$

Methods

 load(output[, comm]) Load a result has been saved to disk with save(). Calculate the pair counts of a survey data catalog. save(output) Save result as a JSON file with name output

Load a result has been saved to disk with save().

run()[source]

Calculate the pair counts of a survey data catalog. This adds the following attribute:

self.pairs.attrs[‘total_wnpairs’]: The total of wnpairs.

pairs

a BinnedStatistic object holding the pair count results. The coordinate grid will be (r,), (r,mu), (rp, pi), or (theta,) when mode is ‘1d’, ‘2d’, ‘projected’, ‘angular’, respectively.

The BinnedStatistic stores the following variables:

• r, rp, or theta : the mean separation value in the bin

• npairs: the number of pairs in the bin

• wnpairs: the weighted npairs in the bin; each pair contributes the product of the individual weight values

Type

BinnedStatistic

save(output)

Save result as a JSON file with name output