nbodykit.algorithms.paircount_tpcf.estimators¶

Functions

`LandySzalayEstimator`(pair_counter, data1, ...)	Compute the correlation function from data/randoms using the Landy - Szalay estimator to compute the correlation function.
`NaturalEstimator`(D1D2)	Internal function to computing the correlation function using analytic randoms and the so-called "natural" correlation function estimator, \(DD/RR - 1\).

Classes

`AnalyticUniformRandoms`(mode, dims, edges, ...)	Internal class to compute analytic pair counts for uniformly distributed randoms.
`WedgeBinnedStatistic`(dims, edges, data[, ...])	A BinnedStatistic of wedges that can be converted to multiples.

class nbodykit.algorithms.paircount_tpcf.estimators.AnalyticUniformRandoms(mode, dims, edges, BoxSize)[source]¶

Internal class to compute analytic pair counts for uniformly distributed randoms.

This can compute the expected randoms counts for several coordinate choices, based on mode. The relevant volume/area calculations for the various coordinate modes are:

mode=’1d’: volume of sphere
mode=’2d’: volume of spherical sector
mode=’projected’: volume of cylinder
mode=’angular’: area of spherical cap

Methods

`__call__`(NR1[, NR2])	Evaluate the expected randoms pair counts, and the total_weight, returns as an object that looks like the result of paircount.
`get_filling_factor`()	This gives the ratio of the volume (or area) occupied by each bin to the global volume (or area).

__call__(NR1, NR2=None)[source]¶: Evaluate the expected randoms pair counts, and the total_weight, returns as an object that looks like the result of paircount.

get_filling_factor()[source]¶

This gives the ratio of the volume (or area) occupied by each bin to the global volume (or area).

It is different based on the value of mode.

nbodykit.algorithms.paircount_tpcf.estimators.LandySzalayEstimator(pair_counter, data1, data2, randoms1, randoms2, R1R2=None, logger=None, **kwargs)[source]¶

Compute the correlation function from data/randoms using the Landy - Szalay estimator to compute the correlation function.

Parameters

pair_counter (SimulationBoxPairCount, SurveyDataPairCount) – the pair counting algorithm class
data1 (CatalogSource) – the first data source
data2 (CatalogSource, None) – the second data catalog to cross-correlate; can be None for auto-correlations
randoms1 (CatalogSource) – the randoms catalog corresponding to data1
randoms2 (CatalogSource, None) – the second randoms catalog; can be None for auto-correlations
R1R2 (SimulationBoxPairCount, SurveyDataPairCount, optional) – if provided, random pairs R1R2 are not recalculated
**kwargs – the parameters passed to the pair_counter class to count pairs

Returns

D1D2, D1R2, D2R1, R1R2, CF – the various terms of the LS estimator + the correlation function result

Return type

BinnedStatistic

References

http://adsabs.harvard.edu/abs/1993ApJ…412…64L

nbodykit.algorithms.paircount_tpcf.estimators.NaturalEstimator(D1D2)[source]¶: Internal function to computing the correlation function using analytic randoms and the so-called “natural” correlation function estimator, \(DD/RR - 1\).

class nbodykit.algorithms.paircount_tpcf.estimators.WedgeBinnedStatistic(dims, edges, data, fields_to_sum=[], coords=None, **kwargs)[source]¶

A BinnedStatistic of wedges that can be converted to multiples.

Attributes

shape: The shape of the coordinate grid
variables: Alias to return the names of the variables stored in data

Methods

`average`(dim, **kwargs)	Compute the average of each variable over the specified dimension.
`copy`([cls])	Returns a copy of the BinnedStatistic, optionally change the type to cls.
`from_json`(filename[, key, dims, edges])	Initialize a BinnedStatistic from a JSON file.
`from_plaintext`(dims, filename, **kwargs)	Initialize a BinnedStatistic from a plaintext file
`reindex`(dim, spacing[, weights, force, ...])	Reindex the dimension `dim` by averaging over multiple coordinate bins, optionally weighting by `weights`.
`rename_variable`(old_name, new_name)	Rename a variable in `data` from `old_name` to `new_name`.
`sel`([method])	Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).
`squeeze`([dim])	Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.
`take`(masks, *indices)	Take a subset of a BinnedStatistic from given list of indices.
`to_json`(filename)	Write a BinnedStatistic from a JSON file.
`to_poles`(poles)	Invert the measured wedges \(\xi(r,mu)\) into correlation multipoles, \(\xi_\ell(r)\).

from_state

classmethod __construct_direct__(data, mask, **kwargs)¶

Shortcut around __init__ for internal use to construct and return a new class instance. The returned object should be identical to that returned by __init__.

Notes

Useful for returning new instances with sliced data/mask
The keyword arguments required to create a full, unbroken instance are dims, coords, edges, and attrs

Parameters: data –

__copy_attrs__()¶: Return a copy of all necessary attributes associated with the BinnedStatistic. This dictionary + data and mask are all that’s required to reconstruct a new class

__finalize__(data, mask, indices)¶: Finalize and return a new instance from a slice of the current object (returns a copy)

__getitem__(key)¶

Index- or string- based indexing

Notes

If a single string is passed, the key is intrepreted as a variable or coordinate, and the corresponding array is returned
If a list of strings is passed, then a new BinnedStatistic holding only the variable names in key is returned
Integer-based indexing or slices similar to numpy indexing will slice data, returning a new BinnedStatistic holding the newly sliced data and coordinate grid
Scalar indexes (i.e., integers) used to index a certain dimension will “squeeze” that dimension, removing it from the coordinate grid

__setitem__(key, data)¶: Add a new variable with the name key to the class using data

__slice_edges__(indices)¶: Internal function to slice the edges attribute with the specified indices, which specify the included coordinate bins

average(dim, **kwargs)¶

Compute the average of each variable over the specified dimension.

Parameters

dim (str) – The name of the dimension to average over
**kwargs – Additional keywords to pass to BinnedStatistic.reindex(). See the documentation for BinnedStatistic.reindex() for valid keywords.

Returns

averaged – A new BinnedStatistic, with data averaged along one dimension, which reduces the number of dimension by one

Return type

BinnedStatistic

copy(cls=None)¶: Returns a copy of the BinnedStatistic, optionally change the type to cls. cls must be a subclass of BinnedStatistic.

classmethod from_json(filename, key='data', dims=None, edges=None, **kwargs)¶

Initialize a BinnedStatistic from a JSON file.

The JSON file should contain a dictionary, where the data to load is stored as the key entry, with an edges entry specifying bin edges, and optionally, a attrs entry giving a dict of meta-data

Note

This uses nbodykit.utils.JSONDecoder to load the JSON file

Parameters

filename (str) – the name of the file to load
key (str, optional) – the name of the key in the JSON file holding the data to load
dims (list, optional) – list of names specifying the dimensions, i.e., ['k'] or ['k', 'mu']; must be supplied if not given in the JSON file

Returns

dset – the BinnedStatistic holding the data from file

Return type

BinnedStatistic

classmethod from_plaintext(dims, filename, **kwargs)¶

Initialize a BinnedStatistic from a plaintext file

Note

Deprecated in nbodykit 0.2.x Storage of BinnedStatistic objects as plaintext ASCII files is no longer supported; See BinnedStatistic.from_json()

Parameters

dims (list) – list of names specifying the dimensions, i.e., ['k'] or ['k', 'mu']
filename (str) – the name of the file to load

Returns

dset – the BinnedStatistic holding the data from file

Return type

BinnedStatistic

reindex(dim, spacing, weights=None, force=True, return_spacing=False, fields_to_sum=[])¶

Reindex the dimension dim by averaging over multiple coordinate bins, optionally weighting by weights.

Returns a new BinnedStatistic holding the re-binned data.

Notes

We can only re-bin to an integral factor of the current dimension size in order to inaccuracies when re-binning to overlapping bins
Variables specified in fields_to_sum will be summed when re-indexing, instead of averaging

Parameters

dim (str) – The name of the dimension to average over
spacing (float) – The desired spacing for the re-binned data. If force = True, the spacing used will be the closest value to this value, such that the new bins are N times larger, when N is an integer
weights (array_like or str, optional (None)) – An array to weight the data by before re-binning, or if a string is provided, the name of a data column to use as weights
force (bool, optional) – If True, force the spacing to be a value such that the new bins are N times larger, when N is an integer, otherwise, raise an exception. Default is True
return_spacing (bool, optional) – If True, return the new spacing as the second return value. Default is False.
fields_to_sum (list) – the name of fields that will be summed when reindexing, instead of averaging

Returns

rebinned (BinnedStatistic) – A new BinnedStatistic instance, which holds the rebinned coordinate grid and data variables
spacing (float, optional) – If return_spacing is True, the new coordinate spacing will be returned

rename_variable(old_name, new_name)¶

Rename a variable in data from old_name to new_name.

Note that this procedure is performed in-place (does not return a new BinnedStatistic)

Parameters

old_name (str) – the name of the old varibale to rename
new_name (str) – the desired new variable name

Raises

ValueError – If old_name is not present in variables

sel(method=None, **indexers)¶

Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).

Notes

Scalar values used to index a specific dimension will result in that dimension being squeezed. To keep a dimension of unit length, use a list to index (see examples below).

Parameters

method ({None, 'nearest'}) – The method to use for inexact matches; if set to None, require an exact coordinate match, otherwise match the nearest coordinate
**indexers – the pairs of dimension name and coordinate value used to index the BinnedStatistic

Returns

sliced – a new BinnedStatistic holding the sliced data and coordinate grid

Return type

BinnedStatistic

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=0.4)
<BinnedStatistic: dims: (mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=[0.4])
<BinnedStatistic: dims: (k: 1, mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>

property shape¶: The shape of the coordinate grid

squeeze(dim=None)¶

Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.

The behavior is similar to that of numpy.squeeze().

Parameters: dim (str, optional) – The name of the dimension to squeeze. If no dimension is provided, then the one dimension with unit length will be squeezed
Returns: squeezed – a new BinnedStatistic instance, squeezed along one dimension
Return type: BinnedStatistic
Raises: ValueError – If the specified dimension does not have length one, or no dimension is specified and multiple dimensions have length one

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze() # squeeze the mu dimension
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

take(*masks, **indices)¶

Take a subset of a BinnedStatistic from given list of indices. This is more powerful but more verbose than sel. Also the result is never squeezed, even if only a single item along the direction is used.

Parameters

masks (array_like (boolean)) – a list of masks that are of the same shape as the data.
indices (dict (string : array_like)) – mapping from axes (by name, dim) to items to select (list/array_like). Each item is a valid selector for numpy’s fancy indexing.

Return type

new BinnedStatistic, where only items selected by all axes are kept.

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>

# similar to pkmu.sel(k > 0.4), select the bin centers >>> pkmu.take(k=pkmu.coords[‘k’] > 0.4) <BinnedStatistic: dims: (mu: 5), variables: (‘mu’, ‘k’, ‘power’)>

# also similar to pkmu.sel(k > 0.4), select the bin averages >>> pkmu.take(pkmu[‘k’] > 0.4) <BinnedStatistic: dims: (k: 30), variables: (‘mu’, ‘k’, ‘power’)>

# impossible with sel. >>> pkmu.take(pkmu[‘modes’] > 0)

to_json(filename)¶

Write a BinnedStatistic from a JSON file.

Note

This uses nbodykit.utils.JSONEncoder to write the JSON file

Parameters: filename (str) – the name of the file to write

to_poles(poles)[source]¶

Invert the measured wedges \(\xi(r,mu)\) into correlation multipoles, \(\xi_\ell(r)\).

To select a mu_range, use

poles = self.sel(mu=slice(*mu_range), method='nearest').to_poles(poles)

Parameters: poles (array_like) – the list of multipoles to compute
Returns: xi_ell – a data set holding the \(\xi_\ell(r)\) multipoles
Return type: BinnedStatistic

property variables¶: Alias to return the names of the variables stored in data