nbodykit.binned_statistic module¶

class nbodykit.binned_statistic.BinnedStatistic(dims, edges, data, fields_to_sum=[], **kwargs)[source]¶

Bases: object

Lightweight class to hold statistics binned at fixed coordinates.

For example, this class could hold a grid of (r, mu) or (k, mu) bins for a correlation function or power spectrum measurement.

It is modeled after the syntax of xarray.Dataset, and is designed to hold correlation function or power spectrum results (in 1D or 2D)

Parameters:

Parameters:	dims (list, (Ndim,)) – A list of strings specifying names for the coordinate dimensions. The dimension names stored in `dims` have the suffix ‘cen’ added, to indicate that the coordinate grid is defined at the bin centers edges (list, (Ndim,)) – A list specifying the bin edges for each dimension data (array_like) – a structured array holding the data variables, where the named fields interpreted as the variable names. The variable names are stored in `variables` fields_to_sum (list, optional) – the name of fields that will be summed when reindexing, instead of averaging **kwargs – Any additional keywords are saved as metadata in the `attrs` dictionary attribute

dims (list, (Ndim,)) – A list of strings specifying names for the coordinate dimensions. The dimension names stored in dims have the suffix ‘cen’ added, to indicate that the coordinate grid is defined at the bin centers
edges (list, (Ndim,)) – A list specifying the bin edges for each dimension
data (array_like) – a structured array holding the data variables, where the named fields interpreted as the variable names. The variable names are stored in variables
fields_to_sum (list, optional) – the name of fields that will be summed when reindexing, instead of averaging
**kwargs – Any additional keywords are saved as metadata in the attrs dictionary attribute

Examples

The following example shows how to read a power spectrum measurement from a JSON file, as output by nbodykit, assuming the JSON file holds a dictionary with a ‘power’ entry holding the relevant data

>>> filename = 'test_data.json'
>>> pk = BinnedStatistic.from_json(['k'], filename, 'power')

In older versions of nbodykit, results were written using plaintext ASCII files. Although now deprecated, this type of files can be read using:

>>> filename = 'test_data.dat'
>>> dset = BinnedStatistic.from_plaintext(['k'], filename)

Data variables can be accessed in a dict-like fashion:

>>> power = pkmu['power'] # returns power data variable

Array-like indexing of a BinnedStatistic returns a new BinnedStatistic holding the sliced data:

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu[:,0] # select first mu column
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

Additional data variables can be added to the BinnedStatistic via:

>>> modes = numpy.ones((200, 5))
>>> pkmu['modes'] = modes

Coordinate-based indexing is possible through sel():

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>

squeeze() will explicitly squeeze the specified dimension (of length one) such that the resulting instance has one less dimension:

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze(dim='mu') # can also just call pkmu.squeeze()
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

average() returns a new BinnedStatistic holding the data averaged over one dimension

reindex() will re-bin the coordinate arrays along the specified dimension

Attributes

`shape`	The shape of the coordinate grid
`variables`	Alias to return the names of the variables stored in data

Methods

`average`(dim, **kwargs)	Compute the average of each variable over the specified dimension.
`copy`()	Returns a copy of the BinnedStatistic
`from_json`(filename[, key, dims, edges])	Initialize a BinnedStatistic from a JSON file.
`from_plaintext`(dims, filename, **kwargs)	Initialize a BinnedStatistic from a plaintext file
`reindex`(dim, spacing[, weights, force, …])	Reindex the dimension `dim` by averaging over multiple coordinate bins, optionally weighting by `weights`.
`rename_variable`(old_name, new_name)	Rename a variable in `data` from `old_name` to `new_name`.
`sel`([method])	Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).
`squeeze`([dim])	Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.
`to_json`(filename)	Write a BinnedStatistic from a JSON file.

classmethod __construct_direct__(data, mask, **kwargs)[source]¶

Shortcut around __init__ for internal use to construct and return a new class instance. The returned object should be identical to that returned by __init__.

Notes

Useful for returning new instances with sliced data/mask
The keyword arguments required to create a full, unbroken instance are dims, coords, edges, and attrs

Parameters:	data –

__copy_attrs__()[source]¶: Return a copy of all necessary attributes associated with the BinnedStatistic. This dictionary + data and mask are all that’s required to reconstruct a new class

__finalize__(data, mask, indices)[source]¶: Finalize and return a new instance from a slice of the current object (returns a copy)

__getitem__(key)[source]¶

Index- or string- based indexing

Notes

If a single string is passed, the key is intrepreted as a variable or coordinate, and the corresponding array is returned
If a list of strings is passed, then a new BinnedStatistic holding only the variable names in key is returned
Integer-based indexing or slices similar to numpy indexing will slice data, returning a new BinnedStatistic holding the newly sliced data and coordinate grid
Scalar indexes (i.e., integers) used to index a certain dimension will “squeeze” that dimension, removing it from the coordinate grid

__setitem__(key, data)[source]¶: Add a new variable with the name key to the class using data

__slice_edges__(indices)[source]¶: Internal function to slice the edges attribute with the specified indices, which specify the included coordinate bins

average(dim, **kwargs)[source]¶

Compute the average of each variable over the specified dimension.

Parameters:	dim (str) – The name of the dimension to average over **kwargs – Additional keywords to pass to `BinnedStatistic.reindex()`. See the documentation for `BinnedStatistic.reindex()` for valid keywords.
Returns:	averaged – A new BinnedStatistic, with data averaged along one dimension, which reduces the number of dimension by one
Return type:	BinnedStatistic

copy()[source]¶: Returns a copy of the BinnedStatistic

classmethod from_json(filename, key='data', dims=None, edges=None, **kwargs)[source]¶

Initialize a BinnedStatistic from a JSON file.

The JSON file should contain a dictionary, where the data to load is stored as the key entry, with an edges entry specifying bin edges, and optionally, a attrs entry giving a dict of meta-data

Note

This uses nbodykit.utils.JSONDecoder to load the JSON file

Parameters:	filename (str) – the name of the file to load key (str, optional) – the name of the key in the JSON file holding the data to load dims (list, optional) – list of names specifying the dimensions, i.e., `['k']` or `['k', 'mu']`; must be supplied if not given in the JSON file
Returns:	dset – the BinnedStatistic holding the data from file
Return type:	BinnedStatistic

classmethod from_plaintext(dims, filename, **kwargs)[source]¶

Initialize a BinnedStatistic from a plaintext file

Note

Deprecated in nbodykit 0.2.x Storage of BinnedStatistic objects as plaintext ASCII files is no longer supported; See BinnedStatistic.from_json()

Parameters:	dims (list) – list of names specifying the dimensions, i.e., `['k']` or `['k', 'mu']` filename (str) – the name of the file to load
Returns:	dset – the BinnedStatistic holding the data from file
Return type:	BinnedStatistic

reindex(dim, spacing, weights=None, force=True, return_spacing=False, fields_to_sum=[])[source]¶

Reindex the dimension dim by averaging over multiple coordinate bins, optionally weighting by weights.

Returns a new BinnedStatistic holding the re-binned data.

Notes

We can only re-bin to an integral factor of the current dimension size in order to inaccuracies when re-binning to overlapping bins
Variables specified in fields_to_sum will be summed when re-indexing, instead of averaging

Parameters:

Parameters:	dim (str) – The name of the dimension to average over spacing (float) – The desired spacing for the re-binned data. If force = True, the spacing used will be the closest value to this value, such that the new bins are N times larger, when N is an integer weights (array_like or str, optional (None)) – An array to weight the data by before re-binning, or if a string is provided, the name of a data column to use as weights force (bool, optional) – If True, force the spacing to be a value such that the new bins are N times larger, when N is an integer, otherwise, raise an exception. Default is True return_spacing (bool, optional) – If True, return the new spacing as the second return value. Default is False. fields_to_sum (list) – the name of fields that will be summed when reindexing, instead of averaging
Returns:	rebinned (BinnedStatistic) – A new BinnedStatistic instance, which holds the rebinned coordinate grid and data variables spacing (float, optional) – If return_spacing is True, the new coordinate spacing will be returned

dim (str) – The name of the dimension to average over
spacing (float) – The desired spacing for the re-binned data. If force = True, the spacing used will be the closest value to this value, such that the new bins are N times larger, when N is an integer
weights (array_like or str, optional (None)) – An array to weight the data by before re-binning, or if a string is provided, the name of a data column to use as weights
force (bool, optional) – If True, force the spacing to be a value such that the new bins are N times larger, when N is an integer, otherwise, raise an exception. Default is True
return_spacing (bool, optional) – If True, return the new spacing as the second return value. Default is False.
fields_to_sum (list) – the name of fields that will be summed when reindexing, instead of averaging

Returns:

rebinned (BinnedStatistic) – A new BinnedStatistic instance, which holds the rebinned coordinate grid and data variables
spacing (float, optional) – If return_spacing is True, the new coordinate spacing will be returned

rename_variable(old_name, new_name)[source]¶

Rename a variable in data from old_name to new_name.

Note that this procedure is performed in-place (does not return a new BinnedStatistic)

Parameters:	old_name (str) – the name of the old varibale to rename new_name (str) – the desired new variable name
Raises:	`ValueError` – If old_name is not present in `variables`

sel(method=None, **indexers)[source]¶

Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).

Notes

Scalar values used to index a specific dimension will result in that dimension being squeezed. To keep a dimension of unit length, use a list to index (see examples below).

Parameters:	method ({None, 'nearest'}) – The method to use for inexact matches; if set to None, require an exact coordinate match, otherwise match the nearest coordinate **indexers – the pairs of dimension name and coordinate value used to index the BinnedStatistic
Returns:	sliced – a new BinnedStatistic holding the sliced data and coordinate grid
Return type:	BinnedStatistic

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=0.4)
<BinnedStatistic: dims: (mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=[0.4])
<BinnedStatistic: dims: (k: 1, mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>

shape¶: The shape of the coordinate grid

squeeze(dim=None)[source]¶

Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.

The behavior is similar to that of numpy.squeeze().

Parameters:	dim (str, optional) – The name of the dimension to squeeze. If no dimension is provided, then the one dimension with unit length will be squeezed
Returns:	squeezed – a new BinnedStatistic instance, squeezed along one dimension
Return type:	BinnedStatistic
Raises:	`ValueError` – If the specified dimension does not have length one, or no dimension is specified and multiple dimensions have length one

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze() # squeeze the mu dimension
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

to_json(filename)[source]¶

Write a BinnedStatistic from a JSON file.

Note

This uses nbodykit.utils.JSONEncoder to write the JSON file

Parameters:	filename (str) – the name of the file to write

variables¶: Alias to return the names of the variables stored in data

nbodykit.binned_statistic.bin_ndarray(ndarray, new_shape, weights=None, operation=<function mean>)[source]¶

Bins an ndarray in all axes based on the target shape, by summing or averaging.

Parameters:	ndarray (array_like) – the input array to re-bin new_shape (tuple) – the tuple holding the desired new shape weights (array_like, optional) – weights to multiply the input array by, before running the re-binning operation,

Notes

Dimensions in new_shape must be integral factor smaller than the old shape
Number of output dimensions must match number of input dimensions.
See https://gist.github.com/derricw/95eab740e1b08b78c03f

Examples

>>> m = numpy.arange(0,100,1).reshape((10,10))
>>> n = bin_ndarray(m, new_shape=(5,5), operation=numpy.sum)
>>> print(n)
[[ 22  30  38  46  54]
 [102 110 118 126 134]
 [182 190 198 206 214]
 [262 270 278 286 294]
 [342 350 358 366 374]]