nbodykit.binned_statistic¶

Functions

bin_ndarray(ndarray, new_shape[, weights, ...])

Bins an ndarray in all axes based on the target shape, by summing or averaging.

Classes

BinnedStatistic(dims, edges, data[, ...])

Lightweight class to hold statistics binned at fixed coordinates.

class nbodykit.binned_statistic.BinnedStatistic(dims, edges, data, fields_to_sum=[], coords=None, **kwargs)[source]¶

Lightweight class to hold statistics binned at fixed coordinates.

For example, this class could hold a grid of (r, mu) or (k, mu) bins for a correlation function or power spectrum measurement.

It is modeled after the syntax of xarray.Dataset, and is designed to hold correlation function or power spectrum results (in 1D or 2D)

Parameters

dims (list, (Ndim,)) – A list of strings specifying names for the coordinate dimensions. The dimension names stored in dims have the suffix ‘cen’ added, to indicate that the coordinate grid is defined at the bin centers
edges (list, (Ndim,)) – A list specifying the bin edges for each dimension
data (array_like) – a structured array holding the data variables, where the named fields interpreted as the variable names. The variable names are stored in variables
fields_to_sum (list, optional) – the name of fields that will be summed when reindexing, instead of averaging
**kwargs – Any additional keywords are saved as metadata in the attrs dictionary attribute

Examples

The following example shows how to read a power spectrum measurement from a JSON file, as output by nbodykit, assuming the JSON file holds a dictionary with a ‘power’ entry holding the relevant data

>>> filename = 'test_data.json'
>>> pk = BinnedStatistic.from_json(['k'], filename, 'power')

In older versions of nbodykit, results were written using plaintext ASCII files. Although now deprecated, this type of files can be read using:

>>> filename = 'test_data.dat'
>>> dset = BinnedStatistic.from_plaintext(['k'], filename)

Data variables can be accessed in a dict-like fashion:

>>> power = pkmu['power'] # returns power data variable

Array-like indexing of a BinnedStatistic returns a new BinnedStatistic holding the sliced data:

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu[:,0] # select first mu column
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

Additional data variables can be added to the BinnedStatistic via:

>>> modes = numpy.ones((200, 5))
>>> pkmu['modes'] = modes

Coordinate-based indexing is possible through sel():

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>

squeeze() will explicitly squeeze the specified dimension (of length one) such that the resulting instance has one less dimension:

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze(dim='mu') # can also just call pkmu.squeeze()
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

average() returns a new BinnedStatistic holding the data averaged over one dimension

reindex() will re-bin the coordinate arrays along the specified dimension

Attributes

shape: The shape of the coordinate grid
variables: Alias to return the names of the variables stored in data

Methods

`average`(dim, **kwargs)	Compute the average of each variable over the specified dimension.
`copy`([cls])	Returns a copy of the BinnedStatistic, optionally change the type to cls.
`from_json`(filename[, key, dims, edges])	Initialize a BinnedStatistic from a JSON file.
`from_plaintext`(dims, filename, **kwargs)	Initialize a BinnedStatistic from a plaintext file
`reindex`(dim, spacing[, weights, force, ...])	Reindex the dimension `dim` by averaging over multiple coordinate bins, optionally weighting by `weights`.
`rename_variable`(old_name, new_name)	Rename a variable in `data` from `old_name` to `new_name`.
`sel`([method])	Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).
`squeeze`([dim])	Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.
`take`(masks, *indices)	Take a subset of a BinnedStatistic from given list of indices.
`to_json`(filename)	Write a BinnedStatistic from a JSON file.

from_state

classmethod __construct_direct__(data, mask, **kwargs)[source]¶

Shortcut around __init__ for internal use to construct and return a new class instance. The returned object should be identical to that returned by __init__.

Notes

Useful for returning new instances with sliced data/mask
The keyword arguments required to create a full, unbroken instance are dims, coords, edges, and attrs

Parameters: data –

__copy_attrs__()[source]¶: Return a copy of all necessary attributes associated with the BinnedStatistic. This dictionary + data and mask are all that’s required to reconstruct a new class

__finalize__(data, mask, indices)[source]¶: Finalize and return a new instance from a slice of the current object (returns a copy)

__getitem__(key)[source]¶

Index- or string- based indexing

Notes

If a single string is passed, the key is intrepreted as a variable or coordinate, and the corresponding array is returned
If a list of strings is passed, then a new BinnedStatistic holding only the variable names in key is returned
Integer-based indexing or slices similar to numpy indexing will slice data, returning a new BinnedStatistic holding the newly sliced data and coordinate grid
Scalar indexes (i.e., integers) used to index a certain dimension will “squeeze” that dimension, removing it from the coordinate grid

__setitem__(key, data)[source]¶: Add a new variable with the name key to the class using data

__slice_edges__(indices)[source]¶: Internal function to slice the edges attribute with the specified indices, which specify the included coordinate bins

average(dim, **kwargs)[source]¶

Compute the average of each variable over the specified dimension.

Parameters

dim (str) – The name of the dimension to average over
**kwargs – Additional keywords to pass to BinnedStatistic.reindex(). See the documentation for BinnedStatistic.reindex() for valid keywords.

Returns

averaged – A new BinnedStatistic, with data averaged along one dimension, which reduces the number of dimension by one

Return type

BinnedStatistic

copy(cls=None)[source]¶: Returns a copy of the BinnedStatistic, optionally change the type to cls. cls must be a subclass of BinnedStatistic.

classmethod from_json(filename, key='data', dims=None, edges=None, **kwargs)[source]¶

Initialize a BinnedStatistic from a JSON file.

The JSON file should contain a dictionary, where the data to load is stored as the key entry, with an edges entry specifying bin edges, and optionally, a attrs entry giving a dict of meta-data

Note

This uses nbodykit.utils.JSONDecoder to load the JSON file

Parameters

filename (str) – the name of the file to load
key (str, optional) – the name of the key in the JSON file holding the data to load
dims (list, optional) – list of names specifying the dimensions, i.e., ['k'] or ['k', 'mu']; must be supplied if not given in the JSON file

Returns

dset – the BinnedStatistic holding the data from file

Return type

BinnedStatistic

classmethod from_plaintext(dims, filename, **kwargs)[source]¶

Initialize a BinnedStatistic from a plaintext file

Note

Deprecated in nbodykit 0.2.x Storage of BinnedStatistic objects as plaintext ASCII files is no longer supported; See BinnedStatistic.from_json()

Parameters

dims (list) – list of names specifying the dimensions, i.e., ['k'] or ['k', 'mu']
filename (str) – the name of the file to load

Returns

dset – the BinnedStatistic holding the data from file

Return type

BinnedStatistic

reindex(dim, spacing, weights=None, force=True, return_spacing=False, fields_to_sum=[])[source]¶

Reindex the dimension dim by averaging over multiple coordinate bins, optionally weighting by weights.

Returns a new BinnedStatistic holding the re-binned data.

Notes

We can only re-bin to an integral factor of the current dimension size in order to inaccuracies when re-binning to overlapping bins
Variables specified in fields_to_sum will be summed when re-indexing, instead of averaging

Parameters

dim (str) – The name of the dimension to average over
spacing (float) – The desired spacing for the re-binned data. If force = True, the spacing used will be the closest value to this value, such that the new bins are N times larger, when N is an integer
weights (array_like or str, optional (None)) – An array to weight the data by before re-binning, or if a string is provided, the name of a data column to use as weights
force (bool, optional) – If True, force the spacing to be a value such that the new bins are N times larger, when N is an integer, otherwise, raise an exception. Default is True
return_spacing (bool, optional) – If True, return the new spacing as the second return value. Default is False.
fields_to_sum (list) – the name of fields that will be summed when reindexing, instead of averaging

Returns

rebinned (BinnedStatistic) – A new BinnedStatistic instance, which holds the rebinned coordinate grid and data variables
spacing (float, optional) – If return_spacing is True, the new coordinate spacing will be returned

rename_variable(old_name, new_name)[source]¶

Rename a variable in data from old_name to new_name.

Note that this procedure is performed in-place (does not return a new BinnedStatistic)

Parameters

old_name (str) – the name of the old varibale to rename
new_name (str) – the desired new variable name

Raises

ValueError – If old_name is not present in variables

sel(method=None, **indexers)[source]¶

Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).

Notes

Scalar values used to index a specific dimension will result in that dimension being squeezed. To keep a dimension of unit length, use a list to index (see examples below).

Parameters

method ({None, 'nearest'}) – The method to use for inexact matches; if set to None, require an exact coordinate match, otherwise match the nearest coordinate
**indexers – the pairs of dimension name and coordinate value used to index the BinnedStatistic

Returns

sliced – a new BinnedStatistic holding the sliced data and coordinate grid

Return type

BinnedStatistic

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=0.4)
<BinnedStatistic: dims: (mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=[0.4])
<BinnedStatistic: dims: (k: 1, mu: 5), variables: ('mu', 'k', 'power')>

>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>

property shape¶: The shape of the coordinate grid

squeeze(dim=None)[source]¶

Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.

The behavior is similar to that of numpy.squeeze().

Parameters: dim (str, optional) – The name of the dimension to squeeze. If no dimension is provided, then the one dimension with unit length will be squeezed
Returns: squeezed – a new BinnedStatistic instance, squeezed along one dimension
Return type: BinnedStatistic
Raises: ValueError – If the specified dimension does not have length one, or no dimension is specified and multiple dimensions have length one

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze() # squeeze the mu dimension
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

take(*masks, **indices)[source]¶

Take a subset of a BinnedStatistic from given list of indices. This is more powerful but more verbose than sel. Also the result is never squeezed, even if only a single item along the direction is used.

Parameters

masks (array_like (boolean)) – a list of masks that are of the same shape as the data.
indices (dict (string : array_like)) – mapping from axes (by name, dim) to items to select (list/array_like). Each item is a valid selector for numpy’s fancy indexing.

Return type

new BinnedStatistic, where only items selected by all axes are kept.

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>

# similar to pkmu.sel(k > 0.4), select the bin centers >>> pkmu.take(k=pkmu.coords[‘k’] > 0.4) <BinnedStatistic: dims: (mu: 5), variables: (‘mu’, ‘k’, ‘power’)>

# also similar to pkmu.sel(k > 0.4), select the bin averages >>> pkmu.take(pkmu[‘k’] > 0.4) <BinnedStatistic: dims: (k: 30), variables: (‘mu’, ‘k’, ‘power’)>

# impossible with sel. >>> pkmu.take(pkmu[‘modes’] > 0)

to_json(filename)[source]¶

Write a BinnedStatistic from a JSON file.

Note

This uses nbodykit.utils.JSONEncoder to write the JSON file

Parameters: filename (str) – the name of the file to write

property variables¶: Alias to return the names of the variables stored in data

nbodykit.binned_statistic.bin_ndarray(ndarray, new_shape, weights=None, operation=<function mean>)[source]¶

Bins an ndarray in all axes based on the target shape, by summing or averaging.

Parameters

ndarray (array_like) – the input array to re-bin
new_shape (tuple) – the tuple holding the desired new shape
weights (array_like, optional) – weights to multiply the input array by, before running the re-binning operation,

Notes

Dimensions in new_shape must be integral factor smaller than the old shape
Number of output dimensions must match number of input dimensions.
See https://gist.github.com/derricw/95eab740e1b08b78c03f

Examples

>>> m = numpy.arange(0,100,1).reshape((10,10))
>>> n = bin_ndarray(m, new_shape=(5,5), operation=numpy.sum)
>>> print(n)
[[ 22  30  38  46  54]
 [102 110 118 126 134]
 [182 190 198 206 214]
 [262 270 278 286 294]
 [342 350 358 366 374]]