nbodykit.binned_statistic

Functions

bin_ndarray(ndarray, new_shape[, weights, …]) Bins an ndarray in all axes based on the target shape, by summing or averaging.

Classes

BinnedStatistic(dims, edges, data[, …]) Lightweight class to hold statistics binned at fixed coordinates.
class nbodykit.binned_statistic.BinnedStatistic(dims, edges, data, fields_to_sum=[], coords=None, **kwargs)[source]

Lightweight class to hold statistics binned at fixed coordinates.

For example, this class could hold a grid of (r, mu) or (k, mu) bins for a correlation function or power spectrum measurement.

It is modeled after the syntax of xarray.Dataset, and is designed to hold correlation function or power spectrum results (in 1D or 2D)

Parameters:
  • dims (list, (Ndim,)) – A list of strings specifying names for the coordinate dimensions. The dimension names stored in dims have the suffix ‘cen’ added, to indicate that the coordinate grid is defined at the bin centers
  • edges (list, (Ndim,)) – A list specifying the bin edges for each dimension
  • data (array_like) – a structured array holding the data variables, where the named fields interpreted as the variable names. The variable names are stored in variables
  • fields_to_sum (list, optional) – the name of fields that will be summed when reindexing, instead of averaging
  • **kwargs – Any additional keywords are saved as metadata in the attrs dictionary attribute

Examples

The following example shows how to read a power spectrum measurement from a JSON file, as output by nbodykit, assuming the JSON file holds a dictionary with a ‘power’ entry holding the relevant data

>>> filename = 'test_data.json'
>>> pk = BinnedStatistic.from_json(['k'], filename, 'power')

In older versions of nbodykit, results were written using plaintext ASCII files. Although now deprecated, this type of files can be read using:

>>> filename = 'test_data.dat'
>>> dset = BinnedStatistic.from_plaintext(['k'], filename)

Data variables can be accessed in a dict-like fashion:

>>> power = pkmu['power'] # returns power data variable

Array-like indexing of a BinnedStatistic returns a new BinnedStatistic holding the sliced data:

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu[:,0] # select first mu column
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

Additional data variables can be added to the BinnedStatistic via:

>>> modes = numpy.ones((200, 5))
>>> pkmu['modes'] = modes

Coordinate-based indexing is possible through sel():

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>

squeeze() will explicitly squeeze the specified dimension (of length one) such that the resulting instance has one less dimension:

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze(dim='mu') # can also just call pkmu.squeeze()
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>

average() returns a new BinnedStatistic holding the data averaged over one dimension

reindex() will re-bin the coordinate arrays along the specified dimension

Attributes:
shape

The shape of the coordinate grid

variables

Alias to return the names of the variables stored in data

Methods

average(dim, **kwargs) Compute the average of each variable over the specified dimension.
copy([cls]) Returns a copy of the BinnedStatistic, optionally change the type to cls.
from_json(filename[, key, dims, edges]) Initialize a BinnedStatistic from a JSON file.
from_plaintext(dims, filename, **kwargs) Initialize a BinnedStatistic from a plaintext file
reindex(dim, spacing[, weights, force, …]) Reindex the dimension dim by averaging over multiple coordinate bins, optionally weighting by weights.
rename_variable(old_name, new_name) Rename a variable in data from old_name to new_name.
sel([method]) Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).
squeeze([dim]) Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.
take(*masks, **indices) Take a subset of a BinnedStatistic from given list of indices.
to_json(filename) Write a BinnedStatistic from a JSON file.
from_state  
classmethod __construct_direct__(data, mask, **kwargs)[source]

Shortcut around __init__ for internal use to construct and return a new class instance. The returned object should be identical to that returned by __init__.

Notes

  • Useful for returning new instances with sliced data/mask
  • The keyword arguments required to create a full, unbroken instance are dims, coords, edges, and attrs
Parameters:data
__copy_attrs__()[source]

Return a copy of all necessary attributes associated with the BinnedStatistic. This dictionary + data and mask are all that’s required to reconstruct a new class

__finalize__(data, mask, indices)[source]

Finalize and return a new instance from a slice of the current object (returns a copy)

__getitem__(key)[source]

Index- or string- based indexing

Notes

  • If a single string is passed, the key is intrepreted as a variable or coordinate, and the corresponding array is returned
  • If a list of strings is passed, then a new BinnedStatistic holding only the variable names in key is returned
  • Integer-based indexing or slices similar to numpy indexing will slice data, returning a new BinnedStatistic holding the newly sliced data and coordinate grid
  • Scalar indexes (i.e., integers) used to index a certain dimension will “squeeze” that dimension, removing it from the coordinate grid
__setitem__(key, data)[source]

Add a new variable with the name key to the class using data

__slice_edges__(indices)[source]

Internal function to slice the edges attribute with the specified indices, which specify the included coordinate bins

average(dim, **kwargs)[source]

Compute the average of each variable over the specified dimension.

Parameters:
Returns:

averaged – A new BinnedStatistic, with data averaged along one dimension, which reduces the number of dimension by one

Return type:

BinnedStatistic

copy(cls=None)[source]

Returns a copy of the BinnedStatistic, optionally change the type to cls. cls must be a subclass of BinnedStatistic.

classmethod from_json(filename, key='data', dims=None, edges=None, **kwargs)[source]

Initialize a BinnedStatistic from a JSON file.

The JSON file should contain a dictionary, where the data to load is stored as the key entry, with an edges entry specifying bin edges, and optionally, a attrs entry giving a dict of meta-data

Note

This uses nbodykit.utils.JSONDecoder to load the JSON file

Parameters:
  • filename (str) – the name of the file to load
  • key (str, optional) – the name of the key in the JSON file holding the data to load
  • dims (list, optional) – list of names specifying the dimensions, i.e., ['k'] or ['k', 'mu']; must be supplied if not given in the JSON file
Returns:

dset – the BinnedStatistic holding the data from file

Return type:

BinnedStatistic

classmethod from_plaintext(dims, filename, **kwargs)[source]

Initialize a BinnedStatistic from a plaintext file

Note

Deprecated in nbodykit 0.2.x Storage of BinnedStatistic objects as plaintext ASCII files is no longer supported; See BinnedStatistic.from_json()

Parameters:
  • dims (list) – list of names specifying the dimensions, i.e., ['k'] or ['k', 'mu']
  • filename (str) – the name of the file to load
Returns:

dset – the BinnedStatistic holding the data from file

Return type:

BinnedStatistic

reindex(dim, spacing, weights=None, force=True, return_spacing=False, fields_to_sum=[])[source]

Reindex the dimension dim by averaging over multiple coordinate bins, optionally weighting by weights.

Returns a new BinnedStatistic holding the re-binned data.

Notes

  • We can only re-bin to an integral factor of the current dimension size in order to inaccuracies when re-binning to overlapping bins
  • Variables specified in fields_to_sum will be summed when re-indexing, instead of averaging
Parameters:
  • dim (str) – The name of the dimension to average over
  • spacing (float) – The desired spacing for the re-binned data. If force = True, the spacing used will be the closest value to this value, such that the new bins are N times larger, when N is an integer
  • weights (array_like or str, optional (None)) – An array to weight the data by before re-binning, or if a string is provided, the name of a data column to use as weights
  • force (bool, optional) – If True, force the spacing to be a value such that the new bins are N times larger, when N is an integer, otherwise, raise an exception. Default is True
  • return_spacing (bool, optional) – If True, return the new spacing as the second return value. Default is False.
  • fields_to_sum (list) – the name of fields that will be summed when reindexing, instead of averaging
Returns:

  • rebinned (BinnedStatistic) – A new BinnedStatistic instance, which holds the rebinned coordinate grid and data variables
  • spacing (float, optional) – If return_spacing is True, the new coordinate spacing will be returned

rename_variable(old_name, new_name)[source]

Rename a variable in data from old_name to new_name.

Note that this procedure is performed in-place (does not return a new BinnedStatistic)

Parameters:
  • old_name (str) – the name of the old varibale to rename
  • new_name (str) – the desired new variable name
Raises:

ValueError – If old_name is not present in variables

sel(method=None, **indexers)[source]

Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).

Notes

Scalar values used to index a specific dimension will result in that dimension being squeezed. To keep a dimension of unit length, use a list to index (see examples below).

Parameters:
  • method ({None, 'nearest'}) – The method to use for inexact matches; if set to None, require an exact coordinate match, otherwise match the nearest coordinate
  • **indexers – the pairs of dimension name and coordinate value used to index the BinnedStatistic
Returns:

sliced – a new BinnedStatistic holding the sliced data and coordinate grid

Return type:

BinnedStatistic

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=0.4)
<BinnedStatistic: dims: (mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=[0.4])
<BinnedStatistic: dims: (k: 1, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5)
<BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>
shape

The shape of the coordinate grid

squeeze(dim=None)[source]

Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.

The behavior is similar to that of numpy.squeeze().

Parameters:dim (str, optional) – The name of the dimension to squeeze. If no dimension is provided, then the one dimension with unit length will be squeezed
Returns:squeezed – a new BinnedStatistic instance, squeezed along one dimension
Return type:BinnedStatistic
Raises:ValueError – If the specified dimension does not have length one, or no dimension is specified and multiple dimensions have length one

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')>
>>> pkmu.squeeze() # squeeze the mu dimension
<BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>
take(*masks, **indices)[source]

Take a subset of a BinnedStatistic from given list of indices. This is more powerful but more verbose than sel. Also the result is never squeezed, even if only a single item along the direction is used.

Parameters:
  • masks (array_like (boolean)) – a list of masks that are of the same shape as the data.
  • indices (dict (string : array_like)) – mapping from axes (by name, dim) to items to select (list/array_like). Each item is a valid selector for numpy’s fancy indexing.
Returns:

Return type:

new BinnedStatistic, where only items selected by all axes are kept.

Examples

>>> pkmu
<BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>

# similar to pkmu.sel(k > 0.4), select the bin centers >>> pkmu.take(k=pkmu.coords[‘k’] > 0.4) <BinnedStatistic: dims: (mu: 5), variables: (‘mu’, ‘k’, ‘power’)>

# also similar to pkmu.sel(k > 0.4), select the bin averages >>> pkmu.take(pkmu[‘k’] > 0.4) <BinnedStatistic: dims: (k: 30), variables: (‘mu’, ‘k’, ‘power’)>

# impossible with sel. >>> pkmu.take(pkmu[‘modes’] > 0)

to_json(filename)[source]

Write a BinnedStatistic from a JSON file.

Note

This uses nbodykit.utils.JSONEncoder to write the JSON file

Parameters:filename (str) – the name of the file to write
variables

Alias to return the names of the variables stored in data

nbodykit.binned_statistic.bin_ndarray(ndarray, new_shape, weights=None, operation=<function mean>)[source]

Bins an ndarray in all axes based on the target shape, by summing or averaging.

Parameters:
  • ndarray (array_like) – the input array to re-bin
  • new_shape (tuple) – the tuple holding the desired new shape
  • weights (array_like, optional) – weights to multiply the input array by, before running the re-binning operation,

Notes

Examples

>>> m = numpy.arange(0,100,1).reshape((10,10))
>>> n = bin_ndarray(m, new_shape=(5,5), operation=numpy.sum)
>>> print(n)
[[ 22  30  38  46  54]
 [102 110 118 126 134]
 [182 190 198 206 214]
 [262 270 278 286 294]
 [342 350 358 366 374]]