nbodykit.binned_statistic¶
Functions
|
Bins an ndarray in all axes based on the target shape, by summing or averaging. |
Classes
|
Lightweight class to hold statistics binned at fixed coordinates. |
- class nbodykit.binned_statistic.BinnedStatistic(dims, edges, data, fields_to_sum=[], coords=None, **kwargs)[source]¶
Lightweight class to hold statistics binned at fixed coordinates.
For example, this class could hold a grid of (r, mu) or (k, mu) bins for a correlation function or power spectrum measurement.
It is modeled after the syntax of
xarray.Dataset
, and is designed to hold correlation function or power spectrum results (in 1D or 2D)- Parameters
dims (list, (Ndim,)) – A list of strings specifying names for the coordinate dimensions. The dimension names stored in
dims
have the suffix ‘cen’ added, to indicate that the coordinate grid is defined at the bin centersedges (list, (Ndim,)) – A list specifying the bin edges for each dimension
data (array_like) – a structured array holding the data variables, where the named fields interpreted as the variable names. The variable names are stored in
variables
fields_to_sum (list, optional) – the name of fields that will be summed when reindexing, instead of averaging
**kwargs – Any additional keywords are saved as metadata in the
attrs
dictionary attribute
Examples
The following example shows how to read a power spectrum measurement from a JSON file, as output by nbodykit, assuming the JSON file holds a dictionary with a ‘power’ entry holding the relevant data
>>> filename = 'test_data.json' >>> pk = BinnedStatistic.from_json(['k'], filename, 'power')
In older versions of nbodykit, results were written using plaintext ASCII files. Although now deprecated, this type of files can be read using:
>>> filename = 'test_data.dat' >>> dset = BinnedStatistic.from_plaintext(['k'], filename)
Data variables can be accessed in a dict-like fashion:
>>> power = pkmu['power'] # returns power data variable
Array-like indexing of a
BinnedStatistic
returns a newBinnedStatistic
holding the sliced data:>>> pkmu <BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')> >>> pkmu[:,0] # select first mu column <BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>
Additional data variables can be added to the
BinnedStatistic
via:>>> modes = numpy.ones((200, 5)) >>> pkmu['modes'] = modes
Coordinate-based indexing is possible through
sel()
:>>> pkmu <BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')> >>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5) <BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>
squeeze()
will explicitly squeeze the specified dimension (of length one) such that the resulting instance has one less dimension:>>> pkmu <BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')> >>> pkmu.squeeze(dim='mu') # can also just call pkmu.squeeze() <BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>
average()
returns a newBinnedStatistic
holding the data averaged over one dimensionreindex()
will re-bin the coordinate arrays along the specified dimension- Attributes
Methods
average
(dim, **kwargs)Compute the average of each variable over the specified dimension.
copy
([cls])Returns a copy of the BinnedStatistic, optionally change the type to cls.
from_json
(filename[, key, dims, edges])Initialize a BinnedStatistic from a JSON file.
from_plaintext
(dims, filename, **kwargs)Initialize a BinnedStatistic from a plaintext file
reindex
(dim, spacing[, weights, force, ...])Reindex the dimension
dim
by averaging over multiple coordinate bins, optionally weighting byweights
.rename_variable
(old_name, new_name)Rename a variable in
data
fromold_name
tonew_name
.sel
([method])Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).
squeeze
([dim])Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.
take
(*masks, **indices)Take a subset of a BinnedStatistic from given list of indices.
to_json
(filename)Write a BinnedStatistic from a JSON file.
from_state
- classmethod __construct_direct__(data, mask, **kwargs)[source]¶
Shortcut around __init__ for internal use to construct and return a new class instance. The returned object should be identical to that returned by __init__.
Notes
Useful for returning new instances with sliced data/mask
The keyword arguments required to create a full, unbroken instance are dims, coords, edges, and attrs
- Parameters
data –
- __copy_attrs__()[source]¶
Return a copy of all necessary attributes associated with the BinnedStatistic. This dictionary + data and mask are all that’s required to reconstruct a new class
- __finalize__(data, mask, indices)[source]¶
Finalize and return a new instance from a slice of the current object (returns a copy)
- __getitem__(key)[source]¶
Index- or string- based indexing
Notes
If a single string is passed, the key is intrepreted as a variable or coordinate, and the corresponding array is returned
If a list of strings is passed, then a new BinnedStatistic holding only the variable names in key is returned
Integer-based indexing or slices similar to numpy indexing will slice data, returning a new BinnedStatistic holding the newly sliced data and coordinate grid
Scalar indexes (i.e., integers) used to index a certain dimension will “squeeze” that dimension, removing it from the coordinate grid
- __slice_edges__(indices)[source]¶
Internal function to slice the edges attribute with the specified indices, which specify the included coordinate bins
- average(dim, **kwargs)[source]¶
Compute the average of each variable over the specified dimension.
- Parameters
dim (str) – The name of the dimension to average over
**kwargs – Additional keywords to pass to
BinnedStatistic.reindex()
. See the documentation forBinnedStatistic.reindex()
for valid keywords.
- Returns
averaged – A new BinnedStatistic, with data averaged along one dimension, which reduces the number of dimension by one
- Return type
- copy(cls=None)[source]¶
Returns a copy of the BinnedStatistic, optionally change the type to cls. cls must be a subclass of BinnedStatistic.
- classmethod from_json(filename, key='data', dims=None, edges=None, **kwargs)[source]¶
Initialize a BinnedStatistic from a JSON file.
The JSON file should contain a dictionary, where the data to load is stored as the
key
entry, with anedges
entry specifying bin edges, and optionally, aattrs
entry giving a dict of meta-dataNote
This uses
nbodykit.utils.JSONDecoder
to load the JSON file- Parameters
- Returns
dset – the BinnedStatistic holding the data from file
- Return type
- classmethod from_plaintext(dims, filename, **kwargs)[source]¶
Initialize a BinnedStatistic from a plaintext file
Note
Deprecated in nbodykit 0.2.x Storage of BinnedStatistic objects as plaintext ASCII files is no longer supported; See
BinnedStatistic.from_json()
- Parameters
dims (list) – list of names specifying the dimensions, i.e.,
['k']
or['k', 'mu']
filename (str) – the name of the file to load
- Returns
dset – the BinnedStatistic holding the data from file
- Return type
- reindex(dim, spacing, weights=None, force=True, return_spacing=False, fields_to_sum=[])[source]¶
Reindex the dimension
dim
by averaging over multiple coordinate bins, optionally weighting byweights
.Returns a new BinnedStatistic holding the re-binned data.
Notes
We can only re-bin to an integral factor of the current dimension size in order to inaccuracies when re-binning to overlapping bins
Variables specified in fields_to_sum will be summed when re-indexing, instead of averaging
- Parameters
dim (str) – The name of the dimension to average over
spacing (float) – The desired spacing for the re-binned data. If force = True, the spacing used will be the closest value to this value, such that the new bins are N times larger, when N is an integer
weights (array_like or str, optional (None)) – An array to weight the data by before re-binning, or if a string is provided, the name of a data column to use as weights
force (bool, optional) – If True, force the spacing to be a value such that the new bins are N times larger, when N is an integer, otherwise, raise an exception. Default is True
return_spacing (bool, optional) – If True, return the new spacing as the second return value. Default is False.
fields_to_sum (list) – the name of fields that will be summed when reindexing, instead of averaging
- Returns
rebinned (BinnedStatistic) – A new BinnedStatistic instance, which holds the rebinned coordinate grid and data variables
spacing (float, optional) – If return_spacing is True, the new coordinate spacing will be returned
- rename_variable(old_name, new_name)[source]¶
Rename a variable in
data
fromold_name
tonew_name
.Note that this procedure is performed in-place (does not return a new BinnedStatistic)
- sel(method=None, **indexers)[source]¶
Return a new BinnedStatistic indexed by coordinate values along the specified dimension(s).
Notes
Scalar values used to index a specific dimension will result in that dimension being squeezed. To keep a dimension of unit length, use a list to index (see examples below).
- Parameters
method ({None, 'nearest'}) – The method to use for inexact matches; if set to None, require an exact coordinate match, otherwise match the nearest coordinate
**indexers – the pairs of dimension name and coordinate value used to index the BinnedStatistic
- Returns
sliced – a new BinnedStatistic holding the sliced data and coordinate grid
- Return type
Examples
>>> pkmu <BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=0.4) <BinnedStatistic: dims: (mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=[0.4]) <BinnedStatistic: dims: (k: 1, mu: 5), variables: ('mu', 'k', 'power')>
>>> pkmu.sel(k=slice(0.1, 0.4), mu=0.5) <BinnedStatistic: dims: (k: 30), variables: ('mu', 'k', 'power')>
- property shape¶
The shape of the coordinate grid
- squeeze(dim=None)[source]¶
Squeeze the BinnedStatistic along the specified dimension, which removes that dimension from the BinnedStatistic.
The behavior is similar to that of
numpy.squeeze()
.- Parameters
dim (str, optional) – The name of the dimension to squeeze. If no dimension is provided, then the one dimension with unit length will be squeezed
- Returns
squeezed – a new BinnedStatistic instance, squeezed along one dimension
- Return type
- Raises
ValueError – If the specified dimension does not have length one, or no dimension is specified and multiple dimensions have length one
Examples
>>> pkmu <BinnedStatistic: dims: (k: 200, mu: 1), variables: ('mu', 'k', 'power')> >>> pkmu.squeeze() # squeeze the mu dimension <BinnedStatistic: dims: (k: 200), variables: ('mu', 'k', 'power')>
- take(*masks, **indices)[source]¶
Take a subset of a BinnedStatistic from given list of indices. This is more powerful but more verbose than sel. Also the result is never squeezed, even if only a single item along the direction is used.
- Parameters
masks (array_like (boolean)) – a list of masks that are of the same shape as the data.
indices (dict (string : array_like)) – mapping from axes (by name, dim) to items to select (list/array_like). Each item is a valid selector for numpy’s fancy indexing.
- Return type
new BinnedStatistic, where only items selected by all axes are kept.
Examples
>>> pkmu <BinnedStatistic: dims: (k: 200, mu: 5), variables: ('mu', 'k', 'power')>
# similar to pkmu.sel(k > 0.4), select the bin centers >>> pkmu.take(k=pkmu.coords[‘k’] > 0.4) <BinnedStatistic: dims: (mu: 5), variables: (‘mu’, ‘k’, ‘power’)>
# also similar to pkmu.sel(k > 0.4), select the bin averages >>> pkmu.take(pkmu[‘k’] > 0.4) <BinnedStatistic: dims: (k: 30), variables: (‘mu’, ‘k’, ‘power’)>
# impossible with sel. >>> pkmu.take(pkmu[‘modes’] > 0)
- to_json(filename)[source]¶
Write a BinnedStatistic from a JSON file.
Note
This uses
nbodykit.utils.JSONEncoder
to write the JSON file- Parameters
filename (str) – the name of the file to write
- property variables¶
Alias to return the names of the variables stored in data
- nbodykit.binned_statistic.bin_ndarray(ndarray, new_shape, weights=None, operation=<function mean>)[source]¶
Bins an ndarray in all axes based on the target shape, by summing or averaging.
- Parameters
ndarray (array_like) – the input array to re-bin
new_shape (tuple) – the tuple holding the desired new shape
weights (array_like, optional) – weights to multiply the input array by, before running the re-binning operation,
Notes
Dimensions in new_shape must be integral factor smaller than the old shape
Number of output dimensions must match number of input dimensions.
Examples
>>> m = numpy.arange(0,100,1).reshape((10,10)) >>> n = bin_ndarray(m, new_shape=(5,5), operation=numpy.sum) >>> print(n) [[ 22 30 38 46 54] [102 110 118 126 134] [182 190 198 206 214] [262 270 278 286 294] [342 350 358 366 374]]