nbodykit.source.catalog.species¶

Functions

`check_species_metadata`(name, attrs, species)	Check to see if there is a single value for `name` in the meta-data of all the species
`split_column`(col, species)	Split the column name of the form 'species/name'

Classes

MultipleSpeciesCatalog(names, *species, **kwargs)

A CatalogSource interface for handling multiples species of particles.

class nbodykit.source.catalog.species.MultipleSpeciesCatalog(names, *species, **kwargs)[source]¶

A CatalogSource interface for handling multiples species of particles.

This CatalogSource stores a copy of the original CatalogSource objects for each species, providing access to the columns via the format species/ where “species” is one of the species names provided.

Parameters

names (list of str) – list of strings specifying the names of the various species; data columns are prefixed with “species/” where “species” is in names
*species (two or more CatalogSource objects) – catalogs to be combined into a single catalog, which give the data for different species of particles; as many catalogs as names must be provided

Examples

Initialization:

>>> data = UniformCatalog(nbar=3e-5, BoxSize=512., seed=42)
>>> randoms = UniformCatalog(nbar=3e-5, BoxSize=512., seed=84)
>>> cat = MultipleSpeciesCatalog(['data', 'randoms'], data, randoms)

Accessing the Catalogs for individual species:

>>> data = cat["data"] # a copy of the original "data" object

Accessing individual columns:

>>> data_pos = cat["data/Position"]

Setting new columns:

>>> cat["data"]["new_column"] = 1.0
>>> assert "data/new_column" in cat

Attributes

attrs: A dictionary storing relevant meta-data about the CatalogSource.
columns: Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.
hardcolumns: Hardcolumn of the form species/name
species: List of species names

Methods

`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Construct and return a hard-coded column.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output[, columns, dataset, datasets, ...])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, ...])	Convert the catalog to a mesh, which knows how to "paint" the the combined density field, summed over all particle species.
`to_subvolumes`([domain, position, columns])	Domain Decompose a catalog, sending items to the ranks according to the supplied domain object.
`view`([type])	Return a "view" of the CatalogSource object, with the returned type set by `type`.

create_instance

__delitem__(col)[source]¶: Delete a column of the form species/column

__finalize__(other)¶

Finalize the creation of a CatalogSource object by copying over any additional attributes from a second CatalogSource.

The idea here is to only copy over attributes that are similar to meta-data, so we do not copy some of the core attributes of the CatalogSource object.

Parameters: other – the second object to copy over attributes from; it needs to be a subclass of CatalogSourcBase for attributes to be copied
Returns: return self, with the added attributes
Return type: CatalogSource

__getitem__(key)[source]¶

This provides access to the underlying data in two ways:

The CatalogSource object for a species can be accessed if key is a species name.
Individual columns for a species can be accessed using the format: species/column.

__setitem__(col, value)[source]¶: Add columns to any of the species catalogs.

Note

New column names should be prefixed by ‘species/’ where ‘species’ is a name in the species attribute.

property attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

property columns¶: Columns for individual species can be accessed using a species/ prefix and the column name, i.e., data/Position.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

. note::: If the base attribute is set, compute() will called using base instead of self.

Parameters: args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

copy()¶

Return a shallow copy of the object, where each column is a reference of the corresponding column in self.

Note

No copy of data is made.

Note

This is different from view in that the attributes dictionary of the copy no longer related to self.

Returns: a new CatalogSource that holds all of the data columns of self
Return type: CatalogSource

get_hardcolumn(col)¶

Construct and return a hard-coded column.

These are usually produced by calling member functions marked by the @column decorator.

Subclasses may override this method and the hardcolumns attribute to bypass the decorator logic.

Note

If the base attribute is set, get_hardcolumn() will called using base instead of self.

property hardcolumns¶: Hardcolumn of the form species/name

static make_column(array)¶

Utility function to convert an array-like object to a dask.array.Array.

Note

The dask array chunk size is controlled via the dask_chunk_size global option. See set_options.

Parameters: array (array_like) – an array-like object; can be a dask array, numpy array, ColumnAccessor, or other non-scalar array-like object
Returns: a dask array initialized from array
Return type: dask.array.Array

read(columns)¶

Return the requested columns as dask arrays.

Parameters: columns (list of str) – the names of the requested columns
Returns: the list of column data, in the form of dask arrays
Return type: list of dask.array.Array

save(output, columns=None, dataset=None, datasets=None, header='Header', compute=True)¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters

output (str) – the name of the file to write to
columns (list of str) – the names of the columns to save in the file, or None to use all columns
dataset (str, optional) – dataset to store the columns under.
datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column (deprecated)
header (str, optional, or None) – the name of the data set holding the header information, where attrs is stored if header is None, do not save the header.
compute (boolean, default True) – if True, wait till the store operations finish if False, return a dictionary with column name and a future object for the store. use dask.compute() to wait for the store operations on the result.

property species¶: List of species names

to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, resampler='cic', weight='Weight', value='Value', selection='Selection', position='Position', window=None)[source]¶

Convert the catalog to a mesh, which knows how to “paint” the the combined density field, summed over all particle species.

Parameters

Nmesh (int, 3-vector, optional) – the number of cells per box side; can be inferred from attrs if the value is the same for all species
BoxSize (float, 3-vector, optional) – the size of the box; can be inferred from attrs if the value is the same for all species
dtype (str, dtype, optional) – the data type of the mesh when painting
interlaced (bool, optional) – whether to use interlacing to reduce aliasing when painting the particles on the mesh
compensated (bool, optional) – whether to apply a Fourier-space transfer function to account for the effects of the gridding + aliasing
resampler (str, optional) – the string name of the resampler to use when interpolating the
weight (str, optional) – the name of the column specifying the weight for each particle
selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take
value (str, optional) – the name of the column specifying the field value for each particle
position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
window (str, optional) – the string name of the window to use when interpolating (deprecated, use resampler)

to_subvolumes(domain=None, position='Position', columns=None)¶

Domain Decompose a catalog, sending items to the ranks according to the supplied domain object. Using the position column as the Position.

This will read in the full position array and all of the requested columns.

Parameters

domain (pmesh.domain.GridND object, or None) – The domain to distribute the catalog. If None, try to evenly divide spatially. An easiest way to find a domain object is to use pm.domain, where pm is a pmesh.pm.ParticleMesh object.
position (string_like) – column to use to compute the position.
columns (list of string_like) – columns to include in the new catalog, if not supplied, all catalogs will be exchanged.

Returns

A decomposed catalog source, where each rank only contains objects belongs to the rank as claimed by the domain object.

self.attrs are carried over as a shallow copy to the returned object.

Return type

CatalogSource

view(type=None)¶

Return a “view” of the CatalogSource object, with the returned type set by type.

This initializes a new empty class of type type and attaches attributes to it via the __finalize__() mechanism.

Parameters: type (Python type) – the desired class type of the returned object.

nbodykit.source.catalog.species.check_species_metadata(name, attrs, species)[source]¶: Check to see if there is a single value for name in the meta-data of all the species

nbodykit.source.catalog.species.split_column(col, species)[source]¶: Split the column name of the form ‘species/name’