Here, we detail some of the most common operations when dealing with
data in the form of a CatalogSource
. The native format for data columns
in a CatalogSource
object is the dask array. Be sure to read
the previous section for an introduction to dask arrays
before proceeding.
The dask array format allows users to easily manipulate columns in their input data and feed any transformed data into one of the nbodykit algorithms. This provides a fast and easy way to transform the data while hiding the implementation details needed to compute these transformations internally. In this section, we’ll provide examples of some of these data transformations to get users acclimated to dask arrays quickly.
To help illustrate these operations, we’ll initialize the nbodykit “lab” and load a catalog of uniformly distributed objects.
In [1]: from nbodykit.lab import *
In [2]: cat = UniformCatalog(nbar=100, BoxSize=1.0, seed=42)
Specific columns can be accessed by indexing the catalog object using the
column name, and a dask.array.Array
object is returned (see
What is a dask array? for more details on dask arrays).
In [3]: position = cat['Position']
In [4]: velocity = cat['Velocity']
In [5]: print(position)
dask.array<array, shape=(96, 3), dtype=float64, chunksize=(96, 3)> first : [ 0.45470105 0.83263203 0.06905134] last: [ 0.62474599 0.15388738 0.84302209]
In [6]: print(velocity)