Dealing with Discrete Data¶

The main interface for dealing with data in the form of catalogs of discrete objects is provided by subclasses of the nbodykit.base.catalog.CatalogSource object. In this section, we provide an overview of this class and note important things to know.

What is a CatalogSource?
Use Cases
Requirements
- A well-defined size
- The Position column
Default Columns
Storing Meta-data
API

What is a `CatalogSource`?¶

Most often the user starts with a catalog of discrete objects, with a set of fields describing each object, such as the position coordinates, velocity, mass, etc. Given this input data, the user wishes to use nbodykit to perform a task, i.e., computing the power spectrum or grouping together objects with a friends-of-friends algorithm. To achieve these goals, nbodykit provides the nbodykit.base.catalog.CatalogSource base class.

The CatalogSource object behaves much like a numpy structured array, where the fields of the array are referred to as “columns”. These columns store the information about the objects in the catalog; common columns are “Position”, “Velocity”, “Mass”, etc. A list of the column names that are valid for a given catalog can be accessed via the CatalogSource.columns attribute.

Use Cases¶

The CatalogSource is an abstract base class – it cannot be directly initialized. Instead, nbodykit includes several specialized catalog subclasses of CatalogSource in the nbodykit.source.catalog module. In general, these subclasses fall into two categories:

Reading data from disk (see Reading Catalogs from Disk)
Generating mock data at run time (see Generating Catalogs of Mock Data)

Requirements¶

A well-defined size¶

The only requirement to initialize a CatalogSource is that the object has a well-defined size. Information about the length of a CatalogSource is stored in two attributes:

CatalogSource.size : the local size of the catalog, equal to the number of objects in the catalog on the local rank
CatalogSource.csize : the collective, global size of the catalog, equal to the sum of size across all MPI ranks

So, the user can think of a CatalogSource object as storing information for a total of csize objects, which is divided amongst the available MPI ranks such that each process only stores information about size objects.

The `Position` column¶

All CatalogSource objects must include the Position column, which should be a (N,3) array giving the Cartesian position of each of the N objects in the catalog.

Often, the user will have the Cartesian coordinates stored as separate columns or have the object coordinates in terms of right ascension, declination, and redshift. See Common Data Operations for more details about how to construct the Position column for these cases.

Default Columns¶

All CatalogSource objects include several default columns. These columns are used broadly throughout nbodykit and can be summarized as follows:

Name	Description	Default Value
`Weight`	The weight to use for each particle when interpolating a `CatalogSource` on to a mesh. The mesh field is a weighted average of `Value`, with the weights given by `Weight`.	1.0
`Value`	When interpolating a `CatalogSource` on to a mesh, the value of this array is used as the field value that each particle contributes to a given mesh cell. The mesh field is a weighted average of `Value`, with the weights given by `Weight`. For example, the `Value` column could represent `Velocity`, in which case the field painted to the mesh will be momentum (mass-weighted velocity).	1.0
`Selection`	A boolean column that selects a subset slice of the `CatalogSource`. When converting a `CatalogSource` to a mesh object, only the objects where the `Selection` column is `True` will be painted to the mesh.	`True`

Storing Meta-data¶

For all CatalogSource objects, the input parameters and additional meta-data are stored in the attrs dictionary attribute.

API¶

For more information about specific catalog objects, please see the API section.