Dealing with Discrete Data¶
The main interface for dealing with data in the form of catalogs of discrete
objects is provided by subclasses of the
nbodykit.base.catalog.CatalogSource
object.
In this section, we provide an overview of this class and note important
things to know.
What is a CatalogSource
?¶
Most often the user starts with a catalog of discrete objects, with a set of
fields describing each object, such as the position coordinates, velocity,
mass, etc. Given this input data, the user wishes to use nbodykit to perform a
task, i.e., computing the power spectrum or grouping together objects with a
friends-of-friends algorithm. To achieve these goals, nbodykit provides the
nbodykit.base.catalog.CatalogSource
base class.
The CatalogSource
object behaves much like a
numpy structured array,
where the fields of the array are referred to as “columns”. These columns store
the information about the objects in the catalog; common columns are
“Position”, “Velocity”, “Mass”, etc. A list of the column names
that are valid for a given catalog can be accessed via the
CatalogSource.columns
attribute.
Use Cases¶
The CatalogSource
is an abstract base class – it cannot be directly
initialized. Instead, nbodykit includes several specialized catalog subclasses
of CatalogSource
in the nbodykit.source.catalog
module. In
general, these subclasses fall into two categories:
Reading data from disk (see Reading Catalogs from Disk)
Generating mock data at run time (see Generating Catalogs of Mock Data)
Requirements¶
A well-defined size¶
The only requirement to initialize a CatalogSource
is that the object
has a well-defined size. Information about the length of a CatalogSource
is stored in two attributes:
CatalogSource.size
: the local size of the catalog, equal to the number of objects in the catalog on the local rankCatalogSource.csize
: the collective, global size of the catalog, equal to the sum ofsize
across all MPI ranks
So, the user can think of a CatalogSource
object as storing
information for a total of csize
objects, which is
divided amongst the available MPI ranks such that each process only stores
information about size
objects.
The Position
column¶
All CatalogSource
objects must include the Position
column, which
should be a (N,3)
array giving the Cartesian position of each of the N
objects in the catalog.
Often, the user will have the Cartesian coordinates
stored as separate columns or have the object coordinates in terms of
right ascension, declination, and redshift. See Common Data Operations
for more details about how to construct the Position
column for
these cases.
Default Columns¶
All CatalogSource
objects include several default columns.
These columns are used broadly throughout nbodykit and can be summarized as
follows:
Name |
Description |
Default Value |
|
The weight to use for each particle when interpolating a |
1.0 |
|
When interpolating a |
1.0 |
|
A boolean column that selects a subset slice of the |
|
Storing Meta-data¶
For all CatalogSource
objects, the input parameters and additional
meta-data are stored in the attrs
dictionary attribute.
API¶
For more information about specific catalog objects, please see the API section.