nbodykit.source.catalog.file¶

Functions

FileCatalogFactory(name, filetype[, examples]) Factory method to create a CatalogSource that uses a subclass of nbodykit.io.base.FileType to read data from disk.

Classes

`BigFileCatalog`(args, *kwargs)	A CatalogSource that uses `BigFile` to read data from disk.
`BinaryCatalog`(args, *kwargs)	A CatalogSource that uses `BinaryFile` to read data from disk.
`CSVCatalog`(args, *kwargs)	A CatalogSource that uses `CSVFile` to read data from disk.
`FITSCatalog`(args, *kwargs)	A CatalogSource that uses `FITSFile` to read data from disk.
`FileCatalogBase`(filetype[, args, kwargs, …])	Base class to create a source of particles from a single file, or multiple files, on disk.
`Gadget1Catalog`(args, *kwargs)	A CatalogSource that uses `Gadget1File` to read data from disk.
`HDFCatalog`(args, *kwargs)	A CatalogSource that uses `HDFFile` to read data from disk.
`TPMBinaryCatalog`(args, *kwargs)	A CatalogSource that uses `TPMBinaryFile` to read data from disk.

nbodykit.source.catalog.file.FileCatalogFactory(name, filetype, examples=None)[source]¶

Factory method to create a CatalogSource that uses a subclass of nbodykit.io.base.FileType to read data from disk.

Parameters:	name (str) – the name of the catalog class to create filetype (subclass of `nbodykit.io.base.FileType`) – the subclass of the FileType that reads a specific type of data examples (str, optional) – if given, a documentation cross-reference link where examples can be found
Returns:	the `CatalogSource` object that reads data using `filetype`
Return type:	subclass of `FileCatalogBase`

class nbodykit.source.catalog.file.FileCatalogBase(filetype, args=(), kwargs={}, comm=None, use_cache=False)[source]¶

Base class to create a source of particles from a single file, or multiple files, on disk.

Files of a specific type should be subclasses of this class.

Parameters:

Parameters:	filetype (subclass of `FileType`) – the file-like class used to load the data from file; should be a subclass of `nbodykit.io.base.FileType` args (tuple, optional) – the arguments to pass to the `filetype` class when constructing each file object kwargs (dict, optional) – the keyword arguments to pass to the `filetype` class when constructing each file object comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False`

filetype (subclass of FileType) – the file-like class used to load the data from file; should be a subclass of nbodykit.io.base.FileType
args (tuple, optional) – the arguments to pass to the filetype class when constructing each file object
kwargs (dict, optional) – the keyword arguments to pass to the filetype class when constructing each file object
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

get_hardcolumn(col)[source]¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

class nbodykit.source.catalog.file.CSVCatalog(*args, **kwargs)¶

A CatalogSource that uses CSVFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the name of the file to load names (list of str) – the names of the columns of the csv file; this should give names of all the columns in the file – pass `usecols` to select a subset of columns blocksize (int, optional) – the file will be partitioned into blocks of bytes roughly of this size dtype (dict, str, optional) – if specified as a string, assume all columns have this dtype, otherwise; each column can have a dtype entry in the dict; if not specified, the data types will be inferred from the file usecols (list, optional) – a `pandas.read_csv`; a subset of `names` to store, ignoring all other columns delim_whitespace (bool, optional) – a `pandas.read_csv` keyword; if the CSV file is space-separated, set this to `True` config – additional keyword arguments that will be passed to `pandas.read_csv()`; see the documentation of that function for a full list of possible options comm** (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the name of the file to load
names (list of str) – the names of the columns of the csv file; this should give names of all the columns in the file – pass usecols to select a subset of columns
blocksize (int, optional) – the file will be partitioned into blocks of bytes roughly of this size
dtype (dict, str, optional) – if specified as a string, assume all columns have this dtype, otherwise; each column can have a dtype entry in the dict; if not specified, the data types will be inferred from the file
usecols (list, optional) – a pandas.read_csv; a subset of names to store, ignoring all other columns
delim_whitespace (bool, optional) – a pandas.read_csv keyword; if the CSV file is space-separated, set this to True
**config – additional keyword arguments that will be passed to pandas.read_csv(); see the documentation of that function for a full list of possible options
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

class nbodykit.source.catalog.file.BinaryCatalog(*args, **kwargs)¶

A CatalogSource that uses BinaryFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the name of the binary file to load dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a `numpy.dtype` or be able to be converted to one via a `numpy.dtype()` call offsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage header_size (int, optional) – the size of the header in bytes size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the name of the binary file to load
dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a numpy.dtype or be able to be converted to one via a numpy.dtype() call
offsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage
header_size (int, optional) – the size of the header in bytes
size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

class nbodykit.source.catalog.file.BigFileCatalog(*args, **kwargs)¶

A CatalogSource that uses BigFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the name of the directory holding the bigfile data exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header header (str, optional) – the path to the header; default is to use a column ‘Header’. It is relative to the file, not the dataset. dataset (str) – load a specific dataset from the bigfile; default is to starting from the root. comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the name of the directory holding the bigfile data
exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header
header (str, optional) – the path to the header; default is to use a column ‘Header’. It is relative to the file, not the dataset.
dataset (str) – load a specific dataset from the bigfile; default is to starting from the root.
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

class nbodykit.source.catalog.file.HDFCatalog(*args, **kwargs)¶

A CatalogSource that uses HDFFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the file path to load root (str, optional) – the start path in the HDF file, loading all data below this path exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to `root` comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the file path to load
root (str, optional) – the start path in the HDF file, loading all data below this path
exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to root
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

class nbodykit.source.catalog.file.TPMBinaryCatalog(*args, **kwargs)¶

A CatalogSource that uses TPMBinaryFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the path to the binary file to load precision ({'f4', 'f8'}, optional) – the string dtype specifying the precision comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the path to the binary file to load
precision ({'f4', 'f8'}, optional) – the string dtype specifying the precision
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

class nbodykit.source.catalog.file.Gadget1Catalog(*args, **kwargs)¶

A CatalogSource that uses Gadget1File to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the path to the binary file to load columndefs (list) – a list of triplets (columnname, element_dtype, particle_types) ptype (int) – type of particle of interest. hdtype (list, dtype) – dtype of the header; must define Massarr and Npart comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the path to the binary file to load
columndefs (list) – a list of triplets (columnname, element_dtype, particle_types)
ptype (int) – type of particle of interest.
hdtype (list, dtype) – dtype of the header; must define Massarr and Npart
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.

class nbodykit.source.catalog.file.FITSCatalog(*args, **kwargs)¶

A CatalogSource that uses FITSFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the file path to load ext (number or string, optional) – The extension. Either the numerical extension from zero or a string extension name. If not sent, data is read from the first HDU that has data. comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the file path to load
ext (number or string, optional) – The extension. Either the numerical extension from zero or a string extension name. If not sent, data is read from the first HDU that has data.
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`Index`	The attribute giving the global index rank of each particle in the list.
`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The number of objects in the CatalogSource on the local rank.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a shallow copy of the object, where each column is a reference of the corresponding column in `self`.
`get_hardcolumn`(col)	Return a column from the underlying file source.
`gslice`(start, stop[, end, redistribute])	Execute a global slice of a CatalogSource.
`make_column`(array)	Utility function to convert an array-like object to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`sort`(keys[, reverse, usecols])	Return a CatalogSource, sorted globally across all MPI ranks in ascending order by the input keys.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`view`([type])	Return a “view” of the CatalogSource object, with the returned type set by `type`.