nbodykit.source.catalog.file module¶

nbodykit.source.catalog.file.FileCatalogFactory(name, filetype, examples=None)[source]¶

Factory method to create a CatalogSource that uses a subclass of nbodykit.io.base.FileType to read data from disk.

Parameters:	name (str) – the name of the catalog class to create filetype (subclass of `nbodykit.io.base.FileType`) – the subclass of the FileType that reads a specific type of data examples (str, optional) – if given, a documentation cross-reference link where examples can be found
Returns:	the `CatalogSource` object that reads data using `filetype`
Return type:	subclass of `FileCatalogBase`

class nbodykit.source.catalog.file.FileCatalogBase(filetype, args=(), kwargs={}, comm=None, use_cache=False)[source]¶

Bases: nbodykit.base.catalog.CatalogSource

Base class to create a source of particles from a single file, or multiple files, on disk.

Files of a specific type should be subclasses of this class.

Parameters:

Parameters:	filetype (subclass of `FileType`) – the file-like class used to load the data from file; should be a subclass of `nbodykit.io.base.FileType` args (tuple, optional) – the arguments to pass to the `filetype` class when constructing each file object kwargs (dict, optional) – the keyword arguments to pass to the `filetype` class when constructing each file object comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False`

filetype (subclass of FileType) – the file-like class used to load the data from file; should be a subclass of nbodykit.io.base.FileType
args (tuple, optional) – the arguments to pass to the filetype class when constructing each file object
kwargs (dict, optional) – the keyword arguments to pass to the filetype class when constructing each file object
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)[source]¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

to_mesh(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.file.CSVCatalog(*args, **kwargs)¶

Bases: nbodykit.source.catalog.file.FileCatalogBase

A CatalogSource that uses CSVFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the name of the file to load names (list of str) – the names of the columns of the csv file; this should give names of all the columns in the file – pass `usecols` to select a subset of columns blocksize (int, optional) – the file will be partitioned into blocks of bytes roughly of this size dtype (dict, str, optional) – if specified as a string, assume all columns have this dtype, otherwise; each column can have a dtype entry in the dict; if not specified, the data types will be inferred from the file usecols (list, optional) – a `pandas.read_csv`; a subset of `names` to store, ignoring all other columns delim_whitespace (bool, optional) – a `pandas.read_csv` keyword; if the CSV file is space-separated, set this to `True` config – additional keyword arguments that will be passed to `pandas.read_csv()`; see the documentation of that function for a full list of possible options comm** (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the name of the file to load
names (list of str) – the names of the columns of the csv file; this should give names of all the columns in the file – pass usecols to select a subset of columns
blocksize (int, optional) – the file will be partitioned into blocks of bytes roughly of this size
dtype (dict, str, optional) – if specified as a string, assume all columns have this dtype, otherwise; each column can have a dtype entry in the dict; if not specified, the data types will be inferred from the file
usecols (list, optional) – a pandas.read_csv; a subset of names to store, ignoring all other columns
delim_whitespace (bool, optional) – a pandas.read_csv keyword; if the CSV file is space-separated, set this to True
**config – additional keyword arguments that will be passed to pandas.read_csv(); see the documentation of that function for a full list of possible options
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.file.BinaryCatalog(*args, **kwargs)¶

Bases: nbodykit.source.catalog.file.FileCatalogBase

A CatalogSource that uses BinaryFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the name of the binary file to load dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a `numpy.dtype` or be able to be converted to one via a `numpy.dtype()` call offsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage header_size (int, optional) – the size of the header in bytes size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the name of the binary file to load
dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a numpy.dtype or be able to be converted to one via a numpy.dtype() call
offsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage
header_size (int, optional) – the size of the header in bytes
size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.file.BigFileCatalog(*args, **kwargs)¶

Bases: nbodykit.source.catalog.file.FileCatalogBase

A CatalogSource that uses BigFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the name of the directory holding the bigfile data exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header header (str, optional) – the path to the header dataset (str) – load a specific dataset from the bigfile comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the name of the directory holding the bigfile data
exclude (list of str, optional) – the data sets to exlude from loading within bigfile; default is the header
header (str, optional) – the path to the header
dataset (str) – load a specific dataset from the bigfile
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.file.HDFCatalog(*args, **kwargs)¶

Bases: nbodykit.source.catalog.file.FileCatalogBase

A CatalogSource that uses HDFFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the file path to load root (str, optional) – the start path in the HDF file, loading all data below this path exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to `root` comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the file path to load
root (str, optional) – the start path in the HDF file, loading all data below this path
exclude (list of str, optional) – list of path names to exclude; these can be absolute paths, or paths relative to root
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.file.TPMBinaryCatalog(*args, **kwargs)¶

Bases: nbodykit.source.catalog.file.FileCatalogBase

A CatalogSource that uses TPMBinaryFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the path to the binary file to load precision ({'f4', 'f8'}, optional) – the string dtype specifying the precision comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the path to the binary file to load
precision ({'f4', 'f8'}, optional) – the string dtype specifying the precision
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.

class nbodykit.source.catalog.file.FITSCatalog(*args, **kwargs)¶

Bases: nbodykit.source.catalog.file.FileCatalogBase

A CatalogSource that uses FITSFile to read data from disk.

Multiple files can be read at once by supplying a list of file names or a glob asterisk pattern as the path argument. See Reading Multiple Data Files at Once for examples.

Parameters:

Parameters:	path (str) – the file path to load ext (number or string, optional) – The extension. Either the numerical extension from zero or a string extension name. If not sent, data is read from the first HDU that has data. comm (MPI Communicator, optional) – the MPI communicator instance; default (`None`) sets to the current communicator use_cache (bool, optional) – whether to cache data read from disk; default is `False` attrs (dict, optional) – dictionary of meta-data to store in `attrs`

path (str) – the file path to load
ext (number or string, optional) – The extension. Either the numerical extension from zero or a string extension name. If not sent, data is read from the first HDU that has data.
comm (MPI Communicator, optional) – the MPI communicator instance; default (None) sets to the current communicator
use_cache (bool, optional) – whether to cache data read from disk; default is False
attrs (dict, optional) – dictionary of meta-data to store in attrs

Examples

Please see the documentation for examples.

Attributes

`attrs`	A dictionary storing relevant meta-data about the CatalogSource.
`columns`	All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
`csize`	The total, collective size of the CatalogSource, i.e., summed across all ranks.
`hardcolumns`	The union of the columns in the file and any transformed columns.
`size`	The local size of the catalog.
`use_cache`	If set to `True`, use the built-in caching features of `dask` to cache data in memory.

Methods

`Selection`()	A boolean column that selects a subset slice of the CatalogSource.
`Value`()	When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
`Weight`()	The column giving the weight to use for each particle on the mesh.
`compute`(args, *kwargs)	Our version of `dask.compute()` that computes multiple delayed dask collections at once.
`copy`()	Return a copy of the CatalogSource object
`get_hardcolumn`(col)	Return a column from the underlying file source.
`make_column`(array)	Utility function to convert a numpy array to a `dask.array.Array`.
`read`(columns)	Return the requested columns as dask arrays.
`save`(output, columns[, datasets, header])	Save the CatalogSource to a `bigfile.BigFile`.
`to_mesh`([Nmesh, BoxSize, dtype, interlaced, …])	Convert the CatalogSource to a MeshSource, using the specified parameters.
`update_csize`()	Set the collective size, `csize`.

Selection()¶

A boolean column that selects a subset slice of the CatalogSource.

By default, this column is set to True for all particles.

Value()¶

When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

Weight()¶

The column giving the weight to use for each particle on the mesh.

The mesh field is a weighted average of Value, with the weights given by Weight.

By default, this array is set to unity for all particles.

__delitem__(col)¶: Delete a column; cannot delete a “hard-coded” column

__getitem__(sel)¶

The following types of indexing are supported:

strings specifying a column in the CatalogSource; returns a dask array holding the column data
boolean arrays specifying a slice of the CatalogSource; returns a CatalogCopy holding only the revelant slice
slice object specifying which particles to select
list of strings specifying column names; returns a CatalogCopy holding only the selected columnss

__len__()¶: The local size of the CatalogSource on a given rank.

__setitem__(col, value)¶: Add columns to the CatalogSource, overriding any existing columns with the name col.

attrs¶: A dictionary storing relevant meta-data about the CatalogSource.

columns¶: All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.

compute(*args, **kwargs)¶

Our version of dask.compute() that computes multiple delayed dask collections at once.

This should be called on the return value of read() to converts any dask arrays to numpy arrays.

If use_cache is True, this internally caches data, using dask’s built-in cache features.

Parameters:	args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged.

Notes

The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.

copy()¶

Return a copy of the CatalogSource object

Returns:	the new CatalogSource object holding the copied data columns
Return type:	CatalogCopy

csize¶

The total, collective size of the CatalogSource, i.e., summed across all ranks.

It is the sum of size across all available ranks.

get_hardcolumn(col)¶

Return a column from the underlying file source.

Columns are returned as dask arrays.

hardcolumns¶: The union of the columns in the file and any transformed columns.

logger = <logging.Logger object>¶

make_column(array)¶: Utility function to convert a numpy array to a dask.array.Array.

read(columns)¶

Return the requested columns as dask arrays.

Parameters:	columns (list of str) – the names of the requested columns
Returns:	the list of column data, in the form of dask arrays
Return type:	list of `dask.array.Array`

save(output, columns, datasets=None, header='Header')¶

Save the CatalogSource to a bigfile.BigFile.

Only the selected columns are saved and attrs are saved in header. The attrs of columns are stored in the datasets.

Parameters:	output (str) – the name of the file to write to columns (list of str) – the names of the columns to save in the file datasets (list of str, optional) – names for the data set where each column is stored; defaults to the name of the column header (str, optional) – the name of the data set holding the header information, where `attrs` is stored

size¶: The local size of the catalog.

Convert the CatalogSource to a MeshSource, using the specified parameters.

Parameters:	Nmesh (int, optional) – the number of cells per side on the mesh; must be provided if not stored in `attrs` BoxSize (scalar, 3-vector, optional) – the size of the box; must be provided if not stored in `attrs` dtype (string, optional) – the data type of the mesh array interlaced (bool, optional) – use the interlacing technique of Sefusatti et al. 2015 to reduce the effects of aliasing on Fourier space quantities computed from the mesh compensated (bool, optional) – whether to correct for the window introduced by the grid interpolation scheme window (str, optional) – the string specifying which window interpolation scheme to use; see pmesh.window.methods weight (str, optional) – the name of the column specifying the weight for each particle value (str, optional) – the name of the column specifying the field value for each particle selection (str, optional) – the name of the column that specifies which (if any) slice of the CatalogSource to take position (str, optional) – the name of the column that specifies the position data of the objects in the catalog
Returns:	mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh
Return type:	CatalogMesh

update_csize()¶

Set the collective size, csize.

This function should be called in __init__() of a subclass, after size has been set to a valid value (not NotImplemented)

use_cache¶: If set to True, use the built-in caching features of dask to cache data in memory.