nbodykit.source.catalog.file.
FileCatalogFactory
(name, filetype, examples=None)[source]¶Factory method to create a CatalogSource
that uses a subclass of nbodykit.io.base.FileType
to read
data from disk.
Parameters: |
|
---|---|
Returns: | the |
Return type: | subclass of |
nbodykit.source.catalog.file.
FileCatalogBase
(filetype, args=(), kwargs={}, comm=None, use_cache=False)[source]¶Bases: nbodykit.base.catalog.CatalogSource
Base class to create a source of particles from a single file, or multiple files, on disk.
Files of a specific type should be subclasses of this class.
Parameters: |
|
---|
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)[source]¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.source.catalog.file.
CSVCatalog
(*args, **kwargs)¶Bases: nbodykit.source.catalog.file.FileCatalogBase
A CatalogSource that uses CSVFile
to read data from disk.
Multiple files can be read at once by supplying a list of file
names or a glob asterisk pattern as the path
argument. See
Reading Multiple Data Files at Once for examples.
Parameters: |
|
---|
Examples
Please see the documentation for examples.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.source.catalog.file.
BinaryCatalog
(*args, **kwargs)¶Bases: nbodykit.source.catalog.file.FileCatalogBase
A CatalogSource that uses BinaryFile
to read data from disk.
Multiple files can be read at once by supplying a list of file
names or a glob asterisk pattern as the path
argument. See
Reading Multiple Data Files at Once for examples.
Parameters: |
|
---|
Examples
Please see the documentation for examples.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.source.catalog.file.
BigFileCatalog
(*args, **kwargs)¶Bases: nbodykit.source.catalog.file.FileCatalogBase
A CatalogSource that uses BigFile
to read data from disk.
Multiple files can be read at once by supplying a list of file
names or a glob asterisk pattern as the path
argument. See
Reading Multiple Data Files at Once for examples.
Parameters: |
|
---|
Examples
Please see the documentation for examples.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.source.catalog.file.
HDFCatalog
(*args, **kwargs)¶Bases: nbodykit.source.catalog.file.FileCatalogBase
A CatalogSource that uses HDFFile
to read data from disk.
Multiple files can be read at once by supplying a list of file
names or a glob asterisk pattern as the path
argument. See
Reading Multiple Data Files at Once for examples.
Parameters: |
|
---|
Examples
Please see the documentation for examples.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.source.catalog.file.
TPMBinaryCatalog
(*args, **kwargs)¶Bases: nbodykit.source.catalog.file.FileCatalogBase
A CatalogSource that uses TPMBinaryFile
to read data from disk.
Multiple files can be read at once by supplying a list of file
names or a glob asterisk pattern as the path
argument. See
Reading Multiple Data Files at Once for examples.
Parameters: |
|
---|
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.
nbodykit.source.catalog.file.
FITSCatalog
(*args, **kwargs)¶Bases: nbodykit.source.catalog.file.FileCatalogBase
A CatalogSource that uses FITSFile
to read data from disk.
Multiple files can be read at once by supplying a list of file
names or a glob asterisk pattern as the path
argument. See
Reading Multiple Data Files at Once for examples.
Parameters: |
|
---|
Examples
Please see the documentation for examples.
Attributes
attrs |
A dictionary storing relevant meta-data about the CatalogSource. |
columns |
All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user. |
csize |
The total, collective size of the CatalogSource, i.e., summed across all ranks. |
hardcolumns |
The union of the columns in the file and any transformed columns. |
size |
The local size of the catalog. |
use_cache |
If set to True , use the built-in caching features of dask to cache data in memory. |
Methods
Selection () |
A boolean column that selects a subset slice of the CatalogSource. |
Value () |
When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell. |
Weight () |
The column giving the weight to use for each particle on the mesh. |
compute (*args, **kwargs) |
Our version of dask.compute() that computes multiple delayed dask collections at once. |
copy () |
Return a copy of the CatalogSource object |
get_hardcolumn (col) |
Return a column from the underlying file source. |
make_column (array) |
Utility function to convert a numpy array to a dask.array.Array . |
read (columns) |
Return the requested columns as dask arrays. |
save (output, columns[, datasets, header]) |
Save the CatalogSource to a bigfile.BigFile . |
to_mesh ([Nmesh, BoxSize, dtype, interlaced, …]) |
Convert the CatalogSource to a MeshSource, using the specified parameters. |
update_csize () |
Set the collective size, csize . |
Selection
()¶A boolean column that selects a subset slice of the CatalogSource.
By default, this column is set to True
for all particles.
Value
()¶When interpolating a CatalogSource on to a mesh, the value of this array is used as the Value that each particle contributes to a given mesh cell.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
Weight
()¶The column giving the weight to use for each particle on the mesh.
The mesh field is a weighted average of Value
, with the weights
given by Weight
.
By default, this array is set to unity for all particles.
__delitem__
(col)¶Delete a column; cannot delete a “hard-coded” column
__getitem__
(sel)¶The following types of indexing are supported:
__len__
()¶The local size of the CatalogSource on a given rank.
__setitem__
(col, value)¶Add columns to the CatalogSource, overriding any existing columns
with the name col
.
attrs
¶A dictionary storing relevant meta-data about the CatalogSource.
columns
¶All columns in the CatalogSource, including those hard-coded into the class’s defintion and override columns provided by the user.
compute
(*args, **kwargs)¶Our version of dask.compute()
that computes
multiple delayed dask collections at once.
This should be called on the return value of read()
to converts any dask arrays to numpy arrays.
If use_cache
is True
, this internally caches data, using
dask’s built-in cache features.
Parameters: | args (object) – Any number of objects. If the object is a dask collection, it’s computed and the result is returned. Otherwise it’s passed through unchanged. |
---|
Notes
The dask default optimizer induces too many (unnecesarry) IO calls – we turn this off feature off by default. Eventually we want our own optimizer probably.
copy
()¶Return a copy of the CatalogSource object
Returns: | the new CatalogSource object holding the copied data columns |
---|---|
Return type: | CatalogCopy |
csize
¶The total, collective size of the CatalogSource, i.e., summed across all ranks.
It is the sum of size
across all available ranks.
get_hardcolumn
(col)¶Return a column from the underlying file source.
Columns are returned as dask arrays.
hardcolumns
¶The union of the columns in the file and any transformed columns.
logger
= <logging.Logger object>¶make_column
(array)¶Utility function to convert a numpy array to a dask.array.Array
.
read
(columns)¶Return the requested columns as dask arrays.
Parameters: | columns (list of str) – the names of the requested columns |
---|---|
Returns: | the list of column data, in the form of dask arrays |
Return type: | list of dask.array.Array |
save
(output, columns, datasets=None, header='Header')¶Save the CatalogSource to a bigfile.BigFile
.
Only the selected columns are saved and attrs
are saved in
header
. The attrs of columns are stored in the datasets.
Parameters: |
|
---|
size
¶The local size of the catalog.
to_mesh
(Nmesh=None, BoxSize=None, dtype='f4', interlaced=False, compensated=False, window='cic', weight='Weight', value='Value', selection='Selection', position='Position')¶Convert the CatalogSource to a MeshSource, using the specified parameters.
Parameters: |
|
---|---|
Returns: | mesh – a mesh object that provides an interface for gridding particle data onto a specified mesh |
Return type: |
update_csize
()¶Set the collective size, csize
.
This function should be called in __init__()
of a subclass,
after size
has been set to a valid value (not NotImplemented
)
use_cache
¶If set to True
, use the built-in caching features of dask
to cache data in memory.