nbodykit.io.hdf.ColumnInfo(size, dtype, dset)¶Bases: tuple
Attributes
dset |
Alias for field number 2 |
dtype |
Alias for field number 1 |
size |
Alias for field number 0 |
Methods
count(…) |
|
index((value, [start, …) |
Raises ValueError if the value is not present. |
__getnewargs__()¶Return self as a plain tuple. Used by copy and pickle.
__new__(_cls, size, dtype, dset)¶Create new instance of ColumnInfo(size, dtype, dset)
__repr__()¶Return a nicely formatted representation string
count(value) → integer -- return number of occurrences of value¶dset¶Alias for field number 2
dtype¶Alias for field number 1
index(value[, start[, stop]]) → integer -- return first index of value.¶Raises ValueError if the value is not present.
size¶Alias for field number 0
nbodykit.io.hdf.HDFFile(path, root='/', exclude=[])[source]¶Bases: nbodykit.io.base.FileType
A file object to handle the reading of columns of data from a h5py
HDF5 file.
See http://docs.h5py.org for documentation on h5py.
| Parameters: |
|---|
Attributes
columns |
A list of the names of the columns in the file. |
dtype |
A numpy.dtype object holding the data types of each column in the file. |
ncol |
The number of data columns in the file. |
shape |
The shape of the file, which defaults to (size, ) |
size |
The size of the file, i.e., number of rows |
Methods
asarray() |
Return a view of the file, where the fields of the |
get_dask(column[, blocksize]) |
Return the specified column as a dask array, which |
keys() |
Aliased function to return columns |
read(columns, start, stop[, step]) |
Read the specified column(s) over the given range |
__getitem__(s)¶This function provides numpy-like array indexing of the file object.
It supports:
keys()Note
If a single column is being returned, a numpy array holding the data is returned, rather than a structured array with only a single field.
asarray()¶Return a view of the file, where the fields of the structured array are stacked in columns of a single numpy array
Examples
Start with a file object with three named columns,
ra, dec, and z
>>> ff.dtype
dtype([('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')])
>>> ff.shape
(1000,)
>>> ff.columns
['ra', 'dec', 'z']
>>> ff[:3]
array([(235.63442993164062, 59.39099884033203, 0.6225500106811523),
(140.36181640625, -1.162310004234314, 0.5026500225067139),
(129.96627807617188, 45.970130920410156, 0.4990200102329254)],
dtype=(numpy.record, [('ra', '<f4'), ('dec', '<f4'), ('z', '<f4')]))
Select a subset of columns and switch the ordering and convert output to a single numpy array
>>> x = ff[['dec', 'ra']].asarray()
>>> x.dtype
dtype('float32')
>>> x.shape
(1000, 2)
>>> x.columns
['dec', 'ra']
>>> x[:3]
array([[ 59.39099884, 235.63442993],
[ -1.16231 , 140.36181641],
[ 45.97013092, 129.96627808]], dtype=float32)
Now, select only the first column (dec)
>>> dec = x[:,0]
>>> dec[:3]
array([ 59.39099884, -1.16231 , 45.97013092], dtype=float32)
| Returns: | a file object that will return a numpy array with the columns representing the fields |
|---|---|
| Return type: | FileType |
columns¶A list of the names of the columns in the file.
This defaults to the named fields in the file’s dtype
attribute, but differ from this if a view of the file has been
returned with asarray()
dtype¶A numpy.dtype object holding the data types of each column in the file.
get_dask(column, blocksize=100000)¶Return the specified column as a dask array, which
delays the explicit reading of the data until
dask.compute() is called
The dask array is chunked into blocks of size blocksize
| Parameters: | |
|---|---|
| Returns: | the dask array holding the column, which computes the necessary functions to read the data, but delays evaluating until the user specifies |
| Return type: |
logger = <logging.Logger object>¶ncol¶The number of data columns in the file.
read(columns, start, stop, step=1)[source]¶Read the specified column(s) over the given range
‘start’ and ‘stop’ should be between 0 and size,
which is the total size of the file
| Parameters: | |
|---|---|
| Returns: | structured array holding the requested columns over the specified range of rows |
| Return type: |
shape¶The shape of the file, which defaults to (size, )
Multiple dimensions can be introduced into the shape if
a view of the file has been returned with asarray()
size¶The size of the file, i.e., number of rows