nbodykit.io.binary

Functions

getsize(filename, header_size, rowsize) The default method to determine the size of the binary file

Classes

BinaryFile(path, dtype[, offsets, …]) A file object to handle the reading of columns of data from a binary file.
class nbodykit.io.binary.BinaryFile(path, dtype, offsets=None, header_size=0, size=None)[source]

A file object to handle the reading of columns of data from a binary file.

Warning

This assumes the data is stored in a column-major format

Parameters:
  • path (str) – the name of the binary file to load
  • dtype (numpy.dtype or list of tuples) – the dtypes of the columns to load; this should be either a numpy.dtype or be able to be converted to one via a numpy.dtype() call
  • offsets (dict, optional) – a dictionay specifying the byte offsets of each column in the binary file; if not supplied, the offsets are inferred from the dtype size of each column, assuming a fixed header size, and contiguous storage
  • header_size (int, optional) – the size of the header in bytes
  • size (int, optional) – the number of objects in the binary file; if not provided, the value is inferred from the dtype and the total size of the file in bytes

Attributes

columns A list of the names of the columns in the file.
dtype A numpy.dtype object holding the data types of each column in the file.
ncol The number of data columns in the file.
shape The shape of the file, which defaults to (size, )
size The size of the file, i.e., number of rows

Methods

asarray() Return a view of the file, where the fields of the
get_dask(column[, blocksize]) Return the specified column as a dask array, which
keys() Aliased function to return columns
read(columns, start, stop[, step]) Read the specified column(s) over the given range
read(columns, start, stop, step=1)[source]

Read the specified column(s) over the given range

‘start’ and ‘stop’ should be between 0 and size, which is the total size of the binary file (in particles)

Parameters:
  • columns (str, list of str) – the name of the column(s) to return
  • start (int) – the row integer to start reading at
  • stop (int) – the row integer to stop reading at
  • step (int, optional) – the step size to use when reading; default is 1
Returns:

structured array holding the requested columns over the specified range of rows

Return type:

numpy.array

nbodykit.io.binary.getsize(filename, header_size, rowsize)[source]

The default method to determine the size of the binary file

The “size” is defined as the number of rows, where each row has of size of rowsize in bytes.

Notes

  • This assumes the input file is not compressed
  • This function does not depend on the layout of the binary file, i.e., if the data is formatted in actual rows or not
Raises:

ValueError : – If the function determines a fractional number of rows

Parameters:
  • filename (str) – the name of the binary file
  • header_size (int) – the size of the header in bytes, which will be skipped when determining the number of rows
  • rowsize (int) – the size of the data in each row in bytes