nbodykit provides support for initializing
CatalogSource
objects by reading tabular data
stored on disk in a variety of formats:
In this section, we provide short examples illustrating how to read data stored in each of these formats. If your data format is not currently supported, please see Reading a Custom Data Format.
Reading data stored as columns in plaintext files is supported via the
CSVCatalog
class. This class partitions the CSV file into chunks, and
data is only read from the relevant chunks of the file, using
the pandas.read_csv()
function. The class accepts any configuration
keywords that this function does. The partitioning step provides a significant
speed-up when reading from the end of the file, since the entirety of the data
does not need to be read first.
Caveats
delim_whitespace=False
and changing the delimiter
keywordpandas
index column is not supported – all columns should represent
data columns to read.CSVCatalog
via the names
argument.As an example, below we generate 5 columns for 100 fake objects and write to a plaintext file:
In [1]: import numpy
In [2]: from nbodykit.source.catalog import CSVCatalog
# generate some fake ASCII data
In [3]: data = numpy.random.random(size=(100,5))
# save to a plaintext file
In [4]: numpy.savetxt('csv-example.txt', data, fmt='%.7e')
# name each of the 5 input columns
In [5]: names =['a', 'b', 'c', 'd', 'e']
# read the data
In [6]: f = CSVCatalog('csv-example.txt', names)
In [7]: print(f)
CSVCatalog(size=100, file='csv-example.txt')
In [8]: print("columns = ", f.columns) # default Weight,Selection also present