Functions
make_partitions (filename, blocksize, config) |
Partition a CSV file into blocks, using the preferred blocksize |
verify_data (path, names[, nrows]) |
Verify the data by reading the first few lines of the specified |
Classes
CSVFile (path, names[, blocksize, dtype, …]) |
A file object to handle the reading of columns of data from a CSV file. |
CSVPartition (filename, offset, blocksize, …) |
A simple class to convert byte strings of data from a CSV file |
nbodykit.io.csv.
CSVFile
(path, names, blocksize=33554432, dtype={}, usecols=None, delim_whitespace=True, **config)[source]¶A file object to handle the reading of columns of data from a CSV file.
Internally, this class partitions the CSV file into chunks, and
data is only read from the relevant chunks of the file, using
pandas.read_csv()
.
This setup provides a significant speed-up when reading from the end of the file, since the entirety of the data does not need to be read first.
The class supports any of the configuration keywords that can be
passed to pandas.read_csv()
Warning
This assumes the delimiter for separate lines is the newline
character and that all columns in the file represent data
columns (no “index” column when using pandas
)
Parameters: |
|
---|
Attributes
columns |
A list of the names of the columns in the file. |
dtype |
A numpy.dtype object holding the data types of each column in the file. |
ncol |
The number of data columns in the file. |
shape |
The shape of the file, which defaults to (size, ) |
size |
The size of the file, i.e., number of rows |
Methods
asarray () |
Return a view of the file, where the fields of the |
get_dask (column[, blocksize]) |
Return the specified column as a dask array, which |
keys () |
Aliased function to return columns |
read (columns, start, stop[, step]) |
Read the specified column(s) over the given range |
read
(columns, start, stop, step=1)[source]¶Read the specified column(s) over the given range
‘start’ and ‘stop’ should be between 0 and size
,
which is the total size of the file (in particles)
Parameters: | |
---|---|
Returns: | structured array holding the requested columns over the specified range of rows |
Return type: |
nbodykit.io.csv.
CSVPartition
(filename, offset, blocksize, delimiter, **config)[source]¶A simple class to convert byte strings of data from a CSV file to a pandas DataFrame on demand
The DataFrame is cached as value
, so only a single
call to pandas.read_csv()
is used
Attributes
value |
Return the parsed btye string as a DataFrame |
__init__
(filename, offset, blocksize, delimiter, **config)[source]¶Parameters: |
|
---|
value
¶Return the parsed btye string as a DataFrame
nbodykit.io.csv.
make_partitions
(filename, blocksize, config, delimiter='\n')[source]¶Partition a CSV file into blocks, using the preferred blocksize in bytes, returning the partititions and number of rows in each partition
This divides the input file into partitions with size roughly equal to blocksize, reads the bytes, and counts the number of delimiters to compute the size of each block
Parameters: |
|
---|---|
Returns: |
|
nbodykit.io.csv.
verify_data
(path, names, nrows=10, **config)[source]¶Verify the data by reading the first few lines of the specified CSV file to determine the data type
Parameters: |
|
---|---|
Returns: | dtype – dictionary holding the dtype for each name in names |
Return type: |