deepr.io.ParquetDataset
- class deepr.io.ParquetDataset(path_or_paths, filesystem=None, metadata=None, schema=None, split_row_groups=False, validate_schema=True, filters=None, metadata_nthreads=1, memory_map=False)[source]
Context aware ParquetDataset with support for chunk writing.
Makes it easier to read / write
ParquetDataset
. For example>>> from deepr.io import ParquetDataset >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> with ParquetDataset("viewfs://root/foo.parquet.snappy").open() as ds: ... ds.write_pandas(df, chunk_size=100)
The use of context managers automatically opens / closes the dataset as well as the connection to the FileSystem.
- path_or_paths
Path to parquet dataset (directory or file), or list of files.
- filesystem
FileSystem, if None, will be inferred automatically later.
- Type:
FileSystem, Optional
- __init__(path_or_paths, filesystem=None, metadata=None, schema=None, split_row_groups=False, validate_schema=True, filters=None, metadata_nthreads=1, memory_map=False)[source]
Methods
__init__
(path_or_paths[, filesystem, ...])open
()Open HDFS Filesystem if dataset on HDFS
read
([columns, use_threads, use_pandas_metadata])read_pandas
([columns, use_threads])write
(table[, compression])write_pandas
(df[, compression, num_chunks, ...])Write DataFrame as Parquet Dataset
Attributes
is_hdfs
- rtype:
is_local
- rtype:
pq_dataset