deepr.io.ParquetDataset
- class deepr.io.ParquetDataset(path_or_paths, filesystem=None, metadata=None, schema=None, split_row_groups=False, validate_schema=True, filters=None, metadata_nthreads=1, memory_map=False)[source]
Context aware ParquetDataset with support for chunk writing.
Makes it easier to read / write
ParquetDataset. For example>>> from deepr.io import ParquetDataset >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> with ParquetDataset("viewfs://root/foo.parquet.snappy").open() as ds: ... ds.write_pandas(df, chunk_size=100)
The use of context managers automatically opens / closes the dataset as well as the connection to the FileSystem.
- path_or_paths
Path to parquet dataset (directory or file), or list of files.
- filesystem
FileSystem, if None, will be inferred automatically later.
- Type:
FileSystem, Optional
- __init__(path_or_paths, filesystem=None, metadata=None, schema=None, split_row_groups=False, validate_schema=True, filters=None, metadata_nthreads=1, memory_map=False)[source]
Methods
__init__(path_or_paths[, filesystem, ...])open()Open HDFS Filesystem if dataset on HDFS
read([columns, use_threads, use_pandas_metadata])read_pandas([columns, use_threads])write(table[, compression])write_pandas(df[, compression, num_chunks, ...])Write DataFrame as Parquet Dataset
Attributes
is_hdfs- rtype:
is_local- rtype:
pq_dataset