deepr.io package
Submodules
deepr.io.hdfs module
HDFS Utilities
- class deepr.io.hdfs.HDFSFile(filesystem, path, mode='rb', encoding='utf-8')[source]
Bases:
object
FileSystemFile, support of “r”, “w” modes, readlines and iter.
Makes it easier to read or write file from any filesystem. For example, if you use HDFS you can do
>>> from deepr.io import HDFSFileSystem >>> with HDFSFileSystem() as fs: ... with HDFSFile(fs, "viewfs://root/user/foo.txt", "w") as file: ... file.write("Hello world!")
The use of context manager means that the connection to the filesystem is automatically opened / closed, as well as the file buffer.
- filesystem
FileSystem instance
- Type:
FileSystem
deepr.io.json module
Json IO
deepr.io.parquet module
Utilities for parquet
- class deepr.io.parquet.ParquetDataset(path_or_paths, filesystem=None, metadata=None, schema=None, split_row_groups=False, validate_schema=True, filters=None, metadata_nthreads=1, memory_map=False)[source]
Bases:
object
Context aware ParquetDataset with support for chunk writing.
Makes it easier to read / write
ParquetDataset
. For example>>> from deepr.io import ParquetDataset >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> with ParquetDataset("viewfs://root/foo.parquet.snappy").open() as ds: ... ds.write_pandas(df, chunk_size=100)
The use of context managers automatically opens / closes the dataset as well as the connection to the FileSystem.
- path_or_paths
Path to parquet dataset (directory or file), or list of files.
- filesystem
FileSystem, if None, will be inferred automatically later.
- Type:
FileSystem, Optional
- property pq_dataset
deepr.io.path module
Path Utilities
- class deepr.io.path.Path(*args)[source]
Bases:
object
Equivalent of pathlib.Path for local and HDFS FileSystem
Automatically opens and closes an HDFS connection if the path is an HDFS path.
Allows you to work with local / HDFS files in an agnostic manner.
Example
path = Path("viewfs://foo", "bar") / "baz" path.parent.mkdir() with path.open("r") as file: for line in file: print(line) for path in path.glob("*"): print(path.is_file())
- copy_dir(dest, recursive=False, filesystem=None)[source]
Copy current files and directories if recursive to dest.
- exists(filesystem=None)[source]
Return True if the path points to an existing file or dir.
- Return type:
- is_dir(filesystem=None)[source]
Return True if the path points to a regular directory.
- Return type:
- open(mode='r', encoding='utf-8', filesystem=None)[source]
Open file on both HDFS and Local File Systems.
Example
Use a context manager like so
path = Path("viewfs://root/user/path/to/file.txt") with path.open("w") as file: file.write("Hello world!")
- property parent
Path to the parent of the current path
- property suffix
File extension of the file if any.