deepr.prepros.Prepro

class deepr.prepros.Prepro[source]

Base class for composable preprocessing functions.

Prepro are the basic building blocks of a preprocessing pipeline. A Prepro defines a function on a tf.data.Dataset.

The basic usage of a Prepro is to apply it on a Dataset. For example: >>> from deepr import readers >>> from deepr.prepros import Map >>> def gen(): … for i in range(3): … yield {“a”: i} >>> raw_dataset = tf.data.Dataset.from_generator(gen, {“a”: tf.int32}, {“a”: tf.TensorShape([])}) >>> list(readers.from_dataset(raw_dataset)) [{‘a’: 0}, {‘a’: 1}, {‘a’: 2}] >>> prepro_fn = Map(lambda x: {‘a’: x[‘a’] + 1}) >>> dataset = prepro_fn(raw_dataset) >>> list(readers.from_dataset(dataset)) [{‘a’: 1}, {‘a’: 2}, {‘a’: 3}]

Because some preprocessing pipelines behave differently depending on the mode (TRAIN, EVAL, PREDICT), an optional argument can be provided: >>> def map_func(element, mode=None): … if mode == tf.estimator.ModeKeys.PREDICT: … return {‘a’: 0} … else: … return element >>> prepro_fn = Map(map_func) >>> list(readers.from_dataset(raw_dataset)) [{‘a’: 0}, {‘a’: 1}, {‘a’: 2}] >>> dataset = prepro_fn(raw_dataset, mode=tf.estimator.ModeKeys.TRAIN) >>> list(readers.from_dataset(dataset)) [{‘a’: 0}, {‘a’: 1}, {‘a’: 2}] >>> dataset = prepro_fn(raw_dataset, mode=tf.estimator.ModeKeys.PREDICT) >>> list(readers.from_dataset(dataset)) [{‘a’: 0}, {‘a’: 1}, {‘a’: 2}]

TODO: Actually mode in map_func is not taken into account

Map, Filter, Shuffle and Repeat have a special attribute modes that you can use to specify the modes on which the preprocessing should be applied. For example: >>> def map_func(element, mode=None): … return {‘a’: 0} >>> prepro_fn = Map(map_func, modes=[tf.estimator.ModeKeys.PREDICT]) >>> dataset = prepro_fn(raw_dataset, tf.estimator.ModeKeys.TRAIN) >>> list(readers.from_dataset(dataset)) [{‘a’: 0}, {‘a’: 1}, {‘a’: 2}] >>> dataset = prepro_fn(dataset, tf.estimator.ModeKeys.PREDICT) >>> list(readers.from_dataset(dataset)) [{‘a’: 0}, {‘a’: 0}, {‘a’: 0}]

Authors of new Prepro subclasses typically override the apply method of the base Prepro class:

def apply(self, dataset: tf.data.Dataset, mode: str = None) -> tf.data.Dataset:
    return dataset

The easiest way to define custom preprocessors is to use the prepro decorator (see documentation).

__init__()

Methods

__init__()

apply(dataset[, mode])

Pre-process a dataset