deepr.prepros.Map

class deepr.prepros.Map(map_func, on_dict=True, update=True, modes=None, num_parallel_calls=None)[source]

Map a function on each element of a tf.data.Dataset.

A Map instance applies a map_func to all elements of a dataset. By default, elements are expected to be dictionaries. You can set on_dict=False if your dataset does not yield dictionaries.

If elements are dictionaries, you can use the additional argument update to choose to update dictionaries instead of overriding them.

NOTE: If map_func is a Layer, it directly uses forward or forward_as_dict to avoid inspection overhead from the Layer.__call__ method.

WARNING: if map_func is a Layer, the mode will not be forwarded by the Map.apply() call, and the default None will always be used. This is intended to keep the signature of the generic map_func in line with the tf.Dataset.map method.

If you wish to use a Layer with a given mode, you can do

>>> from functools import partial
>>> from deepr import readers
>>> from deepr.layers import Sum
>>> from deepr.prepros import Map
>>> layer = Sum()
>>> prepro_fn = Map(partial(layer.forward_as_dict, mode=tf.estimator.ModeKeys.TRAIN))

For example, by setting update=True (DEFAULT behavior)

>>> def gen():
...     yield {"a": 0}
>>> dataset = tf.data.Dataset.from_generator(gen, {"a": tf.int32}, {"a": tf.TensorShape([])})
>>> list(readers.from_dataset(dataset))
[{'a': 0}]
>>> def map_func(x):
...     return {"b": x["a"] + 1}
>>> prepro_fn = Map(map_func, update=True)
>>> list(readers.from_dataset(prepro_fn(dataset)))
[{'a': 0, 'b': 1}]

On the other hand, update=False yields the output of the map_func

>>> prepro_fn = Map(map_func, update=False)
>>> list(readers.from_dataset(prepro_fn(dataset)))
[{'b': 1}]

Because some preprocessing pipelines behave differently depending on the mode (TRAIN, EVAL, PREDICT), an optional argument can be provided. By setting modes, you select the modes on which the map transformation should apply. For example:

>>> prepro_fn = Map(map_func, modes=[tf.estimator.ModeKeys.TRAIN])
>>> list(readers.from_dataset(prepro_fn(dataset, tf.estimator.ModeKeys.TRAIN)))
[{'a': 0, 'b': 1}]
>>> list(readers.from_dataset(prepro_fn(dataset, tf.estimator.ModeKeys.PREDICT)))
[{'a': 0}]

If the mode is not given at runtime, the preprocessing is applied.

>>> list(readers.from_dataset(prepro_fn(dataset)))
[{'a': 0, 'b': 1}]
map_func

Function to map to each element

Type:

Callable[[Any], Any]

modes

Active modes for the map (will skip modes not in modes). Default is None (all modes are considered active modes).

Type:

Iterable[str], Optional

num_parallel_calls

Number of threads.

Type:

int

on_dict

If True (default), assumes dataset yields dictionaries

Type:

bool

update

If True (default), combine element and map_func(element)

Type:

bool

__init__(map_func, on_dict=True, update=True, modes=None, num_parallel_calls=None)[source]

Methods

__init__(map_func[, on_dict, update, modes, ...])

apply(dataset[, mode])

Pre-process a dataset

Attributes

tf_map_func

Return final map function.