deepr.prepros.Map
- class deepr.prepros.Map(map_func, on_dict=True, update=True, modes=None, num_parallel_calls=None)[source]
Map a function on each element of a tf.data.Dataset.
A
Map
instance applies amap_func
to all elements of a dataset. By default, elements are expected to be dictionaries. You can seton_dict=False
if your dataset does not yield dictionaries.If elements are dictionaries, you can use the additional argument
update
to choose to update dictionaries instead of overriding them.NOTE: If
map_func
is aLayer
, it directly usesforward
orforward_as_dict
to avoid inspection overhead from theLayer.__call__
method.WARNING: if
map_func
is aLayer
, themode
will not be forwarded by theMap.apply()
call, and the defaultNone
will always be used. This is intended to keep the signature of the genericmap_func
in line with thetf.Dataset.map
method.If you wish to use a
Layer
with a givenmode
, you can do>>> from functools import partial >>> from deepr import readers >>> from deepr.layers import Sum >>> from deepr.prepros import Map >>> layer = Sum() >>> prepro_fn = Map(partial(layer.forward_as_dict, mode=tf.estimator.ModeKeys.TRAIN))
For example, by setting update=True (DEFAULT behavior)
>>> def gen(): ... yield {"a": 0} >>> dataset = tf.data.Dataset.from_generator(gen, {"a": tf.int32}, {"a": tf.TensorShape([])}) >>> list(readers.from_dataset(dataset)) [{'a': 0}] >>> def map_func(x): ... return {"b": x["a"] + 1} >>> prepro_fn = Map(map_func, update=True) >>> list(readers.from_dataset(prepro_fn(dataset))) [{'a': 0, 'b': 1}]
On the other hand,
update=False
yields the output of themap_func
>>> prepro_fn = Map(map_func, update=False) >>> list(readers.from_dataset(prepro_fn(dataset))) [{'b': 1}]
Because some preprocessing pipelines behave differently depending on the mode (TRAIN, EVAL, PREDICT), an optional argument can be provided. By setting modes, you select the modes on which the map transformation should apply. For example:
>>> prepro_fn = Map(map_func, modes=[tf.estimator.ModeKeys.TRAIN]) >>> list(readers.from_dataset(prepro_fn(dataset, tf.estimator.ModeKeys.TRAIN))) [{'a': 0, 'b': 1}] >>> list(readers.from_dataset(prepro_fn(dataset, tf.estimator.ModeKeys.PREDICT))) [{'a': 0}]
If the mode is not given at runtime, the preprocessing is applied.
>>> list(readers.from_dataset(prepro_fn(dataset))) [{'a': 0, 'b': 1}]
- map_func
Function to map to each element
- Type:
Callable[[Any], Any]
- modes
Active modes for the map (will skip modes not in modes). Default is None (all modes are considered active modes).
- Type:
Iterable[str], Optional
Methods
__init__
(map_func[, on_dict, update, modes, ...])apply
(dataset[, mode])Pre-process a dataset
Attributes
tf_map_func
Return final map function.