DeepR: Build and Train Deep Learning Pipelines for Production¶
DeepR is a library for Deep Learning on top of Tensorflow 1.x that focuses on production capabilities. It makes it easy to define pipelines (via the
Job abstraction), preprocess data (via the
Prepro abstraction), design models (via the
Layer abstraction) and train them either locally or on a Yarn cluster. It also integrates nicely with MLFlow and Graphite, allowing for production ready logging capabilities.
It can be seen as a collection of generic tools and abstractions to be extended for more specific use cases. See the
Use DeepR section for more information.
Submitting jobs and defining flexible pipelines is made possible thanks to a config system based off simple dictionaries and import strings. It is similar to Thinc config system or gin config in a lot of ways.
Why a Deep Learning Library based on TF1.x¶
Tensorflow 1.x provides great production oriented capabilities, centered around the
tf.Estimator API. It makes it possible to deploy models using a
protobuf with no
python code, and optimize computational graphs with XLA compilation.
DeepR comes with a
Layer interface (most similar to google TRAX and very close to most modern frameworks) that makes it easy to define models using a functional programming approach, most of its capabilities are orthogonal to it. Most of the building blocks expect generic
python types (for example, a
Layer is merely a function
You can use
DeepR as a simple python library, reusing only a subset of the concepts (the config system is generic for example) or build your own extension as a standalone python package that depends on
Have a look at the submodule examples of
deepr that illustrates what packages built on top of deepr would look like. It defines custom jobs, layers, preprocessors, macros as well as configs. Once your custom components are packaged in a library, it is easy to run configs with
deepr run config.json macros.json
You can try using DeepR on the MovieLens dataset, consisting of movie ratings aggregated by users. The submodule movielens implements an AverageModel, a Transformer Model and a BPR loss as well as jobs to build and evaluate on this dataset.
You can jump to the notebook on Colab or use the command line.
pip install deepr[cpu] faiss_cpu cd deepr/examples/movielens/configs wget http://files.grouplens.org/datasets/movielens/ml-20m.zip unzip ml-20m.zip deepr run config.json macros.json
Make sure you use
python>=3.6 and an up-to-date version of
python --version pip install -U pip setuptools
It is recommended to install
deepr in a new virtual environment. For example
python -m venv deepr source deepr/bin/activate pip install -U pip setuptools pip install deepr[cpu]
If installing using pip and your own
requirements.txt file, be aware that
Tensorflow is listed in
extras_require in the
setup.py, which means that
pip install deepr WON’T INSTALL Tensorflow. This is because the Tensorflow requirement is different depending on the platform (GPU or CPU-only).
You can specify which extras to use using the
[gpu] argument like in the following examples
pip install deepr[cpu] pip install deepr[gpu] pip install -e ".[cpu]" pip install -e ".[gpu]"
Or alternatively, pre-install Tensorflow separately like so
pip install tensorflow==1.15.2 pip install deepr
First, clone the
deepr repo on your local machine with
git clone https://github.com/criteo/deepr.git cd deepr
To install from source in editable mode, run
Or to install on a GPU enabled machine
To install development tools and test requirements, run
To run unit tests in your current environment, run
To run integration tests in your current environment, run
To run lint + unit and integration tests in a fresh virtual environment, run
To auto-format the code using
Command Line Tools¶
To get a list of available commands, run