DeepR: Build and Train Deep Learning Pipelines for Production
DeepR is a library for Deep Learning on top of Tensorflow 1.x that focuses on production capabilities. It makes it easy to define pipelines (via the Job
abstraction), preprocess data (via the Prepro
abstraction), design models (via the Layer
abstraction) and train them either locally or on a Yarn cluster. It also integrates nicely with MLFlow and Graphite, allowing for production ready logging capabilities.
It can be seen as a collection of generic tools and abstractions to be extended for more specific use cases. See the Use DeepR
section for more information.
Submitting jobs and defining flexible pipelines is made possible thanks to a config system based off simple dictionaries and import strings. It is similar to Thinc config system or gin config in a lot of ways.
To start with deepr read the blogpost then go to quickstart on colab
Why a Deep Learning Library based on TF1.x
Tensorflow 1.x provides great production oriented capabilities, centered around the tf.Estimator
API. It makes it possible to deploy models using a protobuf
with no python
code, and optimize computational graphs with XLA compilation.
Although DeepR
comes with a Layer
interface (most similar to google TRAX and very close to most modern frameworks) that makes it easy to define models using a functional programming approach, most of its capabilities are orthogonal to it. Most of the building blocks expect generic python
types (for example, a Layer
is merely a function fn(tensors, mode)
).
Use DeepR
You can use DeepR
as a simple python library, reusing only a subset of the concepts (the config system is generic for example) or build your own extension as a standalone python package that depends on deepr
.
Have a look at the submodule examples of deepr
that illustrates what packages built on top of deepr would look like. It defines custom jobs, layers, preprocessors, macros as well as configs. Once your custom components are packaged in a library, it is easy to run configs with
deepr run config.json macros.json
MovieLens Example
You can try using DeepR on the MovieLens dataset, consisting of movie ratings aggregated by users. The submodule movielens implements an AverageModel, a Transformer Model and a BPR loss as well as jobs to build and evaluate on this dataset.
You can jump to the notebook on Colab or use the command line.
pip install deepr[cpu] faiss_cpu
cd deepr/examples/movielens/configs
wget http://files.grouplens.org/datasets/movielens/ml-20m.zip
unzip ml-20m.zip
deepr run config.json macros.json
Installation
Prerequisites
Make sure you use python>=3.6
and an up-to-date version of pip
and setuptools
python --version
pip install -U pip setuptools
It is recommended to install deepr
in a new virtual environment. For example
python -m venv deepr
source deepr/bin/activate
pip install -U pip setuptools
pip install deepr[cpu]
Using Pip
If installing using pip and your own requirements.txt
file, be aware that Tensorflow
is listed in extras_require
in the setup.py
, which means that pip install deepr
WON’T INSTALL Tensorflow. This is because the Tensorflow requirement is different depending on the platform (GPU or CPU-only).
You can specify which extras to use using the [cpu]
or [gpu]
argument like in the following examples
pip install deepr[cpu]
pip install deepr[gpu]
pip install -e ".[cpu]"
pip install -e ".[gpu]"
Or alternatively, pre-install Tensorflow separately like so
pip install tensorflow==1.15.2
pip install deepr
From Source
First, clone the deepr
repo on your local machine with
git clone https://github.com/criteo/deepr.git
cd deepr
To install from source in editable mode, run
make install-cpu
Or to install on a GPU enabled machine
make install-gpu
To install development tools and test requirements, run
make install-dev
Test
To run unit tests in your current environment, run
make test
To run integration tests in your current environment, run
make integration
To run lint + unit and integration tests in a fresh virtual environment, run
make venv-lint-test-integration
Lint
To run mypy
, pylint
and black --check
:
make lint
To auto-format the code using black
make black
Command Line Tools
To get a list of available commands, run
deepr --help
Contributing
See CONTRIBUTING
Change log
See CHANGELOG
Main contributors
Main contributors and maintainers for deepr are Guillaume Genthial, Romain Beaumont, Denis Kuzin, Amine Benhalloum