{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install deepr[cpu]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Config\n", "\n", "It is possible (and hopefully intuitive) to translate code into a simple config syntax based off native python types (dictionaries, tuples, lists, etc.).\n", "\n", "The config system builds upon ideas from [Thinc](https://thinc.ai/docs/usage-config) and [gin-config](https://github.com/google/gin-config/blob/master/docs/index.md) as follows\n", "\n", "- Support arbitrary tree of objects (like `Thinc`), but not arbirary dependency injection like `gin-config`\n", "- Re-use macro syntax \"$macro:name\" from `Thinc`\n", "- Re-use special keyword \"@reference\" for references from `gin-config`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Design Requirements\n", "\n", "Features\n", "\n", "- Trees of class instances of any type.\n", "- Static macros (simple parameter values).\n", "- Dynamic macros (parameter values created at run time, eg. the MLFlow run id).\n", "- Positional and keyword arguments support.\n", "- References to the config and macros themselves for future use (upload to MLFlow for example).\n", "- Config evaluation mode: allow attributes to be configs, delegating objects instantiation to the parent.\n", "- Easy serialization to `json`.\n", "- No \"python magic\": avoid having registries and hidden configuration in package level variables that would not be passed along when sending jobs to remote machines (configs should be self-contained).\n", "- No special `Config` class (ideally dictionaries should be enough).\n", "- Unicity: there should ideally be only one way to write a config for a given object.\n", "\n", "Don't support\n", "\n", "- Classes : objects in config are necessarily instances of classes or literals, i.e. `dict`, `tuple`, `float`, `int`, etc.\n", "- Object references and scoping : don't allow references to other objects defined in the config as it might lead to confusions (singleton issue). It will also break unicity as there would be mutiple ways to define the \"same\" config.\n", "- Singletons : the singleton problem can be (and should be) solved in the code rather than in the config." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import logging\n", "import sys\n", "logging.basicConfig(level=logging.INFO, stream=sys.stdout)\n", "logging.getLogger(\"tensorflow\").setLevel(logging.CRITICAL)\n", "\n", "import deepr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arbitrary tree of objects\n", "\n", "A `Config` is any arbitrary nested dictionary that corresponds to a tree of python objects.\n", "\n", "An instance of a class can be described as a dictionary with the following special key\n", "\n", "- `type`: the full import string of the instance's class to be created.\n", "\n", "All other keys will be treated as keyword arguments at instantiation time.\n", "\n", "For example if you have the following class and configuration" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Model:\n", " def __init__(self, learning_rate):\n", " self.learning_rate = learning_rate\n", "\n", "config = {\n", " \"type\": \"__main__.Model\",\n", " \"learning_rate\": 0.1\n", "}\n", "\n", "parsed = deepr.parse_config(config)\n", "model = deepr.from_config(parsed)\n", "model.learning_rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Macros\n", "\n", "It is possible to define macro variables and reference them in the config.\n", "\n", "For example, given the following macros (there is one macro `params` with one parameter `learning_rate`):\n", "\n", "\n", "A config can use the learning rate using the `$macro:param` convention:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'type': 'Model', 'learning_rate': 0.1}\n" ] } ], "source": [ "macros = {\n", " \"params\": {\n", " \"learning_rate\": 0.1\n", " }\n", "}\n", "\n", "config = {\n", " \"type\": \"Model\",\n", " \"learning_rate\": \"$params:learning_rate\"\n", "}\n", "\n", "print(deepr.parse_config(config, macros))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dynamic Macros\n", "\n", "For more advanced uses, it is possible to define *dynamic* macros as configs of instances of classes inheriting dict.\n", "\n", "For example" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'type': '__main__.Job', 'year': '2020'}\n", "2020\n" ] } ], "source": [ "from datetime import datetime\n", "\n", "\n", "class DateMacro(dict):\n", " def __init__(self):\n", " super().__init__(year=datetime.now().strftime('%Y'))\n", "\n", " \n", "class Job:\n", " def __init__(self, year):\n", " self.year = year\n", "\n", "macros = {\n", " \"date\": {\n", " \"type\": \"__main__.DateMacro\"\n", " }\n", "}\n", "config = {\n", " \"type\": \"__main__.Job\",\n", " \"year\": \"$date:year\"\n", "}\n", "parsed = deepr.parse_config(config, macros)\n", "print(parsed)\n", "job = deepr.from_config(parsed)\n", "print(job.year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Positional Arguments\n", "\n", "Use the special key `*` for positional arguments.\n", "\n", "For example if you have the following class and configuration" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 2, 3)\n" ] } ], "source": [ "class Model:\n", " def __init__(self, *layers):\n", " self.layers = layers\n", "\n", "config = {\n", " \"type\": \"__main__.Model\",\n", " \"*\": [1, 2, 3]\n", "}\n", "parsed = deepr.parse_config(config)\n", "model = deepr.from_config(parsed)\n", "print(model.layers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Argument as config\n", "\n", "In some cases, the argument to be provided to an instance should be left as a config. To specify that a block should not be instantiated but kept as a dict config, use the special key:\n", "\n", "- `eval` (Optional): can take 3 values: \n", " - `\"call\"`: call the class or function referenced by \"type\" with the provided arguments.\n", " - `\"partial\"`: return a callable equivalent to the callable referenced by \"type\" with pre-filled arguments.\n", " - `None`: the dictionary will be kept as is and no instance will be created.\n", "\n", "\n", "For example if you have the following class and configuration" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'type': '__main__.Model', '*': [1, 2, 3]}\n" ] } ], "source": [ "class Job:\n", " def __init__(self, model_config):\n", " self.model_config = model_config\n", " \n", " def run(self):\n", " # Job is responsible for instantiating the model from the config\n", " model = from_config(self.model_config)\n", "\n", "config = {\n", " \"type\": \"__main__.Job\",\n", " \"model_config\": {\n", " \"type\": \"__main__.Model\",\n", " \"eval\": None,\n", " \"*\": [1, 2, 3]\n", " }\n", "}\n", "\n", "parsed = deepr.parse_config(config)\n", "job = deepr.from_config(parsed)\n", "print(job.model_config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### References\n", "\n", "Though references to other objects defined in the tree is not possible, it is possible to use 3 special references\n", "\n", "- `@self`: refers to the unparsed config\n", "- `@macros`: refers to the unparsed macros\n", "- `@macros_eval`: refers to the evaluated macros\n", "\n", "For example" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:deepr.config.base:- MACRO PARAM NOT USED: macro = 'date', param = 'year'\n", "WARNING:deepr.config.base:- MACRO PARAM NOT USED: macro = 'params', param = 'learning_rate'\n", "{'config': '@self', 'macros': '@macros', 'macros_eval': '@macros_eval', 'eval': None}\n", "{'date': {'type': '__main__.DateMacro'}, 'params': {'learning_rate': 0.1}, 'eval': None}\n", "{'date': {'year': '2020'}, 'params': {'learning_rate': 0.1}, 'eval': None}\n" ] } ], "source": [ "macros = {\n", " \"date\": {\n", " \"type\": \"__main__.DateMacro\"\n", " },\n", " \"params\": {\n", " \"learning_rate\": 0.1\n", " }\n", "}\n", "\n", "config = {\n", " \"config\": \"@self\",\n", " \"macros\": \"@macros\",\n", " \"macros_eval\": \"@macros_eval\"\n", "}\n", "\n", "parsed = deepr.parse_config(config, macros)\n", "print(parsed[\"config\"])\n", "print(parsed[\"macros\"])\n", "print(parsed[\"macros_eval\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The intended use of these references is mainly for logging and should be used sparingly." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 4 }