Wednesday 15 November 2017

Dropbox releases PyAnnotate -- auto-generate type annotations for mypy

For statically checking Python code, mypy is great, but it only works after you have added type annotations to your codebase. When you have a large codebase, this can be painful. At Dropbox we’ve annotated over 1.2 million lines of code (about 20% of our total Python codebase), so we know how much work this can be. It’s worth it though: the payoff is fantastic.

To easy the pain of adding type annotations to existing code, we’ve developed a tool, PyAnnotate, that observes what types are actually used at runtime, and inserts annotations into your source code based on those observations. We’ve now open-sourced the tool.

Here’s how it works. To start, you run your code with a special profiling hook enabled. This observes all call arguments and return values and records the observed types in memory. At the end of a run the data is dumped to a file in JSON format. A separate command-line utility can then read this JSON file and use it to add inline annotations to your source code.

Once the automatic annotations have been added you should run mypy on the updated files to verify that the generated annotations make sense, and most likely you will have to tweak some annotations by hand. But once you get the hang of it, PyAnnotate can save you a lot of work when adding annotations to a large codebase, and that’s nothing to sneeze at.

PyAnnotate is intended to ease the work of adding type annotations to existing (may we say legacy”) code. For new code we recommend adding annotations as you write the code type annotations should reflect the intention of the author, and there’s no better time to capture those intentions than when initially writing the code.

Example

Here’s a little program that defines a gcd() function and calls it a few times:

# gcd.py
def main():
    print(gcd(15, 10))
    print(gcd(45, 12))

def gcd(a, b):
    while b:
        a, b = b, a%b
    return a

We also need a driver script:

# driver.py
from gcd import main
from pyannotate_runtime import collect_types

if __name__ == '__main__':
    collect_types.init_types_collection()
    with collect_types.collect():
        main()
    collect_types.dump_stats('type_info.json')

Now let’s install PyAnnotate and run the driver script. The standard output is just what the main()  function prints:

$ pip install pyannotate
$ python driver.py
5
3

At this point, if you’re curious, you can look in type_info.json to see what types were recorded:

[
    {
        "path": "gcd.py",
        "line": 1,
        "func_name": "main",
        "type_comments": [
            "() -> None"
        ],
        "samples": 1
    },
    {
        "path": "gcd.py",
        "line": 5,
        "func_name": "gcd",
        "type_comments": [
            "(int, int) -> int"
        ],
        "samples": 2
    }
]

Let’s go ahead and annotate the gcd.py file (the -w flag means go ahead, update the file”):

$ pyannotate -w gcd.py
Refactored gcd.py
--- gcd.py        (original)
+++ gcd.py        (refactored)
@@ -1,8 +1,10 @@
 def main():
+    # type: () -> None
     print(gcd(15, 10))
     print(gcd(45, 12))
 def gcd(a, b):
+    # type: (int, int) -> int
     while b:
         a, b = b, a%b
     return a
Files that were modified:
gcd.py
$ mypy gcd.py

Since the original file was so small, the unified diff printed by PyAnnotate happens to show the entire file. Note that the final command shows that mypy is happy with the results! If you’d rather not see the diff output, use the -q flag.

Where to get PyAnnotate

The full source code for PyAnnotate is on GitHub: https://github.com/dropbox/pyannotate
If you’d rather just install and use it, a source distribution and a universal wheel” file exist on PyPI: https://pypi.python.org/pypi/pyannotate

Caveats

PyAnnotate doesn’t actually inspect every call the profiling hooks have a lot of overhead and a test scenario for a large application would run too slowly. For details on the downsampling” it performs see the source code. For the same reasons it also doesn’t inspect every item of a list or dictionary in fact it only inspects the first four. And if a function is called with many different argument types, it will only preserve the first eight that differ significantly.

The generated annotations are only as good as the test scenario you’re using. If you have a function that is written to take either an integer or a string, but your test scenario only calls it with integers, the annotations will say arg: int. If your function may return None or a dictionary, but in your test scenario it always returns None, the annotation will say -> None.

Annotations for functions with *args or **kwds in their signature will likely require manual cleanup.

PyAnnotate currently doesn’t generate Abstract Base Classes (ABCs) such as Iterable or Mapping. If you’re a fan of these you will have to do some manual tweaking to the output.

Since NewType and TypedDict are invisible at runtime, PyAnnotate won’t generate those.

Because of a limitation of the profiling hook, the runtime collection code cannot tell the difference between raising an exception and returning None.

Code in __main__ is currently ignored, because the module name doesn’t correspond to the filename.

The current version of PyAnnotate has been designed to work with either Python 2 or Python 3, but we’ve only used it for Python 2 code ourselves (since that’s what we have) and the tool currently generates type comments that are compatible with Python 2. We’re happy to accept a pull request adding the capability to generate Python 3 style annotations though.

Surely there are some bugs left in our code. We’re happy to accept bug reports in our tracker and bug fixes as pull requests. Note that all code contributions require filling out the Dropbox Contributor License Agreement (CLA).

We consider the current release a prototype, and we don’t promise that future versions will be backwards compatible. In fact, we hope to find the time to make significant changes. We just didn’t feel we would do anyone a favor by sitting on the code longer than we already have.