Wednesday, 17 March 2021

Summer of Code with Mypy

TH3CHARLie, Xuanda Yang
th3charlie at gmail dot com
GitHub Profile
GSoC Project Page
GSoC Submission Page

Introduction

In Fall 2019, I was writing a course project in Python and was tired of the experience of finding type errors at the last minute, so I did some searching and tried mypy for the first time. At that time, I would not dream of doing a three-month coding project with the organization, which became reality last summer. In this post, I'd like to share the story of this very personal experience and talk about how I started to contribute to mypy and eventually became a Google Summer of Code (GSoC) student of this project.

Pre-application

After finishing my course project with mypy, out of curiosity, I took a look at its GitHub page and paid extra attention to the issue tracker. I searched for issues with the "good first issue" label and was surprised to find that most of them were equipped with comments from the maintainers telling how to tackle them. Reading through these comments, I found some issues that fit my level and started to take a try at them.

Before submitting a pull request, I read through the repository and checked every material that seemed useful for writing a PR, which includes the README file, the documentation and the wiki page. I learned about how to install the development version of mypy, how to test my PR locally and some high-level ideas about the codebase. This proves to save a lot of trouble in my future contributions to the project. My first PR to the project was Support Negative Int Literal and was reviewed by Ivan. He patiently guided me to refine the PR step by step and finally merged it into the main branch. The satisfaction of seeing my code entering the codebase of a tool used by people around the globe was huge and encouraged me to take more attempts on other good first issues. By the end of 2019, I had submitted around 20 PRs to the repository. Most of them were super simple tasks, like producing meaningful error messages or fixing a corner case of a specific type-checking rule.

Looking back now, writing all these simple PRs helped me gradually familiarize myself with the codebase and the development workflow, which became a valuable asset when applying to the GSoC project.

Applying to GSoC

Google Summer of Code (GSoC) is an annual event where students get to work with open source communities through the summer and even get some stipends from Google. Before Google announced last year's selected organizations and project ideas on Feb 20th, one of mypy's maintainers posted an issue named Project ideas listing several time-consuming project ideas. At that time, I was eager to try something more challenging than fixing small bugs so I spent some time reading them through and tried to figure out which of them fit my level. A few days later, the official organization list for GSoC 2020 was out and mypy was one of them and the previously mentioned issue included the potential project ideas. What a great match, I said to myself. So I made my mind and started to draft my application.

The student application of GSoC usually consists of two parts. The first is (are) mandatory project proposal(s). Each student can choose up to three project ideas from three different organizations. Finding suitable project ideas from hundreds of them is usually hard since understanding the idea descriptions already requires effort and even some domain knowledge. Thanks to my previous contribution experience at mypy, I didn't spend too much time on this matter and quickly narrowed my topic to compiler related ones. Generalize mypyc IR to Make non-C Backends Possible from mypy and another topic from LLVM were my choices. However, the latter one was picked by someone else through the mailing list before I even had the chance to take a closer look at it. So mypy became the only organization left for me to apply to.

To write a good proposal, I started to discuss directly with the maintainers (or future mentors) in the issue tracker. I'd say this is one of the best approaches to prepare your proposal. By doing this, you are essentially expressing your passion, demonstrating your background and skills directly to the mentors and the mentors can provide details of the project that help you get a better picture. Don't be afraid to discuss, open-source communities often have very nice people and mypy is definitely one of them. Jukka Lehtosalo and Michael (Sully) Sullivan provided tons of good advice that helped me formulate a plan for the project and therefore finish my proposal. I also came up with a prototype PR addressing one core change of my proposal before the deadline. Although it never got merged since we did it differently later, a working prototype is always welcomed and I'd say it would increase your chance of getting accepted.

The second and also the optional part is the organization-specific requirement(s). Last year for applicants to mypy, there were some posted requirements including course background, programming skills, submitted PRs and an email giving details.

After I submitted my proposal to GSoC's official site and sent an email to the mypy team, I did a short video interview with the mypy team. I talked about what I would do to make the project successful and also talked a little more about my background other than what was written in the email. The interview went quite well and several days later, I received an email from the mypy team and was told I was selected by the organization. Then all I had to do was to wait for the official announcement from Google. And days later, I officially became the first GSoC student of the mypy project and would start my coding summer soon.

Coding Through the Summer

After getting accepted to GSoC, what's left is to code through the summer. The project I worked on was Generalize mypyc IR to Make non-C Backends Possible. In short, it was a project to redesign and refactor some parts of the old mypyc IR to remove the embedded C backend-specific information. You can find the detailed technique report on the submission page. Here, I'd like to talk about other things that play as important a role as coding.

Plan the schedule and monitor the progress. Three months is just a blink of an eye, so it's important to have realistic goals to achieve at specific timestamps, to make sure the project goes smoothly. We had a rough schedule in the proposal and established monthly goals in later discussions. With the schedule and goals in mind, I was able to make progress continuously. I even had an extra week. Usually, before the coding period starts in June, there is one month for community bonding where students get to familiarize themselves with the workflow of the project and connect with the mentors. Thanks to my previous mypy contribution experience, I thought it would be fine to skip this period and start coding one week earlier than scheduled. This extra week gave my mentors and me a lot of flexibility to handle unexpected schedule delays.

Communication is the key. Jukka, one of my mentors, told me that it was always hard to come up with the perfect design at first as requirements and needs develop all the time. To face this developing nature, we communicated a lot. I had daily syncs with my mentors through Gitter chat, weekly meetings with Jukka, and monthly meetings with both Jukka and Sully. In all these meetings, I shared recent progress, they helped me remove any blockers, and we discussed component designs. I am one-hundred percent sure that this project would never have been this successful without such frequent and effective communication.

After the Summer

I completed this project and submitted the required materials in early September, with around 60 PRs merged into the repository. To me, the GSoC project ended pretty well, I received the certificate from Google as well as some stipends. More importantly, I've demonstrated my passion and skills to contribute to this organization more than just a community contributor. I was honored to become one of mypy's committers. That means a lot to me and I am looking forward to contributing more to the community.

That's my story with mypy through the Google Summer of Code 2020. I hope this post will provide some useful information to future potential applicants including applicants of this year since mypy has officially been selected as one of this year’s organizations. If you have any questions regarding this post, feel free to send me an email and I will be more than happy to discuss.

Acknowledgements

I'd like to express my sincere gratitude to every person and organization that made my GSoC 2020 adventure possible. I'd like to firstly thank Google and mypy for providing me this opportunity.

Especially, I'd like to thank my mentors Jukka Lehtosalo and Michael Sullivan. We had daily syncs every day on Gitter and weekly/monthly video meetings via Zoom. In every discussion, they helped me clean my thoughts and find out the best approach to meet our goals. They responded quickly to my PRs, giving high-quality review comments and suggestions. They mentored me with patience and passion and I feel connected even though we are several timezones/continents away. Their guidance is even beyond the scope of the project and helps me form good software engineering skills along the way.

I also would like to thank my parents for supporting me working on this project and Yaozhu Sun from the University of Hong Kong, who lit my passion for the field of compilers and programming languages two years ago. Finally, I'd like to thank Kanemura Miku from Hinatazaka 46 for all the mental support during this special summer.

Disclaimer

All content of this post only represents the author's personal opinions. The post does not, in any degree, constitute the view or opinion from Google or the mypy community.

Friday, 19 February 2021

Mypy 0.812 Released

We’ve just uploaded mypy 0.812 to the Python Package Index (PyPI). Mypy is a static type checker for Python. This release includes a fix to a regression in source file finding logic in mypy 0.800, and a new command-line option --exclude to exclude paths from the build. You can install it as follows:

    python3 -m pip install -U mypy

You can read the full documentation for this release on Read the Docs.

Improved Source File Finding

Mypy 0.800 changed how mypy finds modules if you run mypy as mypy directory/ or mypy -p package. Mypy started looking for source files in directories without a __init__.py file. This is often the expected behavior, and it avoids excluding some files that should be type checked.

However, this caused issues for some users, such as when using mypy . to type check all files under the current directory. Mypy could now try to type check files inside nested virtual environments and node_modules directories, which is usually not desirable. This could result in mypy needlessly complaining about duplicate module names, in particular.

Now mypy will skip directories named site-packages or node_modules, and any directory beginning with a dot (such as .git) when recursively looking for files to check.

This doesn’t affect how mypy resolves imports — it only affects when mypy is given a directory or a package to type check. You can override the exclusions by explicitly passing the files on the command line.

This was contributed by Shantanu (PR 9992).

Excluding Paths

Mypy now supports the --exclude regex command line option to exclude paths matching a regular expression when searching for files to type check. For example, mypy --exclude '/setup\.py$' skips all setup.py files. This lets you exclude additional paths that mypy started finding after mypy 0.800 changed module finding behavior, as discussed above.

You can also specify this in the config file (exclude=regex). The option expects forward slashes as directory separators on all platforms, including Windows, for consistency.

This was also contributed by Shantanu (PR 9992). See the documentation for more details.

Typeshed Updates

There are no typeshed updates in this release.

Acknowledgments

Thanks to Shantanu who contributed to this release.

We’d also like to thank our employer, Dropbox, for funding the mypy core team.

Friday, 22 January 2021

Mypy 0.800 Released

We’ve just uploaded mypy 0.800 to the Python Package Index (PyPI). Mypy is a static type checker for Python. This release includes new features, bug fixes and library stub (typeshed) updates. You can install it as follows:

    python3 -m pip install -U mypy

You can read the full documentation for this release on Read the Docs.

Python 3.9 Support

Mypy 0.800 officially supports the recently released Python 3.9. We now provide compiled binary wheels for Python 3.9, improving type checking speed significantly.

Typing Usability Improvements (PEP 585 and PEP 604)

The necessity to repeatedly import various types and special forms from typing has been a long-term nuisance for users of static type checking and Python.

Two new Python features improve this situation and are now supported by mypy:

  • PEP 585 lets you use list[int] instead of List[int] (no need to import List and other generic collections from typing).
  • PEP 604 lets you write X | Y instead of Union[X, Y], and X | None instead of Optional[X] (no need to import Union or Optional from typing).

Note: Using list[int] requires Python 3.9 and X | Y requires Python 3.10 (alpha) in order to work at runtime. To use them on older versions of Python, use from __future__ import annotations. This allows them to be used in type annotations, but the older variants (or string literal escaping) may be required in non-annotation contexts, such as in type aliases. See the docs for more details.

Here is an example that uses the new features:

    from __future__ import annotations
    
    def fields(s: str | None) -> list[str]:
        if not s:
            return []
        else:
            return s.split(',')

These were implemented by Allan Daemon in PR 9564 and by Marc Mueller in PR 9647.

Improvements to Finding Modules

This release adds several improvements to how mypy finds Python source files to type check.

You can now pass paths to files within namespace packages on the command line, and mypy can better infer their module names. As before, use --namespace-packages to enable namespace packages.

When you use --explicit-package-bases together with --namespace-packages, mypy assumes that only the current directory and directories explicitly specified in MYPYPATH (or mypy_path in the config file) are valid package roots. This can help with situations where the module name of a file is ambiguous. For example, it may not be clear whether src/pkg/mod.py should be treated as src.pkg.mod or pkg.mod, and you can use this option to disambiguate between the two (more information in the docs).

The above improvements were implemented in PR 9742 by Shantanu.

Other related improvements (also implemented by Shantanu):

  • When you run mypy as mypy <directory>, look for source files recursively also inside directories without a __init__.py (PR 9614)
  • Support namespace packages with -p (PR 9683)
  • Log encountered source files when running mypy with -v (PR 9672)
  • Document the new module finding behavior (PR 9923)

Other Notable Improvements and Bug Fixes

  • Only treat import X as X as a re-export in stubs (Shantanu, PR 9515)
  • Fix package imports with aliases in stubgen (Chad Dombrova, PR 9534)
  • Require first argument of namedtuple() to match the variable name (Momoko Hattori, PR 9577)
  • Add error code for name mismatches in named tuples and TypedDicts to make it easy to disable these error messages (Jukka Lehtosalo, PR 9811)
  • Document local_partial_types config option (Momoko Hattori, PR 9551)
  • Improve ambiguous kwargs checking (Erik Soma, PR 9573)
  • Disable unreachable warnings in boolean operators for type variables with value restrictions (Vincent Barbaresi, PR 9572)
  • Allow assignment to an empty tuple (Tobin Yehle, PR 5617)
  • Use absolute path when checking source duplication error (Yuki Igarashi, PR 9059)
  • Add get_function_signature_hook() to the plugin API (Nikita Sobolev, PR 9102)
  • Speed up type checking of dictionary, set, and list expressions (Hugues, PR 9477)
  • Allow non-types as arguments in Annotated (Patrick Arminio, PR 9625)
  • Add support for next generation attrs API (David Euresti, PR 9396)
  • Fix case folding of missing keys error message for TypedDicts (Marti Raudsepp, PR 9757)
  • Fix generic inheritance of __init__() methods in dataclasses and attrs classes (Nate McMaster, PR 9383, PR 9380)
  • Add more information to error message on too few arguments (Abhinay Pandey, PR 9796)
  • Document PEP 585, 563, 604, and related functionality (Shantanu, PR 9763)

Mypyc Improvements

We use mypyc to compile mypy into fast C extension modules. This release includes many mypyc improvements.

Xuanda Yang finished the migration to use a new, lower-level compiler intermediate representation in his Google Summer of Code project.

New supported Python features:

  • Support the walrus operator (:=) (Michael J. Sullivan, PR 9624)

Performance improvements:

  • Add primitives for list sort and list reverse (Jukka Lehtosalo, PR 9897)
  • Recognize six.moves.xrange again as an alias of range (Jukka Lehtosalo, PR 9896)
  • Speed up some integer primitives (Jukka Lehtosalo, PR 9801)
  • Speed up if x for int values (Jukka Lehtosalo, PR 9854)
  • Implement dict clear primitive (Vasileios Sakkas, PR 9724)
  • Implement list insert primitive (Vasileios Sakkas, PR 9741)
  • Implement float abs primitive (Xuanda Yang, PR 9695)
  • Implement str-to-float primitive (Xuanda Yang, PR 9685)
  • Specialize some calls to frozenset (Michael J. Sullivan, PR 9623)
  • Speed up multiple assignment from tuple (Xuanda Yang, PR 9575)
  • Speed up multiple assignment from sequence (Jukka Lehtosalo, PR 9800)
  • Optimize startswith and endswith (Tomer Chachamu, PR 9557)
  • Add primitives for bitwise ops (Jukka Lehtosalo, PR 9529)
  • Speed up in operations for list/tuple (Johan Dahlin, PR 9004)
  • Add primitives for list, str and tuple slicing (Jukka Lehtosalo, PR 9283)
  • Speed up tuple equality checks (Xuanda Yang, PR 9343)

Bug fixes:

  • Always add implicit None return type to __init__ method (Thomas Johnson, PR 9866)
  • Fix deallocation of deeply nested data structures (Michael J. Sullivan, PR 9839)
  • Fix using package imported inside a function (Jukka Lehtosalo, PR 9782)
  • Fix type of for loop index register in for over range (Jukka Lehtosalo, PR 9634)

Typeshed Updates

Many improvements were made to typeshed — too many to list. Browse the typeshed commit log here.

Acknowledgments

First of all, we’d like to thank our employer, Dropbox, for funding the mypy core team.

Thanks to all mypy contributors who contributed to this release:

  • Abdullah Selek
  • Abhinay Pandey
  • Adam
  • aghast
  • Akuli
  • Alexander
  • Allan Daemon
  • Aristotelis Mikropoulos
  • Ashley Whetter
  • Brian Mboya
  • Bryan Forbes
  • cdce8p
  • Chad Dombrova
  • David Euresti
  • Denis Laxalde
  • Eisuke Kawashima
  • Erik Soma
  • Ethan Pronovost
  • Florian Bruhin
  • Frank Dana
  • Greg Compestine
  • Guido van Rossum
  • Hugues
  • Jake Bailey
  • Jakub Stasiak
  • Jelle Zijlstra
  • Jeremy Metz
  • Johan Dahlin
  • Jon Shea
  • Jonathan Wong
  • Jürgen Gmach
  • Kamil Turek
  • Krzysztof Przybyła
  • Lawrence Chan
  • Marti Raudsepp
  • Matan Gover
  • Matt Gilson
  • Michael J. Sullivan
  • Momoko Hattori
  • Nate McMaster
  • Nikita Sobolev
  • Nils K
  • Nipunn Koorapati
  • Oleg Höfling
  • Patrick Arminio
  • Rajiv Singh
  • rhkleijn
  • Roland van Laar
  • Shantanu
  • Tobin Yehle
  • Tom Scogland
  • Tomer Chachamu
  • Tripp Horbinski
  • Ville Skyttä
  • Vincent Barbaresi
  • vsakkas
  • Wes Turner
  • willtryagain
  • Xiaodong DENG
  • Xuanda Yang
  • Yash Chhabria
  • Yuki Igarashi

Additional thanks to all contributors to typeshed:

  • Abraham Francis
  • Adam Dangoor
  • Adam Kliś
  • Adam Lichtl
  • Akuli
  • alexander-held
  • an onion
  • Anders Kaseorg
  • Andrew Mitchell
  • Árni Már Jónsson
  • Ash Berlin-Taylor
  • Ashwin Vishnu
  • Avery
  • cdce8p
  • Cebtenzzre
  • Changsheng
  • Christine
  • coiax
  • cptpcrd
  • crusaderky
  • Daniel O'Neel
  • David Caro
  • Dominic Davis-Foster
  • Eric Traut
  • Ethan Pronovost
  • Frank Maximilian
  • Gal Ben David
  • github-actions[bot]
  • Guido van Rossum
  • henribru
  • Hong Xu
  • Hugues
  • Hynek Schlawack
  • jack1142
  • Jake Bailey
  • Jason Fried
  • Jelle Zijlstra
  • Jia Chen
  • Jon Dufresne
  • Jonathan Schoonhoven
  • Jonathan Slenders
  • Joseph Haaga
  • Julien Danjou
  • Jun Jia
  • Jérome Perrin
  • karl ding
  • Katelyn Gigante
  • Kaushal Rohit
  • Kevin Wojniak
  • ky-gog
  • Kyle Fuller
  • kylec1
  • Lam Son Ho
  • Lourens Veen
  • Mahmoud Abduljawad
  • Mariam Maarouf
  • Marti Raudsepp
  • melassa
  • Mickaël Schoentgen
  • Mikhail Sveshnikov
  • Mikołaj Kuranowski
  • Moriyoshi Koizumi
  • Nate McMaster
  • Neel Somani
  • nicolas-harraudeau-sonarsource
  • Nikolaus Waxweiler
  • Nils K
  • Nipunn Koorapati
  • Oleg Höfling
  • Omar Sandoval
  • Paul
  • Pete Scopes
  • Peter Law
  • Philipp Hahn
  • Phillip Huang
  • proost
  • PythonCoderAS
  • Rajiv Bakulesh Shah
  • Ran Benita
  • Raphael Geronimi
  • Rebecca Chen
  • Sam Bull
  • Sebastian Rittau
  • Sergei Lebedev
  • Shantanu
  • Stefano Chiodino
  • Steve Dignam
  • Sténio Jacinto
  • Timur Kushukov
  • Tom Most
  • turettn
  • Unrud
  • Utsav
  • Vasily Zakharov
  • Vincent Barbaresi
  • Vincent Meurisse
  • 愚氓
  • Yuri Khan