Wednesday, 17 March 2021

Summer of Code with Mypy

TH3CHARLie, Xuanda Yang
th3charlie at gmail dot com
GitHub Profile
GSoC Project Page
GSoC Submission Page

Introduction

In Fall 2019, I was writing a course project in Python and was tired of the experience of finding type errors at the last minute, so I did some searching and tried mypy for the first time. At that time, I would not dream of doing a three-month coding project with the organization, which became reality last summer. In this post, I'd like to share the story of this very personal experience and talk about how I started to contribute to mypy and eventually became a Google Summer of Code (GSoC) student of this project.

Pre-application

After finishing my course project with mypy, out of curiosity, I took a look at its GitHub page and paid extra attention to the issue tracker. I searched for issues with the "good first issue" label and was surprised to find that most of them were equipped with comments from the maintainers telling how to tackle them. Reading through these comments, I found some issues that fit my level and started to take a try at them.

Before submitting a pull request, I read through the repository and checked every material that seemed useful for writing a PR, which includes the README file, the documentation and the wiki page. I learned about how to install the development version of mypy, how to test my PR locally and some high-level ideas about the codebase. This proves to save a lot of trouble in my future contributions to the project. My first PR to the project was Support Negative Int Literal and was reviewed by Ivan. He patiently guided me to refine the PR step by step and finally merged it into the main branch. The satisfaction of seeing my code entering the codebase of a tool used by people around the globe was huge and encouraged me to take more attempts on other good first issues. By the end of 2019, I had submitted around 20 PRs to the repository. Most of them were super simple tasks, like producing meaningful error messages or fixing a corner case of a specific type-checking rule.

Looking back now, writing all these simple PRs helped me gradually familiarize myself with the codebase and the development workflow, which became a valuable asset when applying to the GSoC project.

Applying to GSoC

Google Summer of Code (GSoC) is an annual event where students get to work with open source communities through the summer and even get some stipends from Google. Before Google announced last year's selected organizations and project ideas on Feb 20th, one of mypy's maintainers posted an issue named Project ideas listing several time-consuming project ideas. At that time, I was eager to try something more challenging than fixing small bugs so I spent some time reading them through and tried to figure out which of them fit my level. A few days later, the official organization list for GSoC 2020 was out and mypy was one of them and the previously mentioned issue included the potential project ideas. What a great match, I said to myself. So I made my mind and started to draft my application.

The student application of GSoC usually consists of two parts. The first is (are) mandatory project proposal(s). Each student can choose up to three project ideas from three different organizations. Finding suitable project ideas from hundreds of them is usually hard since understanding the idea descriptions already requires effort and even some domain knowledge. Thanks to my previous contribution experience at mypy, I didn't spend too much time on this matter and quickly narrowed my topic to compiler related ones. Generalize mypyc IR to Make non-C Backends Possible from mypy and another topic from LLVM were my choices. However, the latter one was picked by someone else through the mailing list before I even had the chance to take a closer look at it. So mypy became the only organization left for me to apply to.

To write a good proposal, I started to discuss directly with the maintainers (or future mentors) in the issue tracker. I'd say this is one of the best approaches to prepare your proposal. By doing this, you are essentially expressing your passion, demonstrating your background and skills directly to the mentors and the mentors can provide details of the project that help you get a better picture. Don't be afraid to discuss, open-source communities often have very nice people and mypy is definitely one of them. Jukka Lehtosalo and Michael (Sully) Sullivan provided tons of good advice that helped me formulate a plan for the project and therefore finish my proposal. I also came up with a prototype PR addressing one core change of my proposal before the deadline. Although it never got merged since we did it differently later, a working prototype is always welcomed and I'd say it would increase your chance of getting accepted.

The second and also the optional part is the organization-specific requirement(s). Last year for applicants to mypy, there were some posted requirements including course background, programming skills, submitted PRs and an email giving details.

After I submitted my proposal to GSoC's official site and sent an email to the mypy team, I did a short video interview with the mypy team. I talked about what I would do to make the project successful and also talked a little more about my background other than what was written in the email. The interview went quite well and several days later, I received an email from the mypy team and was told I was selected by the organization. Then all I had to do was to wait for the official announcement from Google. And days later, I officially became the first GSoC student of the mypy project and would start my coding summer soon.

Coding Through the Summer

After getting accepted to GSoC, what's left is to code through the summer. The project I worked on was Generalize mypyc IR to Make non-C Backends Possible. In short, it was a project to redesign and refactor some parts of the old mypyc IR to remove the embedded C backend-specific information. You can find the detailed technique report on the submission page. Here, I'd like to talk about other things that play as important a role as coding.

Plan the schedule and monitor the progress. Three months is just a blink of an eye, so it's important to have realistic goals to achieve at specific timestamps, to make sure the project goes smoothly. We had a rough schedule in the proposal and established monthly goals in later discussions. With the schedule and goals in mind, I was able to make progress continuously. I even had an extra week. Usually, before the coding period starts in June, there is one month for community bonding where students get to familiarize themselves with the workflow of the project and connect with the mentors. Thanks to my previous mypy contribution experience, I thought it would be fine to skip this period and start coding one week earlier than scheduled. This extra week gave my mentors and me a lot of flexibility to handle unexpected schedule delays.

Communication is the key. Jukka, one of my mentors, told me that it was always hard to come up with the perfect design at first as requirements and needs develop all the time. To face this developing nature, we communicated a lot. I had daily syncs with my mentors through Gitter chat, weekly meetings with Jukka, and monthly meetings with both Jukka and Sully. In all these meetings, I shared recent progress, they helped me remove any blockers, and we discussed component designs. I am one-hundred percent sure that this project would never have been this successful without such frequent and effective communication.

After the Summer

I completed this project and submitted the required materials in early September, with around 60 PRs merged into the repository. To me, the GSoC project ended pretty well, I received the certificate from Google as well as some stipends. More importantly, I've demonstrated my passion and skills to contribute to this organization more than just a community contributor. I was honored to become one of mypy's committers. That means a lot to me and I am looking forward to contributing more to the community.

That's my story with mypy through the Google Summer of Code 2020. I hope this post will provide some useful information to future potential applicants including applicants of this year since mypy has officially been selected as one of this year’s organizations. If you have any questions regarding this post, feel free to send me an email and I will be more than happy to discuss.

Acknowledgements

I'd like to express my sincere gratitude to every person and organization that made my GSoC 2020 adventure possible. I'd like to firstly thank Google and mypy for providing me this opportunity.

Especially, I'd like to thank my mentors Jukka Lehtosalo and Michael Sullivan. We had daily syncs every day on Gitter and weekly/monthly video meetings via Zoom. In every discussion, they helped me clean my thoughts and find out the best approach to meet our goals. They responded quickly to my PRs, giving high-quality review comments and suggestions. They mentored me with patience and passion and I feel connected even though we are several timezones/continents away. Their guidance is even beyond the scope of the project and helps me form good software engineering skills along the way.

I also would like to thank my parents for supporting me working on this project and Yaozhu Sun from the University of Hong Kong, who lit my passion for the field of compilers and programming languages two years ago. Finally, I'd like to thank Kanemura Miku from Hinatazaka 46 for all the mental support during this special summer.

Disclaimer

All content of this post only represents the author's personal opinions. The post does not, in any degree, constitute the view or opinion from Google or the mypy community.