Monday 15 April 2013

PyCon Update: Python-Compatible Syntax for Mypy?

It's already weeks since PyCon! Phew, I've been busy recently. Anyway, I had an eventful trip to PyCon in Santa Clara, California. PyCon is the biggest Python conference with about 2500 delegates from all around the world (though most seemed to come from the US).

I had chats with Guido, Armin Rigo (PyPy) and many others. After the conference, I stayed around for a few days in the San Francisco Bay Area and gave a talk at Google Mountain View, and also visited Dropbox in San Francisco.

One of my main goals for the trip was trying to gauge whether mypy is going in the right direction in the eyes of the Python community. There was a lot of interest in the project, but some important issues were raised that I need to discuss in more detail.

Ability to do compile-time checking of programs even without a new VM was interesting to many. This would benefit projects and organizations with large existing Python code bases. However, these organizations also manage risks carefully. Currently mypy can be used on top of CPython, but the sources must always be translated to Python before execution. Adding the mypy tool chain to the core build process is something most seem to be reluctant to do. Obviously this is the case now as mypy is still experimental, but I got the impression that even if mypy would be considered stable and mature, relying on a third-party tool to be able to run their code would be a pretty daring and unlikely move. Also, mypy has the problem of being not-quite compatible with many Python tools such as IDEs. This is a chicken-an-egg problem: tool support probably would fix itself if mypy would be widely used, but it's difficult to get wide use without tool support. Library support is similar. However, there may be a way around this dilemma -- just stay with me for a few more paragraphs.

Many organizations using Python are still stuck with 2.x, and find the transition to Python 3 difficult. Even upgrades from 2.x to 2.x+1 have caused a lot of trouble, and the switch to Python 3 is much trickier, in large part due to changes in string representations (str/unicode in Python 2.x versus bytes/str in Python 3.x). Mypy currently only supports Python 3.x syntax, which limits its usefulness to many.

Some also saw the challenge of developing a production-quality mypy VM to be too large for our team. I think this is to a large part down to how previous projects have succeeded (or not), including PyPy: even after many years, and with several talented developers, still their adoption has been pretty slow in the Python community. Unladen Swallow is another example that showed that speeding up Python is not easy. Of course, mypy has goals different from PyPy and other previous projects, and our approach of targeting ahead-of-time compilation slashes development efforts by a large factor. But I agree that I won't be able to it alone, and getting funding for continued development is hard.

Based on suggestions from Guido and the above observations, I've worked now for some time on a pretty big proposal that would help address all of the above issues in some form or another. This is still in a planning stage, and no concrete plans are yet finalized. However, here are the main points:

  1. For mypy to really take off, we need users. In order to realistically get users, there needs to be a low-risk way of adopting mypy incrementally in current projects implemented in Python.
  2. There is a good amount of interest in optional typing in the Python community, but the approach should be non-invasive to current development processes, tool chains, etc.
  3. The pragmatic way to resolve the two above issues is to make mypy syntax 100% compatible with Python, both Python 2.x and 3.x. There would be no need for a Python translation phase, and a normal Python interpreter could be used to run mypy programs directly. Also all Python tools would pretty much Just Work. Note that as this would be a syntactic change, it would have no significant impact on planned efficiency of the new VM compared to the current syntax and plans, though this would likely result in semantic changes as well (see below for more about these). Also, mypy already supports translation to Python. This would just remove the need for the translation step.
  4. We should first focus most resources on the optional typing part instead of the the new VM and compiler in order to make mypy usable as a static type checker for CPython (and PyPy/Jython).
  5. Now mypy would be much easier to adopt in organizations that would like to use optional typing to get better maintainability and productivity. I think that the above changes could speed up the adoption of mypy a lot. Also, the type checker part of mypy is a fairly straightforward project form an engineering point of view and there is no need for a large team of developers.
  6. If mypy gets significant adoption, there would also be demand for the new VM and the compiler, and it would be easier (but still not exactly easy!) to get contributors, maybe even development funding, etc.

The above plan would imply redesigning the type annotation syntax of mypy. I've given it a lot of thought, and perhaps surprisingly, it seems that there would not be need for many compromises. Generally readability would be similar to the current syntax, and sometimes it would be even better. I'm not going to cover this in detail now, but the main difference would be the introduction of Python 3 style annotation syntax (obviously for Python 3.x only; Python 2.x needs a different approach):

  NOW:
    str greeting(str name):
        return 'hello, ' + name
  NEW PROPOSAL:
    def greeting(name:str) -> str:
        return 'hello, ' + name

Mypy uses nominal subtyping, even though structural subtyping would help model 'duck typing' in Python. Many people have expressed their interest in structural subtyping, and I discussed this at PyCon as well. Earlier, I thought that this couldn't be implemented efficiently on platforms that I would eventually like to be able to support, including Dalvik (Android). However, now I think I've figured out how to have efficient structural subtyping on basically any VM than could realistically run mypy, so the main objection is thrown out. Also, with the proposed Python-compatible syntax, structural subtyping could be a win for various reasons. In summary, it now seems likely that mypy will get support for structural subtyping in addition to nominal subtyping. I've started to prepare an enhancement proposal.

There are other, less major changes that Python compatibility would require. Mypy should support multiple inheritance without the current limitations, similar to Python. Again, I previously ruled this out due to efficiency concerns, but I think I was wrong and there is really no technical reason why multiple inheritance needs to be restricted to interfaces like it is now. Also, mypy needs to support metaclasses; this one trickier but I'm optimistic about it as well.

Let me know if you have any opinions on the proposed changes. Write comments below or send me en email.