The Mypy Blog: 2013

Friday 23 August 2013

Mypy Development Update #3

This is a short update about what's been happening in the mypy project.

I have (mostly) been on vacation recently, and now it's time to get back to mypy development. My PhD dissertation submission deadline also looms near, so I will have to work on that as well. Actually a lot of things are happening in my life — I will be relocating after finishing with my PhD, and this will involve (the typical) things such as finding a nice neighborhood, new schools/nurseries and shipping our pets and our belongings. I have already been glacial in responding to emails recently, and things may not get better until later this year.

I've been working on several new mypy features during the summer, though some of them are still incomplete. I want to get a "good-enough" prototype mypy implementation for my dissertation, and I really don't have time to polish things at this stage, which is a bit sad but can't really be helped. After submitting I can get back to doing things in a more orderly manner.

I'm really enthusisistic about some of the new features. Here is a sketch of some of them:

Multiple (implementation) inheritance now mostly works. This is a side effect of the switch to Python-compatible syntax, as mypy lost the distinction between classes and interfaces.
Abstract methods and abstract base classes (ABCs) are supported, using the abstractmethod decorator and the ABCMeta metaclass, as in Python.
Static methods are supported, using the staticmethod decorator.
There is support for read-only properties. Support for more complex properties is not there yet.
Type inference of code that uses isinstance checks is getting better. The nice result is that casts are often no longer needed in cases where they used to be required.
Conditional expressions of form x if y else z are supported.
For loops over tuple literals are supported.
The del statement is improved. You can now delete names and attributes. These don't affect type checking yet, but it makes it easier to adapt existing code that uses del statements to static typing.
I'm working on support for a feature which allows a kind of string polymorphism. It lets a single function implementation to support both str and bytes arguments with effective type checking. Previously either some code duplication or using dynamically typed code was needed in cases like this. This seems like a very useful feature especially for library code. I'll get back to this later when the implementation is more complete.
There is now some very limited Python 2.x support. It should be straightforward but moderately laborious to finish this, but other tasks are more urgent.
As always, several bugs have been fixed and standard library stubs have been improved.

As I had previously planned, I'm not currently working on the compiler. I focus on getting the type system and type checker reasonably feature-complete and stable first.

Tuesday 2 July 2013

Mypy Switches to Python-Compatible Syntax

Mypy now has a Python-compatible syntax! The implementation is already self-hosting, but it should still be considered very experimental. This is a huge change which touched almost every source code file in the repository, and there are still many unfixed bugs.

Here is small example of the new syntax:

  import typing

  def fib(n: int) -> None:
      a, b = 0, 1
      while a < n:
          print(a)
          a, b = b, a+b

I have mostly updated the web site and documentation to reflect the new syntax. The Mypy Tutorial (formerly Mypy Overview) is largely rewritten:

http://www.mypy-lang.org/tutorial.html

The update also adds more content to the tutorial that isn't directly related to the syntax switch.

Since the mypy implementation is now a valid Python 3 program, there is no need for a translation step before running it. Thus also the old mypy-py repository is no longer needed to run mypy. Instead, mypy now uses the standard Python setup.py mechanism for installation. Have a look at the updated README for details:

https://github.com/JukkaL/mypy/blob/master/README.md

I have updated the example programs on the web site:

http://www.mypy-lang.org/examples.html

I also updated the mypy roadmap:

http://www.mypy-lang.org/roadmap.html

If you are thirsty for more code to look at, point your browser at the git repository:

From now on, the old syntax is considered legacy and will not be supported by the implementation. There will be a tool for migrating mypy code written using the old syntax to the new syntax. The tool is already functional, but it still needs a bit if polish before it's generally useful. I will notify when it's available on this blog. I'll also write a longer blog post discussing the new syntax in more detail at a later date.

Any feedback is appreciated! The syntax is still work in progress, and improvements are possible.

Many thanks to Guido van Rossum, Sebastian Riikonen and Ron Murawski for ideas and comments.

Friday 28 June 2013

Python-Compatible Syntax for Mypy Is Almost Ready

This is just a quick update about what has been happening in the mypy project recently.

I have been working on the new Python-compatible syntax along with writing my dissertation during the last months. I've also done a lot of design and evaluation work related to potential new exciting mypy features. I won't go into details now, but these include much more powerful type inference, union types and better parametric polymorphism for string types.

The new syntax is currently mostly functional. I'm very happy with the initial results, and this now seems like the obvious way forward for mypy. The most important test was when I translated the entire mypy implementation to use the new syntax this Tuesday. I was able to do it in just a few hours using an automatic translator tool that I had written earlier, though there was quite a bit of manual clean-up work afterwards.

The code is available in the pythonsyntax branch for anybody who wants to play with it:

https://github.com/JukkaL/mypy/tree/pythonsyntax

Note that installing mypy now uses the standard setup.py approach instead of the old funky approach of using a git repository to distribute a runnable implementation. Have a look at the updated README for more information.

There is not much documentation, though. The wiki contains a short introruction to the new syntax:

http://www.mypy-lang.org/wiki/PythonCompatibleSyntax

The details of the syntax may still change, and I have many ideas of making the new syntax even better. I'll write about these later.

Monday 15 April 2013

PyCon Update: Python-Compatible Syntax for Mypy?

It's already weeks since PyCon! Phew, I've been busy recently. Anyway, I had an eventful trip to PyCon in Santa Clara, California. PyCon is the biggest Python conference with about 2500 delegates from all around the world (though most seemed to come from the US).

I had chats with Guido, Armin Rigo (PyPy) and many others. After the conference, I stayed around for a few days in the San Francisco Bay Area and gave a talk at Google Mountain View, and also visited Dropbox in San Francisco.

One of my main goals for the trip was trying to gauge whether mypy is going in the right direction in the eyes of the Python community. There was a lot of interest in the project, but some important issues were raised that I need to discuss in more detail.

Ability to do compile-time checking of programs even without a new VM was interesting to many. This would benefit projects and organizations with large existing Python code bases. However, these organizations also manage risks carefully. Currently mypy can be used on top of CPython, but the sources must always be translated to Python before execution. Adding the mypy tool chain to the core build process is something most seem to be reluctant to do. Obviously this is the case now as mypy is still experimental, but I got the impression that even if mypy would be considered stable and mature, relying on a third-party tool to be able to run their code would be a pretty daring and unlikely move. Also, mypy has the problem of being not-quite compatible with many Python tools such as IDEs. This is a chicken-an-egg problem: tool support probably would fix itself if mypy would be widely used, but it's difficult to get wide use without tool support. Library support is similar. However, there may be a way around this dilemma -- just stay with me for a few more paragraphs.

Many organizations using Python are still stuck with 2.x, and find the transition to Python 3 difficult. Even upgrades from 2.x to 2.x+1 have caused a lot of trouble, and the switch to Python 3 is much trickier, in large part due to changes in string representations (str/unicode in Python 2.x versus bytes/str in Python 3.x). Mypy currently only supports Python 3.x syntax, which limits its usefulness to many.

Some also saw the challenge of developing a production-quality mypy VM to be too large for our team. I think this is to a large part down to how previous projects have succeeded (or not), including PyPy: even after many years, and with several talented developers, still their adoption has been pretty slow in the Python community. Unladen Swallow is another example that showed that speeding up Python is not easy. Of course, mypy has goals different from PyPy and other previous projects, and our approach of targeting ahead-of-time compilation slashes development efforts by a large factor. But I agree that I won't be able to it alone, and getting funding for continued development is hard.

Based on suggestions from Guido and the above observations, I've worked now for some time on a pretty big proposal that would help address all of the above issues in some form or another. This is still in a planning stage, and no concrete plans are yet finalized. However, here are the main points:

For mypy to really take off, we need users. In order to realistically get users, there needs to be a low-risk way of adopting mypy incrementally in current projects implemented in Python.
There is a good amount of interest in optional typing in the Python community, but the approach should be non-invasive to current development processes, tool chains, etc.
The pragmatic way to resolve the two above issues is to make mypy syntax 100% compatible with Python, both Python 2.x and 3.x. There would be no need for a Python translation phase, and a normal Python interpreter could be used to run mypy programs directly. Also all Python tools would pretty much Just Work. Note that as this would be a syntactic change, it would have no significant impact on planned efficiency of the new VM compared to the current syntax and plans, though this would likely result in semantic changes as well (see below for more about these). Also, mypy already supports translation to Python. This would just remove the need for the translation step.
We should first focus most resources on the optional typing part instead of the the new VM and compiler in order to make mypy usable as a static type checker for CPython (and PyPy/Jython).
Now mypy would be much easier to adopt in organizations that would like to use optional typing to get better maintainability and productivity. I think that the above changes could speed up the adoption of mypy a lot. Also, the type checker part of mypy is a fairly straightforward project form an engineering point of view and there is no need for a large team of developers.
If mypy gets significant adoption, there would also be demand for the new VM and the compiler, and it would be easier (but still not exactly easy!) to get contributors, maybe even development funding, etc.

The above plan would imply redesigning the type annotation syntax of mypy. I've given it a lot of thought, and perhaps surprisingly, it seems that there would not be need for many compromises. Generally readability would be similar to the current syntax, and sometimes it would be even better. I'm not going to cover this in detail now, but the main difference would be the introduction of Python 3 style annotation syntax (obviously for Python 3.x only; Python 2.x needs a different approach):

  NOW:
    str greeting(str name):
        return 'hello, ' + name
  NEW PROPOSAL:
    def greeting(name:str) -> str:
        return 'hello, ' + name

Mypy uses nominal subtyping, even though structural subtyping would help model 'duck typing' in Python. Many people have expressed their interest in structural subtyping, and I discussed this at PyCon as well. Earlier, I thought that this couldn't be implemented efficiently on platforms that I would eventually like to be able to support, including Dalvik (Android). However, now I think I've figured out how to have efficient structural subtyping on basically any VM than could realistically run mypy, so the main objection is thrown out. Also, with the proposed Python-compatible syntax, structural subtyping could be a win for various reasons. In summary, it now seems likely that mypy will get support for structural subtyping in addition to nominal subtyping. I've started to prepare an enhancement proposal.

There are other, less major changes that Python compatibility would require. Mypy should support multiple inheritance without the current limitations, similar to Python. Again, I previously ruled this out due to efficiency concerns, but I think I was wrong and there is really no technical reason why multiple inheritance needs to be restricted to interfaces like it is now. Also, mypy needs to support metaclasses; this one trickier but I'm optimistic about it as well.

Let me know if you have any opinions on the proposed changes. Write comments below or send me en email.

Wednesday 13 March 2013

Mypy Development Update #2

This post is about the latest developments in the mypy project. A lot has happened since the last update in December, even though my family got hit by the flu pretty bad this winter; I lost perhaps two weeks of development time.

Latest Changes

There are many new type checker and Python back end features. Here are some of the more important:
- Added package support (modules with names of form foo.bar). The mypy implementation now uses packages. As a side effect, the mypy driver is now named driver.py instead of mypy.py!
- A module can be run as a script using the '-m' driver command line argument.
- Arbitrary statements and references to class attributes are supported in the class body.
- Added support for nested functions and classes.
- Import statements can be used anywhere, not just at the top level of a file.
- Added support for function decorators. I will write about using statically typed function decorators in another post.
- Added support for 'with' statements. Also updated various library classes to support the with statement.
- Special attributes such as __name__, __doc__ and __dict__ are supported.
- Implemented chained assignments such as x = y = z.
- Type checking of boolean operator expressions with non-boolean values works as expected (for example, s = s or 'x').
- Set literals {a, b, ...} work.
- Various minor conveniences now work, including raise without an argument and multiple types per an except block.
I have adapted several Python standard library modules to static typing. Currently there's around 3000 non-empty, non-comment lines of adapted code + around 7000 lines of related unit tests. I will write a separate blog post about my experiences, but generally the process was fairly smooth with the notable exception of the dozens of mypy bugs and missing features that I encountered during the work. However, this has improved the compiler front end tremendously, and I'm going to continue with more Python modules. An interesting result is that the type checker helped find several bugs in CPython 3.2 standard libs. This was somewhat unexpected. When starting the mypy project, I primarily wanted to improve programmer productivity and runtime efficiency. I wasn't really expecting to find bugs in debugged and tested code, so this is a very welcome result. Here's a link to the adapted code. Note that some unit tests still fail -- there's more fixing to do.
There are several new library stubs, including socket and time (thanks to Ron Murawski) and shlex. There are also dozens of fixes and additions to existing stubs, and several new partial stubs.
A lot of work since the last update has targeted the C back end. The C back end development has progressed well, though it will still take quite a lot of effort before it's usable for real programs. The biggest implementation changes are not user-visible and still only affect the first stages of the back end. However, this ground work will make future progress much faster.
We've started to use the GitHub issue tracker more actively. There are now 165 issues in total; 85 issues have been closed. Many changes and fixes still have no associated issue, though. As the project grows in size and complexity, the issue tracker will become more central to development.
Ashley Hewson started working on an automatic library stub generator. It could make it easy to enable mypy programs to access many Python modules.
I've written more about the compiler internals in the wiki.
Several new potential new features have been added to the wiki, including immutability and fixed-width integer types.
Several bugs have been fixed. Credits to me and Ashley.

Next Steps

Mypy development focus will shift more and more to the C back end (but porting more Python standard library code is important as well). There is still a lot of work to do before we can run interesting programs. For example, the current implementation has no garbage collector. The efficiency of the garbage collector is very important for mypy, as typical programs construct millions of objects. Instead of developing the garbage collector from scratch, I'm going to port the gc from the Alore VM. It's pretty speedy and has been working for me pretty well, and it supports multithreading. However, it does not support parallel collection yet, which is a minor downside.

The next minor C back end milestones is being able to run the well-known Pystone benchmark. Another small but important milestone is to be able to run unit tests using the native back end. This will speed up development.

The major long-term goal is to support a baseline compiler for a good subset of mypy and some standard library functionality, and to support self-compilation. This will speed up translation and compile times significantly. I concentrate on adding language features and making the compiler stable before turning to more complex optimizations. The baseline compiler will still give a good speedup over CPython due to static typing, optimised semantics and native code compilation, in addition to more powerful runtime type checking.

As before, there will also be incremental improvements to the compiler front end (parser and type checker) and the Python back end. The highest-priority features include properties, static and class methods and named tuples.

Thursday 31 January 2013

Mypy Native Code Back End: C vs LLVM

Mypy will initially use C as an intermediate step to compile to native code. I also seriously considered LLVM, and several people have recommended LLVM.

It was a tough decision; LLVM would probably be a good match as well. Here's my reasoning for choosing C:

C works everywhere. It's very stable and supported on older and more exotic systems as well.
A huge number of developers know C. If the back end uses C, more people will be able to help and debug problems. By contrast, LLVM is mostly used by specialists such as programming language implementors and researchers.
I know C very well. LLVM might have some problems or imitations that I'm not not aware of yet, and these might bite me. LLVM is also fairly complex and takes time to learn.
We would probably have to implement mypy bindings to the LLVM API. This would have to be in C++, since the LLVM C API does not seem to be very well supported. The API is large, so this would probably take some effort (maybe be a few weeks, maybe longer). We would also have to maintain these bindings. There are Python LLVM bindings, but they haven't been updated recently and I have no idea of how complete and usable they are -- another unknown.
C is probably "efficient enough", at least initially. LLVM has some low-level features that could be useful, but I doubt the difference is large in practice.
LLVM is slightly lower level than C. This probably translates to more development work.
My VM / runtime support code is in C, so it's probably easier to integrate it with a C back end and debug it than when using LLVM.

LLVM would also have benefits:

LLVM is designed for using in VMs; C is slightly awkward for this purpose (but it works and has been used in many projects).
It would be fairly easy to support JIT/dynamic compilation with LLVM, but with C it would be a pain (e.g. running gcc in a subprocess).
An LLVM based compiler would probably have faster compiles, since we wouldn't need the intermediate C parsing step. Besides, the code generator may be faster. But on the other hand, we can always use clang if the difference in code generation speed is large.

Implementing C generation (+ support code) is going to be a pretty small part of the entire mypy project, so rewriting it later to use LLVM is not a big deal. In the long term, LLVM is probably a better bet since we will want to support runtime code generation at some point.

In summary, I'm pretty sure that we can save several weeks of development time by using C initially, and most importantly, it reduces risks and uncertainty. Tackling two big uncertainties at the same time (a new programming language and an unfamiliar back end technology) would be asking for trouble. However, developing an alternative LLVM back end would be a useful project for anybody interested in LLVM and mypy internals.

Wednesday 23 January 2013

Mypy at PyCon 2013 in March

I will be attending PyCon 2013 in Santa Clara, CA this March. Just received the confirmation of travel funding a few days ago. If you are coming to PyCon this year, I'd be happy to arrange some time for a chat. I can also help anybody interested in becoming a contributor or in using mypy in their own projects.