The Mypy Blog: Why Mypy Can Be More Efficient Than Python

Several projects have tried to fix the inefficiency of CPython, with varying degrees of success. The most notable of these is PyPy, which is a JIT compiler for Python (among other things). Mypy is different from the previous projects in several ways, and I believe that these differences are the keys to understanding why I chose the mypy approach.

Some projects base their work on the CPython VM; examples include Cython and Nuitka. These have two major problems:

The GIL (Global Interpreter Lock) is deeply ingrained in the CPython implementation. There seems to be no way to get rid of it without either actually degrading performance further or rearchitecting a lot of the VM and potentially breaking backward compatibility.
The CPython VM was designed to be easy to develop, not to have high performance. Again, it is practically impossible to improve performance significantly without major rewrites and breaking compatibility, as this was not a goal in the original architecture. Mind you, Python is a very useful language and widely used, so I'm not implying that the original goals were somehow wrong; only that some of the goals are different from what I and many other people want.

Mypy will not have these problems since it uses a fresh VM implementation. These issues are not exceedingly difficult to solve if they are taken into consideration early enough in development. Still, a new VM alone does not make it easy to get good performance or to get rid of the GIL effectively.

PyPy takes a different approach of implementing Python. PyPy has reimplemented all the VM infrastructure and it has a new interpreter and a JIT compiler. However, PyPy aims at almost perfect compatibility with CPython: it holds the Python semantics as almost sacred (different Python implementations actually differ in semantics in important ways; a good example is reference counting versus garbage collection). Even though I have a great deal of respect for the PyPy project, their approach has still major problems that the project has not, in my opinion, solved in a satisfactory way, and that seem to be fundamentally difficult to solve with their chosen constraints:

The GIL is actually deeply ingrained in Python semantics, not only in the implementation. To be fully compatible with CPython, a parallel Python implementation has to effectively introduce locking at a very fine-grained granularity (Discussion of GIL in the Python documentation). By effectively I mean that this locking semantics needs at least be emulated. PyPy is in the process of adopting Software Transactional Memory (STM) for this (blog post). However, Armin Rigo wrote that the performance with STM is at least 2x slower than sequential code, which I find unacceptable. Besides, to me it's still not obvious whether STM is a good match for many programs: STM is still an active research area, and it has not seen a huge mainstream adoption yet.
The dynamism of Python, even without the problem of the GIL ("almost everything is a dictionary"), makes it difficult to compile Python to efficient native code, and this affects especially heavily object-oriented programs. PyPy tackles it by using a trace compiler, but it often results in very slow program "warm-up" (sometimes several minutes!) and high memory usage. In my admittedly small experiments with PyPy, a program that runs for a couple of seconds ran almost fully interpreted (according to the PyPy profiler). As the PyPy interpreter is about twice as slow as in CPython, the program ran about 2x as slowly with PyPy than with CPython!

Mypy deals with these problems in three ways. These are key to why I believe that mypy can achieve its goals. These are also key to why mypy will never be 100% compatible with Python. In my opinion this is a fair tradeoff, and in practice I believe that it is almost a necessary tradeoff in order to make the performance of Python competitive with other languages such as Java or C#, which don't have the above issues:

Mypy will make more things immutable by default than Python. In particular, classes are immutable unless otherwise specified (analogous to extension or C classes in CPython). This reduces the need for fine-grained locking a lot, and it also makes it possible to optimize many operations such as method lookups in ways that are difficult to achieve with Python semantics.
Mypy does not guarantee implicit locking at boundaries corresponding to CPython bytecodes. Due to the additional immutability mentioned above, often this actually does not make any practical difference for programs. For example, dictionaries have no implicit locking. However, relying on such low-level implementation details for implicit synchronization in Python programs is begging for trouble. The implicit locking is defined in terms of the CPython bytecode format, and not many programmers know or even want to know into what kind of bytecode their code compiles.
Mypy will support static typing with reasonable runtime soundness guarantees (the soundness properties will be sometimes stronger than Java, sometimes similar). These allow a mypy implementation to get rid of the final stumbling block of running Python programs efficiently: dynamic typing. Due to the points 1 and 2 above, also dynamically typed mypy can be faster than Python, but for close to optimal performance programs should use static typing at least in the hot spots.

So the net result is that mypy can support efficient ahead-of-time compilation with fast program startup, and GIL-free efficient threading, with overhead comparable to that of statically typed languages with safe runtime semantics. Mypy can still support a lof of the dynamism of Python; however it is opt-in. Unlike Python, you only pay for the dynamism that you use. You will be able to declare a method mutable and do monkey patching, but most programs hardly ever do this and they don't have to pay a hefty performance penalty for merely having the option of doing this.

Friday, 7 December 2012

Why Mypy Can Be More Efficient Than Python