Monday, 15 April 2013

PyCon Update: Python-Compatible Syntax for Mypy?

It's already weeks since PyCon! Phew, I've been busy recently. Anyway, I had an eventful trip to PyCon in Santa Clara, California. PyCon is the biggest Python conference with about 2500 delegates from all around the world (though most seemed to come from the US).

I had chats with Guido, Armin Rigo (PyPy) and many others. After the conference, I stayed around for a few days in the San Francisco Bay Area and gave a talk at Google Mountain View, and also visited Dropbox in San Francisco.

One of my main goals for the trip was trying to gauge whether mypy is going in the right direction in the eyes of the Python community. There was a lot of interest in the project, but some important issues were raised that I need to discuss in more detail.

Ability to do compile-time checking of programs even without a new VM was interesting to many. This would benefit projects and organizations with large existing Python code bases. However, these organizations also manage risks carefully. Currently mypy can be used on top of CPython, but the sources must always be translated to Python before execution. Adding the mypy tool chain to the core build process is something most seem to be reluctant to do. Obviously this is the case now as mypy is still experimental, but I got the impression that even if mypy would be considered stable and mature, relying on a third-party tool to be able to run their code would be a pretty daring and unlikely move. Also, mypy has the problem of being not-quite compatible with many Python tools such as IDEs. This is a chicken-an-egg problem: tool support probably would fix itself if mypy would be widely used, but it's difficult to get wide use without tool support. Library support is similar. However, there may be a way around this dilemma -- just stay with me for a few more paragraphs.

Many organizations using Python are still stuck with 2.x, and find the transition to Python 3 difficult. Even upgrades from 2.x to 2.x+1 have caused a lot of trouble, and the switch to Python 3 is much trickier, in large part due to changes in string representations (str/unicode in Python 2.x versus bytes/str in Python 3.x). Mypy currently only supports Python 3.x syntax, which limits its usefulness to many.

Some also saw the challenge of developing a production-quality mypy VM to be too large for our team. I think this is to a large part down to how previous projects have succeeded (or not), including PyPy: even after many years, and with several talented developers, still their adoption has been pretty slow in the Python community. Unladen Swallow is another example that showed that speeding up Python is not easy. Of course, mypy has goals different from PyPy and other previous projects, and our approach of targeting ahead-of-time compilation slashes development efforts by a large factor. But I agree that I won't be able to it alone, and getting funding for continued development is hard.

Based on suggestions from Guido and the above observations, I've worked now for some time on a pretty big proposal that would help address all of the above issues in some form or another. This is still in a planning stage, and no concrete plans are yet finalized. However, here are the main points:

  1. For mypy to really take off, we need users. In order to realistically get users, there needs to be a low-risk way of adopting mypy incrementally in current projects implemented in Python.
  2. There is a good amount of interest in optional typing in the Python community, but the approach should be non-invasive to current development processes, tool chains, etc.
  3. The pragmatic way to resolve the two above issues is to make mypy syntax 100% compatible with Python, both Python 2.x and 3.x. There would be no need for a Python translation phase, and a normal Python interpreter could be used to run mypy programs directly. Also all Python tools would pretty much Just Work. Note that as this would be a syntactic change, it would have no significant impact on planned efficiency of the new VM compared to the current syntax and plans, though this would likely result in semantic changes as well (see below for more about these). Also, mypy already supports translation to Python. This would just remove the need for the translation step.
  4. We should first focus most resources on the optional typing part instead of the the new VM and compiler in order to make mypy usable as a static type checker for CPython (and PyPy/Jython).
  5. Now mypy would be much easier to adopt in organizations that would like to use optional typing to get better maintainability and productivity. I think that the above changes could speed up the adoption of mypy a lot. Also, the type checker part of mypy is a fairly straightforward project form an engineering point of view and there is no need for a large team of developers.
  6. If mypy gets significant adoption, there would also be demand for the new VM and the compiler, and it would be easier (but still not exactly easy!) to get contributors, maybe even development funding, etc.

The above plan would imply redesigning the type annotation syntax of mypy. I've given it a lot of thought, and perhaps surprisingly, it seems that there would not be need for many compromises. Generally readability would be similar to the current syntax, and sometimes it would be even better. I'm not going to cover this in detail now, but the main difference would be the introduction of Python 3 style annotation syntax (obviously for Python 3.x only; Python 2.x needs a different approach):

  NOW:
    str greeting(str name):
        return 'hello, ' + name
  NEW PROPOSAL:
    def greeting(name:str) -> str:
        return 'hello, ' + name

Mypy uses nominal subtyping, even though structural subtyping would help model 'duck typing' in Python. Many people have expressed their interest in structural subtyping, and I discussed this at PyCon as well. Earlier, I thought that this couldn't be implemented efficiently on platforms that I would eventually like to be able to support, including Dalvik (Android). However, now I think I've figured out how to have efficient structural subtyping on basically any VM than could realistically run mypy, so the main objection is thrown out. Also, with the proposed Python-compatible syntax, structural subtyping could be a win for various reasons. In summary, it now seems likely that mypy will get support for structural subtyping in addition to nominal subtyping. I've started to prepare an enhancement proposal.

There are other, less major changes that Python compatibility would require. Mypy should support multiple inheritance without the current limitations, similar to Python. Again, I previously ruled this out due to efficiency concerns, but I think I was wrong and there is really no technical reason why multiple inheritance needs to be restricted to interfaces like it is now. Also, mypy needs to support metaclasses; this one trickier but I'm optimistic about it as well.

Let me know if you have any opinions on the proposed changes. Write comments below or send me en email.

Wednesday, 13 March 2013

Mypy Development Update #2

This post is about the latest developments in the mypy project. A lot has happened since the last update in December, even though my family got hit by the flu pretty bad this winter; I lost perhaps two weeks of development time.

Latest Changes

  • There are many new type checker and Python back end features. Here are some of the more important:
    • Added package support (modules with names of form foo.bar). The mypy implementation now uses packages. As a side effect, the mypy driver is now named driver.py instead of mypy.py!
    • A module can be run as a script using the '-m' driver command line argument.
    • Arbitrary statements and references to class attributes are supported in the class body.
    • Added support for nested functions and classes.
    • Import statements can be used anywhere, not just at the top level of a file.
    • Added support for function decorators. I will write about using statically typed function decorators in another post.
    • Added support for 'with' statements. Also updated various library classes to support the with statement.
    • Special attributes such as __name__, __doc__ and __dict__ are supported.
    • Implemented chained assignments such as x = y = z.
    • Type checking of boolean operator expressions with non-boolean values works as expected (for example, s = s or 'x').
    • Set literals {a, b, ...} work.
    • Various minor conveniences now work, including raise without an argument and multiple types per an except block.
  • I have adapted several Python standard library modules to static typing. Currently there's around 3000 non-empty, non-comment lines of adapted code + around 7000 lines of related unit tests. I will write a separate blog post about my experiences, but generally the process was fairly smooth with the notable exception of the dozens of mypy bugs and missing features that I encountered during the work. However, this has improved the compiler front end tremendously, and I'm going to continue with more Python modules. An interesting result is that the type checker helped find several bugs in CPython 3.2 standard libs. This was somewhat unexpected. When starting the mypy project, I primarily wanted to improve programmer productivity and runtime efficiency. I wasn't really expecting to find bugs in debugged and tested code, so this is a very welcome result. Here's a link to the adapted code. Note that some unit tests still fail -- there's more fixing to do.
  • There are several new library stubs, including socket and time (thanks to Ron Murawski) and shlex. There are also dozens of fixes and additions to existing stubs, and several new partial stubs.
  • A lot of work since the last update has targeted the C back end. The C back end development has progressed well, though it will still take quite a lot of effort before it's usable for real programs. The biggest implementation changes are not user-visible and still only affect the first stages of the back end. However, this ground work will make future progress much faster.
  • We've started to use the GitHub issue tracker more actively. There are now 165 issues in total; 85 issues have been closed. Many changes and fixes still have no associated issue, though. As the project grows in size and complexity, the issue tracker will become more central to development.
  • Ashley Hewson started working on an automatic library stub generator. It could make it easy to enable mypy programs to access many Python modules.
  • I've written more about the compiler internals in the wiki.
  • Several new potential new features have been added to the wiki, including immutability and fixed-width integer types.
  • Several bugs have been fixed. Credits to me and Ashley.

Next Steps

Mypy development focus will shift more and more to the C back end (but porting more Python standard library code is important as well). There is still a lot of work to do before we can run interesting programs. For example, the current implementation has no garbage collector. The efficiency of the garbage collector is very important for mypy, as typical programs construct millions of objects. Instead of developing the garbage collector from scratch, I'm going to port the gc from the Alore VM. It's pretty speedy and has been working for me pretty well, and it supports multithreading. However, it does not support parallel collection yet, which is a minor downside.

The next minor C back end milestones is being able to run the well-known Pystone benchmark. Another small but important milestone is to be able to run unit tests using the native back end. This will speed up development.

The major long-term goal is to support a baseline compiler for a good subset of mypy and some standard library functionality, and to support self-compilation. This will speed up translation and compile times significantly. I concentrate on adding language features and making the compiler stable before turning to more complex optimizations. The baseline compiler will still give a good speedup over CPython due to static typing, optimised semantics and native code compilation, in addition to more powerful runtime type checking.

As before, there will also be incremental improvements to the compiler front end (parser and type checker) and the Python back end. The highest-priority features include properties, static and class methods and named tuples.

Thursday, 31 January 2013

Mypy Native Code Back End: C vs LLVM

Mypy will initially use C as an intermediate step to compile to native code. I also seriously considered LLVM, and several people have recommended LLVM.

It was a tough decision; LLVM would probably be a good match as well. Here's my reasoning for choosing C:

  • C works everywhere. It's very stable and supported on older and more exotic systems as well.
  • A huge number of developers know C. If the back end uses C, more people will be able to help and debug problems. By contrast, LLVM is mostly used by specialists such as programming language implementors and researchers.
  • I know C very well. LLVM might have some problems or imitations that I'm not not aware of yet, and these might bite me. LLVM is also fairly complex and takes time to learn.
  • We would probably have to implement mypy bindings to the LLVM API. This would have to be in C++, since the LLVM C API does not seem to be very well supported. The API is large, so this would probably take some effort (maybe be a few weeks, maybe longer). We would also have to maintain these bindings. There are Python LLVM bindings, but they haven't been updated recently and I have no idea of how complete and usable they are -- another unknown.
  • C is probably "efficient enough", at least initially. LLVM has some low-level features that could be useful, but I doubt the difference is large in practice.
  • LLVM is slightly lower level than C. This probably translates to more development work.
  • My VM / runtime support code is in C, so it's probably easier to integrate it with a C back end and debug it than when using LLVM.

LLVM would also have benefits:

  • LLVM is designed for using in VMs; C is slightly awkward for this purpose (but it works and has been used in many projects).
  • It would be fairly easy to support JIT/dynamic compilation with LLVM, but with C it would be a pain (e.g. running gcc in a subprocess).
  • An LLVM based compiler would probably have faster compiles, since we wouldn't need the intermediate C parsing step. Besides, the code generator may be faster. But on the other hand, we can always use clang if the difference in code generation speed is large.

Implementing C generation (+ support code) is going to be a pretty small part of the entire mypy project, so rewriting it later to use LLVM is not a big deal. In the long term, LLVM is probably a better bet since we will want to support runtime code generation at some point.

In summary, I'm pretty sure that we can save several weeks of development time by using C initially, and most importantly, it reduces risks and uncertainty. Tackling two big uncertainties at the same time (a new programming language and an unfamiliar back end technology) would be asking for trouble. However, developing an alternative LLVM back end would be a useful project for anybody interested in LLVM and mypy internals.

Wednesday, 23 January 2013

Mypy at PyCon 2013 in March

I will be attending PyCon 2013 in Santa Clara, CA this March. Just received the confirmation of travel funding a few days ago. If you are coming to PyCon this year, I'd be happy to arrange some time for a chat. I can also help anybody interested in becoming a contributor or in using mypy in their own projects.

Thursday, 20 December 2012

Mypy Development Update #1

The mypy project has been progressing smoothly during the last two or so weeks after the source release.

Latest changes:

  • I added a lot of content to the mypy language overview. It now covers more language features, explains common issues encountered when using static typing and describes the translation process to Python in some detail.
  • The wiki now contains instructions for adding support for additional Python modules by creating library stubs.
  • There have been several other updates to the wiki. It's starting to be useful tool for developers and users.
  • Several bugs in the mypy implementation have been fixed, and the type checker now supports type inference for lambdas. Also type inference of generic functions such as map has improved. Code like this now works as expected:
        print(list(map(str, [1, 2, 3])))
        
  • Work on the C back end has begun. I started porting some 2000 lines of code from my earlier Alore-to-Java compiler prototype to mypy. It's still going to take a few more days to port the code. If everything goes as planned, we should be able to compile some simple mypy code to reasonably efficient native code in 3 or 4 weeks, perhaps.
  • Even though the development focus is now on the C back end, I will also continue improving the type checker and the mypy-to-Python translator. One of the important milestones will be being able to port some Python standard library modules to static typing without too much effort.

I'm going to continue posting periodic updates like this that highlight the latest developments in the project.

Friday, 14 December 2012

Friday, 7 December 2012

Source Code Released

Mypy source code is available on GitHub:

https://github.com/JukkaL/mypy

Clone the repo and give it a try! Currently the mypy implementation lets you mix static types and dynamic types and translate mypy programs to readable Python. Type annotations and casts are treated as comments when translating to Python. As such there is no performance boost yet. The current prototype supports a useful but somewhat limited subset of Python features (library support is still limited).

There is also an issue tracker for reporting bugs.