If you’re like me, you have probably stumbled upon PyPy by now, and if you’re like me you didn’t understand exactly what it is. If you’re not like me, PyPy is a Python implementation written in Python, but, uhh, what does that mean?
The PyPy site isn’t all that helpful. It says:
The PyPy project aims at producing a flexible and fast Python implementation. The guiding idea is to translate a Python-level description of the Python language itself to lower level languages. Rumors have it that the secret goal is being faster-than-C which is nonsense, isn’t it?
This is only slightly less confusing than the Eclipse description (not really relevant here, but I mention it anyway), which reads:
Eclipse is an open source community whose projects are focused on providing a vendor-neutral open development platform and application frameworks for building software.
(Eclipse is an IDE)
Well, PyPy it turns out to be exactly what it says (surprise!). I was a bit confused at first because I thought that it was a set of optimisations designed to produce faster bytecode (a bit like Psyco), but it isn’t.
Here’s my translation of it: PyPy is a Python interpreter (and it seems it’s also a compiler?). “What good is a Python interpreter written in Python?”, I hear you ask. “Isn’t the current interpreter written in C? Isn’t C faster than Python?” The answer to the former is “wait”, and to the latter “yes, but wait”.
Indeed, if you tried to run PyPy on the current Python interpreter (CPython) you will have an interpreter interpreting an interpreter interpreting your program, and it’ll take longer to do than say (about 2000 times slower than normal CPython). What PyPy does right now (as of version 0.8.0 I think) is translate itself to a C program and then compile it, producing a faster Python interpreter (it’s not faster than CPython, it’s around 10x slower, but they haven’t even begun optimizing).
Since the previous paragraph was quite confusing, I will ease up on the recursiveness in this one. Imagine that you’ve just written a new operating system during your coffee break which will blow all the other operating systems away and make you king (or queen, in which case, call me) of the world, but you don’t have a C compiler and you don’t want to spend your next coffee break writing one. However, there is this great C/C**/whatever suite (let’s call it ECC) that some guy on the internet wrote, and it’s all written in C. “Well, what good is that”, you think, “if I had a C compiler I wouldn’t need this one, duh”. But wait, you could probably write a simple, lame C compiler in two minutes, and then you could compile ECC on your system. You do that, you compile ECC on your lameC compiler, but the code your compiler produces is very inefficient (but nevertheless correct). You now have a (very inefficient) compile of ECC in your system, and ECC is, lo and behold, a great C compiler. You compile ECC using your sucky ECC build, and this code is great. You now have a great compiler for your system!
Have I lost you yet? Apparently not, you’re very persistent. Anyway, this is one of the benefits of PyPy. If you have a Python interpreter (even a sucky one), you could compile PyPy and use it there. What’s more, though, is that PyPy is designed to be able to produce custom interpreters. Want a Python interpreter without garbage collection? Just disable it and build PyPy, and you have one for every system Python has ever run on. Also, since PyPy is written in Python, it’s much easier to experiment with it than interpreters in other languages.
I am really amazed by the speed of development of PyPy. They went from 0.7.0 to 0.8.0 in three months (they do it in sprints, where they basically all get together and code all day long), so development is pretty fast. I used to think that a Python-to-C compiler would take at least a few millennia, but fortunately it is not so. Now I’m just sitting around waiting to see what these guys will release next.
This is the end of this post, and hopefully at least 10% of you has made it this far. I really suck at explaining things.