I got back from PyCon 2010 Atlanta for a while, I am still absorbing the huge volume of knowledge and information I gathered during the conference. It was an amazing experience for me to see at first hand what the Python community were doing. There were something for everyone in PyCon 2010, from beginner Python users to advanced python users. The tone of the conference was very friendly, it was a totally difference experience from corporate sponsored technology conferences.
During the two tutorial days prior to the conference days, I attended four tutorials:
Faster Python Programs through Optimization – The tutorial presented the guidelines and strategies of Python program optimization. It demonstrated the techniques on measuring speed(test.pystone), profiling CPU usage(cProfile), and profiling Memory usage(Guppy_PE framework), which I never knew before. The tutorial detailed the essential differences among Python built-in data types in terms of performance, which is also helpful for me.
Pinax Long Tutorial – Pinax is an open-source platform built on the Django Web Framework. In the tutorial, the Pinax core developers presents on Pinax installation, creating projects, leveraging Django resuable applications, modification of templates, Pinax specific settings, media handling, deployment. The most impressive part of the tutorial for me are how Pinax takes advantage of virtualenv and pip, both provided by Ian Bicking to streamline the installation process, and how Pinax leverages the resuable Django applications. The tutorial also exposed lots of open source reusable Django applications that worth looking at in the coming days, to name a few, django-frontendadmin, django-flatblocks, django-ajax-validation, django-openid, and django-pagination.
Django in Depth – Django is one of my favourite topics during the conference. “In this tutorial, we’ll take a detailed look under the hood, covering everything from the guts of the ORM to the innards of the template system to how the admin interface really works”. James Bennett led us dive deep into the internal world of Django Web Framework, showed us the bits and pieces of Django’s ORM, Forms and Validation, Template system, request processing, view, and admin interface, which are far beyond the Django documentation covers.
Django Deployment Workshop – Another tutorial on Django presented by Jacob Kaplan-Moss covers the creation of a full Django deployment environment running on a cluster of (virtual) machines. Jacob Kaplan-Moss walked us through a live demo on how to setup a production ready deployment environment on the cloud(Rackspace, Amazon EC2) by removing the single point of failures one by one.
During the following three conference days, I attended lots of talks, here are the highlights of the topics that attracted me the most:
NoSQL Database was a hot topic during the conference, MongoDB, Cassandra, and Neo4j attracted lots of attentions. Mark Ramm and Rick Copeland from SourceForge.net presents the comparison between Relational DB and NonSQL Database, the practical guide on deciding what to use in projects, and how they quickly migrating one of their high traffic website from PHP to Python by using TurboGears, MongoDB, and Jinja templates. It is a live example to demonstrate how NoSQL Database could be used in real world projects.
Test in Python was another topic I followed closely in the conference: Ned Batchelder for the fame of coverage.py, gave a talk on test and testability, there was not too much new for me, as we emphasize TDD in the daily software development here in Point2. Michael J Foord, creator of the famous Python Mock library, gave a talk on New *and* Improved: Coming changes to unittest, the standard library test framework, which covered lots of great stuff coming to Python unittest, the most attractive bits are test discovery and more convenient assertion methods. By the way, you do not need to wait for upgrading your Python to 2.7 or 3.2 to take advantages of the new unittest library features, it was back ported to Python 2.4+. It is great to see that test picking up speed in Python community.
Using Django in Non-Standard Ways given by Eric Florenzano was an interesting topic, Eric covered how to use Django with alternatives to what Django offers and how to using bits Django offers in other contexts. He gave examples on using Jinja2 template engine with Django, not using django.contrib.auth in Django application, not using ORM in Django, using Django’s ORM stand-alone. It is amazing to see how you can take advantages of the Django framework, even not in the standard way.
Overall, my experience of PyCon 2010 Atlanta was amazing. The conference was great fun and informative. It was great to be there – PyCon 2010 Atlanta.
This year’s PyCon had quite a few presentations on how Python is being used to scale applications to massive volumes. There are many companies using Python for their large traffic web sites. A shortlist:-
Here, I’m going to look at a big Python stack that came up a number of times.
Jinja2 is a template engine, which is similar in many respects to Django templates. Jinja2 does not set out to be the fastest template engine, striving to be easy to use, provide easily configurable syntax, be easy to debug and provide sandboxing to allow running third party templates in a safe environment.
Although it doesn’t set out to be the fastest, in some benchmarks it is more than 10 times faster than Django’s template engine. It’s this balance of features and speed that made it so suited to the high traffic sites.
Bottle is a very fast and simple WSGI web framework. It is pure python, and fits into a single file. Bottle uses an @route function decorator to identify what url to use to invoke a given method, and allows you to return either a generator for streaming straight back to the caller, or a dictionary to be used by a template defined with an @view decorator. That is pretty much all that bottle does. It has it’s own simple template engine, and a single threaded server, which are both fine for development. It also has support for a multitude of template engines and multi threaded servers.
At 3 of the presentations on scaling apps, the presenters said they use paste httpserver to serve their applications. In 2 other presentations, the applications were running on twisted.web. Paste is written by the talented Ian Bicking, and is more than just a multi threaded http server. In the presentations I went to though, they were just using the httpserver component.
This is a highly reliable messaging system based on the emerging AMQP standard. RabbitMQ came up many times in different talks during the conference. So much so that it would appear to be the primary choice for the Python community at the moment. The interesting point for me was that it’s written in Erlang, so the attraction to it isn’t driven by it’s Python street cred. RabbitMQ was used extensively to off load any processing that can be done asynchronously. In the larger volume application, a common pattern was to try to find anything that could possibly be done asynchronously, to allow a response to be returned as quickly as possible. There was an excellent presentation by Jinal Jhaveri on Scaling Python webapps from zero to 50 million users. This was the story of a game developed for Facebook, that went from 0 users to 1 million in it’s first week. Messaging was the key architectural component that enabled them to maintain a good user experience.
The twisted framework has been around for a long time, and is already widely used. I went to a presentation on cooperative multi-tasking, which looked at ways of substantially increasing performance of your code. As well as networking and filesystems as a source of blocking, code is a common area for blocking. Twisted offers support for non-blocking sockets for networking, and constructs for use in your code to prevent blocking.
The primary target in the presentation was for loops. Calling functions on elements in a for loop running in a single process is a source of blocking, and twisted has tools to improve this. You supply an iterator, and twisted then schedules this to execute along with any other tasks, in multiple processes. In Python, starting multiple threads is not a good solution, because of the way python uses the Global Interpreter Lock, but multi-processes are a good solution, and Twisted make multi processing easy.
There were a lot of talks about NoSQL databases, or document stores during the conference. There were presentations covering MongoDB, Redis, Cassandra and Neo4J (Neo4J is particularly targeted at persisting graphs). Of them all, the one that stood out for me was MongoDB. Both due to it’s maturity, and the relatively easy mental transition from a relational database model. The lead developer of the Python driver for MongoDB, Mike Dirolf, presented in an Open Spaces session, which was excellent. MongoDB in written in C++, and stores JSON documents in collections. It supports a JSON query language that has a SQL like feel, and is blisteringly fast. There are production usages of MongoDB, holding more than 600 million documents. SourceForge are moving more and more of their data onto MongoDB, including page caching.
It was great to see so many presentations on using Python for big, complex, popular applications, and the wealth of tools out there to make it possible. It’s clear to see that Python is well and truly Production Ready. In my next blog, I’ll write about using some of these components to replace an existing large Java REST service with something small, simple and pythonic that is functionally equivalent.
By: Chris Tarttelin
March seems to be a busy month for conferences. We’re sending people to both SD West and PyCon so if you’re going to be there, let us know, either via email or in the comments. We’d love to have a chance to hook up with people who read the blog. More details on our attendance at each of these conferences coming soon.