Archive for March, 2010

Auto-Provisioning OSGi Features: The Basics

March 29, 2010 Comments off

Recently, during a software project, we decided to deliver an application via OSGi inside a ServiceMix container. In order to do that, we chose to provision a “feature”, allowing the customer to install the entire application through a simple URL.

Doing so raised some interesting problems. How do you deploy a versioned “feature” file to a Nexus repository? And how do you automate the versioning of the bundles contained within it?

Our solution to that problem is the topic for this blog series.

Part 1/3 : The Basics

Turning your “features.xml” into a feature template

(Follow the link for the full blog post)

Categories: Point2 - Technical

PyCon 2010 Atlanta Event Report

March 16, 2010 Comments off
pycon 2010

pycon 2010

I got back from PyCon 2010 Atlanta for a while, I am still absorbing the huge volume of knowledge and information I gathered during the conference. It was an amazing experience for me to see at first hand what the Python community were doing. There were something for everyone in PyCon 2010, from beginner Python users to advanced python users. The tone of the conference was very friendly, it was a totally difference experience from corporate sponsored technology conferences.

During the two tutorial days prior to the conference days, I attended four tutorials:
Faster Python Programs through Optimization – The tutorial presented the guidelines and strategies of Python program optimization. It demonstrated the techniques on measuring speed(test.pystone), profiling  CPU usage(cProfile), and profiling Memory usage(Guppy_PE framework), which I never knew before.  The tutorial detailed the essential differences among Python built-in data types in terms of performance, which is also helpful for me.
Pinax Long TutorialPinax is an open-source platform built on the Django Web Framework. In the tutorial, the Pinax core developers presents on Pinax installation, creating projects,  leveraging Django resuable applications, modification of templates, Pinax specific settings, media handling, deployment. The most impressive part of the tutorial for me are how Pinax takes advantage of virtualenv and pip, both provided by Ian Bicking to streamline the installation process, and how Pinax  leverages the resuable Django applications. The tutorial also exposed lots of open source reusable Django applications that worth looking at in the coming days, to name a few, django-frontendadmin, django-flatblocks, django-ajax-validation, django-openid, and django-pagination.
Django in Depth – Django is one of my favourite topics during the conference. “In this tutorial, we’ll take a detailed look under the hood, covering everything from the guts of the ORM to the innards of the template system to how the admin interface really works”. James Bennett led us dive deep into the internal world of Django Web Framework, showed us the bits and pieces of Django’s ORM, Forms and Validation, Template system, request processing, view, and admin interface, which are far beyond the Django documentation covers.
Django Deployment Workshop – Another tutorial on Django presented by Jacob Kaplan-Moss covers  the creation of a full Django deployment environment running on a cluster of (virtual) machines. Jacob Kaplan-Moss walked us through a live demo on how to setup a production ready deployment environment on the cloud(Rackspace,  Amazon EC2) by removing the single point of failures one by one.

During the following three conference days, I attended lots of talks, here are the highlights of the topics that attracted me the most:
NoSQL Database was a hot topic during the conference, MongoDB, Cassandra, and Neo4j attracted lots of attentions. Mark Ramm and  Rick Copeland from presents the comparison between Relational DB and NonSQL Database, the practical guide on deciding what to use in projects, and how they quickly migrating one of their high traffic website from PHP to Python by using TurboGears, MongoDB, and Jinja templates. It is a live example to demonstrate how NoSQL Database could be used in real world projects.

Test in Python was another topic I followed closely in the conference: Ned Batchelder for the fame of,   gave a talk on test and testability, there was not too much new for me, as we emphasize TDD in the daily software development here in Point2. Michael J Foord, creator of the famous Python Mock library, gave a talk on New *and* Improved: Coming changes to unittest, the standard library test framework, which covered lots of great stuff coming to Python unittest, the most attractive bits are test discovery and more convenient assertion methods. By the way, you do not need to wait for upgrading your Python to 2.7 or 3.2 to take advantages of the new unittest library features, it was back ported to Python 2.4+. It is great to see that test picking up speed in Python community.

Using Django in Non-Standard Ways given by Eric Florenzano was an interesting topic, Eric covered how to use Django with alternatives to what Django offers and how to using bits Django offers in other contexts. He gave examples on  using Jinja2 template engine with Django, not using django.contrib.auth in Django application, not using ORM in Django, using Django’s ORM stand-alone. It is amazing to see how you can take advantages of the Django framework, even not in the standard way.

Overall, my experience of PyCon 2010 Atlanta was amazing.  The conference was great fun and informative.  It was great to be there – PyCon 2010 Atlanta.

An easier way to test logging code

March 5, 2010 1 comment

This article describes a simple technique a colleague and I stumbled upon to make unit testing logging easier in Java programs.

Why is it usually hard to unit test logging?

Static state is more or less the enemy of testability. Unfortunately, almost every Java logging example ever published (regardless of logging framework) stores the logging provider as static state:

public class MyClass {
    private static Logger log =
    public void foo() {"foo!");

In order to verify that calling the method “foo” actually emits a log message, we have to somehow intercept the call to Logger.getLogger(), and:

  1. Register some sort of hook so that we can know when something was logged
  2. Prevent the regular log output from happening. This is especially important when exceptions are logged, as having a lot of test stack traces being emitted as part of a test run is annoying, confusing, and slow.

The technique:

Do away with the static call, and use dependency injection instead.

public class MyClass {
    public MyClass(Logger log) {
        this.log = log;

    private final Logger log;

    public void foo() {"foo!");

Now any unit test that needs to verify logging behaviour can just inject a mock object. All commonly used java logging frameworks uses classes instead of interfaces, but modern mocking frameworks can mock classes without too much difficulty.

Now, make the static call within your dependency injection framework. Here’s a spring example:

<bean id="log4jLogger" class="org.apache.log4j.Logger"
    <constructor-arg index="0" value="com.point2.MyClass"/>

I find that this approach actually adds flexibility to your logging options. For example, if you have a number of classes that should all contribute messages to a single log file, inject the same logging reference into each of them.

<bean id="julLogger" class="java.util.logging.Logger"
    <constructor-arg index="0" value="emailLogFile"/>

<bean id="foo">
    <constructor-arg index="0" ref="julLogger"/>

<bean id="bar">
    <constructor-arg index="0" ref="julLogger"/>

In this case, it now makes sense to name this logger for the log file that it outputs to, and we are free to do so, without having to make sure that the change is caught in a number of classes. In addition, if we every have to refractor either of these classes (for example to move it into a different package), we can do so without fear of changing how our logging output works.

This is a specific example of the general principle that replacing static state with inverted dependencies can make your code more flexible and easier to test.

By Sean Reilly

Categories: Point2 - Technical Tags: ,

Homer Simpson’s Guide to Software Development

March 2, 2010 3 comments

Homer SimpsonFor some time during my early development career, I would get great satisfaction from coding an intricate and sophisticated solution to a problem. Complicated problems yield complicated solutions, don’t they? That sense of a job well done, looking at a nested recursive, highly optimised algorithm that would fill three pages but get the job done super fast.

Then I’d go home and relax, with the knowledge that I’d really earned my wage that day. I’d kick back, relax and watch my all time favourite show, The Simpsons. The perfect end to a perfect days coding.

Over time, I’ve begun to change my outlook in many ways. One of the most significant ways as far as my software development career is concerned though, is that I have become the Homer Simpson of software development.

This is all a little cryptic, so let me explain. During the multitude of Simpson’s episodes, Homer has spoken several pearls of wisdom. The one I hold dearest to is “If something is hard to do, it’s not worth doing at all”. This is my mantra when coding, as my coworkers will grumpily attest. When writing code, if I start seeing that it is getting complicated, or I try explaining it, and find it hard to verbalise, then the code is wrong. I’ve come to see complexity as a sign that I’ve gone wrong.

Another Homer gem, ‘You tried your best and you failed miserably. The lesson is, never try’. I see this as another statement that if you have to work really hard to code the task you are working on, and you are stuck elbow deep in code with no idea how to get past the current hurdle, then give up! You are almost always going the wrong way.

Software that is hard to write is also hard to understand, maintain, extend etc. I have come to love that eureka moment when I spot the obvious answer to the seemingly hard problem. The simpler the solution, the happier it makes me. This means that I could expect any member of the development team to be able to look at the solution I’ve implemented, quickly understand what it is doing and why, and then be able to add features to it.

So, Homer Simpson, thanks for being a great mentor.

By: Chris Tarttelin

Scaling applications with Python

March 1, 2010 1 comment

This year’s PyCon had quite a few presentations on how Python is being used to scale applications to massive volumes. There are many companies using Python for their large traffic web sites. A shortlist:-


Here, I’m going to look at a big Python stack that came up a number of times.


Jinja2 is a template engine, which is similar in many respects to Django templates. Jinja2 does not set out to be the fastest template engine, striving to be easy to use, provide easily configurable syntax, be easy to debug and provide sandboxing to allow running third party templates in a safe environment.

Although it doesn’t set out to be the fastest, in some benchmarks it is more than 10 times faster than Django’s template engine. It’s this balance of features and speed that made it so suited to the high traffic sites.


Bottle is a very fast and simple WSGI web framework. It is pure python, and fits into a single file. Bottle uses an @route function decorator to identify what url to use to invoke a given method, and allows you to return either a generator for streaming straight back to the caller, or a dictionary to be used by a template defined with an @view decorator. That is pretty much all that bottle does. It has it’s own simple template engine, and a single threaded server, which are both fine for development. It also has support for a multitude of template engines and multi threaded servers.


At 3 of the presentations on scaling apps, the presenters said they use paste httpserver to serve their applications. In 2 other presentations, the applications were running on twisted.web. Paste is written by the talented Ian Bicking, and is more than just a multi threaded http server. In the presentations I went to though, they were just using the httpserver component.


This is a highly reliable messaging system based on the emerging AMQP standard. RabbitMQ came up many times in different talks during the conference. So much so that it would appear to be the primary choice for the Python community at the moment. The interesting point for me was that it’s written in Erlang, so the attraction to it isn’t driven by it’s Python street cred. RabbitMQ was used extensively to off load any processing that can be done asynchronously. In the larger volume application, a common pattern was to try to find anything that could possibly be done asynchronously, to allow a response to be returned as quickly as possible. There was an excellent presentation by Jinal Jhaveri on Scaling Python webapps from zero to 50 million users. This was the story of a game developed for Facebook, that went from 0 users to 1 million in it’s first week. Messaging was the key architectural component that enabled them to maintain a good user experience.


The twisted framework has been around for a long time, and is already widely used. I went to a presentation on cooperative multi-tasking, which looked at ways of substantially increasing performance of your code. As well as networking and filesystems as a source of blocking, code is a common area for blocking. Twisted offers support for non-blocking sockets for networking, and constructs for use in your code to prevent blocking.

The primary target in the presentation was for loops. Calling functions on elements in a for loop running in a single process is a source of blocking, and twisted has tools to improve this. You supply an iterator, and twisted then schedules this to execute along with any other tasks, in multiple processes. In Python, starting multiple threads is not a good solution, because of the way python uses the Global Interpreter Lock, but multi-processes are a good solution, and Twisted make multi processing easy.


There were a lot of talks about NoSQL databases, or document stores during the conference. There were presentations covering MongoDB, Redis, Cassandra and Neo4J (Neo4J is particularly targeted at persisting graphs). Of them all, the one that stood out for me was MongoDB. Both due to it’s maturity, and the relatively easy mental transition from a relational database model. The lead developer of the Python driver for MongoDB, Mike Dirolf, presented in an Open Spaces session, which was excellent. MongoDB in written in C++, and stores JSON documents in collections. It supports a JSON query language that has a SQL like feel, and is blisteringly fast. There are production usages of MongoDB, holding more than 600 million documents. SourceForge are moving more and more of their data onto MongoDB, including page caching.

It was great to see so many presentations on using Python for big, complex, popular applications, and the wealth of tools out there to make it possible. It’s clear to see that Python is well and truly Production Ready. In my next blog, I’ll write about using some of these components to replace an existing large Java REST service with something small, simple and pythonic that is functionally equivalent.

By: Chris Tarttelin