Archive

Posts Tagged ‘python’

Python command line = Linux Bash command line

August 15, 2010 2 comments

Some of our technical staff are doing a fair amount of Python coding and also are preparing for the Linux Certification (LPIC-1) exam.
One of the first topics we have been studying is the Linux bash command line, command history and command editing.

Guess what, you may be surprised that modern Python interpreters use the GNU readline library… in plain English: you should be able to practice the skills you’ve gained during your Linux study sessions whenever you work with your Python interpreter, i.e.:

* Line editing
* History substitution
* Auto-completion

Cool.

This requires you to:

1) Add line to your bash start up script ~/.bashrc (or your ~/.bash_profile):
export PYTHONSTARTUP=/home/user/.pystartup
(changing /home/user to your home directory, e.g. /home/mtarruella)

2) create the file: .pystartup (see below) in your home directory

Python will execute the contents of a file identified by the PYTHONSTARTUP
environment variable when you start an interactive interpreter.

How do you use it ?

i) use the Emacs key bindings you learnt for bash
e.g. CTRL-A moves the cursor to the beginning of the line.

However, if you are more inclined to the Vim key bindings
you will need to configure readline placing the command:
set editing-mode vi
in the file ~/.inputrc

ii) Retrieve commands from your history by using CTRL-P (Previous command) and
CTRL-N
(Next command) or the useful ‘reversy’: CTRL-R (search command backwards).

iii) Start using command completion as if you were in the Linux bash command line e.g. start typing imp and press <Tab> you should then see:
>>> import
The whole word “import” should be auto-completed.

More detailed information on the topic: Python interactive

Happy Linux! and now Happy Python too!

Note that this is valid for *nix and Mac’s too.

By: Marcos Tarruella

An example of .pystartup file:


# Add auto-completion and a stored history file of commands to your Python
# interactive interpreter. Requires Python 2.0+, readline. Autocomplete is
# bound to the Esc key by default (you can change it - see readline docs).
#
# Store the file in ~/.pystartup, and set an environment variable to point
# to it:  "export PYTHONSTARTUP=/home/user/.pystartup" in bash.
#
# Note that PYTHONSTARTUP does *not* expand "~", so you have to put in the
# full path to your home directory.

import atexit
import os
import readline
import rlcompleter

historyPath = os.path.expanduser("~/.pyhistory")

def save_history(historyPath=historyPath):
import readline
readline.write_history_file(historyPath)

if os.path.exists(historyPath):
readline.read_history_file(historyPath)

atexit.register(save_history)
del os, atexit, readline, rlcompleter, save_history, historyPath

Django’s assertRedirects little gotcha

April 23, 2010 1 comment

Something we’ve be trying to pay more attention to with our newest green field development projects is the running time of our unit test suites.  One of the projects was running ~200 unit tests in 2 seconds.  As development continued and the test case number grew, it started taking 10 seconds, then over 30 seconds. Something wasn’t right.

First challenge was to determine which were the slow running tests.  A little googling found this useful patch to the python code base.  Since we are using python 2.5 and virtual environments we decided to simply monkey patch it.  This makes verbosity level 2 spit out the run time for each test.  We then went one step further and made the following code change to _TextTestResult.addSuccess:

    def addSuccess(self, test):
        TestResult.addSuccess(self, test)
        if self.runTime &gt; 0.1:
            self.stream.writeln("\nWarning: %s runs slow [%.3fs]" % (self.getDescription(test), self.runTime))
        if self.showAll:
            self.stream.writeln("[%.3fs] ok" % (self.runTime))
        elif self.dots:
            self.stream.write('.')

With it now easy to tell which were our slow tests we set out to make them all fast again. As expected the majority of the cases were an external service not being mocked correctly. Most of these were easily solved. But there were a few tests where we couldn’t find what hadn’t been mocked. Adding a few timing statements within these tests revealed the culprit. The Django frameworks assertRedirects method.

    def assertRedirects(self, response, expected_url, status_code=302,
                        target_status_code=200, host=None):
        """Asserts that a response redirected to a specific URL, and that the
        redirect URL can be loaded.

        Note that assertRedirects won't work for external links since it uses
        TestClient to do a request.
        """
        if hasattr(response, 'redirect_chain'):
            # The request was a followed redirect
            self.failUnless(len(response.redirect_chain) > 0,
                ("Response didn't redirect as expected: Response code was %d"
                " (expected %d)" % (response.status_code, status_code)))

            self.assertEqual(response.redirect_chain[0][1], status_code,
                ("Initial response didn't redirect as expected: Response code was %d"
                 " (expected %d)" % (response.redirect_chain[0][1], status_code)))

            url, status_code = response.redirect_chain[-1]

            self.assertEqual(response.status_code, target_status_code,
                ("Response didn't redirect as expected: Final Response code was %d"
                " (expected %d)" % (response.status_code, target_status_code)))

        else:
            # Not a followed redirect
            self.assertEqual(response.status_code, status_code,
                ("Response didn't redirect as expected: Response code was %d"
                 " (expected %d)" % (response.status_code, status_code)))

            url = response['Location']
            scheme, netloc, path, query, fragment = urlsplit(url)

            redirect_response = response.client.get(path, QueryDict(query))

            # Get the redirection page, using the same client that was used
            # to obtain the original response.
            self.assertEqual(redirect_response.status_code, target_status_code,
                ("Couldn't retrieve redirection page '%s': response code was %d"
                 " (expected %d)") %
                     (path, redirect_response.status_code, target_status_code))

        e_scheme, e_netloc, e_path, e_query, e_fragment = urlsplit(expected_url)
        if not (e_scheme or e_netloc):
            expected_url = urlunsplit(('http', host or 'testserver', e_path,
                e_query, e_fragment))

        self.assertEqual(url, expected_url,
            "Response redirected to '%s', expected '%s'" % (url, expected_url))

You’ll notice that if your get request uses the follow=False option you’ll end up at line 34 in this code snippet which will kindly check to make sure the page you are redirecting to returns a 200. Which is great, unless you don’t have the correct mocks for that page setup too. Mocking out the content for a page you aren’t actually testing also didn’t seem quite right. We didn’t care about the other page loading, it had it’s own test cases. We just wanted to make sure the page under test was redirecting to where we expected. Simple solution, write our own assertRedirects method.

def assertRedirectsNoFollow(self, response, expected_url):
    self.assertEqual(response._headers['location'], ('Location', settings.TESTSERVER + expected_url))
    self.assertEqual(response.status_code, 302)

Back to a 2 second unit test run time and all is right with the world again.

Dustin Bartlett

PyCon 2010 Atlanta Event Report

March 16, 2010 Comments off
pycon 2010

pycon 2010

I got back from PyCon 2010 Atlanta for a while, I am still absorbing the huge volume of knowledge and information I gathered during the conference. It was an amazing experience for me to see at first hand what the Python community were doing. There were something for everyone in PyCon 2010, from beginner Python users to advanced python users. The tone of the conference was very friendly, it was a totally difference experience from corporate sponsored technology conferences.

During the two tutorial days prior to the conference days, I attended four tutorials:
Faster Python Programs through Optimization – The tutorial presented the guidelines and strategies of Python program optimization. It demonstrated the techniques on measuring speed(test.pystone), profiling  CPU usage(cProfile), and profiling Memory usage(Guppy_PE framework), which I never knew before.  The tutorial detailed the essential differences among Python built-in data types in terms of performance, which is also helpful for me.
Pinax Long TutorialPinax is an open-source platform built on the Django Web Framework. In the tutorial, the Pinax core developers presents on Pinax installation, creating projects,  leveraging Django resuable applications, modification of templates, Pinax specific settings, media handling, deployment. The most impressive part of the tutorial for me are how Pinax takes advantage of virtualenv and pip, both provided by Ian Bicking to streamline the installation process, and how Pinax  leverages the resuable Django applications. The tutorial also exposed lots of open source reusable Django applications that worth looking at in the coming days, to name a few, django-frontendadmin, django-flatblocks, django-ajax-validation, django-openid, and django-pagination.
Django in Depth – Django is one of my favourite topics during the conference. “In this tutorial, we’ll take a detailed look under the hood, covering everything from the guts of the ORM to the innards of the template system to how the admin interface really works”. James Bennett led us dive deep into the internal world of Django Web Framework, showed us the bits and pieces of Django’s ORM, Forms and Validation, Template system, request processing, view, and admin interface, which are far beyond the Django documentation covers.
Django Deployment Workshop – Another tutorial on Django presented by Jacob Kaplan-Moss covers  the creation of a full Django deployment environment running on a cluster of (virtual) machines. Jacob Kaplan-Moss walked us through a live demo on how to setup a production ready deployment environment on the cloud(Rackspace,  Amazon EC2) by removing the single point of failures one by one.

During the following three conference days, I attended lots of talks, here are the highlights of the topics that attracted me the most:
NoSQL Database was a hot topic during the conference, MongoDB, Cassandra, and Neo4j attracted lots of attentions. Mark Ramm and  Rick Copeland from SourceForge.net presents the comparison between Relational DB and NonSQL Database, the practical guide on deciding what to use in projects, and how they quickly migrating one of their high traffic website from PHP to Python by using TurboGears, MongoDB, and Jinja templates. It is a live example to demonstrate how NoSQL Database could be used in real world projects.

Test in Python was another topic I followed closely in the conference: Ned Batchelder for the fame of coverage.py,   gave a talk on test and testability, there was not too much new for me, as we emphasize TDD in the daily software development here in Point2. Michael J Foord, creator of the famous Python Mock library, gave a talk on New *and* Improved: Coming changes to unittest, the standard library test framework, which covered lots of great stuff coming to Python unittest, the most attractive bits are test discovery and more convenient assertion methods. By the way, you do not need to wait for upgrading your Python to 2.7 or 3.2 to take advantages of the new unittest library features, it was back ported to Python 2.4+. It is great to see that test picking up speed in Python community.

Using Django in Non-Standard Ways given by Eric Florenzano was an interesting topic, Eric covered how to use Django with alternatives to what Django offers and how to using bits Django offers in other contexts. He gave examples on  using Jinja2 template engine with Django, not using django.contrib.auth in Django application, not using ORM in Django, using Django’s ORM stand-alone. It is amazing to see how you can take advantages of the Django framework, even not in the standard way.

Overall, my experience of PyCon 2010 Atlanta was amazing.  The conference was great fun and informative.  It was great to be there – PyCon 2010 Atlanta.

Homer Simpson’s Guide to Software Development

March 2, 2010 3 comments

Homer SimpsonFor some time during my early development career, I would get great satisfaction from coding an intricate and sophisticated solution to a problem. Complicated problems yield complicated solutions, don’t they? That sense of a job well done, looking at a nested recursive, highly optimised algorithm that would fill three pages but get the job done super fast.

Then I’d go home and relax, with the knowledge that I’d really earned my wage that day. I’d kick back, relax and watch my all time favourite show, The Simpsons. The perfect end to a perfect days coding.

Over time, I’ve begun to change my outlook in many ways. One of the most significant ways as far as my software development career is concerned though, is that I have become the Homer Simpson of software development.

This is all a little cryptic, so let me explain. During the multitude of Simpson’s episodes, Homer has spoken several pearls of wisdom. The one I hold dearest to is “If something is hard to do, it’s not worth doing at all”. This is my mantra when coding, as my coworkers will grumpily attest. When writing code, if I start seeing that it is getting complicated, or I try explaining it, and find it hard to verbalise, then the code is wrong. I’ve come to see complexity as a sign that I’ve gone wrong.

Another Homer gem, ‘You tried your best and you failed miserably. The lesson is, never try’. I see this as another statement that if you have to work really hard to code the task you are working on, and you are stuck elbow deep in code with no idea how to get past the current hurdle, then give up! You are almost always going the wrong way.

Software that is hard to write is also hard to understand, maintain, extend etc. I have come to love that eureka moment when I spot the obvious answer to the seemingly hard problem. The simpler the solution, the happier it makes me. This means that I could expect any member of the development team to be able to look at the solution I’ve implemented, quickly understand what it is doing and why, and then be able to add features to it.

So, Homer Simpson, thanks for being a great mentor.

By: Chris Tarttelin

Scaling applications with Python

March 1, 2010 1 comment

PyCon
This year’s PyCon had quite a few presentations on how Python is being used to scale applications to massive volumes. There are many companies using Python for their large traffic web sites. A shortlist:-

Digg
YouTube
SourceForge

Here, I’m going to look at a big Python stack that came up a number of times.

Jinja2

Jinja2 is a template engine, which is similar in many respects to Django templates. Jinja2 does not set out to be the fastest template engine, striving to be easy to use, provide easily configurable syntax, be easy to debug and provide sandboxing to allow running third party templates in a safe environment.

Although it doesn’t set out to be the fastest, in some benchmarks it is more than 10 times faster than Django’s template engine. It’s this balance of features and speed that made it so suited to the high traffic sites.

Bottle

Bottle is a very fast and simple WSGI web framework. It is pure python, and fits into a single file. Bottle uses an @route function decorator to identify what url to use to invoke a given method, and allows you to return either a generator for streaming straight back to the caller, or a dictionary to be used by a template defined with an @view decorator. That is pretty much all that bottle does. It has it’s own simple template engine, and a single threaded server, which are both fine for development. It also has support for a multitude of template engines and multi threaded servers.

Paste

At 3 of the presentations on scaling apps, the presenters said they use paste httpserver to serve their applications. In 2 other presentations, the applications were running on twisted.web. Paste is written by the talented Ian Bicking, and is more than just a multi threaded http server. In the presentations I went to though, they were just using the httpserver component.

RabbitMQ

This is a highly reliable messaging system based on the emerging AMQP standard. RabbitMQ came up many times in different talks during the conference. So much so that it would appear to be the primary choice for the Python community at the moment. The interesting point for me was that it’s written in Erlang, so the attraction to it isn’t driven by it’s Python street cred. RabbitMQ was used extensively to off load any processing that can be done asynchronously. In the larger volume application, a common pattern was to try to find anything that could possibly be done asynchronously, to allow a response to be returned as quickly as possible. There was an excellent presentation by Jinal Jhaveri on Scaling Python webapps from zero to 50 million users. This was the story of a game developed for Facebook, that went from 0 users to 1 million in it’s first week. Messaging was the key architectural component that enabled them to maintain a good user experience.

Twisted

The twisted framework has been around for a long time, and is already widely used. I went to a presentation on cooperative multi-tasking, which looked at ways of substantially increasing performance of your code. As well as networking and filesystems as a source of blocking, code is a common area for blocking. Twisted offers support for non-blocking sockets for networking, and constructs for use in your code to prevent blocking.

The primary target in the presentation was for loops. Calling functions on elements in a for loop running in a single process is a source of blocking, and twisted has tools to improve this. You supply an iterator, and twisted then schedules this to execute along with any other tasks, in multiple processes. In Python, starting multiple threads is not a good solution, because of the way python uses the Global Interpreter Lock, but multi-processes are a good solution, and Twisted make multi processing easy.

MongoDB

There were a lot of talks about NoSQL databases, or document stores during the conference. There were presentations covering MongoDB, Redis, Cassandra and Neo4J (Neo4J is particularly targeted at persisting graphs). Of them all, the one that stood out for me was MongoDB. Both due to it’s maturity, and the relatively easy mental transition from a relational database model. The lead developer of the Python driver for MongoDB, Mike Dirolf, presented in an Open Spaces session, which was excellent. MongoDB in written in C++, and stores JSON documents in collections. It supports a JSON query language that has a SQL like feel, and is blisteringly fast. There are production usages of MongoDB, holding more than 600 million documents. SourceForge are moving more and more of their data onto MongoDB, including page caching.

It was great to see so many presentations on using Python for big, complex, popular applications, and the wealth of tools out there to make it possible. It’s clear to see that Python is well and truly Production Ready. In my next blog, I’ll write about using some of these components to replace an existing large Java REST service with something small, simple and pythonic that is functionally equivalent.

By: Chris Tarttelin

How to Mask Command Line Output from the Python run() Method

June 10, 2009 Comments off

Recently, We were tasked with creating automated deployment for our Python Django project. For our purposes, this involved creating python modules for programatically logging onto servers and carrying out all the necessary deployment tasks. We decided to use Fabric to make this work. We encountered difficulty, however, when using the run() method to execute commands in remote environments.

The problem is, with every command that Fabric executes on the remote machine, the command itself is always echoed to standard-out when the run() method executes, displaying (in plain text!) any password arguments you may be passing to the command being executed.

We worked around this by “monkey-patching” stdout itself with a wrapper class that checks specifically for the password provided, and replacing any output to standard out matching the password with something else (like “****”).

The wrapper class looks something like this:

class StdoutWrapper:


  def __init__(self, stdout, mask):
    self.stdout = stdout
    self.mask = mask


  def __getattr__(self, attr_name):
    return getattr(self.stdout, attr_name)


  def write(self, text):
    if text.find(self.mask) > -1:
      self.stdout.write(text.replace(self.mask, '****'))
    else:
      self.stdout.write(text)

Then, when you want to execute the command whose output you wish to mask, you simply use the StdoutWrapper class like so:

import getpass.getpass, sys


def some_fabric_method():
  config.username = prompt('Enter Username:')
  config.passwd = getpass('Enter Password:')
  config.app_path = '/path/to/your/app'
  config.svn_url = 'http://some.svn.checkout/'
  out = StdoutWrapper(sys.stdout, config.passwd)
  sys.stdout = out
  try:
    run("svn revert -R $(app_path")
    run("svn switch --username $(username) --password $(svn_password) $(svn_url) $(app_path)")
  finally:
    sys.stdout = sys.__stdout__

Obviously, this is a less-than-ideal solution; it accomplishes what we want, but having to resort to temporarily monkey-patching standard out itself seems like overkill. Has anyone else out there found a more preferable solution for suppressing plain-text password output while using Fabric for Python?

By: Brett McClelland

High Ceremony doesn’t have to mean High Cycle Time

April 27, 2009 Comments off

Oftentimes languages such as Java are called “high ceremony” languages compared to languages like Ruby or Python. This refers to the fact that there’s generally a bit more plumbing involved in firing up a Java application – particularly a web application – than there is with the scripting languages.

Of course, Java is compiled (to byte-code at least), so it’s not quite a 1 to 1 comparison with a more interpreted language such as Ruby, but still, even in a “high ceremony” language it’s important not to get too high a “cycle time” for developers, IMO.

By “cycle time” I mean the time between making a change and seeing it working – either in a test, or, ideally, in a running application. Most modern IDEs made the cycle time for tests pretty darn low (and great tools like Inifinitest can take all the manual work out of it, no less), but to see a running application and be able to exercise your changes deployed in a container is a bit more of a grind.

That’s where a tool like Jetty can come in handy. Jetty is a lightweight web app container that can be easily added to your development cycle in place of a heavier-weight solution to allow you a faster cycle time, and, often, greater productivity and interactivity.

Especially in combination with it’s integration with Maven, Jetty can get your app deployed far faster than with other solutions. For most webapps, it’s just a matter of saying:

mvn jetty:run

And you’ve got a container up and running with your app in it within a few seconds.

Jetty can even do a certain amount of “hot update”: modify a JSP (or even some code – although there are limits) and the running webapp is updated, and you’re able to test, edit… cycle away without the painful wait for a deployment any more often than necessary.

You can pass required system properties to your app via maven’s -D mechanism, and they’ll be available to your app:

mvn -Dsome.property=someValue jetty:run

And even control the port your application binds to on the fly (or via the handy jetty.xml file if you want to set it more permanently).

Jetty and maven also give you the ability to easily script, for example, if you need to run a test utility on your running webapp to ping a series of REST calls, for example, you can:

mvn clean package # Build the webapp
mvn jetty:run & # start jetty, spawning it in the background
java -jar mytestutility.jar # Run my test jar, which pings the URLs for all my rest services, maybe does performance checks, etc
mvn jetty:stop # Stop the jetty instance we fired up in the background

Lightweight containers such as Jetty are just one way to help crank down the “cycle time” for developers, of course. Some other possibilities I’ll leave for a later entry.

By: Mike Nash