Chaining a sequence of generators

I often gravitate towards solutions using a series of chained generators, in the style of David Beazley's 'Generator Tricks for Systems Programmers.'

This results in the outer level of my code calling one generator after another, terminating in something that consumes the rows, pulling data one row at a time through each of the generators:

inputRows = read()
parsedRows = parse(inputRows)
processedRows = process(parsedRows)
outputRows = format_(processedRows)
output(outputRows)

where each called function except the last is actually a generator, e.g:

def parse(rows):
    for row in rows:
        yield int(row)

This is great. But my itch is that the top level code above is a bit wordy, given that what it does is so simple. The reader has to check each temporary variable quite carefully to be sure it's doing the right thing.

Fowler's 'Refactoring' describes circumstances when it's good to remove intermediate variables, which results in:

output( format_( process( parse( read() ) ) ) )

This is certainly less wordy, and expresses what's happening very directly, but it annoys some of my colleagues that the called functions are listed in reverse order from what one might intuitively expect.

I've had this idea in my head to create a decorator for generators which allows one to chain them in an intuitive order, possibly using some unconventional notation such as:

read() | parse | process | format_ | output

where 'parse', et al, are now decorated with '@chainable' or somesuch, which returns an instance of a class that stores the wrapped generator, and overrides __or__ to do its magic. Maybe 'read' doesn't need to be invoked manually there at the start of the chain. I haven't really thought this through.

Luckily, before embarking on that, I realised today I've been over-complicating the whole thing. There's no need for decorators, nor for the cute '|' syntax. I just need a plain old function:

def link(source, *transforms):
    args = source
    for transform in transforms:
        args = transform(args)
    return args

Update: This code has been improved thanks to suggestions in the comments from Daniel Pope (eliminate the 'first' variable) and Xtian (take an iterable rather than a callable for the source.)

This assumes the first item passed to link is an iterable, and each subsequent item is a generator that takes the result of the item before.

If the final item in the sequence passed to 'link' is a generator, then this returns a generator which is the composite of all the ones passed in:

for item in link(read(), parse, process, format_):
    print item

Or if the final item passed to 'link' is a regular function, which consumes the preceding generators, then calling 'link' will invoke the generators, i.e. the following is the same as the above 'for' loop:

link(read(), parse, process, format_, output)

There's some rough edges, such as determining what to do if different generators require other args. Presumably 'partial' could help here. But in general, 'link' only needs to be written once, and I'm liking it.

Revolution: An Instruction Manual

When you have a government that operates without limits, who proclaims the right to arrest, torture and kill anyone, anywhere, with no warrants, no trial, no due process...

When you have a government that militarizes the police and grants the armed forces the power to operate with impunity within your own borders and beyond. A government that views you, the people, as the enemy, and treats you as such...

When you have a government that lies to you takes you into wars of aggression, toppling country after country, killing hundreds of thousands of innocent civilians, and sending your sons and daughters and your fathers and your brothers home in flag draped coffins, or disfigured and broken in mind and body, their lives destroyed in wars that serve only to line the pockets of a unelected cartel of bankers and corporations...

When you have all of this right in front of you, so blatant, so clear, you shouldn't have to be convinced that both parties in this political puppet show are owned and operated by the same interests. You shouldn't have to be convinced that these overrated corporate popularity contests that some call elections are distractions that will achieve nothing. With this understood, you shouldn't have to be convinced that the system you are living under must be brought to a halt.

Read on...

pip install : Lightspeed and Bulletproof

I saw a post about speeding up the Python packaging command "pip install", by specifying more responsive mirrors for querying and downloading packages. For my situation, a better tactic is this.

Step one: Download all your project's dependencies into a local 'packages' dir, but don't install them yet:

mkdir packages
pip install --download=packages -r requirements.txt

Step two, install from the 'packages' dir:

pip install --no-index --find-links=packages -r requirements.txt

(The above syntax works on pip 1.3, released yesterday. Docs for older versions of pip claim to support this, but in practice, for pip 1.2, I've had to use "--find-links=file://$PWD/packages")

Step 2 works even if PyPI is unreachable. It works even if some of your dependencies are self-hosted by the authors, and that website is unreachable. It works even if the version you have pinned of one of your dependencies has been deleted by the author (some packages do this routinely after security updates.) It works even if you have no network connection at all. In short, it makes creation of your virtualenv bulletproof.

As a nice side effect, it runs really fast, because it isn't downloading the packages across the internet, nor is it attempting to scan a remote index to check for matching or newer versions of each package. This is much quicker than just using a Pip download cache, especially for large projects with many dependencies which only change occasionally.

At Rangespan, we check the 'packages' directory into source control, so that once you've checked out a project's repo, you have everything you need to deploy locally and run, even if you have no network. You might choose to treat 'packages' as ephemeral.

It was pointed out to me recently by @jezdez, Pip maintainer, this usage pattern has now been explicitly called out in the documentation, which was substantially reorganised and improved with the recent 1.3 release.

Hexagonal Django

The last few weeks I've been thinking about the architectural pattern known as Clean, Onion, Hexagonal, or Ports'n'Adaptors. I'm curious if many people are applying it in the Django world.

The premise is for your core application entity classes and business rules to be plain old objects, with no dependencies. In particular, they are not dependent on the interfaces between your application and external systems, such as your persistence mechanism, or your web framework. Instead, external interface components depend upon your core business objects. This essentially moves the database from the 'bottom' layer of the old traditional 'three layer architecture', to form a part of the topmost layer - a sibling with the 'UI.'

For inbound messages (e.g handling a web request) this is straightforward - Django calls your view code which calls your business layer, but keep your business layer separate from your Django code, so it is stand-alone and unit-testable. For outbound messages, such as then rendering the web page in response, it's slightly more complicated: Your business logic must pass the result (typically a pure data structure) back to your web-aware code, but without your business logic depending on the web-aware component. This requires an inversion of control.

That way, all your business logic can easily be tested in unit tests, with no mocking required. You still need some end-to-end tests to verify integration, but you shouldn't need to involve your UI or database in testing every detail of your business logic.

Also, you can easily switch out your external system interfaces, such as persistence, to use another RDBMS, another ORM, a NoSQL store, or an in-memory version for testing Since the core of your application doesn't have any dependency on these components, it is oblivious to the change. The business logic, because it doesn't depend on Django, is no longer riddled with Django's convenient ORM database access.

Same thing goes for switching out your web framework, or calling the same logic from web UI or web API calls. And again, for switching out your UI: add a command line application, or a console UI. The core application logic is unaffected, and your new interface components contain only the code that is specific to that interface's concerns.

Another side effect is that your web framework, if you're using one, becomes a peripheral detail which depends upon your core application, rather than the other way round. Your Django project would become a subdirectory of your project, rather than dominating your project directory structure. Since the business logic formerly contained within it is now elsewhere (in your core business objects) the Django project is now very thin. Views, for example, are delegations to single business-layer functions. The Django project now contains just the web-oriented aspects of your project, as it should.

These ideas all seem like relatively straightforward software engineering, and I feel a bit foolish for not having been aware of them all these years. I console myself that I'm not alone.

UncleBob's Ruby Midwest keynote "Architecture - The Lost Years" attributes one source of this idea to Ivar Jacobsen's 1994 book Object Oriented Software Engineering : A Use Case Driven Approach (2nd-hand hardbacks cheap on Amazon.)

I see a few people applying these ideas to Rails, but are many people out there doing this in Django? I plan to refactor a small vertical slice of our monster Django app into this style, to try and prove the idea for myself.

Encrypted zip files on OSX

My passwords and other miscellany are in a plain text file within an encrypted zip. Since starting to use OSX I've been looking for a way to access my passwords such that:

  • I get prompted for the decryption password.
  • The file gets unzipped, but not in the same directory, because that's synced to Dropbox, so would send my plaintext passwords to them every time I accessed them. Maybe to /tmp?
  • The plaintext file within the zip is opened in \$EDITOR.
  • Wait for me to close \$EDITOR, then remove my plaintext passwords from the filesystem.
  • Before deleting the passwords, check if I've updated them. If so, put the new updated version back into the original zip file.
  • Don't forget to keep the updated zip file encrypted, using the same password as before, without prompting for it again.

I failed to find an existing app which would do all this (although I had no trouble on Linux or even on Windows.) Hence, resorting to good old Bash:

#!/bin/bash

ZIPDIR="$HOME/docs/org"

read -s -p "Password:" key

cd $ZIPDIR
unzip -P $key passwords.zip passwords.txt -d $TMPDIR
if [[ $? != 0 ]] ; then
    exit 1
fi

cd "$TMPDIR"
touch passwords.datestamp
$EDITOR passwords.txt
if [[ passwords.txt -nt passwords.datestamp ]] ; then
    zip -P $key -r "$ZIPDIR/passwords.zip" passwords.txt
fi

rm passwords.txt
rm passwords.datestamp

I don't expect this to be watertight, but seems good enough for today. I'm happy to hear suggestions.

Compiling MacVim with Python 2.7

I love the brilliant Vim plugin pyflakes-vim, which highlights errors & warnings, and since I got a MacBook for work, I've been using MacVim a lot.

This combination has a problem, that MacVim uses the OSX system default Python 2.6, so pyflakes is unable to handle Python 2.7 syntax, such as set literals. These are marked as a syntax errors, which prevents the rest of the file from being parsed.

The solution is to compile your own MacVim, using Python 2.7 instead of the system Python. The following commands got MacVim compiled for me:

#!/bin/bash
git clone git://github.com/b4winckler/macvim.git
cd macvim/src
export LDFLAGS=-L/usr/lib
./configure \
    --with-features=huge \
    --enable-rubyinterp \
    --enable-perlinterp \
    --enable-cscope \
    --enable-pythoninterp \
    --with-python-config-dir=/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/config
make
open MacVim/build/Release
echo Drag MacVim.app to your Applications directory

Without the LDFLAGS setting, I was missing some symbols at link. The --with-python-config-dir entry came from typing 'which python' to find where my Python 2.7 install lives, and modifying that result to find its 'config' directory (whatever that is) near to the binary.

As indicated, install by dragging the resulting macvim/src/MacVim/build/Release/MacVim.app into your Applications directory.

Open up MacVim, and check out the built-in Python version:

:python import sys; print sys.version
2.7.1 (r271:86882M, Nov 30 2010, 10:35:34)

And files with set literals are now correctly parsed for errors.

Update: This only works if the Python 2.7 is your default 'python' executable. Otherwise, or if you get "ImportError: No module named site"?, see Richard's comments below.

Python 2.7 regular expression cheatsheet

Couldn't find one of these, so I whipped one up.

Bit of restructured text:

https://github.com/tartley/python-regex-cheatsheet/blob/master/cheatsheet.rst

Install some Python packages:

https://github.com/tartley/python-regex-cheatsheet/blob/master/requirements.txt

Invoke rst2pdf:

https://github.com/tartley/python-regex-cheatsheet/blob/master/Makefile

Get a nice PDF out:

Python 2.7 regular expression cheatsheet (click this link or the image for the most up-to-date PDF from github.)

Django testing 201 : Acceptance Tests vs Unit Tests

I'm finding that our Django project's tests fall into an uncomfortable middle-ground, halfway between end-to-end acceptance tests and proper unit tests. As such they don't exhibit the best qualities of either. I'd like to fix this.

We're testing our Django application in what I believe is the canonical way, as described by the excellent documentation. We have a half-dozen Django applications, with a mixture of unittest.TestCase and django.test.TestCase subclasses in each application's tests.py module. They generally use fixtures or the Django ORM to set up data for the test, then invoke the function-under-test, and then make assertions about return values or side-effects, often using the ORM again to assert about the new state of the database.

Not an Acceptance Test

Such a test doesn't provide the primary benefit of an acceptance test, namely proof that the application actually works, because it isn't quite end-to-end enough. Instead of calling methods-under-test, we should be using the Django testing client to make HTTP requests to our web services, and maybe incorporating Selenium tests to drive our web UI. This change is a lot of work, but at least the path forward seems clear.

However, an additional problem is that acceptance tests ought to be associated with features that are visible to an end user. A single user story might involve calls to several views, potentially spread across different Django apps. Because of this, I don't think it's appropriate for an acceptance test to live within a single Django app's directory.

Not a Unit Test

On the other hand, our existing tests are also not proper unit tests. They hit the (test) database and the filesystem, and they currently don't do any mocking out of expensive or complicated function calls. As a result, they are slow to run, and will only get slower as we ramp up our feature set and our test coverage. This is a cardinal sin for unit tests, and it discourages developers from running the tests frequently enough. In addition, tests like this often require extensive setup of test data, and are therefore hard to write, so it's very labour-intensive to provide adequate test coverage.

My Solution

1) I've created a top-level acceptancetests directory. Most of our current tests will be moved into this directory, because they are closer to acceptance tests than unit tests, and will gradually be modified to be more end-to-end.

These acceptance tests need to be run by the Django testrunner, since they rely on lots of things that it does, such as creating the test database and rolling back after each test method. However, the Django testrunner won't find these tests unless I make 'acceptancetests' a new Django application, and import all acceptance test classes into its tests.py. I'm considering doing this, but for the moment I have another solution, which I'll describe in a moment.

We also need to be able to create unit tests for all of our code, regardless of whether that code is within a Django model, or somewhere else in a Django app, or in another top-level directory that isn't a Django app. Such unit tests should live in a 'tests' package right next to the code they test. I'm puzzled as to why Django's testrunner doesn't look for unit tests throughout the project and just run them all, along with the Django-specific tests.

2) My solution to this is to augment the Django test runner, by inheriting from it. My test runner, instead of just looking for tests in each app's models.py and tests.py, looks for subclasses of unittest.TestCase in every module throughout the whole project. Setting Django's settings.TEST_RUNNER causes this custom test runner to be used by 'manage.py test'. Thanks to the Django contributors for this flexibility!

So the new test runner finds and runs all the tests which the default Django runner runs, and it also finds our unit tests from all over the project, and it also includes our new top-level 'acceptancetests' directory. This is great!

One surprise is that the number of tests which get run has actually decreased. On closer inspection, this is because the standard Django test runner includes all the tests for every Django app, and this includes not just my project's apps, but also the built-in and middleware Django apps. We are no longer running these tests. Is this important? I'm not sure: After all, we are not modifying the code in django.contrib, so I don't expect these tests to start failing. On the other hand, maybe those tests help to demonstrate that our Django settings are not broken?

An appeal for sanity

My solutions seem to work, but I'm suspicious that I'm swimming against the current, because I haven't found much discussion about these issues, so maybe I'm just well off the beaten path. Have many other people already written a similar extension to Django's test runner? If so, where are they all? If not, why not? How else is everyone running their Django project tests in locations other than models.py or tests.py? Or do they not have tests outside these locations? If not, why not? I'd love to hear about it if I'm doing it wrong, or if there's an easier approach.

Update: My fabulous employer has given permission to release the test runner as open source:

https://github.com/rangespan/django-alltestsrunner

Update2: I like this post's numeric ID (check the URL)

£ key in Windows on a US laptop keyboard, done right.

The usual solution to typing non-US characters on a US keyboard in Windows is to hold left-alt, then type on the numeric keypad:

£   Left-alt + 0163

€   Left-alt + 0128

This is a pain on my (otherwise fabulous) Thinkpad laptop, because the numeric keypad is accessed by holding the blue 'Fn' key while you tap ScrLk, to toggle numeric keypad mode, and then doing the same again afterwards to turn it off.

One inadequate alternative (on WindowsXP, YMMV) is to go into control panel; Regional and Language Options; Languages; Details; Settings. Add a new keyboard configuration, "United States-International", which should be grouped under your existing language ("English (United Kingdom)" for me.) OK all the dialogs, restart your applications.

Now you can simply type:

£   Right-alt + Shift + 4

€   Right-alt + 5

The downside of this solution is that the "UnitedStates-International" keyboard setting adds a bunch of other features, including 'dead-keys', whereby quotes and other punctuation are used to add accents to letters, which is overly intrusive if, like me, you hardly ever use accents.

Ultimate solution then, define your own personal keyboard layout. Download the Microsoft Keyboard Layout Creator from here: http://msdn.microsoft.com/en-us/goglobal/bb964665.

My end result is an MSI with which I can install a new keyboard layout, which is exactly like 'US', but with the addition of £ on the key right-alt + 3:

windows-US-keyboard-layout-with-pound-on-right-alt-3

The source .klc file is in there, so you could add your own tweaks on top of that.

Python port of Modern 3D Graphics using OpenGL tutorial

To my knowledge, there are three online OpenGL tutorials which stand head and shoulders above the rest:

I recently started porting the first of these from C to Python.

https://bitbucket.org/tartley/gltutpy

This is primarily for my own benefit, and I've only done the first two chapters thus far, but others may find some use in seeing the translation. If anyone else is interested enough to want to contribute, drop me a line and I'll happily grant commit rights.