Pylons and django comparison

I’ve been working with django for 2 years now on various projects, recently i decided to give a try to pylons, here are some of my thoughts about the two.

First of all the two are fundamentally different, django is a bundle of tools that work nicely together, pylons is glue around your tools of choice. A good example here is a template engine. When creating a new pylons project it asks you what engine you want to use, making zero assumptions. Django comes with it’s own template engine to start with. Bunch of still in django.contrib depends on the django template tags. That means to retain basic django functionality (admin, comments, 3rd party reusable apps) on a different template engine you have to monkey-patch django :). As a side not until 1.2 the development was heavily slowed down by the incompatibility between jinja2 and django, .

Lesson here: use what django gives you or you’re in for lots of troubles

There is more than one side on the “django is a bundle” argument tho, let’s have a look at something as common as authentication. My goal was to secure the editing of blog posts on this blog. On pylons the process went like:

  • google it
  • look up the various auth libs out there
  • decide on auth lib
  • lookup basic example that auths from file
  • lookup more complex example that auths from DB

The comparable process on django would be:

  • google it
  • look up basic example

This approach has the downside of locking you into one SQL row per user loosing the flexibility pylons approach offers. On the other side this is exactly what i wanted on all, but one, the projects i’ve done so far. As added bonus django also gives me the functional web interface for administering users out of the box.

Lesson here: in most cases the django assumptions offer more productivity than pylons pluggable infrastructure

There are more people using django out there. Some of the implications are more:

  • 3rd party recipes and reusable code
  • community support (irc)
  • jobs / employees

Lesson here: size matters

What i can not leave out of this blog post is the pylons debug screen. Just like django it gives you the ability to see the context and variables of every line in the traceback. It also allows you to type the python code from web browser at ANY point from your traceback. Just to be clear, here are the steps i saved:

  • find the line from traceback
  • insert set_trace()
  • re-run the request
  • debug
  • possibly find previous calling point, insert break point there and re-run the request

All that done straight from the browser, awesome!

Lesson here: pylons default debugger rocks

AppEngine ups and downs

So i recently started using appengine, it should be noted i’m VERY new to it. I come from django background so this is mostly the comparison between the two.

Let’s start with the classic argument, price! A lot of people compare EC2 and GAE (Google AppEngine) prices, they are not really comparable. EC2 provides you easily spawnable machines, as many as you need, leaving the scalability to you. It means your programmers have to worry about shredding the database, your sys admins about adding more web nodes… On the GAE side you just write code, effectively outsourcing both the hardware and sys admin to Google. The advantage EC2 has is that you can use any technology under the sun, with GAE you are forced to use their platform. Does it in pure money pay off to use GAE? No idea :) , but if you get big you can iterate faster becouse you don’t have to worry about scaling (which can be a major competitive advantage). To be honest most of the sites out there don’t need anything more than what can be easily scaled.

The schema-less development is definitely one of technical things i like about GAE. South is cool, but not having to care at all about the schema is just a major pain relief.

$ appcfg.py -e  update .
Scanning files on local disk.
Scanned 500 files.
Scanned 1000 files.
Initiating update.
Password for : 
Cloning 154 static files.
Cloned 100 files.
Cloning 224 application files.
Cloned 100 files.
Cloned 200 files.
Uploading 2 files.
Deploying new version.
Checking if new version is ready to serve.
Will check again in 1 seconds.
Checking if new version is ready to serve.
Will check again in 2 seconds.
Checking if new version is ready to serve.
Closing update: new version is ready to start serving.
Uploading index definitions.
Uploading cron entries.

Next thing i like very much is the deployment, you type a command poof it works. It has its downsides as well tho, unlike say svn, it is not capable of remembering the credentials. Next problem is that it deploys only as the app specified in config file, making it more difficult to do a test deploy first (this is just a minor annoyance).


Like you can see above, if it’s good, it’s really really good, likewise when it is bad, it is really really bad.

In [1]: from myapp.models import User
In [2]: User.all().count()
....
BadArgumentError: _app must not be empty.

Hello? I just want to do a simple ORM query? As it turns out i’m not the only one to have this problem, i copy/pasted stuff i found on the internets and execute it at the shell startup (’redduck666′ is the name of my app):

from google.appengine.api import apiproxy_stub_map, urlfetch_stub
from google.appengine.api import datastore_file_stub, mail_stub, user_service_stub
import os
os.environ['APPLICATION_ID'] = 'redduck666'
 
apiproxy_stub_map.apiproxy = apiproxy_stub_map.APIProxyStubMap()
apiproxy_stub_map.apiproxy.RegisterStub('urlfetch',
    urlfetch_stub.URLFetchServiceStub()) 
apiproxy_stub_map.apiproxy.RegisterStub('user',
    user_service_stub.UserServiceStub())
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3',
    datastore_file_stub.DatastoreFileStub('redduck666', '/dev/null', '/dev/null'))
apiproxy_stub_map.apiproxy.RegisterStub('mail', mail_stub.MailServiceStub())

As far as users go, as long as your idea of “user” == “user with google account” you are gonna be happiest person around. GAE handles for you the authentication against google account and makes sure the site is usable even in development environment. What happens if you want to add twitter connect or facebook connect? Well, SOL (Shit Outta Luck), they haven’t bothered to abstract the User model so you are left with dealing with low level stuff (like cookies). Compare that to django, where you have to authorize the user once and it does everything else for you (albeit keeping a user in SQL table).

Another reason against GAE is lack of reusable stuff out there, django has much more vibrant community creating all kinds of reusable stuff. As a matter of fact i had to port django facebook connect stuff to work under GAE.

 

Automated scalability

My daemon takes the desired response time in milliseconds as input and scales the infrastructure accordingly. When it sees increased traffic it adds more nodes which are then shut off when they are no longer necessary. The infrastructure is based on Rackcloud, Cassandra, uwsgi and nginx.

Does it work? The end result was:

Requests per second:    823.20 [#/sec] (mean)
...

  50%    186
  66%    471
  75%    516
  80%    525
  90%    976
  95%   1108
  98%   1394
  99%   1767
 100%   1767 (longest request)

SHAMELESS PLUG: I’m a freelance developer & sys admin, if interested in my services.

For benchmarking i’ve used . It had 10000 blog posts in it, the biggest part of which was 1500 bytes of lorem ipsum text. The benchmark was performed on from within Rackcloud and consisted of displaying the index page of the blog, so last 10 blog posts. The blog engine runs on pylons web framework and uses Tragedy data mapper.

In front of entire setup there is nginx, it is load balancing the requests to uwsgi nodes. A useful feature here is that it will re-proxy the request in case one node goes down.

At its very core this setup is asd.py (automatic scalability daemon) which is scaling the infrastructure based on current needs, code . The current needs are defined as average responsiveness of your site. It works on top of nginx, it dynamically modifies the “upstream” in nginx when it takes the node up or down. My choice was to deploy with uwsgi, i could easily replace that with FCGI or even reverse proxy to some other web server

The individual spawned nodes have uwsgi and Cassnadra on them, uwsgi at this point just serves the requests handed to it by nginx. The heavy lifting is done by Cassandra, which has to make sure your data is (eventually) consistent. Setting up the cluster was a breeze, my favorite feature is some kind of query caching. If it has to go to network to get the result to network it may write it locally as well. This is most obvious when performing a first benchmark against freshly spawned cluster, the results will be much worse than in the second one.

A problem that i had is that Cassandra cluster to function requires an ip in the config file. Since i’m spawning the instances on the fly i don’t know their ip in advance, so i made an ugly hack to fix the matter. The /etc/rc.local now has:

1
2
3
ip=`ifconfig | sed -n '/eth1/ { N; /dr:/{;s/.*dr://;s/ .*//;p;} }'`
sed -i "s@<listenaddress.*@><listenaddress>$ip</listenaddress>@" /etc/cassandra/storage-conf.xml
</listenaddress.*@>

Thing worth noting is responsiveness to changes, it spawns new nodes in cloud, process that takes minutes. Given how fast one gains web traffic this shouldn’t be a problem in real world.

Cloud APIs give anyone the power of hardware control to our fingertips. The images are both powerful and nice to work with, opening whole new world of possibilities. For example release could change from “deploy new code to node” to “deploy new node with new code” while guaranteeing language independent full reversibility.

Git Joys

Image via Wikipedia

I wrote about things that , now it’s time to provide a counter balance by writing things i enjoy in git. Same disclaimer as last time applies, this is not an objective review, it’s just a comparison of things i like in git over svn.

First thing i started appreciating is not having .svn in each and every directory. Imaging you have only one file/dir in a dir, (assuming bash default tab completion) you hit tab twice and nothing is auto completed because there are two things it can choose from. With git it just works. Another advantage of this is that when grep-ing for stuff you don’t get to deal with results from .svn dirs. As it turns out there is .

Next thing i like is that diff/commit operations are per repository global. I have quite often missed some changes because i wasn’t in the repository root. The same reasoning applies for being able to commit from anywhere in the repository. For example:

redduck666@vm04:~/dev/rd666$ echo >> doc/README.deps
redduck666@vm04:~/dev/rd666$ svn diff
Index: doc/README.deps
===================================================================
--- doc/README.deps     (revision 1263)
+++ doc/README.deps     (working copy)
@@ -9,3 +9,4 @@
  - README.deps.python2.5
  - README.deps.python2.6
  - README.deps.python3.0
+
redduck666@vm04:~/dev/rd666$ cd media
redduck666@vm04:~/dev/rd666/media$ svn diff

As opposed to

redduck666@b00:~/dev/prevoz$ echo >> parser/googletransit/urls.py 
redduck666@b00:~/dev/prevoz$ git diff
diff --git a/parser/googletransit/urls.py b/parser/googletransit/urls.py
index 3932a95..26f131e 100644
--- a/parser/googletransit/urls.py
+++ b/parser/googletransit/urls.py
@@ -16,3 +16,4 @@ if settings.DEBUG:
     urlpatterns += patterns("",
         (r'^smedia/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
         )
+ 
redduck666@b00:~/dev/prevoz$ cd trunk/
redduck666@b00:~/dev/prevoz/trunk$ git diff
diff --git a/parser/googletransit/urls.py b/parser/googletransit/urls.py
index 3932a95..26f131e 100644
--- a/parser/googletransit/urls.py
+++ b/parser/googletransit/urls.py
@@ -16,3 +16,4 @@ if settings.DEBUG:
     urlpatterns += patterns("",
         (r'^smedia/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
         )
+

One thing i’ve grown to love in git is it’s feature to warn me about stuff which is present but not tracked. It has occurred to me many times that i had to make another commit because i forgot to svn add stuff, for example:

redduck666@b00:~/dev/prevoz$ touch a
redduck666@b00:~/dev/prevoz$ git commit -a
# On branch routing
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       a

Ever wanted to commit from a train? Or revert to an older revision? Since git stores entire revision history locally you can do that.

Now let’s have a look at working with branches, how long does it take to diff to branches with svn? With git the operation is near instantaneous since there is no network involved (assuming you are diffing a local branch). And here is a controversial claim to conclude with, svn can’t do branches :) . The only thing it can do is dumb property diff tracking. Suppose some developers work on a major new feature, which is given it’s own branch. When it’s ready the trunk maintainer merges the branch into trunk, great now all the changes are attributed to him/her. Yay!