Automated scalability

My daemon takes the desired response time in milliseconds as input and scales the infrastructure accordingly. When it sees increased traffic it adds more nodes which are then shut off when they are no longer necessary. The infrastructure is based on Rackcloud, Cassandra, uwsgi and nginx.

Does it work? The end result was:

Requests per second:    823.20 [#/sec] (mean)
...

  50%    186
  66%    471
  75%    516
  80%    525
  90%    976
  95%   1108
  98%   1394
  99%   1767
 100%   1767 (longest request)

SHAMELESS PLUG: I’m a freelance developer & sys admin, if interested in my services.

For benchmarking i’ve used . It had 10000 blog posts in it, the biggest part of which was 1500 bytes of lorem ipsum text. The benchmark was performed on from within Rackcloud and consisted of displaying the index page of the blog, so last 10 blog posts. The blog engine runs on pylons web framework and uses Tragedy data mapper.

In front of entire setup there is nginx, it is load balancing the requests to uwsgi nodes. A useful feature here is that it will re-proxy the request in case one node goes down.

At its very core this setup is asd.py (automatic scalability daemon) which is scaling the infrastructure based on current needs, code . The current needs are defined as average responsiveness of your site. It works on top of nginx, it dynamically modifies the “upstream” in nginx when it takes the node up or down. My choice was to deploy with uwsgi, i could easily replace that with FCGI or even reverse proxy to some other web server

The individual spawned nodes have uwsgi and Cassnadra on them, uwsgi at this point just serves the requests handed to it by nginx. The heavy lifting is done by Cassandra, which has to make sure your data is (eventually) consistent. Setting up the cluster was a breeze, my favorite feature is some kind of query caching. If it has to go to network to get the result to network it may write it locally as well. This is most obvious when performing a first benchmark against freshly spawned cluster, the results will be much worse than in the second one.

A problem that i had is that Cassandra cluster to function requires an ip in the config file. Since i’m spawning the instances on the fly i don’t know their ip in advance, so i made an ugly hack to fix the matter. The /etc/rc.local now has:

                                ip=`ifconfig | sed -n '/eth1/ { N; /dr:/{;s/.*dr://;s/ .*//;p;} }'`
                              
                                sed -i "s@<listenaddress.*@><listenaddress>$ip</listenaddress>@" /etc/cassandra/storage-conf.xml
                              
                                </listenaddress.*@>

Thing worth noting is responsiveness to changes, it spawns new nodes in cloud, process that takes minutes. Given how fast one gains web traffic this shouldn’t be a problem in real world.

Cloud APIs give anyone the power of hardware control to our fingertips. The images are both powerful and nice to work with, opening whole new world of possibilities. For example release could change from “deploy new code to node” to “deploy new node with new code” while guaranteeing language independent full reversibility.

Automated scalability

About