Automated scalability

Battledecks

Image by Roo Reynolds via Flickr

My daemon takes the desired response time in milliseconds as input and scales the infrastructure accordingly. When it sees increased traffic it adds more nodes which are then shut off when they are no longer necessary. The infrastructure is based on Rackcloud, Cassandra, uwsgi and nginx.

Does it work? The end result was:

Requests per second:    823.20 [#/sec] (mean)
...

  50%    186
  66%    471
  75%    516
  80%    525
  90%    976
  95%   1108
  98%   1394
  99%   1767
 100%   1767 (longest request)

SHAMELESS PLUG: I'm a freelance developer & sys admin, drop me a line if interested in my services.

For benchmarking i've used my blog. It had 10000 blog posts in it, the biggest part of which was 1500 bytes of lorem ipsum text. The benchmark was performed on from within Rackcloud and consisted of displaying the index page of the blog, so last 10 blog posts. The blog engine runs on pylons web framework and uses Tragedy data mapper.

In front of entire setup there is nginx, it is load balancing the requests to uwsgi nodes. A useful feature here is that it will re-proxy the request in case one node goes down.

At its very core this setup is asd.py (automatic scalability daemon) which is scaling the infrastructure based on current needs, code here. The current needs are defined as average responsiveness of your site. It works on top of nginx, it dynamically modifies the "upstream" in nginx when it takes the node up or down. My choice was to deploy with uwsgi, i could easily replace that with FCGI or even reverse proxy to some other web server

The individual spawned nodes have uwsgi and Cassnadra on them, uwsgi at this point just serves the requests handed to it by nginx. The heavy lifting is done by Cassandra, which has to make sure your data is (eventually) consistent. Setting up the cluster was a breeze, my favorite feature is some kind of query caching. If it has to go to network to get the result to network it may write it locally as well. This is most obvious when performing a first benchmark against freshly spawned cluster, the results will be much worse than in the second one.

A problem that i had is that Cassandra cluster to function requires an ip in the config file. Since i'm spawning the instances on the fly i don't know their ip in advance, so i made an ugly hack to fix the matter. The /etc/rc.local now has:

ip=`ifconfig | sed -n '/eth1/ { N; /dr:/{;s/.*dr://;s/ .*//;p;} }'`
sed -i "s@$ip@" /etc/cassandra/storage-conf.xml

Thing worth noting is responsiveness to changes, it spawns new nodes in cloud, process that takes minutes. Given how fast one gains web traffic this shouldn't be a problem in real world.

Cloud APIs give anyone the power of hardware control to our fingertips. The images are both powerful and nice to work with, opening whole new world of possibilities. For example release could change from "deploy new code to node" to "deploy new node with new code" while guaranteeing language independent full reversibility.

Enhanced by Zemanta
Tags: [ ," ,c ,a ,s ,s ,a ,n ,d ,r ,a ," ,, , ," ,s ,c ,a ,l ,a ,b ,i ,l ,i ,t ,y ," ,]
Comments

Andorid vs iPhone

Image representing Android as depicted in Crun...

Image via CrunchBase

Some weeks ago after my iPhone grew couple of more dead pixels an offer for a fancy new EVO 4G has landed in my inbox. So my journey in the Android world began.

First let's have a look at the sizes of various hardware features. EVO 4G has bigger screen, louder speakers, more pixels in the camera & an extra camera as a cherry on top. Bigger is better. Period.

A huge advantage for me was multi tasking, no more choosing between listening to pandora and twittering! But for months now it is known that iPhone is also getting this, so no heavy points here. Heavy points go for customization tho, on the first page of my phone i have full controls for Pandora, google search (searches phone for apps as well) and a handful of apps. In other words thanks to mainly widgets, android can have "workspaces" on your phone, rather than "app launcher containers" of iPhone.

On the dark side i had a lot more WTF moments, first time i lunched "Spark" app my phone wouldn't stop vibrating, not even after exiting the app. I ended up restarting the phone. Pandora & Flash have the tendency to crash every couple of days. The apps seem to lag behind considerably as well, the games that i tried out engaged me less than iPhone ones. One app had on the splash screen written something about using the buttons, i couldn't figure out how to use it, another one told me it is not supported on my handset.

So overall, which one is better? It all depends on your needs bla bla bla bla... Let me ask you this question, which one allows you to turn the bellow picture in your temp office (HotSpot app!)?

Reblog this post [with Zemanta]
Tags: iPhone ,Android ,Smartphone
Comments

Open sourcing this blog

A composite of the GNU logo and the OSI logo, ...

Image via Wikipedia

I finally open sourced the code that runs this blog, code at github. It is using Tragedy and Pylons. To make my life easier i have integrated Disqus and Zemanta services to provide commenting system and administration helper respectively. You will need Cassandra node to run it and the code is alpha quality.
Reblog this post [with Zemanta]
Comments

Pylons and django comparison

django-logo-negative

Image by John Griffiths via Flickr

I've been working with django for 2 years now on various projects, recently i decided to give a try to pylons, here are some of my thoughts about the two.

First of all the two are fundamentally different, django is a bundle of tools that work nicely together, pylons is glue around your tools of choice. A good example here is a template engine. When creating a new pylons project it asks you what engine you want to use, making zero assumptions. Django comes with it's own template engine to start with. Bunch of still in django.contrib depends on the django template tags. That means to retain basic django functionality (admin, comments, 3rd party reusable apps) on a different template engine you have to monkey-patch django :). As a side not until 1.2 the development was heavily slowed down by the incompatibility between jinja2 and django, more info.

Lesson here: use what django gives you or you're in for lots of troubles

There is more than one side on the "django is a bundle" argument tho, let's have a look at something as common as authentication. My goal was to secure the editing of blog posts on this blog. On pylons the process went like:

  • google it
  • look up the various auth libs out there
  • decide on auth lib
  • lookup basic example that auths from file
  • lookup more complex example that auths from DB
The comparable process on django would be:
  • google it
  • look up basic example
This approach has the downside of locking you into one SQL row per user loosing the flexibility pylons approach offers. On the other side this is exactly what i wanted on all, but one, the projects i've done so far. As added bonus django also gives me the functional web interface for administering users out of the box.

Lesson here: in most cases the django assumptions offer more productivity than pylons pluggable infrastructure

There are more people using django out there. Some of the implications are more:

  • 3rd party recipes and reusable code
  • community support (irc)
  • jobs / employees

Lesson here: size matters

What i can not leave out of this blog post is the pylons debug screen. Just like django it gives you the ability to see the context and variables of every line in the traceback. It also allows you to type the python code from web browser at ANY point from your traceback. Just to be clear, here are the steps i saved:

  • find the line from traceback
  • insert set_trace()
  • re-run the request
  • debug
  • possibly find previous calling point, insert break point there and re-run the request
All that done straight from the browser, awesome!

Lesson here: pylons default debugger rocks

Reblog this post [with Zemanta]
Comments

Detecting command failures in bash

Suppose you want to detect weather all the commands executed successfully in your script:
#!/usr/bin/env bash





# make sure the script exits as soon

# as first error is encountered

set -e

success=0



# call our function on (nice) exit of the script

trap on_exit 0



on_exit(){

    if (( success )); then

        echo 'all commands executed successfully'

    else

        echo 'at least one command failed'

    fi

}



# your commands go here

true

#false



# if the execution come to here means 

# set -e did NOT exit the script

success=1

Some notes:
  • Uncomment "false" to see the code work
  • set -e depends on the commands behaving nicely and returning a non zero exit code on error. Just about all standard tools do this.
  • The on_exit will NOT be called if your script was interrupted with SIGKILL
  • Dummy echo commands are useless :), you should replace them with something like mailing/logging
Comments

Debugging python (multi)processing

CPython
Image via Wikipedia
My goal is to get the pdb shell from the worker processes i spawned with Process() from python-processing. The "classic" approach to spawning the pdb shell miserably fails:
(Pdb) > /home/redduck666/dev/abj/bin/feeds.py(639)__init__()

-> self.timeout = timeout

Process Process-3:2:

Traceback (most recent call last):

  File "/var/lib/python-support/python2.5/processing/process.py", line 227, in _bootstrap

    self.run()

  File "/var/lib/python-support/python2.5/processing/process.py", line 85, in run

    self._target(*self._args, **self._kwargs)

  File "./feeds.py", line 639, in __init__

    self.timeout = timeout

  File "./feeds.py", line 639, in __init__

    self.timeout = timeout

  File "/usr/lib/python2.5/bdb.py", line 48, in trace_dispatch

    return self.dispatch_line(frame)

  File "/usr/lib/python2.5/bdb.py", line 66, in dispatch_line

    self.user_line(frame)

  File "/usr/lib/python2.5/pdb.py", line 144, in user_line

    self.interaction(frame, None)

  File "/usr/lib/python2.5/pdb.py", line 187, in interaction

    self.cmdloop()

  File "/usr/lib/python2.5/cmd.py", line 130, in cmdloop

    line = raw_input(self.prompt)

ValueError: I/O operation on closed file

The problem here is that processing closes the file descriptors for the processes it spawns, so a straight forward approach like that will not work. Due to the same reason using sys.__std(out|in|err)__ will not work. The solution for me was to tell explicitly python to use my current stdin/stdout. The 'r+' flag is needed as pdb needs to read from stdin.
pdb.Pdb(stdin=open('/dev/stdin', 'r+'), stdout=open('/dev/stdout', 'r+')).set_trace()

I use this on Linux, AFAIK it should work across Unix world (and is probably horribly broken on Windows).
Comments

My JFK adventure

Main lobby: Eero Saarinen's abandoned TWA Term...
Image by Telstar Logistics via Flickr
So i went to my trip from the little Slovenia back to San Francisco, the flights had two stops, first at Prague from Ljubljana than at New York from Prague and final destination San Francisco. The things started to complicate when we were told that we can't land at JFK due to president Obama leaving the airport. We started going in circles near New York, however after two circles we were running out of gas so had to refuel, we took a ~20 minute trip to Bradley airport. We stayed there for two hours, either waiting to be fueled or listening to apologies on how the fueling should take less. After that time we set direction to JFK, by the time we reached the airport my connection to San Francisco was supposedly (according to an airport employee i asked) already flying :/. Problems weren't solved by simply getting to the airport, i had to go past the customs and immigration office. They had two problems with me, one i don't know when am i leaving the country and two i come back after being for about 9 days abroad :). The first officer was just smirking and took me to the suspicious office, or however they are called. It is worth noticing that most people in there were either latinos, muslims or indians. I waited there considerable amount of time (probably something like 2 hours) and when it finally come my turn i was done in like 5 minutes. I explained the guy that i am IT freelancer and that i like the weather and IT events in California and poof he made his decision, the rest of the 5 minutes was either typing in bureaucracy or advising me not to come again on the visa waiver before say, next summer. Ok, so, i'm legally in the US, but i missed my flight? It is the first flight that i missed :), i spent quite some time searching for Czech airlines to fix the mess they made, but i couldn't find them. I gave up and looked for United, the airline that is supposed to take me across the US to San Francisco, they were very quick and professional, they issued a new ticket for a flight tomorrow morning at 6am. Where does this leave me? I have food, coffee, internet but not shower :/. I stink and i have nowhere to go till 6am (nexxt ~8 hours), i guess you can't have them all?
Comments

AppEngine debugging tip

Image representing Google App Engine as depict...
Image via CrunchBase
As i explained in my last blog post, GAE does some very weird stuff with stdout, making it very difficult to print information from arbitrary points of code. For example if you invoked the Pdb(stdout=sys.__stdout__) i couldn't print on my screen from a POST request. I have finally found a fix for that annoyance, you hack sys.stdout:
def set_trace():

    import pdb, sys

    sys.stdout = sys.__stdout__

    sys.stdin = sys.__stdin__

    debugger = pdb.Pdb()

    debugger.set_trace(sys._getframe().f_back)
Comments

GAE -- too much magic!

Image representing Google App Engine as depict...
Image via CrunchBase
This is a rant post about GAE -- Google AppEngine, it is too magical. My first problem with it is that pdb simply doesn't work. Why not? Because GAE hijacks your stdout and prints in on the web page. This by itself should ring bells, seriously cgi wrapper? *checks calendar* What do you mean it's 2009? Are you sure you didn't mean 1999? But apparently i'm not the only one to have that problem, people have come up with the solution, explicitly pass the file descriptors to pdb. It usually boils down to something like:
def set_trace():

    import pdb, sys

    debugger = pdb.Pdb(stdin=sys.__stdin__,

        stdout=sys.__stdout__)

    debugger.set_trace(sys._getframe().f_back)

Great, now i have a debugger that works. Until it doesn't. For example trying to debug any POST statements has turned fruitless for me:
> /home/redduck666/dev/abj/st/trunk/views/auth.py(91)post()

-> login = self.request.get('login')

(Pdb) print self.request

(Pdb) print 'wtf??'

This is using the above defined set_trace, as you can see it doesn't print anything to the stdout and instead prints it to web page when 'c' in pdb is hit. Why? Who knows? Than there is the datastore initialization problem i've already been rambling about. You can't out of the box write to GAE datastore from python script. You'd think an engineer driver web company would get this right, such as do the initialization in the model classes not some magical part of dev_appserver.py. But fear not it get's better! For example my frontend developer is complaining that sometimes she can login with the username, sometime she can not! The best part is that i have a script executed before the dev_appserver.py to create some dummy data, it wasn't working. Than i cleaned my datastore dir and surprise surprise, the exact same script executed in the exact same way works now. If that ain't magic i don't know what is. What happens if something goes wrong with the appcfg.py update? Who knows?

Checking if new version is ready to serve.

Closing update: new version is ready to start serving.

2009-08-16 05:23:29,819 ERROR appcfg.py:1272 An unexpected error occurred. Aborting. 

Traceback (most recent call last):

  File "/root/abj/GAE/1.2.3/google/appengine/tools/appcfg.py", line 1265, in DoUpload

    self.Commit()

  File "/root/abj/GAE/1.2.3/google/appengine/tools/appcfg.py", line 1141, in Commit

    self.StartServing()

  File "/root/abj/GAE/1.2.3/google/appengine/tools/appcfg.py", line 1194, in StartServing

    app_id=self.app_id, version=self.version)

  File "/root/abj/GAE/1.2.3/google/appengine/tools/appengine_rpc.py", line 344, in Send

    f = self.opener.open(req)

  File "/usr/lib/python2.5/urllib2.py", line 387, in open

    response = meth(req, response)

  File "/usr/lib/python2.5/urllib2.py", line 498, in http_response

    'http', request, response, code, msg, hdrs)

  File "/usr/lib/python2.5/urllib2.py", line 425, in error

    return self._call_chain(*args)

  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain

    result = func(*args)

  File "/usr/lib/python2.5/urllib2.py", line 506, in http_error_default

    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

HTTPError: HTTP Error 500: Internal Server Error

Rolling back the update.

Error 500: --- begin server output ---



Server Error (500)

A server error has occurred.

--- end server output ---
Comments

Lessons from San Francisco

San Francisco drops away behind us.
Image by Tolka Rover via Flickr
Let's start with little bit of history, beginning June (two and the half months ago) i went from Slovenia to the San Francisco looking for a proper job. The journey has lead me through lots disappointments, some problems and taught me lots of lessons. This is an attempt to document some of those. A lot lessons were spoon fed to me by @gandalfar who happens to know his stuff when it comes to building a social network around yourself. An opportunity won't come after you, you have to get it. Go to IT events, hang with people, give talks if possible, make people think you are really really good. It will be easier to convince them if you really are :), but in first impressions ability to sell yourself (appear as highly skilled) is much more important than skills themselves. Later on if you want to get any real work done you of course have to know what you're doing. A thing that greatly helps at building a social network are the business cards, don't leave home without. (almost) No-one cares about you over email. It is MUCH easier to convince people of anything (like hire me) in person or at phone than over email. Which kinda brings us to the above paragraph, go out meet people. Physical activity is important, i started regularly cycling and i find it much easier to concentrate afterwards, gets rid of the annoying feeling of too much energy. While we are at the healthy life i also started eating more more vegetables. I find that kind of food tires me less after the meal. It is damn hard to stay and work in USA if you don't have the diploma :/, the options you have:
  • 12 years of experience (or diploma) + a sponsor (company that will employ you) and you can go for H1B
  • A company (or institution like an university) providing you a "training spot", thus willing to be your sponsor and you can get 12-18 months on J1
  • Have at least 50.000$ to invest and prove you will hire people etc and you can go for L1
As a side note you can literally buy the green card, it is called EB5 and you get it if you invest more than 500.000$ in state of Vermont.
Comments

AppEngine ups and downs

Image representing Google App Engine as depict...
Image via CrunchBase
So i recently started using appengine, it should be noted i'm VERY new to it. I come from django background so this is mostly the comparison between the two. Let's start with the classic argument, price! A lot of people compare EC2 and GAE (Google AppEngine) prices, they are not really comparable. EC2 provides you easily spawnable machines, as many as you need, leaving the scalability to you. It means your programmers have to worry about shredding the database, your sys admins about adding more web nodes... On the GAE side you just write code, effectively outsourcing both the hardware and sys admin to Google. The advantage EC2 has is that you can use any technology under the sun, with GAE you are forced to use their platform. Does it in pure money pay off to use GAE? No idea :), but if you get big you can iterate faster becouse you don't have to worry about scaling (which can be a major competitive advantage). To be honest most of the sites out there don't need anything more than what can be easily scaled. The schema-less development is definitely one of technical things i like about GAE. South is cool, but not having to care at all about the schema is just a major pain relief.
$ appcfg.py -e redduck666@gmail.com update .

Scanning files on local disk.

Scanned 500 files.

Scanned 1000 files.

Initiating update.

Password for redduck666@gmail.com: 

Cloning 154 static files.

Cloned 100 files.

Cloning 224 application files.

Cloned 100 files.

Cloned 200 files.

Uploading 2 files.

Deploying new version.

Checking if new version is ready to serve.

Will check again in 1 seconds.

Checking if new version is ready to serve.

Will check again in 2 seconds.

Checking if new version is ready to serve.

Closing update: new version is ready to start serving.

Uploading index definitions.

Uploading cron entries.

Next thing i like very much is the deployment, you type a command poof it works. It has its downsides as well tho, unlike say svn, it is not capable of remembering the credentials. Next problem is that it deploys only as the app specified in config file, making it more difficult to do a test deploy first (this is just a minor annoyance). Like you can see above, if it's good, it's really really good, likewise when it is bad, it is really really bad.
In [1]: from myapp.models import User

In [2]: User.all().count()

....

BadArgumentError: _app must not be empty.
Hello? I just want to do a simple ORM query? As it turns out i'm not the only one to have this problem, i copy/pasted stuff i found on the internets and execute it at the shell startup ('redduck666' is the name of my app):
from google.appengine.api import apiproxy_stub_map, urlfetch_stub

from google.appengine.api import datastore_file_stub, mail_stub, user_service_stub

import os

os.environ['APPLICATION_ID'] = 'redduck666'



apiproxy_stub_map.apiproxy = apiproxy_stub_map.APIProxyStubMap()

apiproxy_stub_map.apiproxy.RegisterStub('urlfetch',

    urlfetch_stub.URLFetchServiceStub()) 

apiproxy_stub_map.apiproxy.RegisterStub('user',

    user_service_stub.UserServiceStub())

apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3',

    datastore_file_stub.DatastoreFileStub('redduck666', '/dev/null', '/dev/null'))

apiproxy_stub_map.apiproxy.RegisterStub('mail', mail_stub.MailServiceStub())
As far as users go, as long as your idea of "user" == "user with google account" you are gonna be happiest person around. GAE handles for you the authentication against google account and makes sure the site is usable even in development environment. What happens if you want to add twitter connect or facebook connect? Well, SOL (Shit Outta Luck), they haven't bothered to abstract the User model so you are left with dealing with low level stuff (like cookies). Compare that to django, where you have to authorize the user once and it does everything else for you (albeit keeping a user in SQL table). Another reason against GAE is lack of reusable stuff out there, django has much more vibrant community creating all kinds of reusable stuff. As a matter of fact i had to port django facebook connect stuff to work under GAE.
Comments
Next