Archive for the ‘Hacking’ Category

Detecting command failures in bash

Sunday, January 31st, 2010

Suppose you want to detect weather all the commands executed successfully in your script:

#!/usr/bin/env bash
 
 
# make sure the script exits as soon
# as first error is encountered
set -e
success=0
 
# call our function on (nice) exit of the script
trap on_exit 0
 
on_exit(){
    if (( success )); then
        echo 'all commands executed successfully'
    else
        echo 'at least one command failed'
    fi
}
 
# your commands go here
true
#false
 
# if the execution come to here means 
# set -e did NOT exit the script
success=1

Some notes:

  • Uncomment “false” to see the code work
  • set -e depends on the commands behaving nicely and returning a non zero exit code on error. Just about all standard tools do this.
  • The on_exit will NOT be called if your script was interrupted with SIGKILL
  • Dummy echo commands are useless :) , you should replace them with something like mailing/logging

Debugging python (multi)processing

Thursday, January 7th, 2010
CPython
Image via Wikipedia

My goal is to get the pdb shell from the worker processes i spawned with Process() from python-processing. The “classic” approach to spawning the pdb shell miserably fails:

(Pdb) > /home/redduck666/dev/abj/bin/feeds.py(639)__init__()
-> self.timeout = timeout
Process Process-3:2:
Traceback (most recent call last):
  File "/var/lib/python-support/python2.5/processing/process.py", line 227, in _bootstrap
    self.run()
  File "/var/lib/python-support/python2.5/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
  File "./feeds.py", line 639, in __init__
    self.timeout = timeout
  File "./feeds.py", line 639, in __init__
    self.timeout = timeout
  File "/usr/lib/python2.5/bdb.py", line 48, in trace_dispatch
    return self.dispatch_line(frame)
  File "/usr/lib/python2.5/bdb.py", line 66, in dispatch_line
    self.user_line(frame)
  File "/usr/lib/python2.5/pdb.py", line 144, in user_line
    self.interaction(frame, None)
  File "/usr/lib/python2.5/pdb.py", line 187, in interaction
    self.cmdloop()
  File "/usr/lib/python2.5/cmd.py", line 130, in cmdloop
    line = raw_input(self.prompt)
ValueError: I/O operation on closed file

The problem here is that processing closes the file descriptors for the processes it spawns, so a straight forward approach like that will not work. Due to the same reason using sys.__std(out|in|err)__ will not work.

The solution for me was to tell explicitly python to use my current stdin/stdout. The ‘r+’ flag is needed as pdb needs to read from stdin.

pdb.Pdb(stdin=open('/dev/stdin', 'r+'), stdout=open('/dev/stdout', 'r+')).set_trace()

I use this on Linux, AFAIK it should work across Unix world (and is probably horribly broken on Windows).

My JFK adventure

Thursday, September 10th, 2009

Main lobby: Eero Saarinen's abandoned TWA Term...
Image by Telstar Logistics via Flickr
So i went to my trip from the little Slovenia back to San Francisco, the flights had two stops, first at Prague from Ljubljana than at New York from Prague and final destination San Francisco.

The things started to complicate when we were told that we can’t land at JFK due to president Obama leaving the airport. We started going in circles near New York, however after two circles we were running out of gas so had to refuel, we took a ~20 minute trip to Bradley airport.

We stayed there for two hours, either waiting to be fueled or listening to apologies on how the fueling should take less. After that time we set direction to JFK, by the time we reached the airport my connection to San Francisco was supposedly (according to an airport employee i asked) already flying :/. Problems weren’t solved by simply getting to the airport, i had to go past the customs and immigration office. They had two problems with me, one i don’t know when am i leaving the country and two i come back after being for about 9 days abroad :) . The first officer was just smirking and took me to the suspicious office, or however they are called. It is worth noticing that most people in there were either latinos, muslims or indians.

I waited there considerable amount of time (probably something like 2 hours) and when it finally come my turn i was done in like 5 minutes. I explained the guy that i am IT freelancer and that i like the weather and IT events in California and poof he made his decision, the rest of the 5 minutes was either typing in bureaucracy or advising me not to come again on the visa waiver before say, next summer.

Ok, so, i’m legally in the US, but i missed my flight? It is the first flight that i missed :) , i spent quite some time searching for Czech airlines to fix the mess they made, but i couldn’t find them. I gave up and looked for United, the airline that is supposed to take me across the US to San Francisco, they were very quick and professional, they issued a new ticket for a flight tomorrow morning at 6am.

Where does this leave me? I have food, coffee, internet but not shower :/. I stink and i have nowhere to go till 6am (nexxt ~8 hours), i guess you can’t have them all?

AppEngine debugging tip

Saturday, August 22nd, 2009

Image representing Google App Engine as depict...
Image via CrunchBase
As i explained in my last blog post, GAE does some very weird stuff with stdout, making it very difficult to print information from arbitrary points of code. For example if you invoked the Pdb(stdout=sys.__stdout__) i couldn’t print on my screen from a POST request. I have finally found a fix for that annoyance, you hack sys.stdout:

def set_trace():
    import pdb, sys
    sys.stdout = sys.__stdout__
    sys.stdin = sys.__stdin__
    debugger = pdb.Pdb()
    debugger.set_trace(sys._getframe().f_back)

GAE — too much magic!

Monday, August 17th, 2009

Image representing Google App Engine as depict...
Image via CrunchBase
This is a rant post about GAEGoogle AppEngine, it is too magical.

My first problem with it is that pdb simply doesn’t work. Why not? Because GAE hijacks your stdout and prints in on the web page. This by itself should ring bells, seriously cgi wrapper? *checks calendar* What do you mean it’s 2009? Are you sure you didn’t mean 1999? But apparently i’m not the only one to have that problem, people have come up with the solution, explicitly pass the file descriptors to pdb. It usually boils down to something like:

def set_trace():
    import pdb, sys
    debugger = pdb.Pdb(stdin=sys.__stdin__,
        stdout=sys.__stdout__)
    debugger.set_trace(sys._getframe().f_back)

Great, now i have a debugger that works. Until it doesn’t. For example trying to debug any POST statements has turned fruitless for me:

> /home/redduck666/dev/abj/st/trunk/views/auth.py(91)post()
-> login = self.request.get('login')
(Pdb) print self.request
(Pdb) print 'wtf??'

This is using the above defined set_trace, as you can see it doesn’t print anything to the stdout and instead prints it to web page when ‘c’ in pdb is hit. Why? Who knows?

Than there is the datastore initialization problem i’ve already been rambling about. You can’t out of the box write to GAE datastore from python script. You’d think an engineer driver web company would get this right, such as do the initialization in the model classes not some magical part of dev_appserver.py. But fear not it get’s better! For example my frontend developer is complaining that sometimes she can login with the username, sometime she can not! The best part is that i have a script executed before the dev_appserver.py to create some dummy data, it wasn’t working. Than i cleaned my datastore dir and surprise surprise, the exact same script executed in the exact same way works now. If that ain’t magic i don’t know what is.

What happens if something goes wrong with the appcfg.py update? Who knows?

Checking if new version is ready to serve.
Closing update: new version is ready to start serving.
2009-08-16 05:23:29,819 ERROR appcfg.py:1272 An unexpected error occurred. Aborting. 
Traceback (most recent call last):
  File "/root/abj/GAE/1.2.3/google/appengine/tools/appcfg.py", line 1265, in DoUpload
    self.Commit()
  File "/root/abj/GAE/1.2.3/google/appengine/tools/appcfg.py", line 1141, in Commit
    self.StartServing()
  File "/root/abj/GAE/1.2.3/google/appengine/tools/appcfg.py", line 1194, in StartServing
    app_id=self.app_id, version=self.version)
  File "/root/abj/GAE/1.2.3/google/appengine/tools/appengine_rpc.py", line 344, in Send
    f = self.opener.open(req)
  File "/usr/lib/python2.5/urllib2.py", line 387, in open
    response = meth(req, response)
  File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.5/urllib2.py", line 425, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.5/urllib2.py", line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
Rolling back the update.
Error 500: --- begin server output ---
 
Server Error (500)
A server error has occurred.
--- end server output ---

AppEngine ups and downs

Wednesday, August 12th, 2009
Image representing Google App Engine as depict...
Image via CrunchBase

So i recently started using appengine, it should be noted i’m VERY new to it. I come from django background so this is mostly the comparison between the two.

Let’s start with the classic argument, price! A lot of people compare EC2 and GAE (Google AppEngine) prices, they are not really comparable. EC2 provides you easily spawnable machines, as many as you need, leaving the scalability to you. It means your programmers have to worry about shredding the database, your sys admins about adding more web nodes… On the GAE side you just write code, effectively outsourcing both the hardware and sys admin to Google. The advantage EC2 has is that you can use any technology under the sun, with GAE you are forced to use their platform. Does it in pure money pay off to use GAE? No idea :) , but if you get big you can iterate faster becouse you don’t have to worry about scaling (which can be a major competitive advantage). To be honest most of the sites out there don’t need anything more than what can be easily scaled.

The schema-less development is definitely one of technical things i like about GAE. South is cool, but not having to care at all about the schema is just a major pain relief.

$ appcfg.py -e redduck666@gmail.com update .
Scanning files on local disk.
Scanned 500 files.
Scanned 1000 files.
Initiating update.
Password for redduck666@gmail.com: 
Cloning 154 static files.
Cloned 100 files.
Cloning 224 application files.
Cloned 100 files.
Cloned 200 files.
Uploading 2 files.
Deploying new version.
Checking if new version is ready to serve.
Will check again in 1 seconds.
Checking if new version is ready to serve.
Will check again in 2 seconds.
Checking if new version is ready to serve.
Closing update: new version is ready to start serving.
Uploading index definitions.
Uploading cron entries.

Next thing i like very much is the deployment, you type a command poof it works. It has its downsides as well tho, unlike say svn, it is not capable of remembering the credentials. Next problem is that it deploys only as the app specified in config file, making it more difficult to do a test deploy first (this is just a minor annoyance).


Like you can see above, if it’s good, it’s really really good, likewise when it is bad, it is really really bad.

In [1]: from myapp.models import User
In [2]: User.all().count()
....
BadArgumentError: _app must not be empty.

Hello? I just want to do a simple ORM query? As it turns out i’m not the only one to have this problem, i copy/pasted stuff i found on the internets and execute it at the shell startup (’redduck666′ is the name of my app):

from google.appengine.api import apiproxy_stub_map, urlfetch_stub
from google.appengine.api import datastore_file_stub, mail_stub, user_service_stub
import os
os.environ['APPLICATION_ID'] = 'redduck666'
 
apiproxy_stub_map.apiproxy = apiproxy_stub_map.APIProxyStubMap()
apiproxy_stub_map.apiproxy.RegisterStub('urlfetch',
    urlfetch_stub.URLFetchServiceStub()) 
apiproxy_stub_map.apiproxy.RegisterStub('user',
    user_service_stub.UserServiceStub())
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3',
    datastore_file_stub.DatastoreFileStub('redduck666', '/dev/null', '/dev/null'))
apiproxy_stub_map.apiproxy.RegisterStub('mail', mail_stub.MailServiceStub())

As far as users go, as long as your idea of “user” == “user with google account” you are gonna be happiest person around. GAE handles for you the authentication against google account and makes sure the site is usable even in development environment. What happens if you want to add twitter connect or facebook connect? Well, SOL (Shit Outta Luck), they haven’t bothered to abstract the User model so you are left with dealing with low level stuff (like cookies). Compare that to django, where you have to authorize the user once and it does everything else for you (albeit keeping a user in SQL table).

Another reason against GAE is lack of reusable stuff out there, django has much more vibrant community creating all kinds of reusable stuff. As a matter of fact i had to port django facebook connect stuff to work under GAE.

Git joys

Sunday, August 9th, 2009

G.I.T.
Image via Wikipedia
I wrote about things that annoy me in git, now it’s time to provide a counter balance by writing things i enjoy in git. Same disclaimer as last time applies, this is not an objective review, it’s just a comparison of things i like in git over svn.

First thing i started appreciating is not having .svn in each and every directory. Imaging you have only one file/dir in a dir, (assuming bash default tab completion) you hit tab twice and nothing is auto completed because there are two things it can choose from. With git it just works. Another advantage of this is that when grep-ing for stuff you don’t get to deal with results from .svn dirs. As it turns out there is a simple way around it.

Next thing i like is that diff/commit operations are per repository global. I have quite often missed some changes because i wasn’t in the repository root. The same reasoning applies for being able to commit from anywhere in the repository. For example:

redduck666@vm04:~/dev/rd666$ echo >> doc/README.deps
redduck666@vm04:~/dev/rd666$ svn diff
Index: doc/README.deps
===================================================================
--- doc/README.deps     (revision 1263)
+++ doc/README.deps     (working copy)
@@ -9,3 +9,4 @@
  - README.deps.python2.5
  - README.deps.python2.6
  - README.deps.python3.0
+
redduck666@vm04:~/dev/rd666$ cd media
redduck666@vm04:~/dev/rd666/media$ svn diff

As opposed to

redduck666@b00:~/dev/prevoz$ echo >> parser/googletransit/urls.py 
redduck666@b00:~/dev/prevoz$ git diff
diff --git a/parser/googletransit/urls.py b/parser/googletransit/urls.py
index 3932a95..26f131e 100644
--- a/parser/googletransit/urls.py
+++ b/parser/googletransit/urls.py
@@ -16,3 +16,4 @@ if settings.DEBUG:
     urlpatterns += patterns("",
         (r'^smedia/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
         )
+ 
redduck666@b00:~/dev/prevoz$ cd trunk/
redduck666@b00:~/dev/prevoz/trunk$ git diff
diff --git a/parser/googletransit/urls.py b/parser/googletransit/urls.py
index 3932a95..26f131e 100644
--- a/parser/googletransit/urls.py
+++ b/parser/googletransit/urls.py
@@ -16,3 +16,4 @@ if settings.DEBUG:
     urlpatterns += patterns("",
         (r'^smedia/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
         )
+

One thing i’ve grown to love in git is it’s feature to warn me about stuff which is present but not tracked. It has occurred to me many times that i had to make another commit because i forgot to svn add stuff, for example:

redduck666@b00:~/dev/prevoz$ touch a
redduck666@b00:~/dev/prevoz$ git commit -a
# On branch routing
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       a

Ever wanted to commit from a train? Or revert to an older revision? Since git stores entire revision history locally you can do that.

Now let’s have a look at working with branches, how long does it take to diff to branches with svn? With git the operation is near instantaneous since there is no network involved (assuming you are diffing a local branch). And here is a controversial claim to conclude with, svn can’t do branches :) . The only thing it can do is dumb property diff tracking. Suppose some developers work on a major new feature, which is given it’s own branch. When it’s ready the trunk maintainer merges the branch into trunk, great now all the changes are attributed to him/her. Yay!

Blog Widget by LinkWithin