Posts Tagged ‘Shell’

Parallelization in bash

Friday, July 31st, 2009

Ever had multi core system and some kind of work load which you glued in unix shell that could use parallelization? I did, so i wrote a poor mans parallelization technique in bash.

parallel(){
    # number of processes you want
    num=$1
    shift
    # loop through commands
    for i; do 
        # make sure at most $num processes are active
        until ((`jobs | wc -l` < num)); do
            sleep 1
        done
        # execute the current command in background
        $i &
    done
    # wait for all the remaining processes
    wait
}

The following snippet demonstrates how to run 3 commands, up to 2 in parallel.

$ time parallel 2 "sleep 1" "sleep 2" "sleep 3" 2> /dev/null 
 
real	0m4.080s
user	0m0.024s
sys	0m0.036s

What happens is the following:
- executes sleep 1
- executes sleep 2
- waits 1 second (for the first one to finish)
- executes sleep 3
- once it runs out of commands just waits for everything to finish

The ones that we are waiting for here are steps 1 and 3, so they add up to 4 seconds we see.

bash builtins

Wednesday, July 29th, 2009

Bash
Image via Wikipedia

This post will be fully dedicated to stuff you can do from with in bash, as opposed to invoking external programs to do the job.

$ help :
:: :
    No effect; the command does nothing.
    A zero exit code is returned.

This are two interesting bits here, first one is the shell builtin “help”. Bash apparently doesn’t believe in individual man pages and documents it’s stuff in either hugh man bash, “help” pages or not at all (hey completion!). The second interesting part is the :, as you can see it does nothing. Why does it exist? Well, to start with it was defined in a standard. To be honest it can be useful, for example to create a cool looking infinite loops:

while :; do ....; done

Suppose you would like a quick and generic way to have a paginated display of some string in your current directory. A naive way would be:

$ alias cmd='grep -r "$1" . | less'
$ cmd string
string: No such file or directory

The problem here is that alias is nothing more than a dumb text expansion, the only thing it does is expand the cmd to grep -r “$1″ . | less. So the cmd string is exactly the same as:

$ grep -r "$1" . | less string
string: No such file or directory

The reason why we get the error is that less treats any argument as a file to open, since i don’t have a “string” file it fails. So what is a solution if you want to insert an argument in the middle of pipeline? An ugly way would be to create a script, a cool way would be to create a function. Functions in bash can take arguments and are full blown composite commands. The final solution would be something ala:

cmd(){ grep -r "$1" . | less; }

Ever wanted to do regex straight out bash? Since bash3 with the =~ operator you can. The result is stored in BASH_REMATCH array, first element is the entire string regex matched, the others are stored back references.

$ [[ caab =~ [^a]*(a+).* ]] && printf "\
entire string matched: ${BASH_REMATCH[0]}
first back reference: ${BASH_REMATCH[1]}\n"
entire string matched: caab
first back reference: aa

sed and grep tips

Wednesday, July 15th, 2009

It was quite a while since i did any sed, i rediscovered that joy when someone at #sed irc channel asked how to turn “/abc/def/filename” in “<a href=”file://A:/abc/def/filename”>A:\abc\def\filename</a>”.

echo '/abc/def/filename' | sed \
 -e 's@.*@<a href="&">A:&</a>@' -e :b\
 -e 's@\(>[^/]*[^<]\)/@\1\\@' -e tb

The first command is really basic, it turns ‘/abc/def/filename’ into ‘<a href=”file://A:/def/filename”>A:/abc/def/filename</a>’. The problem here is that the second part remains with slashes instead of backslashes. Conditional branching to help! ‘:b’ means “define label named ‘b’ here”. The substitution means “replace ‘>’ followed by any sequence of not ‘/’, a not ‘<’ and an ‘/’ with captured part (everything but the last ‘/’) and a ‘\’ (double backslash needed to escape it for sed). Basically we are replacing the ‘/’ with ‘\’. the “tb” part means “jump to label ‘b’ if the substitution has been performed successfully. :)

But this is terribly inefficient! It performs one initial substitution plus one for each slash and it performs non trivial matching :| .

echo '/abc/def/filename' | sed -e h\
 -e 's@/@\\@g' -e G -e \
's@\([^\n]*\)\n\(.*\)@<a href="file://\2">A:\1</a>@'

This solution kicks of with pushing whatever is in current pattern space (the line that is being currently processed) to hold space. Hold space is as close as you’ll get to variables in sed. After that we prepare the backslashed version (replacing all the slashes with backslashes) and than with ‘G’ we put new line in pattern space and append hold space (the slashed version). We break the pattern space by the new line (remember first part has backslashed version, second slashed) and simply put that in context.

While we are at sed let’s look at my submission at the hello world challenge:

set 68 65 6C 6C 6F 1 77 6F 72 6C 64;
while [[ $1 ]]; do
dd if=/dev/urandom bs=1c count=1 2> /dev/null | \
sed -n "/\\x$1/{ p; Q1; }" || shift; done; echo

In a nutshell this looks for “hello world” characters in your /dev/urandom. First thing you’ll probably notice are the numbers, they are hex codes of “hello world” ascii values. The code kicks of with set line, it sets the positional parameters to those numbers. With dd we are getting a single character from /dev/urandom and discarding the stats dd gives us. If we find the character we print it and exit with ‘1′. Since one is considered error code the shift is called, discarding the current first positional parameter (so at the next iteration new character is used). And it keeps looping until it has positional parameter to loop through :) . Note that this code is unportable, to the best of my knowledge only gsed has ‘Q’.

As a side note, unless you are working with/for ancient bourne shell you should use [[ instead of [.

dpkg -L subversion | egrep "(${PATH//:/.|}.)"
GREP
Image by dannyman via Flickr

Ever wanted to find out what binaries does a certain debian package provide? The above satisfies that curiosity. We get the input from dpkg -L which lists the content of a package. The most interesting bit is happening inside the (), it uses parameter expansion feature of shell to replace the : (the standard PATH delimiter) with a '|'. The '.' are only there to make sure something is following the path, so that empty directories don't get matched.

Ever had a huge config file with shitload of examples? But you only wanted to find out what is currently being used?

alias nco="egrep -v '^[[:space:]]*(#|$)'"

This little alias excludes all the empty lines and the comments (assuming comments start with '#'). Use it like:

nco filename
something | nco

More bash :-)

Wednesday, July 8th, 2009
grep -Ir --exclude='.svn' dbsettings !(dbsettings)

This is the line i used couple of days ago when searching through pydra’s code. -I and -r are what i’m using on daily basis, first one tells grep to ignore binary files second one to search the directory recursively. What i don’t use often is the –exclude parameter, it tells grep to ignore the files that have .svn anywhere in their path. Since svn stores it’s metadata in the .svn dirs that is very useful. “dbsettings” is simply a string i want to search for.

Now to the interesting part “!(dbsettings)”. This is called extended globing and allows you to use regex like pattern inside shell. Some of the available operators are ‘?’, ‘+’, ‘*’ and ‘@’ (the later does nothing, only allows you to use other features). Suppose we want to get the files start_master.sh and start_node.sh:

shopt -s extglob; ls start_@(master|node).sh
bash: syntax error near unexpected token `('

First of all a word about shopt -s extglob, extended globing is off by default and with this you turn it on. That apart what the hell just happened? You see bash parses input line by line and glob expansion is one of the things that attempts to do before it actually got to executing the shopt thingy.

$ shopt -s extglob
$ ls start_@(master|node).sh
start_master.sh  start_node.sh

This per-se is not really useful, the cool feature about extglob is that they are space proof. Looping through it is OK (as opposed to `ls | grep`). Here you can read up on simple as well as extended globing.

You should use “#!/usr/bin/env bash” as the shebang line in your script, it increases the portability (/usr/bin/env is required by POSIX to be there, BSD’s tend to have bash in /usr/local/bin). What really happens here is that kernel invokes that binary passing it the file you are executing as an argument. As a side note most kernels don’t perform argument splitting (one exception i can think of is freebsd). Due to that “#!/usr/bin/env bash -u” will not work on most kernels, as kernels tells the env to execute “bash -u” which it can’t find and errors :-) .

Here is the tip for the script writers and vim users

function Modchange()
    if getline(1) =~ "^#!.*/bin"
        silent !chmod a+x <afile>
    endif
endfunction
au BufWritePost * call Modchange()

Having that in your .vimrc will make the file executable on save automatically if it thinks it’s a script. NOTE: i probably stole this from the internets many moons ago.

Reblog this post [with Zemanta]

thinkering with bash

Thursday, July 2nd, 2009
Bash
Image via Wikipedia

Here is another post from the land of bash/shell scripting, this time instead of focusing on usability of tips the focus is thinkering

sudo echo 1500 > /proc/sys/vm/dirty_writeback_centisecs
bash: /proc/sys/vm/dirty_writeback_centisecs: Permission denied

I bet you wanted to do something like that? A naive user would think this will write 1500 in that file, well, that’s not the case :-) . What really happens here is that bash first redirects the stdout that file (meaning that the redirection is performed as current user) and only than executes the command (which is executed as root). An ugly workaround or this kind of a problem is to create a script and than sudo execute that, a quicker workaround is to pass shell string argument:

 sudo bash -c 'echo 1500 > /proc/sys/vm/dirty_writeback_centisecs'

Here are two links really worth reading (which i think belong in the category of “best practices”):

Everyone (hopefully) knows not to use `ls`, but what are the cases when the alternative is just too painful? So far i found only one, when you want to do something (to which the chronological order matters) with rotated logs, web servers usually give you a bunch of files like access.log.15.gz access.log.7.gz access.log.12.gz.

for i in access.log*.gz ; do gunzip "$i"; done
for i in `ls -v access.log* | tac`; do cat "$i"; done

First step in processing should be obvious, we simply decompress the gzipped files. As a side note, gunzip is retarded :-) , it can’t handle multiple files at once and it forces you to use .gz extension. The second line is very unportable, it will work just on the GNU coreutils (due to ls’s -v), what it does is print you the web servers logs as web server would write them, suitable for any kind of post processing. A real world use case would be you have logs stored this way but haven’t run webalizer (or whatever) on it.

Let’s have a look a close look at find(1)’s execution of non trivial commands over the matching files. You can ofcourse take easy way out and create a script, but real man use bash -c!

find . -exec bash -c 'for i; do echo "b00$i"; done' -- {} +

The interesting bits here are:

  • We are using ‘+’ to terminate the expression, that tells find “accumulate as many files as possible (on linux: getconf ARG_MAX) and than pass them for one execution”
  • ‘–’ is here for bash, it tells it “stop taking command line switches”
  • -c argument should be quoted using single quotes (’), with double quotes (”) things like $i would be expanded by current shell before they even get to the invoked shell. In this example the output would be just a bunch of “b00″’s
  • It appears that we are not looping through anything, bash implicitly loops through positional parameters if nothing is passed to for keyword
  • {} should be quoted to work with files with spaces, right? wrong! :-) The steps that happen here are:
    • bash sees the {}, it doesn’t mean anything to it so it passes it as is to find
    • find sees the {} and expands it to file (list) and passes it to bash -c, it doesn’t know or care what are the special characters for shell. So event if the file name has any special characters find passes it as is to bash -c
    • the invoked bash get’s the arguments and works with them
Reblog this post [with Zemanta]

i will not abuse bash

Monday, June 29th, 2009
Bash
Image via Wikipedia

A slightly modified post i wrote long time ago (when i was trying to add a feature to the code bellow)



i will not abuse bash

i will not abuse bash

i will not abuse bash

i will not abuse bash

i will not abuse bash

i will not abuse bash

i will not abuse bash

i will not abuse bash

i will not abuse bash

Why not? Find out by yourself trying to add a feature to one of my scripts ;)

shopt -s extglob
shopt -s nullglob
for i in */; do 
    [[ -f $i/photos.dat ]] && { 
        IFS=$'\a' read title desc < <( tr -d '\r'  < $i/album.dat | sed ':b;$!N; $!bb; $s/\n/\\\\\\\\n/g' | ssed -nR 's/.*?s:5:"title".*?"(.*?)".*?s:11:"description".*?"(.*?)";.*/\1'$'\a'\\2/p)
        title=${title//\'/\\\\\\\'}
        albums=`sed -rn 's/s:11:"isAlbumName";s:8:"([^"]*)"/\n\t\1\n/gp' "$i"/photos.dat | sed '/\t/!d'`
        printf "try:\n    gallery = Gallery.objects.get(title='$title')\nexcept Gallery.DoesNotExist:\n    gallery = Gallery.objects.create(title='$title', title_slug=slugify('$title'), description='$desc')\n"
        for album in $albums; do
            IFS=$'\a' read subtitle subdesc < <( tr -d '\r'  < $album/album.dat | sed ':b;$!N; $!bb; $s/\n/\\\\n/g' | ssed -nR 's/.*?s:5:"title".*?"(.*?)".*?s:11:"description".*?"(.*?)";.*/\1'$'\a'\\2/p)
            subdesc=`echo -n "$subdesc" | sed ':b; $!N; $!bb; $s/\\n/\\\\\\\\n/g'`
            subtitle=${subtitle//\'/\\\\\\\'}
            printf "try:\n    child = Gallery.objects.get(title='$subtitle')\n    child.parent = gallery\n    child.save()\nexcept Gallery.DoesNotExist:\n    child = Gallery.objects.create(title='$subtitle', title_slug=slugify('$subtitle'), parent=gallery, description='%s')\n" "$subdesc"
        done
        for j in $i/!(*.thumb|*.sized|*.highlight).@(jpg|jpeg|png|gif); do
            filename=$j
            dirless=${j##*/}
            title=${dirless%.*}
            c=`tr -d '\n' < "$i"/photos.dat | ssed -nR 's/.*?"'"$title"'".*?caption";s:[0-9]+:"(.*?)".*/\1\n/p'`
            caption="${c//\'/\\\'}"
            if [[ $caption != $title && $caption ]]; then cap=", caption='$caption'"; fi
            printf "p=Photo.objects.create(image='$filename', title=title_wraper('$title', '${i%/}'), title_slug=photo_slug_wraper('$title', '$i')$cap)\np.save()\ngallery.photos.add(p)\n"
        done
    }; 
done

This was my attempt to write a migration script from gallery v1 to photologue. Attempt failed, and i learned not to abuse bash for any serious code generation. I hope.

The history of this script went something like:

  • hrm, i’m supposed to parse serialized php, i bet i can do it with sed, how hard can it be?
  • a “perfect” solution (probably certain line in that code now) was born
  • hrm, i have new problem, another “perfect” solution was made
  • possibly repeat the above step few more times
  • hrm, this doesn’t work, let’s add some ugly solutions/gluing, just for test
  • possibly repeat the above step few more times
  • it WORKS!!, sleep time

Don’t get me wrong, shell is a great tool, just remember to keep it in your pants cowboy ;) . It allows you to do simple tasks with a speed (and character count :) ) python can’t possibly rival, but if you do too complex problems with bash you can easily slip in the land of unmaintainable.

As for the python, i agree pretty much with Tomaz, python is boring, you miss (almost) all the fun of look for a “perfect” one liner, the trade off for this is that the code is syntactically maintainable and not having to look up the syntax often.

Fast forward to recent times and you see that i use shell just to generate simple python array (well, probably soon to be two arrays), but that is harmless and can’t possibly turn into an un-maintainable monster, right?

Reblog this post [with Zemanta]

bash tips’n'tricks

Saturday, June 27th, 2009
Bash
Image via Wikipedia

Lets begin with a link, this is really REALLY REALLY good resource for learning bash (especially the BashFAQ).

Imagine this, you wrote a script for some thing, now that is ready for production you put it in the crontab and let it run from there, BUT in certain conditions that script fails. Assuming the error ain’t trivial the way i tackle the problem is to add something ala this to the top of the script.

exec 2> log.`date +%Y.%m.%d.%H:%M`
set -x

The first statement redirects stderr to a certain file, the second statement enables “xtrace” printing commands as they are executed. The undesired side effect of this is that ALL stderr is redirected to that particular file. The alternative to this would be to do proper logging, but if your script is long enaugh that you need proper logging there are pretty good chances that you shouldn’t be doing it in the bash in the first place ;) .

Ever found yourself wanting to do some simple processing of some command and wanting to use the result?

echo some string | while read; do foo='boo'; done; echo $foo

This echoes nothing as $foo is empty. What happens in the background is that pipes spawn the subshell, so the variable $foo is assigned in the subshell, and when the loop terminates subshell is closed and normal shell is execution is continued, which doesn’t know anything about $foo.

A solution for this is called “process substitution”, to the best of my knowledge it is exactly the same as pipe, only without the subshell part. As a side not, ksh doesn’t spawn subshells with pipes.

while read; do foo='boo'; done < <(echo some string); echo $foo

Let’s talk about dotfiles, specifically about those initializing the shell, in bash that would usually be .bash_profile and .bashrc, what they can be used for is to customize you bash on every startup. The difference is that .bash_profile is executed when a login shell is started (for exmple when you ssh to a box or do “su -”), while the .bashrc is executed when nonlogin shell starts (usually when you start mrxvt/xterm/konsole/…). Why the f**** do we have distinction between the two modes? No good reason really, it’s because we were doing it this way for 20+ years, why break the tradition? :/

[[ $PS1 ]] && exec bash -l

Having the above in .bashrc is the way around this annoyance, it allows you to have all your configuration in .bash_profile. The only thing it does is execute a login shell (thus triggering .bash_profile execution). The condition makes sure that the login shell is executed only when you are in interactive mode (bash by definition/man page set’s PS1 only for interactive shells). A non-interactive shell is spawned when you use scp.

Reblog this post [with Zemanta]
Blog Widget by LinkWithin