Posts Tagged ‘Bash’

Detecting command failures in bash

Sunday, January 31st, 2010

Suppose you want to detect weather all the commands executed successfully in your script:

#!/usr/bin/env bash
 
 
# make sure the script exits as soon
# as first error is encountered
set -e
success=0
 
# call our function on (nice) exit of the script
trap on_exit 0
 
on_exit(){
    if (( success )); then
        echo 'all commands executed successfully'
    else
        echo 'at least one command failed'
    fi
}
 
# your commands go here
true
#false
 
# if the execution come to here means 
# set -e did NOT exit the script
success=1

Some notes:

  • Uncomment “false” to see the code work
  • set -e depends on the commands behaving nicely and returning a non zero exit code on error. Just about all standard tools do this.
  • The on_exit will NOT be called if your script was interrupted with SIGKILL
  • Dummy echo commands are useless :) , you should replace them with something like mailing/logging

Parallelization in bash

Friday, July 31st, 2009

Ever had multi core system and some kind of work load which you glued in unix shell that could use parallelization? I did, so i wrote a poor mans parallelization technique in bash.

parallel(){
    # number of processes you want
    num=$1
    shift
    # loop through commands
    for i; do 
        # make sure at most $num processes are active
        until ((`jobs | wc -l` < num)); do
            sleep 1
        done
        # execute the current command in background
        $i &
    done
    # wait for all the remaining processes
    wait
}

The following snippet demonstrates how to run 3 commands, up to 2 in parallel.

$ time parallel 2 "sleep 1" "sleep 2" "sleep 3" 2> /dev/null 
 
real	0m4.080s
user	0m0.024s
sys	0m0.036s

What happens is the following:
- executes sleep 1
- executes sleep 2
- waits 1 second (for the first one to finish)
- executes sleep 3
- once it runs out of commands just waits for everything to finish

The ones that we are waiting for here are steps 1 and 3, so they add up to 4 seconds we see.

bash builtins

Wednesday, July 29th, 2009

Bash
Image via Wikipedia

This post will be fully dedicated to stuff you can do from with in bash, as opposed to invoking external programs to do the job.

$ help :
:: :
    No effect; the command does nothing.
    A zero exit code is returned.

This are two interesting bits here, first one is the shell builtin “help”. Bash apparently doesn’t believe in individual man pages and documents it’s stuff in either hugh man bash, “help” pages or not at all (hey completion!). The second interesting part is the :, as you can see it does nothing. Why does it exist? Well, to start with it was defined in a standard. To be honest it can be useful, for example to create a cool looking infinite loops:

while :; do ....; done

Suppose you would like a quick and generic way to have a paginated display of some string in your current directory. A naive way would be:

$ alias cmd='grep -r "$1" . | less'
$ cmd string
string: No such file or directory

The problem here is that alias is nothing more than a dumb text expansion, the only thing it does is expand the cmd to grep -r “$1″ . | less. So the cmd string is exactly the same as:

$ grep -r "$1" . | less string
string: No such file or directory

The reason why we get the error is that less treats any argument as a file to open, since i don’t have a “string” file it fails. So what is a solution if you want to insert an argument in the middle of pipeline? An ugly way would be to create a script, a cool way would be to create a function. Functions in bash can take arguments and are full blown composite commands. The final solution would be something ala:

cmd(){ grep -r "$1" . | less; }

Ever wanted to do regex straight out bash? Since bash3 with the =~ operator you can. The result is stored in BASH_REMATCH array, first element is the entire string regex matched, the others are stored back references.

$ [[ caab =~ [^a]*(a+).* ]] && printf "\
entire string matched: ${BASH_REMATCH[0]}
first back reference: ${BASH_REMATCH[1]}\n"
entire string matched: caab
first back reference: aa

bash WTFs

Sunday, July 19th, 2009

wtf

Here is my collection of weird bash features, stuff that doesn’t really behave the way i expected it. For short, stuff that made me do “WTF”.

$ [[ 8 > 12 ]] && echo true
true

Wait what? 8 is more than 12? What happens here is that bash performs a lexicographical comparison and since 8 is bigger than 1 it returns true.

$ (( 8 > 12 )) && echo true

The double brackets cause bash to do arithmetic evaluation inside which > is a math operator. Another feature of arithmetic mode is that you don’t have to use the $ to reference variables.

$ var='\'
$ var=`echo "$var" | sed 's/\\/\\\\/g'`
sed: -e expression #1, char 8: unterminated `s' command

Above is the naive way of trying to replace the backslashes in a variable with sed.. What really happens here is that bash interprets the backslashes once before they get to sed, so what sed get’s to see is ’s/\/\\/g’. As far as it is concerned the escaped slash is not a delimiter, and since the substitution expect 3 delimiters it throws the error.

$ var='\'
$ echo `echo "$var" | sed 's/\\\\/\\\\\\\\/g'`
\\
$ echo $(echo "$var" | sed 's/\\/\\\\/g')
\\

As you can see the possible solutions to this are either escape the backslashes one more time, or simply use the newer form of command substitution. Why does it interpret the backslashes? Why does $(command) behave differently? Since ksh behaves the same way i’d assume this is a burden of history we have to live with, and with $(command) they are fixing it. This is a documented misfeature.

When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by ‘$’, ‘`’, or ‘\’. .. When using the $(command) form, all characters between the parentheses make up the command; none are treated specially.

While we are at backslashes let’s try to escape the backslash with awk.

$ echo '\' | awk '{ gsub("\\\\", "replaced"); print; }'
replaced

Wait, what? One would expect that ‘\\\\’ pattern would match ‘\\’? It turns out this is another documented misfeature, that string is parsed twice and both times the backslashes are interpreted :| .

When using sub, gsub, or gensub, and trying to get literal backslashes and ampersands into the replacement text, you need to remember that there are several levels of escape processing going on.

First, there is the lexical level, which is when awk reads your program and builds an internal copy of it that can be executed. Then there is the runtime level, which is when awk actually scans the replacement string to determine what to generate.

Pipe handling also deserves to be in the WTF category. My thoughts/examples are here while the official docs exaplain this:

Each command in a pipeline is executed in its own subshell

$ time sleep 0.1 &> file
real	0m0.106s
user	0m0.000s
sys	0m0.008s

This is a final WTF :) . One would expect that the output of time would go to “file”, well, wrong. This is possible because time is a shell keyword and as such can do stuff no other kind (builtins, commands, aliases, functions) in shell ecosystem can do. The positive effect of this kind of behavior is that you can pass time a pipeline and it will time entire pipeline, as opposed to just the first part.

$ { time sleep 0.1; } &> file
$ cat file 
 
real	0m0.108s
user	0m0.004s
sys	0m0.000s

I can’t find this documented anywhere in the official docs, it is however documented in BashFAQ

This concludes my list of bash WTFs, if you can think of any more please leave a comment :)

Reblog this post [with Zemanta]

sed vs awk

Saturday, July 18th, 2009

awk


This post attempts to answer that question once and forever! Just joking ;) , the answer of course depends on the type of task you are doing. Let’s have a look at the differences and use cases when these tools are appropriate. The tools are fundamentally different, awk is full blown language while sed is just a tool. For the record, my personal preference is sed. :)

The typical use case for awk is column manipluation. Say you wanted to print some fields from CSV.

$ seq 19 | tr '\n' , | awk -F, '{ print $12, $11, $9 }'
12 11 9
$ seq 19 | tr '\n' , | sed \
's/\([^,]*,\)\{8\}\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),.*/\5 \4 \2/'
12 11 9

Awk’s -F determines what awk consideres a delimiter, $N (where N is a number) is that field in current line. That being said it’s quite easy to understand the awk line, while the sed is rather cumbersome. And here is a related hint, -F takes a regex as delimiter, so to have awk split the line on either space or coma you would use -F’[ ,]‘.

$ seq 10 | awk '{ i+=$1 } END { print i }'
55

Next use case for awk is math, sed can’t do arithmetics in a sane way. Add up all first columns is a simple task in awk with no (sane) sed alternative. Another common use case is if you want to compare certain column to some value (say, want lines with second column more than 10).

$ var=a
$ echo ab | sed "s/$var/b00/g"
b00b
$ echo ab | awk -v var=$var '{ gsub(var, "b00"); print; }'
b00b

Next up is philosophical stuff, with sed you can’t really separate the input from the logic. You are essentially generating sed commands, with awk you have fixed logic and are only telling it what variable to act on. Consider what happens if you pass any characters special to sed in $var? sed chokes, awk doesn’t care.

$ var='\'
$ echo ab | sed "s/$var/b00/"
sed: -e expression #1, char 8: unterminated `s' command
$ echo ab | awk -v var=$var '{ gsub(var, "b00"); print; }'
ab

This particular problem is easy to work around, you have to escape the backslash and slash (because it is the sed delimiter) and you’re good to go:

var=$(echo "$var" | sed 's/[\/\\]/\\&/g')

But what happens if you use a regex meta character (such as ‘^’ or ‘$’) in your input? This problem is really irrelevant to the sed vs awk debate, as both suffer from this problem. The solution of course is the same as above, escape it. Here it is, assumes you are not enabling ERE (if you are couple more meta characters have to be added). Before we have a look at the example let me explain that this kind of thing is usually wrong :) , you are putting a hack in there. A sane alternative is usually fgrep.

var=$(echo "$var" | sed 's/[]\/*.^$[]/\\&/g')

Awk being a full blown language offers you considerable control over flow of he program. In sed you have ‘b’ to do an unconditional jump, ‘t’ for conditional jump and unportable ‘T’ for negated conditional jump. With awk you can jumps (you have if statement) on pretty much arbitrary conditions.

Doing anything with variables is another big no-no in sed. If you want to do filter the rest of the text based on something in the text itself, there is no sane way to do that. Event simple things like getting lines where 1st column is contained in the forth is painful/unmaintainable in sed. Here we use a handy sed feature where you can use back referencing even when still on the left side of the regex.

$ echo '1,2,3,11,4' | awk -F, '$4 ~ $1'
1,2,3,11,4
$ echo '1,2,3,11,4' | sed -n \
'/^\([^,]*\),\([^,]*,\)\{2\}[^,]*\1/p'
1,2,3,11,4

An advantage sed has over awk is that it uses NFA regex engine, making back references possible. There are hacks that allow you to get back referencing to work in awk, one is gawk’s gensub(), but that places you in the portability ghetto, There was awk library named awke which provided this functionality as well, but it seems dead now.

So in conclusion awk is great for math inside text and for column manipulation, i use sed for most of the other stuff. And if you have really complex stuff to do chances are in the long term it is more maintainable to do it in say python.

Reblog this post [with Zemanta]

sed and grep tips

Wednesday, July 15th, 2009

It was quite a while since i did any sed, i rediscovered that joy when someone at #sed irc channel asked how to turn “/abc/def/filename” in “<a href=”file://A:/abc/def/filename”>A:\abc\def\filename</a>”.

echo '/abc/def/filename' | sed \
 -e 's@.*@<a href="&">A:&</a>@' -e :b\
 -e 's@\(>[^/]*[^<]\)/@\1\\@' -e tb

The first command is really basic, it turns ‘/abc/def/filename’ into ‘<a href=”file://A:/def/filename”>A:/abc/def/filename</a>’. The problem here is that the second part remains with slashes instead of backslashes. Conditional branching to help! ‘:b’ means “define label named ‘b’ here”. The substitution means “replace ‘>’ followed by any sequence of not ‘/’, a not ‘<’ and an ‘/’ with captured part (everything but the last ‘/’) and a ‘\’ (double backslash needed to escape it for sed). Basically we are replacing the ‘/’ with ‘\’. the “tb” part means “jump to label ‘b’ if the substitution has been performed successfully. :)

But this is terribly inefficient! It performs one initial substitution plus one for each slash and it performs non trivial matching :| .

echo '/abc/def/filename' | sed -e h\
 -e 's@/@\\@g' -e G -e \
's@\([^\n]*\)\n\(.*\)@<a href="file://\2">A:\1</a>@'

This solution kicks of with pushing whatever is in current pattern space (the line that is being currently processed) to hold space. Hold space is as close as you’ll get to variables in sed. After that we prepare the backslashed version (replacing all the slashes with backslashes) and than with ‘G’ we put new line in pattern space and append hold space (the slashed version). We break the pattern space by the new line (remember first part has backslashed version, second slashed) and simply put that in context.

While we are at sed let’s look at my submission at the hello world challenge:

set 68 65 6C 6C 6F 1 77 6F 72 6C 64;
while [[ $1 ]]; do
dd if=/dev/urandom bs=1c count=1 2> /dev/null | \
sed -n "/\\x$1/{ p; Q1; }" || shift; done; echo

In a nutshell this looks for “hello world” characters in your /dev/urandom. First thing you’ll probably notice are the numbers, they are hex codes of “hello world” ascii values. The code kicks of with set line, it sets the positional parameters to those numbers. With dd we are getting a single character from /dev/urandom and discarding the stats dd gives us. If we find the character we print it and exit with ‘1′. Since one is considered error code the shift is called, discarding the current first positional parameter (so at the next iteration new character is used). And it keeps looping until it has positional parameter to loop through :) . Note that this code is unportable, to the best of my knowledge only gsed has ‘Q’.

As a side note, unless you are working with/for ancient bourne shell you should use [[ instead of [.

dpkg -L subversion | egrep "(${PATH//:/.|}.)"
GREP
Image by dannyman via Flickr

Ever wanted to find out what binaries does a certain debian package provide? The above satisfies that curiosity. We get the input from dpkg -L which lists the content of a package. The most interesting bit is happening inside the (), it uses parameter expansion feature of shell to replace the : (the standard PATH delimiter) with a '|'. The '.' are only there to make sure something is following the path, so that empty directories don't get matched.

Ever had a huge config file with shitload of examples? But you only wanted to find out what is currently being used?

alias nco="egrep -v '^[[:space:]]*(#|$)'"

This little alias excludes all the empty lines and the comments (assuming comments start with '#'). Use it like:

nco filename
something | nco

More bash :-)

Wednesday, July 8th, 2009
grep -Ir --exclude='.svn' dbsettings !(dbsettings)

This is the line i used couple of days ago when searching through pydra’s code. -I and -r are what i’m using on daily basis, first one tells grep to ignore binary files second one to search the directory recursively. What i don’t use often is the –exclude parameter, it tells grep to ignore the files that have .svn anywhere in their path. Since svn stores it’s metadata in the .svn dirs that is very useful. “dbsettings” is simply a string i want to search for.

Now to the interesting part “!(dbsettings)”. This is called extended globing and allows you to use regex like pattern inside shell. Some of the available operators are ‘?’, ‘+’, ‘*’ and ‘@’ (the later does nothing, only allows you to use other features). Suppose we want to get the files start_master.sh and start_node.sh:

shopt -s extglob; ls start_@(master|node).sh
bash: syntax error near unexpected token `('

First of all a word about shopt -s extglob, extended globing is off by default and with this you turn it on. That apart what the hell just happened? You see bash parses input line by line and glob expansion is one of the things that attempts to do before it actually got to executing the shopt thingy.

$ shopt -s extglob
$ ls start_@(master|node).sh
start_master.sh  start_node.sh

This per-se is not really useful, the cool feature about extglob is that they are space proof. Looping through it is OK (as opposed to `ls | grep`). Here you can read up on simple as well as extended globing.

You should use “#!/usr/bin/env bash” as the shebang line in your script, it increases the portability (/usr/bin/env is required by POSIX to be there, BSD’s tend to have bash in /usr/local/bin). What really happens here is that kernel invokes that binary passing it the file you are executing as an argument. As a side note most kernels don’t perform argument splitting (one exception i can think of is freebsd). Due to that “#!/usr/bin/env bash -u” will not work on most kernels, as kernels tells the env to execute “bash -u” which it can’t find and errors :-) .

Here is the tip for the script writers and vim users

function Modchange()
    if getline(1) =~ "^#!.*/bin"
        silent !chmod a+x <afile>
    endif
endfunction
au BufWritePost * call Modchange()

Having that in your .vimrc will make the file executable on save automatically if it thinks it’s a script. NOTE: i probably stole this from the internets many moons ago.

Reblog this post [with Zemanta]
Blog Widget by LinkWithin