sed vs awk

This post attempts to answer that question once and forever! Just joking ;), the answer of course depends on the type of task you are doing. Let's have a look at the differences and use cases when these tools are appropriate. The tools are fundamentally different, awk is full blown language while sed is just a tool. For the record, my personal preference is sed. :)
The typical use case for awk is column manipluation. Say you wanted to print some fields from CSV.
$ seq 19 | tr '\n' , | awk -F, '{ print $12, $11, $9 }'
12 11 9
$ seq 19 | tr '\n' , | sed \
's/\([^,]*,\)\{8\}\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),.*/\5 \4 \2/'
12 11 9
Awk's -F determines what awk consideres a delimiter, $N (where N is a number) is that field in current line. That being said it's quite easy to understand the awk line, while the sed is rather cumbersome. And here is a related hint, -F takes a regex as delimiter, so to have awk split the line on either space or coma you would use -F'[ ,]'.
$ seq 10 | awk '{ i+=$1 } END { print i }'
55
Next use case for awk is math, sed can't do arithmetics in a sane way. Add up all first columns is a simple task in awk with no (sane) sed alternative. Another common use case is if you want to compare certain column to some value (say, want lines with second column more than 10).
$ var=a
$ echo ab | sed "s/$var/b00/g"
b00b
$ echo ab | awk -v var=$var '{ gsub(var, "b00"); print; }'
b00b
![]()
Next up is philosophical stuff, with sed you can't really separate the input from the logic. You are essentially generating sed commands, with awk you have fixed logic and are only telling it what variable to act on. Consider what happens if you pass any characters special to sed in $var? sed chokes, awk doesn't care.
$ var='\'
$ echo ab | sed "s/$var/b00/"
sed: -e expression #1, char 8: unterminated `s' command
$ echo ab | awk -v var=$var '{ gsub(var, "b00"); print; }'
ab
This particular problem is easy to work around, you have to escape the backslash and slash (because it is the sed delimiter) and you're good to go:
var=$(echo "$var" | sed 's/[\/\\]/\\&/g')
But what happens if you use a regex meta character (such as '^' or '$') in your input? This problem is really irrelevant to the sed vs awk debate, as both suffer from this problem. The solution of course is the same as above, escape it. Here it is, assumes you are not enabling ERE (if you are couple more meta characters have to be added). Before we have a look at the example let me explain that this kind of thing is usually wrong :), you are putting a hack in there. A sane alternative is usually fgrep.
var=$(echo "$var" | sed 's/[]\/*.^$[]/\\&/g')
Awk being a full blown language offers you considerable control over flow of he program. In sed you have 'b' to do an unconditional jump, 't' for conditional jump and unportable 'T' for negated conditional jump. With awk you can jumps (you have if statement) on pretty much arbitrary conditions.
Doing anything with variables is another big no-no in sed. If you want to do filter the rest of the text based on something in the text itself, there is no sane way to do that. Event simple things like getting lines where 1st column is contained in the forth is painful/unmaintainable in sed. Here we use a handy sed feature where you can use back referencing even when still on the left side of the regex.
$ echo '1,2,3,11,4' | awk -F, '$4 ~ $1'
1,2,3,11,4
$ echo '1,2,3,11,4' | sed -n \
'/^\([^,]*\),\([^,]*,\)\{2\}[^,]*\1/p'
1,2,3,11,4
An advantage sed has over awk is that it uses NFA regex engine, making back references possible. There are hacks that allow you to get back referencing to work in awk, one is gawk's gensub(), but that places you in the portability ghetto, There was awk library named awke which provided this functionality as well, but it seems dead now.
So in conclusion awk is great for math inside text and for column manipulation, i use sed for most of the other stuff. And if you have really complex stuff to do chances are in the long term it is more maintainable to do it in say python.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=1abec753-1e3e-4493-ac6b-3fb4e69780c5)