Gentle Introduction to sed

If you’ve been using linux for some time, you’ve almost certainly come across sed before. Especially in the context of databases, you’ve probably done the quick and dirty

sed 's/\t/,/g' <file>

to create the ad-hoc csv report out of a mysql output.

But sed, which stands for stream editor is a tool that can do a lot more than be used as a dirty hack to create csv files.

A basic familiarity with regular expressions is assumed here.

Addresses

In the above example, sed operated on the entire file, and by default that is how sed operates. However, addresses can be given to sed, in order to restrict the execution of sed to specific lines of the input file. E.g.

sed 10q <file>

Would tell sed to quit on line 10, thereby printing the first 10 lines of a file. Some commands, unlike quit (q) will also accept ranges, e.g. replace (s):

sed -E '20,25s/^\s{4}//' <file>

would tell to replace exactly 4 spaces from the beginning of lines with nothing.

Common tricks with addresses I find useful

Getting rid of the column names from an sql output sent to me

e.g. if there’s a file like

account
10000
10001
10002
etc.
sed '1d' <file>

is a simple and quick way to trim it off.

Replacing something on the last line of the file

The end of the file (the last line number) is generally not known until it is encountered, so in sed $ can be used to refer to the last line in a data file.

I sometimes find this useful when I need to build a quick query to insert data from a file, e.g.

10000,12
10001,13
10002,14
etc.

This can be turned into a quick INSERT query with a bit of awk

awk 'BEGIN{FS=","; printf "INSERT INTO some_table (column1,column2) VALUES ";}{print "("$1","$2"),"}' <file>

That results in an output like

INSERT INTO some_table (column1,column2) VALUES (10000,12),
(10001,13),
(10002,14),

which is not quite valid SQL.

However, a simple sed on the last line of this output can be used to fix it:

sed '$s/,$/;/'

piping that awk command into this sed expression will fix the issue, by replacing the last comma with a semi-colon:

INSERT INTO some_table (column1,column2) VALUES (10000,12),
(10001,13),
(10002,14);

Inverting match results

Sometimes the beginning of files might have helpful vim modelines, such as

# vim: set ft=i3config

that should ideally be preserved when uncommenting things from a file.

The ! symbol can be used to invert the match results.

sed '1 !s/^#//g'

The above command would preserve the vim modeline comment when deciding to e.g. toggle certain keybindings on in an i3 config file, while removing the # from every other line of the file.

Sed script files

When sed commands start being complex it might be difficult to understand one-liners. Thankfully, sed can use script files much like bash/awk, e.g.

#!/usr/bin/sed -E -f

/foo/{
    s/bar/baz/g
}

The above sed script would replace “bar” with “baz” on lines that contain “foo”.

The form here is somewhat alike to how awk works. Match addresses on the top level, and then commands within a pair of braces {} following the address.

E.g. if we have a file “new"

foo bar
foo baar
fo bar
bar

Running

sed -f sc < new

Returns

foo baz
foo baar
fo bar
bar

Commands

But we’ve only been scratching the surface here. Let’s see some commands we can use.

Branches and Labels

Labels and branches are low level control flow constructs to which higher level constructs, such as if, else, while etc. compile to. In other words, labels and branches can be used to create an internal loop structure within sed.

Here are a few useful facts about labels and branches:

-> A label must be preceded by a colon :

-> Each label must be uniquely named

-> A label can be placed anywhere within a sed command

-> A label can be targeted by any branch

-> a branch, denoted by b may be preceded by an address or regular expression, acting as the test condition for that branch

-> you can effectively think of a branch as sed’s version of “goto” when a branch condition passes, sed will execute commands from the target label instead of moving forward

Let’s try this in a practical example!

You’ve just written a script to be used via ssh, and is repeatedly connecting to some server and adding the results to some file, but every response is pre-pended with some automated messages like

Connected!
If you are not authorized, please disconnect immediately.
All activity is monitored and logged.

Most likely, you don’t want to keep these messages in the output files, and you need an intelligent way to cut this out.

This is exactly the situation that calls for sed, and make use of labels and branches:

E.g. we have a part of the file

2003,1000,00
Connected!
If you are not authorized, please disconnect immediately.
All activity is monitored and logged.
2010,1000,00
2011,1000,01

The following expression will cut this message out

sed -n -e '/Connected!/!p; :m' -e '//{' -e '$!{' -e 'n;n;b m' -e '}' -e'}' <file>

Giving

2003,1000,00
2010,1000,00
2011,1000,01

But how does it work?

Let’s unpack this expression

    #sed -n \
    #  -e '/Connected!/!p'      print lines that do NOT match "Connected!"
    #  -e ':m'                  define label "m"
    #  -e '//{'                 outer conditional: if line matches "Connected!" again
    #  -e '  $!{'               inner conditional: and it's NOT the last line
    #  -e '    n; n; b m'       skip next two lines and go back to label "m"
    #  -e '  }'                 end inner conditional
    #  -e '}'                   end outer conditional

Hold spaces

In general, sed treats each line as an individual unit, and knows nothing of the next line. The hold space is a way we can get sed to keep state between lines. This can be very useful when we want to match or replace things across multiple lines.

First we must quickly understand the internal storage space available to use in sed:

  1. Pattern Space: Pattern space is the internal sed buffer where sed places, and modifies, the line it reads from the input file.

  2. Hold Space: This is an additional buffer available to use, where we can tell sed to hold temporary data. The main point of the hold space, is that its contents do not get destroyed when moving on to the next line, unlike pattern space.

We have a few different options when it comes to manipulating these internal spaces:

-> n read the next line of input into the pattern space

-> N append the next line of input into the pattern space

-> x swap pattern space with hold space

In itself this swapping is rarely useful, but it can be a neat way to sort through some simple multi-line lists

e.g.

Linux
192.168.2.21
BSD
192.168.2.23
Linux
192.168.2.24

If you need the ip’s of the machines that are linux from this file, the following sed command can be used:

sed -n -e 'x;n' -e '/Linux/{x;p}' <file>

Giving the output:

192.168.2.21
192.168.2.24

Let’s break it down how it works:

    #sed -n
    #    -e 'x;n'          saves the current line into hold space, and reads the next line into pattern space.
    #                      So, we're saving the name of the OS into the hold space, and saving the ip address into pattern space.
    #    -e /Linux/{x;p}   If the content of pattern space contains "Linux", we swap back the contents of the hold space and the pattern space, and prints the pattern space.

-> h copy pattern space to hold space

-> g copy hold space to pattern space

-> H append pattern space to hold space

-> G append hold space to pattern space

A practical example of this is fixing up broken queries generated by scripts/applications. It sometimes happens that some script or app spits out something like

insert into some_table(some_id,stuff,thing) values
;

Which is clearly not valid SQL and will break whatever further process relies on this.

A simple fix would be to look for insert statements that are followed by a ; on the next line, and delete the insert statement in such cases. Leaving a lonely ; is no problem for sql.

The following command can be used to grab this insert and make it vanish as we desire.

sed -n '1h;1!H;${;g;s/insert into some_table(some_id,stuff,thing) values\n;/;/g;p;}'

How does it work? Let’s break it down.

    #sed -n
    #    1h;    on line 1, copy the contents of line into the hold buffer
    #    1!H;   on every line other than line 1, append the content of the line to the existing contents of the hold buffer
    #    ${     on the final/last line of the file
    #    g;     replace the pattern space with the hold buffer (this is the entire file, actually)
    #    s/insert into some_table(some_id,stuff,thing) values\n;/;/g;  # regex replace
    #    p;     print the modified result
    #    }      close the command

Here it is in action:

insert into some_table(some_id,stuff,thing) values
;

Executing

sed -n '1h;1!H;${;g;s/insert into some_table(some_id,stuff,thing) values\n;/;/g;p;}' <file>

Outputs

;

Final note

There’s much more that can be done with sed, and it is a worthy addition to any user’s toolbox. But I hope that these examples were enough to convince you that there are many cases where just a bit of sed can save you from having to come up with some complicated awk or perl script.