Sed

Sed

Suppose you have a file in which you need to make some changes. You could load up vi and make the changes that way, but what if what you wanted to change was the output of some command before you sent it to a file? You could first send it to a file and then edit that file, or you could use sed, which is a stream editor that is specifically designed to edit data streams.

If you read the previous section or are already familiar with either the search and replace mechanisms in vi or the editor ed, you already have a jump on learning sed. Unlike vi, sed is non-interactive, but can handle more complicated editing instructions. Because it is non-interactive, commands can be saved in text files and used over and over. This makes debugging the more complicated sed constructs much easier. For the most part, sed is line-oriented, which allows it to process files of almost any size. However, this has the disadvantage that sed cannot do editing that is dependent on relative addressing.

Unlike the section on vi, I am not going to go into as many details about sed. However, sed is a useful tool and I use it often. The reason I am not going to cover it in too much detail is three-fold. First, much of what is true about pattern searches, addressing, etc., in vi is also true in sed. Therefore, I don’t feel the need to repeat. Second, it is not that important that you become a sed expert to be a good system administrator. In a few cases, scripts on a Linux system will use sed. However, they are not that difficult to understand, provided you have a basic understanding of sed syntax. Third, sed is like any programming language, you can get by with simple things. However, to get really good, you need to practice and we just don’t have the space to go beyond the basics.

In this section, I am going to talk about the basics of sed syntax, as well as some of the more common sed commands and constructs. If you want to learn more, I recommend getting sed & awkby Dale Dougherty from O’Reilly and Associates. This will also help you in the section on awk, which is coming up next.

The way sed works is that it reads input one line at a time, and then carries out whatever editing changes you specify. When it has finished making the changes, it writes them to stdout. Like commands such as grep and sort, sed acts like a filter. However, with sed you can create very complicated programs. Because I normally use sed as one end of a pipe, most of the sed commands that I use have the following structure:

first_cmd | sed <options> <edit_description>

This is useful when the edit descriptions you are using are fairly simple. However, if you want to perform multiple edits on each line, then this way is not really suitable. Instead, you can put all of your changes into one file and start up sed like this

first_cmd | sed -f editscript

or

sed -f editscript <inputfile

As I mentioned before, the addressing and search/replace mechanisms within sed are basically the same as within vi. It has the structure

[address1[,address2]] edit_description [arguments]

As with vi, addresses do not necessarily need to be line numbers, but can be regular expressions that sed needs to search for. If you omit the address, sed will make the changes globally, as applicable. The edit_description tells sed what changes to make. Several arguments can be used, and we’ll get to them as we move along.

As sed reads the file, it copies each line into its pattern space. This pattern space is a special buffer that sed uses to hold a line of text as it processes it. As soon as it has finished reading the line, sed begins to apply the changes to the pattern space based on the edit description.

Keep in mind that even though sed will read a line into the pattern space, it will only make changes to addresses that match the addresses specified and does not print any warnings when this happens. In general, sed either silently ignores errors or terminates abruptly with an error message as a result of a syntax error, not because there we no matches. If there are no lines that contain the pattern, no lines match, and the edit commands are not carried out.

Because you can have multiple changes on any given line, sed will carry them each out in turn. When there are no more changes to be made, sed sends the result to its output. The next line is read in and the whole process starts over. As it reads in each line, sed will increment an internal line counter, which keeps track of the total number of lines read, not lines per file. This is an important distinction if you have multiple files that are being read. For example, if you had two 50-line files, from sed’s perspective, line 60 would be the tenth line in the second file.

Each sed command can have 0, 1, or 2 addresses. A command with no addresses specified is applied to every line in the input. A command with one address is applied to all lines that match that address. For example:

/mike/s/fred/john/

substitutes the first instance of “john for “fred only on those lines containing “mike. A command with two addresses is applied to the first line that matches the first address, then to all subsequent lines until a match for the second address is processed. An attempt is made to match the first address on subsequent lines, and the process is repeated. Two addresses are separated by a comma.

For example

50,100s/fred/john/

substitutes the first instance of “john for “fred from line 50 to line 100, inclusive. (Note that there should be no space between the second address and the s command.) If an address is followed by an exclamation mark (!), the command is applied only to lines that do not match the address. For example

50,100!s/fred/john/

substitutes the first instance of “john for “fred everywhere except lines 50 to 100, inclusive.

Also, sed can be told to do input and output based on what it finds. The action it should perform is identified by an argument at the end of the sed command. For example, if we wanted to print out lines 5-10 of a specific file, the sed command would be

cat file | sed -n 5,10p

The -n is necessary so that every line isn’t output in addition to the lines that match.

Remember the script we created in the first section of this chapter, where we wanted just lines 510 of every file. Now that we know how to use sed, we can change the script to be a lot more efficient. It would now look like this:

find ./letters/taxes -print | while read FILE do echo $FILE cat $FILE | sed -n 5-10p done

Rather than sending the file through head and then the output through tail, we send the whole file through sed. It can keep track of which line is line 1, and then print the necessary lines.

In addition, sed allows you to write lines that match. For example, if we wanted all the comments in a shell script to be output to a file, we could use sed like this:

cat filename | sed -n /^#/w filename

Note that there must be exactly one space between the w and the name of the file. If we wanted to read in a file, we could do that as well. Instead of a w to write, we could use an r to read. The contents of the file would be appended after the lines specified in the address. Also keep in mind that writing to or reading from a file are independent of what happens next. For example, if we write every line in a file containing the name “John,” but in a subsequent sed command change “John” to “Chris,” the file would contain references to “John,” as no changes are made. This is logical because sed works on each line and the lines are already in the file before the changes are made.

Keep in mind that every time a line is read in, the contents of the pattern space are overwritten. To save certain data across multiple commands, sed provides what is called the “hold space.” Changes are not made to the hold space directly, rather the contents of either one can be copied into the other for processes. The contents can even be exchanged, if needed. The table below contains a list of the more common sed commands, including the commands used to manipulate the hold and pattern spaces.

Table sed Commands

a

append text to the pattern space

b

branch to a label

c

append text

d

delete text

D

delete all the characters from the start of the pattern space up to and including the first new line

g

overwrite the pattern space with the holding area

G

appends the holding area to the pattern space, separated with a new line

h

overwrite holding area with pattern space

H

append

s the pattern space to the holding area, separated

by a newlinewith a new line

i

insert text

l

list the contents of the pattern space

n

add a new line to the pattern space

N

append the next input line to the pattern space, separated lines with a new line

p

print the pattern space

P

print from the start of the pattern space up to and including the first new line

r

read in a file

s

substitute patterns

t

branch only if a substitution has been made to the current pattern space

w

writes to a file

x

interchange the contents of the pattern space and the holding area (the maximum number of addresses is two)