Pipes and Redirection
Perhaps the most commonly used character is “|”, which is referred to as the
pipe symbol, or simply pipe.
This enables you to pass the output of one command
through the input of another. For example, say you would like to do a long
directory listing of the /bin directory. If you type
ls -l and then press Enter,
the names flash by much too fast for you to read. When the display finally
stops, all you see is the last twenty entries or so.
If instead we ran the command
ls -l | more
the output of the ls command is said to be
“piped through more”. The process is called “piping”.
In this way, we can scan through the list a screenful at a time.
In our discussion of standard input
and standard output in the section on
basic operating system concepts,
we talked about standard input
as being just a file that usually points to your terminal.
In this case, standard output
is also a file that usually points to your terminal.
The standard output
of the ls command is changed to point to the
pipe, and the standard input
of the more command is changed to point to the pipe as well.
The way this works is that when the shell
sees the pipe symbol, it creates a
temporary file on the hard disk. Although it does not have a name or directory
entry, it takes up physical space on the hard disk. Because both the terminal
and the pipe
are seen as files from the perspective of the operating system,
we are saying that the respective commands should use different files instead of standard input and standard output.
When you log in and are working from the command line,
standard input
is taken from your terminal
keyboard and both standard output and standard error
are sent to your terminal screen. In other words, the
shell expects to be getting its input from the keyboard and showing the output
(and any error messages) on the terminal screen. This could be a
physical terminal directly connected to the machine or a pseudo-terminal when you connect from a remote machine.
Actually, the three (standard input, standard output,
and standard error)
are references to files that the shell
automatically opens. Remember that in
UNIX, everything is treated as a file. When the shell
starts, the three files it
opens are usually the ones pointing to your terminal.
When we run a command like cat, it gets input from a
file that it displays
to the screen. Although it may appear that the standard input
is coming from that file, the standard input
(referred to as stdin) is still the keyboard. This
is why when the file is large enough and you are using something like
more to display
the file one screen at a time and it stops after each page, you can
continue by pressing either the Spacebar or Enter key. That’s because
standard input is still the keyboard.
As it is running, more is displaying the contents of the file to the screen.
That is, it is going to standard output
(stdout). If you try to do a more on a file that does not exist, the message
file_name: No such file or directory
shows up on your terminal
screen as well. However, although it appears to be
in the same place, the error message
was written to standard error
(stderr). (I’ll show how this differs shortly.)
When you start a command, these three standard files are inherited by the new process. So, this command will take input from the keyboard and send output and error messages
to the terminal screen. When you input a command line containing a
pipe, the system know to change these for the respective commands.
Writing to and reading from pipes is not the only way the system can change stdin, stdout and stderr. You can also use real files for redirection. This is done quite often using a pair of characters: < and >.
The more common of the two, “>,” redirects the
output of a command into a file. That is, it changes
standard output.
An example of this would be ls /bin > myfile. If we were to run this command, we would
have a file (in my current directory) named myfile that contained the output of
the ls /bin command. This is because stdout
is the file myfile and not the
terminal. Once the command completes, stdout
returns to being the terminal.
What this looks like graphically, we see in the figure below.
Now, we want to see the contents of the file. We could simply say
more myfile, but that wouldn’t explain about redirection. Instead, we input
more <myfile
This tells the more command to take its standard input
from the file myfile instead of from the keyboard or some other file. (Remember, even when
stdin is the keyboard, it is still seen as a file.)
What about errors? As I mentioned, stderr
appears to be going to
the same place as stdout.
A quick way of showing that it doesn’t is by using
output redirection
and forcing an error. If wanted to list two directories and
have the output go to a file, we run this command:
ls /bin /jimmo > /tmp/junk
We then get this message:
/jimmo not found
However, if we look in /tmp, there is indeed a file called junk that
contains the output of the ls /bin portion of the command. What happened here
was that we redirected stdout
into the file /tmp/junk. It did this with the
listing of /bin. However, because there was no directory /jimmo (at least not on
my system), we got the error /jimmo not found. In other words, stdout
went into the file, but stderr
still went to the screen.
If we want to get the output and any error messages to go to the same place,
we can do that. Using the same example with ls, the command would be:
ls /bin /jimmo > /tmp/junk 2>&1
The new part of the command is 2>&1, which says that file descriptor
2 (stderr) should go to the same place as file descriptor
1 (stdout). By changing the command slightly like this:
ls /bin /jimmo > /tmp/junk 2>/tmp/errors
we can tell the shell
to send any errors someplace else. You will find quite
often in shell
scripts throughout the system that the file that error messages
are sent to is /dev/null. This has the effect of ignoring the messages
completely. They are neither displayed on the screen nor sent to a file.
Note that this command does not work as you would think:
ls /bin /jimmo 2>&1 > /tmp/junk
Redirection can also be combined with pipes like this:
sort < names | head
or
ps | grep sh > ps.save
In the first example, the standard input
of the sort command is redirected
to point to the file names. Its output is then passed to the pipe.
The standard
input of the head command (which takes the first ten lines) also comes from the
pipe. This would be the same as the command
sort names | head
which we see here:
In the second example, the ps command (process status) is piped through grep and all of the output is redirected to the file ps.save.
If we want to redirect stderr, we can. The syntax is similar:
command 2> file
It’s possible to input multiple commands on the same command line.
This can be accomplished by using a semi-colon (;) between commands. I have used this on
occasion to create command lines like this:
man bash | col -b > man.tmp; vi man.tmp; rm man.tmp
This command redirects the output of the man-page
for bash into the file man.tmp. (The pipe
through col -b is necessary because of the way the man-pages
are formatted.) Next, we are brought into the vi editor with the file man.tmp.
After I exit vi, the command continues and removes my temporary file man.tmp.
(After about the third time of doing this, it got pretty monotonous, so I
created a shell script to do this for me. I’ll talk more about shell scripts
later.)
Sometimes it is useful to not only redirect the output of a command to a file, but to monitor the output at the same time. One way would be to use to screens. The first re-directs the output of the command to a file. On the second screen you could read the output file as it is being written by using tail -f.
Alternative you can use the the tee command. As its name implies, the tee command creates a “T” whereby the output is sent to a the file specified as well as to standout. For example:
sort input_file | tee output_file
This command would send the output of the sort to the file output_file and to the screen (i.e. standard out).
Note that the output of tee is sent to standardout, it would therefore be possible to send the output of tee to a second pipe. If fact, you use tee multiple times with multiple files:
sort input_file | tee /tmp/output_file1 | tee /tmp/output_file2 | tee /tmp/output_file3