Pipes and Redirection

Pipes and Redirection

Perhaps the most commonly used character is “|”, which is referred to as the pipe symbol, or simply pipe. This enables you to pass the output of one command through the input of another. For example, say you would like to do a long directory listing of the /bin directory. If you type ls -l and then press Enter, the names flash by much too fast for you to read. When the display finally stops, all you see is the last twenty entries or so.

If instead we ran the command

ls -l | more
the output of the ls command is said to be “piped through more”. The process is called “piping”. In this way, we can scan through the list a screenful at a time.

In our discussion of standard input and standard output in the section on basic operating system concepts, we talked about standard input as being just a file that usually points to your terminal. In this case, standard output is also a file that usually points to your terminal. The standard output of the ls command is changed to point to the pipe, and the standard input of the more command is changed to point to the pipe as well.

The way this works is that when the shell sees the pipe symbol, it creates a temporary file on the hard disk. Although it does not have a name or directory entry, it takes up physical space on the hard disk. Because both the terminal and the pipe are seen as files from the perspective of the operating system, we are saying that the respective commands should use different files instead of standard input and standard output.

When you log in and are working from the command line, standard input is taken from your terminal keyboard and both standard output and standard error are sent to your terminal screen. In other words, the shell expects to be getting its input from the keyboard and showing the output (and any error messages) on the terminal screen. This could be a physical terminal directly connected to the machine or a pseudo-terminal when you connect from a remote machine.

Actually, the three (standard input, standard output, and standard error) are references to files that the shell automatically opens. Remember that in UNIX, everything is treated as a file. When the shell starts, the three files it opens are usually the ones pointing to your terminal.

When we run a command like cat, it gets input from a file that it displays to the screen. Although it may appear that the standard input is coming from that file, the standard input (referred to as stdin) is still the keyboard. This is why when the file is large enough and you are using something like more to display the file one screen at a time and it stops after each page, you can continue by pressing either the Spacebar or Enter key. That’s because standard input is still the keyboard.

As it is running, more is displaying the contents of the file to the screen. That is, it is going to standard output (stdout). If you try to do a more on a file that does not exist, the message

file_name: No such file or directory

shows up on your terminal screen as well. However, although it appears to be in the same place, the error message was written to standard error (stderr). (I’ll show how this differs shortly.)

When you start a command, these three standard files are inherited by the new process. So, this command will take input from the keyboard and send output and error messages to the terminal screen. When you input a command line containing a pipe, the system know to change these for the respective commands.

Writing to and reading from pipes is not the only way the system can change stdin, stdout and stderr. You can also use real files for redirection. This is done quite often using a pair of characters: < and >. The more common of the two, “>,” redirects the output of a command into a file. That is, it changes standard output. An example of this would be ls /bin > myfile. If we were to run this command, we would have a file (in my current directory) named myfile that contained the output of the ls /bin command. This is because stdout is the file myfile and not the terminal. Once the command completes, stdout returns to being the terminal. What this looks like graphically, we see in the figure below.

Now, we want to see the contents of the file. We could simply say

more myfile
, but that wouldn’t explain about redirection. Instead, we input

more <myfile

This tells the more command to take its standard input from the file myfile instead of from the keyboard or some other file. (Remember, even when stdin is the keyboard, it is still seen as a file.)

What about errors? As I mentioned, stderr appears to be going to the same place as stdout. A quick way of showing that it doesn’t is by using output redirection and forcing an error. If wanted to list two directories and have the output go to a file, we run this command:

ls /bin /jimmo > /tmp/junk

We then get this message:

/jimmo not found

However, if we look in /tmp, there is indeed a file called junk that contains the output of the ls /bin portion of the command. What happened here was that we redirected stdout into the file /tmp/junk. It did this with the listing of /bin. However, because there was no directory /jimmo (at least not on my system), we got the error /jimmo not found. In other words, stdout went into the file, but stderr still went to the screen.

If we want to get the output and any error messages to go to the same place, we can do that. Using the same example with ls, the command would be:

ls /bin /jimmo > /tmp/junk 2>&1

The new part of the command is 2>&1, which says that file descriptor 2 (stderr) should go to the same place as file descriptor 1 (stdout). By changing the command slightly like this:

ls /bin /jimmo > /tmp/junk 2>/tmp/errors

we can tell the shell to send any errors someplace else. You will find quite often in shell scripts throughout the system that the file that error messages are sent to is /dev/null. This has the effect of ignoring the messages completely. They are neither displayed on the screen nor sent to a file.

Note that this command does not work as you would think:

ls /bin /jimmo 2>&1 > /tmp/junk

stderr to the same place as stdout before we redirect stdout. So, stderr goes to the screen, but stdout goes to the file specified.

Redirection can also be combined with pipes like this:

sort < names | head

or

ps | grep sh > ps.save

In the first example, the standard input of the sort command is redirected to point to the file names. Its output is then passed to the pipe. The standard input of the head command (which takes the first ten lines) also comes from the pipe. This would be the same as the command

sort names | head

which we see here:

In the second example, the ps command (process status) is piped through grep and all of the output is redirected to the file ps.save.

If we want to redirect stderr, we can. The syntax is similar:

command 2> file

It’s possible to input multiple commands on the same command line. This can be accomplished by using a semi-colon (;) between commands. I have used this on occasion to create command lines like this:

man bash | col -b > man.tmp; vi man.tmp; rm man.tmp

This command redirects the output of the man-page for bash into the file man.tmp. (The pipe through col -b is necessary because of the way the man-pages are formatted.) Next, we are brought into the vi editor with the file man.tmp. After I exit vi, the command continues and removes my temporary file man.tmp. (After about the third time of doing this, it got pretty monotonous, so I created a shell script to do this for me. I’ll talk more about shell scripts later.)

Sometimes it is useful to not only redirect the output of a command to a file, but to monitor the output at the same time. One way would be to use to screens. The first re-directs the output of the command to a file. On the second screen you could read the output file as it is being written by using tail -f.

Alternative you can use the the tee command. As its name implies, the tee command creates a “T” whereby the output is sent to a the file specified as well as to standout. For example:

sort input_file | tee output_file

This command would send the output of the sort to the file output_file and to the screen (i.e. standard out).

Note that the output of tee is sent to standardout, it would therefore be possible to send the output of tee to a second pipe. If fact, you use tee multiple times with multiple files:

sort input_file | tee /tmp/output_file1 | tee /tmp/output_file2 | tee /tmp/output_file3