Pipes

Pipes

The common Linux shells all allow redirection. For example

$ ls | pr | lpr

pipes the output from the ls command listing the directory’s files into the standard input of the pr command which paginates them. Finally the standard output from the pr command is piped into the standard input of the lpr command which prints the results on the default printer. Pipes then are unidirectional byte streams which connect the standard output from one process into the standard input of another process. Neither process is aware of this redirection and behaves just as it would normally. It is the shell that sets up these temporary pipes between the processes.


Figure: Pipes

In Linux, a pipe is implemented using two file data structures which both point at the same temporary VFS inode which, in turn, points at a physical page within memory. Figure  5.1 shows that each file data structure contains pointers to different file operation routine vectors: one for writing to the pipe, the other for reading from the pipe.

This hides the underlying differences from the generic system calls which read and write to ordinary files. As the writing process writes to the pipe, bytes are copied into the shared data page and when the reading process reads from the pipe, bytes are copied from the shared data page. Linux must synchronize access to the pipe. It must make sure that the reader and the writer of the pipe are in step and to do this it uses locks, wait queues and signals.

When the writer wants to write to the pipe it uses the standard write library functions. These all pass file descriptors that are indices into the process’ set of file data structures, each one representing an open file or, as in this case, an open pipe. The Linux system call uses the write routine pointed at by the file data structure describing this pipe. That write routine uses information held in the VFS inode representing the pipe to manage the write request.

If there is enough room to write all of the bytes into the pipe and, so long as the pipe is not locked by its reader, Linux locks it for the writer and copies the bytes to be written from the process’ address space into the shared data page. If the pipe is locked by the reader or if there is not enough room for the data then the current process is made to sleep on the pipe inode’s wait queue and the scheduler is called so that another process can run. It is interruptible, so it can receive signals and it will be awakened by the reader when there is enough room for the write data or when the pipe is unlocked. When the data has been written, the pipe’s VFS inode is unlocked and any waiting readers sleeping on the inode’s wait queue will themselves be awakened.

Reading data from the pipe is a very similar process to writing to it.

Processes are allowed to do non-blocking reads (depending on the mode in which they opened the file or pipe and if there is no data to be read or if the pipe is locked, an error will be returned,as in this case). This means that the process can continue to run. The alternative is to wait on the pipe inode’s wait queue until the write process has finished. When both processes have finished with the pipe, the pipe inode is discarded along with the shared data page.

Linux also supports named pipes, also known as FIFOs because pipes operate on a First In, First Out principle. The first data written into the pipe is the first data read from the pipe. Unlike pipes, FIFOs are not temporary objects, they are entities in the file system and can be created using the mkfifo command. Processes are free to use a FIFO so long as they have appropriate access rights to it. The way that FIFOs are opened is a little different from pipes. A pipe (its two file data structures, its VFS inode and the shared data page) is created in one go whereas a FIFO already exists and is opened and closed by its users. Linux must handle readers opening the FIFO before writers open it as well as readers reading before any writers have written to it. That aside, FIFOs are handled almost exactly the same way as pipes and they use the same data structures and operations.