Linux Processes

Linux Processes

So that Linux can manage the processes in the system, each process is represented by a task_struct data structure (task and process are terms that Linux uses interchangeably). The task vector is an array of pointers to every task_struct data structure in the system.

This means that the maximum number of processes in the system is limited by the size of the task vector; by default it has 512 entries. As processes are created, a new task_struct is allocated from system memory and added into the task vector. To make it easy to find, the current, running, process is pointed to by the current pointer.

As well as the normal type of process, Linux supports real time processes. These processes have to react very quickly to external events (hence the term “real time”) and they are treated differently from normal user processes by the scheduler. Although the task_struct data structure is quite large and complex, but its fields can be divided into a number of functional areas:

State
As a process executes it changes state according to its circumstances. Linux processes have the following states: 1
Running
The process is either running (it is the current process in the system) or it is ready to run (it is waiting to be assigned to one of the system’s CPUs).
Waiting
The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances.
Stopped
The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a task_struct data structure in the task vector. It is what it sounds like, a dead process.

Scheduling Information
The scheduler needs this information in order to fairly decide which process in the system most deserves to run,

Identifiers
Every process in the system has a process identifier. The process identifier is not an index into the task vector, it is simply a number. Each process also has User and group identifiers, these are used to control this processes access to the files and devices in the system,

Inter-Process Communication
Linux supports the classic Unix TM IPC mechanisms of signals, pipes and semaphores and also the System V IPC mechanisms of shared memory, semaphores and message queues. The IPC mechanisms supported by Linux are described in the section on IPC.

Links
In a Linux system no process is independent of any other process. Every process in the system, except the initial process has a parent process. New processes are not created, they are copied, or rather cloned from previous processes. Every task_struct representing a process keeps pointers to its parent process and to its siblings (those processes with the same parent process) as well as to its own child processes. You can see the family relationship between the running processes in a Linux system using the pstree command:
init(1)-+-crond(98)
        |-emacs(387)
        |-gpm(146)
        |-inetd(110)
        |-kerneld(18)
        |-kflushd(2)
        |-klogd(87)
        |-kswapd(3)
        |-login(160)---bash(192)---emacs(225)
        |-lpd(121)
        |-mingetty(161)
        |-mingetty(162)
        |-mingetty(163)
        |-mingetty(164)
        |-login(403)---bash(404)---pstree(594)
        |-sendmail(134)
        |-syslogd(78)
        `-update(166)
Additionally all of the processes in the system are held in a doubly linked list whose root is the init processes task_struct data structure. This list allows the Linux kernel to look at every process in the system. It needs to do this to provide support for commands such as ps or kill.

Times and Timers
The kernel keeps track of a processes creation time as well as the CPU time that it consumes during its lifetime. Each clock tick, the kernel updates the amount of time in jiffies that the current process has spent in system and in user mode. Linux also supports process specific interval timers, processes can use system calls to set up timers to send signals to themselves when the timers expire. These timers can be single-shot or periodic timers.

File system
Processes can open and close files as they wish and the processes task_struct contains pointers to descriptors for each open file as well as pointers to two VFS inodes. Each VFS inode uniquely describes a file or directory within a file system and also provides a uniform interface to the underlying file systems. How file systems are supported under Linux is described in the section on the filesystems. The first is to the root of the process (its home directory) and the second is to its current or pwd directory. pwd is derived from the Unix TM command pwd, print working directory. These two VFS inodes have their count fields incremented to show that one or more processes are referencing them. This is why you cannot delete the directory that a process has as its pwd directory set to, or for that matter one of its sub-directories.

Virtual memory
Most processes have some virtual memory (kernel threads and daemons do not) and the Linux kernel must track how that virtual memory is mapped onto the system’s physical memory.

Processor Specific Context
A process could be thought of as the sum total of the system’s current state. Whenever a process is running it is using the processor’s registers, stacks and so on. This is the processes context and, when a process is suspended, all of that CPU specific context must be saved in the task_struct for the process. When a process is restarted by the scheduler its context is restored from here.