Normally, processes are asleep, waiting on some event. When that event happens, these processes are called into action. Remember, it is the responsibility of the sched process to free memory when a process runs short of it. So, it is not until memory is needed that sched starts up.
How does sched know that memory is needed? When a process makes reference to a place in its virtual memory space that does not yet exist in physical memory, a page fault occurs. Faults belong to a group of system events called exceptions. An exception is simply something that occurs outside of what is normally expected. Faults (exceptions) can occur either before or during the execution of an instruction.
For example, if an instruction that is not yet in memory needs to be read, the exception (page fault) occurs before the instruction starts being executed. On the other hand, if the instruction is supposed to read data from a virtual memory location that isn’t in physical memory, the exception occurs during the execution of the instruction. In cases like these, once the missing memory location is loaded into physical memory, the CPU can start the instruction.
Traps are exceptions that occur after an instruction has been executed. For example, attempting to divide by zero generates an exception. However, in this case it doesn’t make sense to restart the instruction because every time we to try to run that instruction, it still comes up with a Divide-by-Zero exception. That is, all memory references are read before we start to execute the command.
It is also possible for processes to generate exceptions intentionally. These programmed exceptions are called software interrupts.
When any one of these exceptions occurs, the system must react to the exception. To react, the system will usually switch to another process to deal with the exception, which means a context switch. In our discussion of process scheduling, I mentioned that at every clock tick the priority of every process is recalculated. To make those calculations, something other than those processes have to run.
In Linux, the system timer (or clock) is programmed to generate a hardware interrupt 100 times a second (as defined by the HZ system parameter). The interrupt is accomplished by sending a signal to a special chip on the motherboard called an interrupt controller. (We go into more detail about these in the section on hardware.) The interrupt controller then sends an interrupt to the CPU. When the CPU receives this signal, it knows that the clock tick has occurred and it jumps to a special part of the kernel that handles the clock interrupt. Scheduling priorities are also recalculated within this same section of code.
Because the system might be doing something more important when the clock generates an interrupt, you can turn interrupts off using “masking”. In other words, there is a way to mask out interrupts. Interrupts that can be masked out are called maskable interrupts. An example of something more important than the clock would be accepting input from the keyboard. This is why clock ticks are lost on systems with a lot of users inputting a lot of data. As a result, the system clock appears to slow down over time.
Sometimes events occur on the system that you want to know about no matter what. Imagine what would happen if memory was bad. If the system was in the middle of writing to the hard disk when it encountered the bad memory, the results could be disastrous. If the system recognizes the bad memory, the hardware generates an interrupt to alert the CPU. If the CPU is told to ignore all hardware interrupts, it would ignore this one. Instead, the hardware has the ability to generate an interrupt that cannot be ignored, or “masked out”, called a non-maskable interrupt. Non-maskable interrupts are generically referred to as NMIs.
When an interrupt or an exception occurs, it must be dealt with to ensure the integrity of the system. How the system reacts depends on whether it was an exception or interrupt. In addition, what happens when the hard disk generates an interrupt is going to be different than when the clock generates one.
Within the kernel is the Interrupt Descriptor Table (IDT), which is a list of descriptors (pointers) that point to the functions that handle the particular interrupt or exception. These functions are called the interrupt or exception handlers. When an interrupt or exception occurs, it has a particular value, called an identifier or vector. Table 0-2 contains a list of the defined interrupt vectors.
Table Interrupt Vectors
|7||Coprocessor not available|
|11||Segment not present|
|13||General protection fault|
|17||alignment error (80486)|
|32-255||External (HW) interrupts|
These numbers are actually indices into the IDT. When an interrupt, exception, or trap occurs, the system knows which number corresponds to that event. It then uses that number as an index into the IDT, which in turn points to the appropriate area of memory for handling the event.
It is possible for devices to share interrupts; that is, multiple devices on the system can be (and ofter are) configured to use the same interrupt. In fact, certain kinds of computers are designed to allow devices to share interrupts (I’ll talk about them in the hardware section). If the interrupt number is an offset into a table of pointers to interrupt routines, how does the kernel know which one to call?
As it turns out, there are two IDTs: one for shared interrupts and one for non-shared
interrupts. During a kernel rebuild (
When an exception happens in user mode, the process passes through a trap gate. At this point, the CPU no longer uses the process’ user stack, but rather the system stack within that process’ task structure. (each task structure has a portion set aside for the system stack.) At this point, that process is operating in system (kernel) mode; that is, at the highest privilege level, 0.
The kernel treats interrupts very similarly to the way it treats exceptions: all the general purpose registers are pushed onto the system stack and a common interrupt handler is called. The current interrupt priority is saved and the new priority is loaded. This prevents interrupts at lower priority levels from interrupting the kernel while it handles this interrupt. Then the real interrupt handler is called.
Because an exception is not fatal, the process will return from whence it came. It is possible that a context switch occurs immediately on return from kernel mode. This might be the result of an exception with a lower priority. Because it could not interrupt the process in kernel mode, it had to wait until it returned to user mode. Because the exception has a higher priority than the process when it is in user mode, a context switch occurs immediately after the process returns to user mode.
It is abnormal for another exception to occur while the process is in kernel mode. Even a page fault can be considered a software event. Because the entire kernel is in memory all the time, a page fault should not happen. When a page fault does happen when in kernel mode, the kernel panics. Special routines have been built into the kernel to deal with the panic to help the system shut down as gracefully as possible. Should something else happen to cause another exception while the system is trying to panic, a double panic occurs.
This may sound confusing because I just said that a context switch could occur as the result of another exception. What this means is that the exception occurred in user mode, so there must be a jump to kernel mode. This does not mean that the process continues in kernel mode until it is finished. It may (depending on what it is doing) be context-switched out. If another process has run before the first one gets its turn on the CPU again, that process may generate the exception.
Unlike exceptions, another interrupt could possibly occur while the kernel is handling the first one (and therefore is in kernel mode). If the second interrupt has a higher priority than the first, a context switch will occur and the new interrupt will be handled. If the second interrupt has the same or lower priority, then the kernel will “put it on hold.” These are not ignored, but rather saved (queued) to be dealt with later.