An Abstract Model of Virtual Memory
Figure: Abstract model of Virtual to Physical address mapping
Before considering the methods that Linux uses to support virtual memory it is useful to consider an abstract model that is not cluttered by too much detail.
As the processor executes a program it reads an instruction from memory and decodes it. In decoding the instruction, the processor may need to fetch or store the contents of a location in memory. The processor then executes the instruction and moves on to the next instruction in the program. In this way the processor is always accessing memory either to fetch instructions or to fetch and store data.
In a virtual memory system all of these addresses are virtual addresses and not physical addresses. These virtual addresses are converted into physical addresses by the processor based on information held in a set of tables maintained by the operating system.
To make this translation easier, virtual and physical memory are divided into handy sized chunks called pages. These pages are all the same size. They need not be but if they were not, the system would be very hard to administer. Linux on Alpha AXP systems uses 8 KiB pages and on Intel x86 systems it uses 4 KiB pages. Each of these pages is given a unique number: the page frame number (PFN).
In this paged model, a virtual address is composed of two parts: an offset and a virtual page frame number. If the page size is 4 KiB, bits 1-10 of the virtual address contain the offset and bits 12 and above are the virtual page frame number. The processor extracts the virtual page frame number and offset from a virtual address every time it encounters one. Then it matches the virtual page frame number to a physical page and uses the offset to specify how far to go into the page. The processor uses page tables to match the virtual page frame number to the physical page.
The figure above shows the virtual address spaces of two processes, process X and process Y, each with their own page tables. These page tables map each process’ virtual pages into physical pages in memory. This shows that process X’s virtual page frame number 0 is mapped into memory in physical page frame number 1 and that process Y’s virtual page frame number 1 is mapped into physical page frame number 4. Each entry in the page table contains the following information:
- Valid flag. This indicates if this page table entry (PTE) is valid,
- The physical page frame number that this entry describes
- Access control information. This describes how the page may be used. Can it be written to? Does it contain executable code?
The page table is accessed using the virtual page frame number as an offset. Virtual page frame 5 would be the 6th element of the table (0 is the first element).
To translate a virtual address into a physical one, the processor must first work out the virtual address’ page frame number and the offset within that virtual page. By making the page size a power of 2 this can be easily done by masking and shifting. Looking again at the figures and assuming a page size of 0x2000 bytes (which is decimal 8192) and an address of 0x2194 in process Y’s virtual address space then the processor would translate that address into offset 0x194 into virtual page frame number 1.
The processor uses the virtual page frame number as an index into the process’ page table to retrieve its page table entry. If the page table entry at that offset is valid, the processor takes the physical page frame number from this entry. If the entry is invalid, the process has accessed a non-existent area of its virtual memory. In this case, the processor cannot resolve the address and must pass control to the operating system so that it can fix things up.
Just how the processor notifies the operating system that the correct process has attempted to access a virtual address for which there is no valid translation is specific to the processor. However the processor delivers it, this is known as a page fault and the operating system is notified of the faulting virtual address and the reason for the page fault.
For a valid page table entry, the processor takes that physical page frame number and multiplies it by the page size to get the address of the base of the page in physical memory. Finally, the processor adds in the offset to the instruction or data that it needs.
Using the above example again, process Y’s virtual page frame number 1 is mapped to physical page frame number 4 which starts at 0x8000 (4 x 0x2000). Adding in the 0x194 byte offset gives us a final physical address of 0x8194.
By mapping virtual to physical addresses this way, the virtual memory can be mapped into the system’s physical pages in any order. In the figure above, process X’s virtual page frame number 0 is mapped to physical page frame number 1, whereas virtual page frame number 7 is mapped to physical page frame number 0 although it is higher in virtual memory than virtual page frame number 0. This demonstrates an interesting byproduct of virtual memory; the pages of virtual memory do not have to be present in physical memory in any particular order.
Shared Virtual Memory
Virtual memory makes it easy for several processes to share memory. All memory access are made via page tables and each process has its own separate page table. For two processes sharing a physical page of memory, its physical page frame number must appear in a page table entry in both of their page tables.
The figure above shows two processes that each share physical page frame number 4. For process X this is virtual page frame number 4 whereas for process Y this is virtual page frame number 6. This illustrates an interesting point about sharing pages: the shared physical page does not have to exist at the same place in virtual memory for any or all of the processes sharing it.
Physical and Virtual Addressing Modes
It does not make much sense for the operating system itself to run in virtual memory. This would be a nightmare situation where the operating system must maintain page tables for itself. Most multi-purpose processors support the notion of a physical address mode as well as a virtual address mode. Physical addressing mode requires no page tables and the processor does not attempt to perform any address translations in this mode. The Linux kernel is linked to run in physical address space.
The Alpha AXP processor does not have a special physical addressing mode. Instead, it divides up the memory space into several areas and designates two of them as physically mapped addresses. This kernel address space is known as KSEG address space and it encompasses all addresses upwards from 0xfffffc0000000000. In order to execute from code linked in KSEG (by definition, kernel code) or access data there, the code must be executing in kernel mode. The Linux kernel on Alpha is linked to execute from address 0xfffffc0000310000.
The page table entries also contain access control information. As the processor is already using the page table entry to map a process’ virtual address to a physical one, it can easily use the access control information to check that the process is not accessing memory in a way that it should not.
There are many reasons why you would want to restrict access to areas of memory. Some memory, such as that containing executable code, is naturally read only memory; the operating system should not allow a process to write data over its executable code. By contrast, pages containing data can be written to, but attempts to execute that memory as instructions should fail. Most processors have at least two modes of execution: kernel and user. This adds a level of security to your operating system. Because it is the core of the operating system and therefore can do most anything, kernel code is only run when the CPU is in kernel mode. You would not want kernel code executed by a user or kernel data structures to be accessible except when the processor is running in kernel mode.
Figure: Alpha AXP Page Table Entry
The access control information is held in the PTE and is processor specific; the figure above shows the PTE for Alpha AXP. The bit fields have the following meanings:
- Valid, if set this PTE is valid,
- “Fault on Execute”, Whenever an attempt to execute instructions in this page occurs, the processor reports a page fault and passes control to the operating system,
- “Fault on Write”, as above but page fault on an attempt to write to this page,
- “Fault on Read”, as above but page fault on an attempt to read from this page,
- Address Space Match. This is used when the operating system wishes to clear only some of the entries from the Translation Buffer,
- Code running in kernel mode can read this page,
- Code running in user mode can read this page,
- Granularity hint used when mapping an entire block with a single Translation Buffer entry rather than many,
- Code running in kernel mode can write to this page,
- Code running in user mode can write to this page,
- page frame number
- For PTEs with the V bit set, this field contains the physical Page Frame Number for this PTE. For invalid PTEs, if this field is not zero, it contains information about where the page is in the swap file.
The following two bits are defined and used by Linux:
- if set, the page needs to be written out to the swap file,
- Used by Linux to mark a page as having been accessed.