In the section on operating system basics, we talked about how the operating system uses capabilities of the CPU to make it appear as though you have more memory than you really do. This is the concept of virtual memory. Later, I’ll go into detail about how this is accomplished, that is, how the operating system and CPU work together to keep up this illusion. However, to make this section on the kernel complete, I should talk about this a little from a software perspective.
One basic concept in the Linux implementation of virtual memory is the concept of a page. A page is a 4Kb area of memory and is the basic unit of memory with which both the kernel and the CPU deal. Although both can access individual bytes (or even bits), the amount of memory that is managed is usually in pages.
If you are reading a book, you do not need to have all the pages spread out on a table for you to work effectively just the page you are currently using. I remember many times in college when I had the entire table top covered with open books, including my notebook. As I was studying, I would read a little from one book, take notes on what I read, and, if I needed more details on that subject, I would either go to a different page or a completely different book.
Virtual memory in Linux is very much like that. Just as I only need to have open the pages I am working with currently, a process needs to have only those pages in memory with which it is working. Like me, if the process needs a page that is not currently available (not in physical memory), it needs to go get it (usually from the hard disk).
If another student came along and wanted to use that table, there might be enough space for him or her to spread out his or her books as well. If not, I would have to close some of my books (maybe putting bookmarks at the pages I was using). If another student came along or the table was fairly small, I might have to put some of the books away. Linux does that as well. Imagine that the text books represent the unchanging text portion of the program and the notebook represents the changing data which might make things a little clearer.
It is the responsibility of both the kernel and the CPU to ensure that I don’t end up reading someone else’s textbook or writing in someone else’s notebook. That is, both the kernel and the CPU ensure that one process does not have access to the memory locations of another process (a discussion of cell replication would look silly in my calculus notebook). The CPU also helps the kernel by recognizing when the process tries to access a page that is not yet in memory. It is the kernel’s job to figure out which process it was, what page it was, and to load the appropriate page.
It is also the kernel’s responsibility to ensure that no one process hogs all available memory, just like the librarian telling me to make some space on the table. If there is only one process running (not very likely), there may be enough memory to keep the entire process loaded as it runs. More likely is the case in which dozens of processes are in memory and each gets a small part of the total memory. (Note: Depending on how much memory you have, it is still possible that the entire program is in memory.)
Processes generally adhere to the principle of spatial locality. This means that typically processes will access the same portions of their code over and over again. The kernel could establish a working set of pages for each process, the pages that have been accessed with the last n memory references. If n is small, the processes may not have enough pages in memory to do their job. Instead of letting the processes work, the kernel is busy spending all of its time reading in the needed pages. By the time the system has finished reading in the needed pages, it is some other process’s turn. Now, some other process needs more pages, so the kernel needs to read them in. This is called thrashing. Large values of n may lead to cases in which there is not enough memory for all the processes to run.
The solution is to use a portion of hard disk as a kind of temporary storage for data pages that are not currently needed. This area of the hard disk is called the swap space or swap device and is a separate area used solely for the purpose of holding data pages from programs.
The size and location of the swap device is normally set when the system is first installed. Afterward, more swap space can be added if needed. (Swap space is added with the mkswap command and the system is told to use it with the swapon command.)
Eventually, the process that was swapped out will get a turn on the CPU and will need to be swapped back in. Before it can be swapped back in, the system needs to ensure that there is at least enough memory for the task structure and a set of structures called page tables. Page tables are an integral part of the virtual memory scheme and point to the actual pages in memory. I talk more about this when I talk about the CPU in the hardware section.
Often you don’t want to swap in certain pages. For example, it doesn’t make sense to swap in pages for a process that is sleeping on some event. Because that event hasn’t occurred yet, swapping them in means that it will just need to go right back to sleep. Therefore, only processes in the TASK_RUNNING state are eligible to have pages swapped back in. That is, only the processes that are runnable get pages swapped back in.
Keep in mind that accessing the hard disk is hundreds of times slower than accessing memory. Although swapping does allow you to have more programs in memory than the physical RAM will allow, using it slows down the system. If possible, it is a good idea to keep from swapping by adding more RAM.
Until kernel 2.3.24 on the x86 platform, the Linux memory manager limited the size of each swap area to 127.5 MB. You could have created a larger swap space, but only the first 127.5 MB will be used. To solve this limitation, a system could have had up to 16 swap spaces for a total of 2GB in swap space. Now Linux supports up to 64 GB of physical memory and several TB of swap.
As there is much less physical memory than virtual memory the operating
system must be careful that it does not use the physical memory
inefficiently. One way to save physical memory is to only load virtual pages that are
currently being used by the executing program. For example, a database program may be run
to query a database. In this case not all of the database needs to be loaded into memory, just
those data records that are being examined. If the database query is a search query then it
does not make sense to load the code from the database program that deals with adding new records.
This technique of only loading virtual pages into memory as they are
accessed is known as demand paging.
When a process attempts to access a virtual address that is not currently
in memory, the processor cannot find a page table entry for the virtual
page being referenced.
For example, in Figure <A href=”#abstract-mm-model”
3.1 there is
no entry in process X’s page table for virtual page frame number 2 and
so if process X attempts to read from an address within
virtual page frame number 2 the processor cannot translate the address into a physical one.
At this point the processor notifies the operating system that a page fault has occurred.
If the faulting virtual address is invalid this means that the process has attempted to access
a virtual address that it should not have.
Maybe the application has gone wrong in some way, for example writing to random addresses in
memory. In this case the operating system will terminate it, protecting the other processes in
the system from this rogue process.
If the faulting virtual address was valid but the page that it refers to is not currently in
memory, the operating system must bring the appropriate page into memory from the image on disk.
Disk access takes a long time, relatively speaking, and so the process must
wait quite a while until the page has been fetched.
If there are other processes that could run, then the operating
system will select one of them to run.
The fetched page is written into a free physical page frame and an entry
for the virtual page frame number is added to the process’ page table.
The process is then restarted at the machine instruction where the memory fault
This time the virtual memory access is made, the processor can make the
virtual to physical address translation and so the process continues to run.
Linux uses demand paging to load executable images into a process’s virtual memory.
Whenever a command is executed, the file containing it is opened
and its contents are mapped into the process’s virtual memory.
This is done by modifying the data structures describing this
process’ memory map and is known as memory mapping.
However, only the first part of the image is actually brought into
The rest of the image is left on disk.
As the image executes, it generates page faults and Linux uses the
process’s memory map in order to determine which parts of the image to
bring into memory for execution.
If a process needs to bring a virtual page into physical memory
and there are no free physical pages available, the operating
system must make room for this page by discarding another page
from physical memory.
If the page to be discarded from physical memory came from an image or data file and has
not been written to then the page does not need to be saved.
Instead it can be discarded and brought back into memory from
the original image or data file if it is needed again.
However, if the page has been modified, the operating system must
preserve the contents of that page so that it can be accessed at a later
This type of page is known as a dirty page.
When dirty pages are removed from
memory, they are saved in a special sort of file called the swap file.
Since access to the swap file takes a long time relative to the speed of the processor and
physical memory, the operating system must juggle the need to write pages to disk with the
need to retain them in memory.
If the swap algorithm, which is used to decide which pages to discard or swap is not
efficient, then a condition known as thrashing occurs.
In the case of thrashing, pages are constantly being written to and read back
from disk. This causes the operating system to be too busy to perform enough real work.
If, for example, physical page frame number 1 in Figure <A href=”#abstract-mm-model”
being regularly accessed then it is not a good candidate for swapping to hard disk. The set of pages that a process is currently using is called the working set.
An efficient swap scheme would make sure that all processes have their working set in physical memory.
Linux uses a Least Recently Used (LRU) page aging technique to fairly choose pages which might be removed from the system. This scheme involves every page in the system having an age which changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is accessed, the older and more stale it becomes.
Old pages are good candidates for swapping.