Cache Memory | The Linux Tutorial

Cache Memory

Based on the principle of spatial locality, a program is more likely to spend its time executing code around the same set of instructions. This is demonstrated by the tests that have shown that most programs spend 80 percent of their time executing 20 percent of their code. Cache memory takes advantage of that.

Cache memory, or sometimes just cache, is a small set of very high-speed memory. Typically, it uses SRAM, which can be up to ten times more expensive than DRAM, which usually makes it prohibitive for anything other than cache.

When the IBM PC first came out, DRAM was fast enough to keep up with even the fastest processor. However, as CPU technology increased, so did its speed. Soon, the CPU began to outrun its memory. The advances in CPU technology could not be used unless the system was filled with the more expensive, faster SRAM.

The solution to this was a compromise. Using the locality principle, manufacturers of fast 386 and 486 machines began to include a set of cache memory consisting of SRAM but still populated main memory with the slower, less expensive DRAM.

To better understand the advantages of this scheme, lets cover the principle of locality in a little more detail. For a computer program, we deal with two types of locality: temporal (time) and spatial (space). Because programs tend to run in loops (repeating the same instructions), the same set of instructions must be read over and over. The longer a set of instructions is in memory without being used, the less likely it is to be used again. This is the principle of temporal locality. What cache memory does is enable us to keep those regularly used instructions “closer” to the CPU, making access to them much faster. This is shown graphically in Figure 0-10.

Image – Level 1 and Level 2 Caches (interactive)

Spatial locality is the relationship between consecutively executed instructions. I just said that a program spends more of its time executing the same set of instructions. Therefore, in all likelihood, the next instruction the program will execute lies in the next memory location. By filling cache with more than just one instruction at a time, the principle of spatial locality can be used.

Is there really such a major advantage to cache memory? Cache performance is evaluated in terms of cache hits. A hit occurs when the CPU requests a memory location that is already in cache (that is, it does not have to go to main memory to get it). Because most programs run in loops (including the OS), the principle of locality results in a hit ratio of 85 to 95 percent. Not bad!

On most 486 machines, two levels of cache are used: level 1 cache and level 2 cache. Level 1 cache is internal to the CPU. Although nothing (other than cost) prevents it from being any larger, Intel has limited the level 1 cache in the 486 to 8k.

The level 2 cache is the kind that you buy separately from your machine. It is often part of the advertisement you see in the paper and is usually what people are talking about when they say how much cache is in their systems. Level 2 cache is external to the CPU and can be increased at any time, whereas level 1 cache is an integral part of the CPU and the only way to get more is to buy a different CPU. Typical sizes of level 2 cache range from 64K to 256K, usually in increments of 64K.

There is one major problem with dealing with cache memory: the issue of consistency. What happens when main memory is updated and cache is not? What happens when cache is updated and main memory is not? This is where the caches write policy comes in.

The write policy determines if and when the contents of the cache are written back to memory. The write-through cache simply writes the data through the cache directly into memory. This slows writes, but the data is consistent. Buffered write-through is a slight modification of this, in which data are collected and everything is written at once. Write-back improves cache performance by writing to main memory only when necessary. Write-dirty is when it writes to main memory only when it has been modified.

Cache (or main memory, for that matter) is referred to as “dirty” when it is written to. Unfortunately, the system has no way of telling whether anything has changed, just that it is being written to. Therefore it is possible, but not likely, that a block of cache is written back to memory even if it is not actually dirty.

Another aspect of cache is its organization. Without going into detail (that would take most of a chapter itself), I can generalize by saying there are four different types of cache organization.

The first kind is fully associative, which means that every entry in the cache has a slot in the “cache directory” to indicate where it came from in memory. Usually these are not individual bytes, but chunks of four bytes or more. Because each slot in the cache has a separate directory slot, any location in RAM can be placed anywhere in the cache. This is the simplest scheme but also the slowest because each cache directory entry must be searched until a match (if any) is found. Therefore, this kind of cache is often limited to just 4Kb.

The second type of cache organization is direct-mapped or one-way set associative cache, which requires that only a single directory entry be searched. This speeds up access time considerably. The location in the cache is related on the location in memory and is usually based on blocks of memory equal to the size of the cache. For example, if the cache could hold 4K 32-bit (4-byte) entries, then the block with which each entry is associated is also 4K x 32 bits. The first 32 bits in each block are read into the first slot of the cache, the second 32 bits in each block are read into the second slot, and so on. The size of each entry, or line, usually ranges from 4 to 16 bytes.

There is a mechanism called a tag, which tells us which block this came from. Also, because of the very nature of this method, the cache cannot hold data from multiple blocks for the same offset. If, for example, slot 1 was already filled with the data from block 1 and a program wanted to read the data at the same location from block 2, the data in the cache would be overwritten. Therefore, the shortcoming in this scheme is that when data is read at intervals that are the size of these blocks, the cache is constantly overwritten. Keep in mind that this does not occur too often due to the principle of spatial locality.

The third type of cache organization is an extension of the one-way set associative cache, called the two-way set associative. Here, there are two entries per slot. Again, data can end up in only a particular slot, but there are two places to go within that slot. Granted, the system is slowed a little because it has to look at the tags for both slots, but this scheme allows data at the same offset from multiple blocks to be in the cache at the same time. This is also extended to four-way set associative cache. In fact, the cache internal to 486 and Pentium has a four-way set associate cache.

Although this is interesting (at least to me), you may be asking yourself, “Why is this memory stuff important to me as a system administrator?” First, knowing about the differences in RAM (main memory) can aide you in making decisions about your upgrade. Also, as I mentioned earlier, it may be necessary to set switches on the motherboard if you change memory configuration.

Knowledge about cache memory is important for the same reason because you may be the one who will adjust it. On many machines, the write policy can be adjusted through the CMOS. For example, on my machine, I have a choice of write-back, write-through, and write-dirty. Depending on the applications you are running, you may want to change the write policy to improve performance.