{"id":276,"date":"2020-08-18T19:23:47","date_gmt":"2020-08-18T20:23:47","guid":{"rendered":"http:\/\/www.linux-tutorial.info\/?page_id=77"},"modified":"2020-08-22T19:25:59","modified_gmt":"2020-08-22T20:25:59","slug":"this-is-the-page-title-toplevel-111","status":"publish","type":"page","link":"http:\/\/www.linux-tutorial.info\/?page_id=276","title":{"rendered":"Cache Memory"},"content":{"rendered":"\n<title>Cache Memory<\/title>\n<p>\nBased on the principle of spatial locality, a program is more likely to spend\nits time executing code  around the same set of instructions. This is\ndemonstrated by the tests that have shown that most programs spend 80 percent of\ntheir time executing 20 percent of their code. Cache memory takes advantage of\nthat.\n<\/p>\n<p>\nCache memory, or sometimes just <glossary>cache<\/glossary>,\n is a small set of very high-speed memory. Typically, it uses <glossary>SRAM<\/glossary>,\n which can be up to ten times more expensive than <glossary>DRAM<\/glossary>,\n which usually makes it prohibitive for anything other than cache.\n<\/p>\n<p>\nWhen the IBM PC first came out, <glossary>DRAM<\/glossary>\nwas fast enough to keep up with even the fastest processor. However, as\n<glossary>CPU<\/glossary> technology increased, so did its speed. Soon, the CPU\nbegan to outrun its memory. The advances in CPU  technology could not be used\nunless the system was filled with the more expensive, faster\n<glossary>SRAM<\/glossary>.\n<\/p>\n<p>\nThe solution to this was a compromise. Using the <glossary>locality principle<\/glossary>,\nmanufacturers of fast 386 and 486 machines began to include a set of <glossary>cache<\/glossary>\nmemory consisting of <glossary>SRAM<\/glossary>\nbut still populated main memory with the slower, less expensive <glossary>DRAM<\/glossary>.\n<\/p>\n<p>\nTo better understand the advantages of this scheme, lets cover the principle\nof locality in a little  more detail. For a computer program, we deal with two\ntypes of locality: temporal (time) and spatial (space). Because programs tend to\nrun in loops (repeating the same instructions), the same set of instructions\nmust be read over and over. The longer a set of instructions is in memory\nwithout being used, the less likely it is to be used again. This is the\nprinciple of temporal locality. What <glossary>cache<\/glossary> memory does is\nenable us to keep those regularly used instructions &#8220;closer&#8221; to the\n<glossary>CPU<\/glossary>,  making access to them much faster. This is shown\ngraphically in Figure 0-10.\n<\/p>\n<p>\n<img decoding=\"async\" src=\"l1cache.png\" width=375 height=113 border=0 usemap=\"#l1cache_map\">\n<map name=\"l1cache_map\">\n<area shape=\"RECT\" coords=\"40,44,91,91\" href=\"popup#L1 Cache#In older CPUs the L1 cache was not part of the CPU. Access to the L1 cache is now much fast.\">\n<area shape=\"RECT\" coords=\"145,41,211,105\" href=\"popup#L2 Cache#The L2 cache was added on new CPUs and increased performance using the principle of spatial locality.\">\n<area shape=\"RECT\" coords=\"258,0,370,106\" href=\"popup#RAM#Although accessing RAM is thousands of times faster than accessing the hard disk, speeds are increased by adding a hardware memory cache. \">\n<area shape=\"RECT\" coords=\"98,48,254,69\" href=\"popup#Memory Cache#By writing changed data to a cache, access times are decreased when the data is read again.\">\n<area shape=\"RECT\" coords=\"98,72,256,90\" href=\"popup#Memory Cache#Reading from a cache speeds up data access.\">\n<\/map>\n<p>\n<icaption>Image &#8211; Level 1 and Level 2 Caches (<b>interactive<\/b>)<\/icaption>\n<\/p>\n<p>\nSpatial locality is the relationship between consecutively executed\ninstructions. I just said that a  program spends more of its time executing the\nsame set of instructions. Therefore, in all likelihood, the next instruction the\nprogram will execute lies in the next memory location. By filling\n<glossary>cache<\/glossary> with more than just one instruction at a time, the\nprinciple of spatial locality can be used.\n<\/p>\n<p>\nIs there really such a major advantage to <glossary>cache<\/glossary>\nmemory? Cache performance is evaluated in terms of <em>cache hits.<\/em> A hit\noccurs when the <glossary>CPU<\/glossary> requests a memory location that is\nalready in cache (that is, it does not have to go to main memory to get it).\nBecause most programs run in loops (including the OS), the principle of locality\nresults in a <glossary>hit ratio<\/glossary> of 85 to 95 percent. Not bad!\n<\/p>\n<p>\nOn most 486 machines, two levels of <glossary>cache<\/glossary>\nare used: level 1 cache and level 2 cache. Level 1 cache is internal to the\n<glossary>CPU<\/glossary>.  Although nothing (other than cost) prevents it from\nbeing any larger, Intel has limited the level 1 cache in the 486 to 8k.\n<\/p>\n<p>\nThe level 2 <glossary>cache<\/glossary> is the kind that you buy separately from\nyour machine. It is often part of the advertisement you see in the paper and is\nusually what people are talking about when they\nsay how much cache is in their systems. Level 2 cache is external to the\n<glossary>CPU<\/glossary> and can be increased at any time, whereas level 1 cache\nis an integral part of the CPU and the only way to get more is to buy a\ndifferent CPU. Typical sizes of level 2 cache range from 64K to 256K, usually in\nincrements of 64K.\n<\/p>\n<p>\nThere is one major problem with dealing with <glossary>cache<\/glossary>\nmemory: the issue of consistency. What happens when main memory is updated and\ncache is not? What happens  when cache is updated and main memory is not? This\nis where the caches <em>write policy<\/em> comes in.\n<\/p>\n<p>\nThe <glossary>write policy<\/glossary>\ndetermines if and when the contents of the <glossary>cache<\/glossary>\nare written back to memory. The write-through cache simply writes the data through the cache\ndirectly into  memory. This slows writes, but the data is consistent. Buffered write-through is a\nslight modification of this, in which data are collected and everything is written at once.\nWrite-back improves cache performance by writing to main memory only when\nnecessary. Write-dirty is when it writes to main memory only when it has been\nmodified.\n<\/p>\n<p>\nCache (or main memory, for that matter) is referred to as &#8220;dirty&#8221; when it is written to.\nUnfortunately, the system has no way of telling whether anything has changed,\njust that it is being written to. Therefore it is possible, but not likely, that\na block of <glossary>cache<\/glossary> is written back to memory even if it is\nnot actually <glossary>dirty<\/glossary>.\n<\/p>\n<p>\nAnother aspect of <glossary>cache<\/glossary>\nis its organization. Without going into detail (that would take most of a\nchapter itself), I can generalize  by saying there are four different types of\ncache organization.\n<\/p>\n<p>\nThe first kind is fully associative, which means that every entry in the\n<glossary>cache<\/glossary> has a slot in the &#8220;cache directory&#8221; to indicate where\nit came from in memory. Usually these are not individual  bytes, but chunks of\nfour bytes or more. Because each slot in the cache has a separate directory\nslot, any location in <glossary>RAM<\/glossary> can be placed anywhere in the\ncache. This is the simplest scheme but also the slowest because each cache\ndirectory entry must be searched until a match (if any) is found. Therefore,\nthis kind of cache is often limited to just 4Kb.\n<\/p><i><em>\n<p>\nThe second type of <glossary>cache<\/glossary>\norganization is <\/i>direct-mapped<\/em> or <em>one-way set associative cache<i>,\nwhich<\/i><\/em> requires  that only a single directory entry be searched. This\nspeeds up access time considerably. The location in the cache is related on the\nlocation in memory and is usually based on blocks of memory equal to the size of\nthe cache. For example, if the cache could hold 4K 32-bit (4-byte) entries, then\nthe block with which each entry is associated is also 4K x 32 bits. The first 32\nbits in each block are read into the first slot of the cache, the second 32 bits\nin each block are read into the second slot, and so on. The size of each entry,\nor line, usually ranges from 4 to 16 bytes.\n<\/p>\n<p>\nThere is a mechanism called a tag, which tells us which block this came from.\nAlso, because of the  very nature of this method, the\n<glossary>cache<\/glossary> cannot hold data from multiple blocks for the same\noffset. If, for example, slot 1 was already filled with the data from block 1\nand a program wanted to read the data at the same location from block 2, the\ndata in the cache would be overwritten. Therefore, the shortcoming in this\nscheme is that when data is read at intervals that are the size of these blocks,\nthe cache is constantly overwritten. Keep in mind that this does not occur too\noften due to the principle of spatial locality.\n<\/p>\n<p>\nThe third type <i><em>of <glossary>cache<\/glossary>\norganization <\/i><\/em>is an extension of the one-way set associative cache,\ncalled the <em>two-way set  associative<\/em>. Here, there are two entries per\nslot. Again, data can end up in only a particular slot, but there are two places\nto go within that slot. Granted, the system is slowed a little because it has to\nlook at the tags for both slots, but this scheme allows data at the same offset\nfrom multiple blocks to be in the cache at the same time. This is also extended\nto four-way set associative cache. In fact, the cache internal to 486 and\nPentium has a four-way set associate cache.\n<\/p>\n<p>\nAlthough this is interesting (at least to me), you may be asking yourself,\n&#8220;Why is this memory stuff  important to me as a system administrator?&#8221; First,\nknowing about the differences in <glossary>RAM<\/glossary> (main memory) can aide\nyou in making decisions about your upgrade. Also, as I mentioned earlier, it may\nbe necessary to set switches on the motherboard if you change memory\nconfiguration.\n<\/p>\n<p>\nKnowledge about <glossary>cache<\/glossary>\nmemory is important for the same reason because you\nmay be the one who will adjust it. On many machines, the <glossary>write policy<\/glossary>\ncan be\nadjusted through the <glossary>CMOS<\/glossary>. For example, on my machine, I\nhave a choice of write-back, write-through, and write-dirty. Depending on the\napplications you are running, you may want to change the <glossary>write policy<\/glossary>\nto improve performance.\n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cache Memory Based on the principle of spatial locality, a program is more likely to spend its time executing code around the same set of instructions. This is demonstrated by the tests that have shown that most programs spend 80 &hellip; <a href=\"http:\/\/www.linux-tutorial.info\/?page_id=276\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-276","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/276","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=276"}],"version-history":[{"count":1,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/276\/revisions"}],"predecessor-version":[{"id":513,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/276\/revisions\/513"}],"wp:attachment":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=276"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}