{"id":278,"date":"2020-08-18T19:23:47","date_gmt":"2020-08-18T20:23:47","guid":{"rendered":"http:\/\/www.linux-tutorial.info\/?page_id=77"},"modified":"2020-08-22T19:26:01","modified_gmt":"2020-08-22T20:26:01","slug":"this-is-the-page-title-toplevel-113","status":"publish","type":"page","link":"http:\/\/www.linux-tutorial.info\/?page_id=278","title":{"rendered":"Intel Processors"},"content":{"rendered":"\n<title>Intel Processors<\/title>\n<p>\nAlthough it is an interesting subject, the ancient history of\nmicroprocessors is not really important to the issues at hand. It might be nice\nto learn how the young PC grew from a small, budding 4-bit system to the\ngigantic, strapping 64-bit Pentium. However, there are many books that have\ncovered this subject and unfortunately, I don&#8217;t have the space. Besides, the\nIntel chips on which Linux runs are only the 80386 (or 100-percent compatible\nclones) and higher processors.\n<\/p>\n<p>\nSo, instead of setting the way-back machine to Charles Babbage and his\nAnalytic Engine, we leap ahead to 1985 and the introduction of the Intel 80386.\nEven compared to its immediate predecessor, the 80286, the 80386 (386 for short)\nwas a powerhouse. Not only could it handle twice the amount of data at once (now\n32 bits), but its speed rapidly increased far beyond that of the 286.\n<\/p>\n<p>\nNew advances were added to increase the 386s power. Internal registers were\nadded and their size was  increased. Built into the 386 was the concept of\n<glossary>virtual memory<\/glossary>,  which was a way to make it appear as\nthough there was much more memory on system than there actually was. This\nsubstantially increased the system efficiency. Another major advance was the\ninclusion of a 16-byte, pre-fetch <glossary>cache<\/glossary>.  With this, the\n<glossary>CPU<\/glossary> could load instructions before it actually processed\nthem, thereby speeding things up even more. Then the most obvious speed increase\ncame when the speed of the processor was increased from 8Mhz to 16Mhz.\n<\/p>\n<p>\nAlthough the 386 had major advantages over its predecessors, at first its\ncost  seemed relatively prohibitive.  To allow users access to the multitasking\ncapability and still make the chip fit within their customers budgets, Intel\nmade an interesting compromise: By making a new chip in which the interface to\nthe <glossary>bus<\/glossary> was 16-bits instead of 32-bits, Intel made their\nchip a fair bit cheaper.\n<\/p>\n<p>\nInternally, this new chip, designated the 80386SX, is identical to the\nstandard 386. All the registers are  there and it is fully 32 bits wide.\nHowever, data and instructions are accessed 16 bits at a time, therefore\nrequiring two <glossary>bus<\/glossary> accesses to fill the registers. Despite\nthis shortcoming, the 80386SX is still faster than the 286.\n<\/p>\n<p>\nPerhaps the most significant advance of the 386 for Linux as well as other\nPC-based <glossary>UNIX<\/glossary> systems was its <glossary>paging<\/glossary>\nabilities. I talked a little about paging in the section on <glossary>operating system<\/glossary>\nbasics, so you already have a general idea of what paging is\nabout. I will also go into more detail about  paging in the section on the\n<glossary>kernel<\/glossary>.  However, I will talk about it a little here so you\ncan fully understand the power that the 386 has given us and see how the\n<glossary>CPU<\/glossary> helps the OS.\n<\/p>\n<p>\nThere are UNIX-like products that run on a 80286, such as SCO XENIX. In\nfact, there was even a version of  SCO XENIX that ran on the 8086. Because Linux\nwas first released for the 386, I won&#8217;t go into anymore detail about the 286 or\nthe differences between the 286 and 386. Instead, I will just describe the\n<glossary>CPU<\/glossary> Linux used as sort of an abstract entity. In addition,\nbecause most of what I will be talking about is valid for the 486 and Pentium as\nwell as the 386, I will simply call it &#8220;the CPU&#8221; instead of 386, 486, or\nPentium.\n<\/p>\n<p>(Note: Linux will also run on non-Intel CPUs, such as those from AMD or\nCyrix. However, the issues I am  going to talk about are all common to\nIntel-based or Intel-derived CPUs.)<\/p>\n<p>\nI need to take a side-step here for a minute. On PC-Buses, multiple things\nare happening at once. The <glossary>CPU<\/glossary> is busily processing while\nmuch of the hardware is being access via <glossary>DMA<\/glossary>. Although\nthese multiple tasks are occurring simultaneously on the system, this is not\nwhat is referred to as multitasking.\n<\/p>\n<p>\nWhen I talk about multitasking, I am referring to multiple processes being in\nmemory at the same time.  Because of the time the computer takes to switch\nbetween these processes, or tasks, is much shorter than the human brain can\nrecognize, it appears as though the processes are running simultaneously. In\nreality, each process gets to use the <glossary>CPU<\/glossary> and other system\nresources for a brief time and then its another process&#8217;s turn.\n<\/p>\n<p>\nAs it runs, the process could use any part of the system memory it needs. The\nproblem with this is that a  portion of <glossary>RAM<\/glossary> that one\nprocess wants may already contain code from another process. Rather than\nallowing each process to access any part of memory it wants, protections keep\none program from overwriting another one. This protection is built in as part of\nthe <glossary>CPU<\/glossary> and is called, quite logically, &#8220;protected mode.&#8221;\nWithout it, Linux could not function.\n<\/p>\n<p>\nNote, however, that just because the <glossary>CPU<\/glossary>\nis in <glossary>protected mode<\/glossary>\ndoes not necessarily mean that the protections are being utilized. It simply\nmeans that the <glossary>operating system<\/glossary> can take advantage of the\nbuilt-in abilities if it wants.\n<\/p>\n<p>\nAlthough this capability is built into the <glossary>CPU<\/glossary>,\n it is not the default mode. Instead, the CPU starts in what I like to call\n&#8220;DOS-compatibility mode.&#8221;  However, the correct term is &#8220;real mode.&#8221; Real mode\nis a real danger to an <glossary>operating system<\/glossary> like\n<glossary>UNIX<\/glossary>.  In this mode, a there are no protections (which\nmakes sense because protections exist only in protected mode). A process running\nin real mode has complete control over the entire system and can do anything it\nwants. Therefore, trying to run a multiuser system on a real-mode system would\nbe a nightmare. All the protections would have to be built into the process\nbecause the operating system wouldn&#8217;t be able to prevent a process from doing\nwhat it wanted.\n<\/p>\n<p>\nA third mode, called &#8220;virtual mode,&#8221; is also built in. In virtual mode, the\n<glossary>CPU<\/glossary> behaves to a limited degree as though it is in real\nmode. However, when a process attempts to directly  access registers or\nhardware, the instruction is caught, or trapped, and the <glossary>operating system<\/glossary>\nis allowed to take over.\n<\/p>\n<p>\nLets get back to <glossary>protected mode<\/glossary>\nbecause this is what makes multitasking possible. When in protected mode, the\n<glossary>CPU<\/glossary>\ncan use <glossary>virtual memory<\/glossary>.\n As I mentioned, this is a way to trick the system into thinking that there is\nmore memory than there  really is. There are two ways of doing this. The first\nis called <glossary>swapping<\/glossary>,  in which the entire process is loaded\ninto memory. It is allowed to run its course for a certain amount of time. When\nits turn is over, another process is allowed to run. What happens when there is\nnot enough room for both process to be in memory at the same time? The only\nsolution is that the first process is copied out to a special part of the hard\ndisk called the swap space, or <glossary>swap device<\/glossary>.  Then, the next\nprocess is loaded into memory and allowed its turn. The second is called\n<glossary>paging<\/glossary> and we will get to it in a minute.\n<\/p>\n<p>\nBecause it takes such a large portion of the system resources to swap\nprocesses in and out of memory, <glossary>virtual memory<\/glossary> can be\nvery inefficient, especially when you have a lot of processes running. So lets\ntake this a step further. What happens if there are too many process and the\nsystem spends all of its time swapping? Not good.\n<\/p>\n<p>\nTo avoid this problem, a mechanism was devised whereby only those parts of\nthe process that are needed  are in memory. As it goes about its business, a\nprogram may only need to access a small portion of its code. As I mentioned\nearlier, empirical tests show that a program spends 80 percent of its time\nexecuting 20 percent of its code. So why bother bringing in those parts that\naren&#8217;t being used? Why not wait and see whether they are used?\n<\/p>\n<p>\nTo make things more efficient, only those parts of the program that are\nneeded (or expected to be needed)  are brought into memory. Rather than\naccessing memory in random units, the memory is divided into 4K chunks, called\npages. Although there is nothing magic about 4K per se, this value is easily\nmanipulated. In the <glossary>CPU<\/glossary>,  data is referenced in 32-bit\n(4-byte) chunks, and 1K (1,024) of each chunk is a page (4,096). Later you will\nsee how this helps things work out.\n<\/p>\n<p>\nAs I mentioned, only that part of the process currently being used needs to\nbe in memory. When the  process wants to read something that is not currently\nin <glossary>RAM<\/glossary>,  it needs to go out to the hard disk to pull in the\nother parts of the process; that is, it goes out and reads in new pages. This\nprocess is called <glossary>paging<\/glossary>. When the process attempts to read\nfrom a part of the process that is not in <glossary>physical memory<\/glossary>,\na &#8220;page <glossary>fault<\/glossary>&#8221;  occurs.\n<\/p>\n<p>\nOne thing you must bear in mind is that a process can jump around a lot.\nFunctions are called, sending the process off somewhere completely different.\nIt is possible, likely, for that matter, that the page containing the memory\nlocation to which the process needs to jump is not currently in memory. Because\nit is trying to read a part of the process not in <glossary>physical memory<\/glossary>,\nthis, too, is called a page <glossary>fault<\/glossary>. As\nmemory fills up, pages that haven&#8217;t been used in some time are replaced by new\nones. (I&#8217;ll talk much more about this whole business later.)\n<\/p>\n<p>\nAssume that a process has just made a call to a function somewhere else in\nthe code and the page it needed is brought into memory. Now there are two pages\nof the process from completely different parts of the code. Should the process\ntake another jump or return from the function, it needs to know whether it is\ngoing into memory. The <glossary>operating system<\/glossary> could keep track of\nthis, but it doesn&#8217;t need to the <glossary>CPU<\/glossary> will keep track for\nit.\n<\/p>\n<p>\nStop here for a minute! This is not entirely true. The OS must first set up\nthe structures that the <glossary>CPU<\/glossary> uses. However, the CPU uses\nthese structures to determine whether a section of a program is in memory.\nAlthough not part of the CPU, but rather <glossary>RAM<\/glossary>,  the CPU\nadministers the RAM utilization through page tables. As their names imply, page\ntables are simply tables of pages. In other words, they are memory locations in\nwhich other memory locations are stored.\n<\/p>\n<p>\nConfused? I was at first, so lets look at this concept another way. Each\nrunning process has a certain part of its code currently in memory. The system\nuses these page tables to keep track of what is currently memory and where it is\nlocated. To limit the amount the <glossary>CPU<\/glossary> has to work, each of\nthese page tables is only 4K, or one page, in size. Because each page contains a\nset of 32-bit addresses, a <glossary>page table<\/glossary> can contain only\n1,024 entries.\n<\/p>\n<p>\nAlthough this would imply that a process can only have 4K x 1,024, or 4Mb,\nloaded at a time, there is more to it. Page tables are grouped into page\ndirectories. Like the <glossary>page table<\/glossary>,  the entries\nin a <glossary>page directory<\/glossary> point to memory locations.\nHowever, rather than pointing to a part of the process, page directories point\nto page tables. Again, to reduce the CPUs work, a page directory is only one\npage. Because each entry in the page directory points to a page, this means that\na process can only have 1,024 page tables.\n<\/p>\n<p>\nIs this enough? Lets see. A page is 4K or 4,096 bytes, which is\n2<sup>12<\/sup>. Each <glossary>page table<\/glossary> can refer to 1,024 pages,\nwhich is 2<sup>10<\/sup>. Each <glossary>page directory<\/glossary> can refer to\n1,024 page tables, which is also 2<sup>10<\/sup>. Multiplying this out, we\nhave\n<\/p>\n<p>(page size) x (pages in page table) x (page tables in page directory)<\/p>\n<p>\nor\n<\/p>\n<p>(2<sup>12<\/sup>) x (2<sup>10<\/sup>) x (2<sup>10<\/sup>) = 2 <sup>32\n<\/p><\/sup>\n<p>\nBecause the <glossary>CPU<\/glossary>\nis only capable of accessing 2<sup>32<\/sup>bytes, this scheme allows\naccess to every possible memory <glossary>address<\/glossary>\nthat the system can generate.\n<\/p>\n<p>\nAre you still with me?\n<\/p>\n<p>\nInside of the <glossary>CPU<\/glossary>\nis a register called the Control Register 0, or CR0 for short. In this register\nis a single bit that turns  on this <glossary>paging<\/glossary> mechanism. If\nthis paging mechanism is turned on, any memory reference that the CPU receives\nis interpreted as a combination of page directories, page tables, and offsets,\nrather than an absolute, linear <glossary>address<\/glossary>.\n<\/p>\n<p>\nBuilt into the <glossary>CPU<\/glossary>\nis a special unit that is responsible for making the translation from the virtual\n<glossary>address<\/glossary> of the process to physical pages in memory. This special\nunit is called\n(what else?) the <glossary>paging<\/glossary> unit. To understand more about the\nwork  the paging unit saves the <glossary>operating system<\/glossary> or other\nparts of the CPU, lets see how the address is translated.\n<\/p>\n<p>Translation of Virtual-to-Physical Address<\/p>\n<p>\nWhen <glossary>paging<\/glossary>\nis turned on, the paging unit receives a 32-bit value that\nrepresents a <glossary>virtual memory<\/glossary>\nlocation within a process. The <glossary>paging<\/glossary>\nunit takes theses values and translates them, as shown in Figure 0-11. At the top of the\nfigure, we see that the virtual <glossary>address<\/glossary>\nis handed to the <glossary>paging<\/glossary>\nunit, which converts it to a linear <glossary>address<\/glossary>.\n This is not the physical address in memory. As\nyou see, the 32-bit linear <glossary>address<\/glossary>\nis broken down into three components. The\nfirst 10 bits (22-31) are offset into the <glossary>page directory<\/glossary>.\n The location in memory\nof the <glossary>page directory<\/glossary>\nis determined by the Page Directory Base Register (PDBR).\n<\/p>\n<img decoding=\"async\" src=\"pageunit.png\" width=476 height=446 border=0 usemap=#pageunit_map>\n<map name=\"pageunit_map\">\n<!-- #$-:Image Map file created by GIMP Imagemap Plugin -->\n<!-- #$-:GIMP Imagemap Plugin by Maurits Rijk -->\n<!-- #$-:Please do not edit lines starting with \"#$\" -->\n<!-- #$VERSION:1.3 -->\n<!-- #$AUTHOR:James Mohr -->\n<area shape=\"RECT\" coords=\"95,5,249,147\" href=\"popup#Paging Unit#A virutal address is sent into the paging unit, which translates it into a physical address.\">\n<area shape=\"RECT\" coords=\"5,186,357,232\" href=\"popup#Paging Unit#A linear address is made up of the appropriate page directory, page table entry and the offset.\">\n<area shape=\"RECT\" coords=\"32,294,122,407\" href=\"popup#Paging Unit#The page directory contrains pointers to the page tables.\">\n<area shape=\"RECT\" coords=\"168,294,260,407\" href=\"popup#Paging Unit#A page table contains pointers to the actual data pages.\">\n<area shape=\"RECT\" coords=\"346,252,473,407\" href=\"popup#Paging Unit#Once the physical page is known, the offset is used to access the desired data. \">\n<area shape=\"RECT\" coords=\"1,416,64,444\" href=\"popup#Paging Unit#The Page Directory Base Register contains the memory location of the page directory for each process.\">\n<area shape=\"RECT\" coords=\"2,237,28,381\" href=\"popup#Paging Unit#The directory portion of the linear address is an offset into the page directory.\">\n<area shape=\"RECT\" coords=\"140,233,163,373\" href=\"popup#Paging Unit#The page portion of the linear address is an offset into the page table.\">\n<area shape=\"RECT\" coords=\"275,231,343,322\" href=\"popup#Paging Unit#The offset portion of the linear address points to the data within the appropriate page.\">\n<area shape=\"RECT\" coords=\"267,335,344,406\" href=\"popup#Paging Unit#The page table entry points to the location of the appropriate page in memory.\">\n<area shape=\"RECT\" coords=\"125,378,166,406\" href=\"popup#Paging Unit#The page directory entry points to the location of the appropriate page table.\">\n<area shape=\"RECT\" coords=\"124,352,139,382\" href=\"popup#Paging Unit#The page directory entry points to the location of the appropriate page table.\">\n<area shape=\"RECT\" coords=\"1,383,28,420\" href=\"popup#Paging Unit#The Page Directory Base Register contains the memory location of the page directory for each process.\">\n<\/map>\n<icaption>Image &#8211; Translation of virtual addresses into physical addresses by the paging unit.  (<b>interactive<\/b>)<\/icaption>\n<p>\nThe <glossary>page directory<\/glossary>\nentry contains 4 bits that point to a specific <glossary>page table<\/glossary>.\nThe entry in the page table, as you see, is determined by bits 1221. Here\nagain, we have 10 bits, which means each entry is 32 bits. These 32-bit entries\npoint to a specific page in <em>physical<\/em> memory. Which byte is referenced\nin <glossary>physical memory<\/glossary> is determined by the offset portion of\nthe linear <glossary>address<\/glossary>,  which are bits 011. These 12 bits\nrepresent the 4,096 (4K) bytes in each physical page.\n<\/p>\n<p>\nKeep in mind a couple of things. First, page tables and page directories are\nnot part of the <glossary>CPU<\/glossary>.  They can&#8217;t be. If a\n<glossary>page directory<\/glossary> were full, it would contain 1,024 references\nto 4K chunks\nof memory. For the page tables alone, you would need 4Mb! Because this would\ncreate a CPU hundreds of times larger than it is now, page tables and\ndirectories are stored in <glossary>RAM<\/glossary>.\n<\/p>\n<p>\nNext, page tables and page directories are abstract concepts that the\n<glossary>CPU<\/glossary> knows how to utilize. They occupy physical\n<glossary>RAM<\/glossary>,  and operating systems such as Linux know how to\nswitch this capability on within the CPU. All the CPU does is the &#8220;translation&#8221;\nwork. When it starts, Linux turns this capability on and sets up all the\nstructures. These structures are then handed off to the CPU, where the\n<glossary>paging<\/glossary> unit does the work.\n<\/p>\n<p>\nAs I said, a process with all of its <glossary>page directory<\/glossary>\nentries full would require 4Mb just for the page tables. This implies that the\nentire process is somewhere in memory. Because each of the <glossary>page table<\/glossary>\nentries points to physical pages in <glossary>RAM<\/glossary>,\nyou would need 16Gb of RAM. Not that I would mind having that much RAM, though\nit is a bit costly and even if you had 16Mb SIMMs, you would need 1000 of\nthem.\n<\/p>\n<p>\nLike pages of the process, it&#8217;s possible that a linear <glossary>address<\/glossary>\npassed to the <glossary>paging<\/glossary>\nunit translates to a <glossary>page table<\/glossary>\nor even a <glossary>page directory<\/glossary>\nthat was not in memory. Because the system is trying to access a page (which\ncontains a page table and not part of the process) that is not in memory, a page\n<glossary>fault<\/glossary> occurs and the system must go get that page.\n<\/p>\n<p>\nBecause page tables and the <glossary>page directory<\/glossary>\nare not really part of the process but are important only to the\n<glossary>operating system<\/glossary>,\na page <glossary>fault<\/glossary> causes these structures to be <i>created<\/i>\nrather than read in from the hard disk or elsewhere. In fact, as the process starts up,\nall is without form and is void: no pages, no page tables, and no page directory.\n<\/p>\n<p>\nThe system accesses a memory location as it starts the process. The system\ntranslates the <glossary>address<\/glossary>, as I described above, and tries to\nread the <glossary>page directory<\/glossary>. It&#8217;s not there. A page\n<glossary>fault<\/glossary> occurs and the page directory must be created. Now\nthat the directory is there, the system finds the entry that points to the\n<glossary>page table<\/glossary>.  Because no page tables exist, the slot is\nempty and another <glossary>page fault<\/glossary> occurs. So, the system needs\nto create a page table. The entry in the page table for the physical page is\nfound to be empty, and so yet another page fault occurs. Finally, the system can\nread in the page that was referenced in the first place.\n<\/p>\n<p>\nThis whole process sounds a bit cumbersome, but bear in mind that this amount\nof page faulting only occurs as the process is starting. Once the table is\ncreated for a given process, it won&#8217;t page <glossary>fault<\/glossary> again on\nthat table. Based on the principle of locality, the page tables will hold enough\nentries for a while, unless, of course, the process bounces around a lot.\n<\/p>\n<p>\nThe potential for bouncing around brings up an interesting aspect of page\ntables.  Because page tables translate to physical <glossary>RAM<\/glossary> in\nthe same way all the time, virtual addresses in the same area of the process end\nup in the same page tables. Therefore, page tables fill up because the process\nis more likely to execute code in the same part of a process rather than\nelsewhere (this is spatial locality).\n<\/p>\n<p>\nThere is quite a lot there, yes? Well, don&#8217;t get up yet because were not\nfinished. There are a few more issues that I haven&#8217;t addressed.\n<\/p>\n<p>\nFirst, I have often referred to page tables and <i>the<\/i>\n<glossary>page directory<\/glossary>.  Each process has a single page directory (it doesn&#8217;t need\nany more). Although the <glossary>CPU<\/glossary> supports multiple page\ndirectories, there is only one directory for the <em>entire<\/em> system. When a\nprocess needs to be switched out, the entries in the page directory for the old\nprocess are overwritten by those for the new process. The location of the page\ndirectory in memory is maintained in the Control Register 3 (CR3) in the\nCPU.\n<\/p>\n<p>\nThere is something here that bothered me in the beginning and may still\nbother you. As I have described, each time a memory reference is made, the\n<glossary>CPU<\/glossary> has to look at the\n<glossary>page directory<\/glossary>,\nthen a <glossary>page table<\/glossary>,  then calculate the physical\n<glossary>address<\/glossary>.  This means that for <i>every<\/i> memory\nreference, the CPU has to make two more references just to find out where the\nnext instruction or data is coming from. I though that was pretty stupid.\n<\/p>\n<p>\nWell, so did the designers of the <glossary>CPU<\/glossary>.\n They have included a functional unit called the Translation Lookaside Buffer,\nor <glossary>TLB<\/glossary>.  The TLB contains 32 entries and, as the internal\nand external caches point to sets of instructions, points to pages. If a page\nthat is being searched is in the TLB, a TLB hit occurs (just like a\n<glossary>cache<\/glossary> hit). As a result of the principle of spatial\nlocality, there is a 98-percent hit rate using the TLB.\n<\/p>\n<p>\nWhen you think about it, this makes a lot of sense. The <glossary>CPU<\/glossary>\ndoes not just execute one instruction for a program then switch to something\nelse, it executes hundreds or even thousands of instructions before\nanother program gets its turn. If each page contains 1,024 instructions and the CPU\nexecutes 1000 before it&#8217;s another programs turn, all 1000 will most likely be in\nthe same page. Therefore, they are all <glossary>TLB<\/glossary> hits.\n<\/p>\n<p>\nNow, lets take a closer look at the <glossary>page table<\/glossary>\nentries themselves. Each is a 32-bit value that points to a 4K location in\n<glossary>RAM<\/glossary>.  Because it points to an area of memory larger than a\nbyte, it does not need all 32 bits to do it. Therefore, some bits are left over.\nBecause the page table entry points to an area that has 2<sup>20 <\/sup>bytes\n(4,096 bytes = 1 page), it doesn&#8217;t need 12 bits. These are the low-order 12 bits\nand the <glossary>CPU<\/glossary> uses them for other purposes related to that\npage. A few of them are unused and the <glossary>operating system<\/glossary>\ncan, and does, use them for its own purposes. Intel also reserves a couple, and\nthey should not be used.\n<\/p>\n<p>\nOne bit, the 0th bit, is the present bit. If this bit is set, the\n<glossary>CPU<\/glossary> knows that the page being referenced is in memory. If\nit is not set, the page is not in memory and if  the CPU tries to access it, a\npage <glossary>fault<\/glossary> occurs. Also, if this bit is not set, none of\nthe other bits has any meaning. (How can you talk about something that&#8217;s not\nthere?)\n<\/p>\n<p>\nAnother important bit is the accessed bit. Should a page be accessed for\neither read or write, the   <glossary>CPU<\/glossary> sets this bit. Because the\n<glossary>page table<\/glossary> entry is never filled in until the page is being\naccessed, this seems a bit redundant. If that was all  there was to it, you&#8217;d be\nright. However, there&#8217;s more.\n<\/p>\n<p>\nAt regular intervals, the <glossary>operating system<\/glossary>\nclears the access bit. If a particular page is never used again, the system is\nfree to reuse that physical page if memory gets short. When that happens, all\nthe OS needs to do is clear the present bit so the page is considered &#8220;invalid.&#8221;\n<\/p>\n<p>\nAnother bit used to determine how a page is accessed is the <glossary>dirty<\/glossary>\nbit. If a page has been written to, it is considered dirty. Before the system\ncan make a dirty page available, it must make sure that whatever was in that\npage is written to disk, otherwise the data is inconsistent.\n<\/p>\n<p>\nFinally, we get to the point of what all this <glossary>protected mode<\/glossary>\nstuff is all about. The protection in protected mode essentially boils down to\ntwo bits in the <glossary>page table<\/glossary> entry. One bit, the\nuser\/supervisor bit, determines who has access to a particular page. If the\n<glossary>CPU<\/glossary> is running at user level, then it only has access to\nuser-level pages. If the CPU is at supervisor level, it has access to all pages.\n<\/p>\n<p>\nI need to say here that this is the maximum access a process can have. Other\nprotections may prevent  a user-level or even supervisor-level process from\ngetting even this far. However, these are implemented at a higher level.\n<\/p>\n<p>\nThe other bit in this pair is the read\/write bit. As the name implies, this\nbit  determines whether a  page can be written to. This single bit is really\njust an on-off switch. If the page is there, you have the right to read it if\nyou can (that is, either you are a supervisor-level process or the page is a\nuser page). However, if the write ability is turned off, you cant write to it,\neven as a supervisor.\n<\/p>\n<p>\nIf you have a 386 <glossary>CPU<\/glossary>, all is well. If you have a 486\nand decide to use one of those bits that I told you were reserved by Intel,  you\nare now running into trouble. Two of these bits were not defined in the 386 but\nare now defined in the 486: page write-through (PWT) and page\n<glossary>cache<\/glossary> disable (PCD).\n<\/p>\n<p>\nPWT determines the <glossary>write policy<\/glossary>\n(see the section on RAM) for external <glossary>cache<\/glossary>\nregarding this page. If PWT is set, then this page has a write-through policy.\nIf it is clear, a write-back  policy is allowed.\n<\/p>\n<p>\nPCD decides whether this page can be cached. If clear, this page cannot be\ncached. If set, then caching  is allowed. Note that I said &#8220;allowed.&#8221; Setting\nthis bit does not mean that the page will be cached. Other factors that go\nbeyond what I am trying to get across here are involved.\n<\/p>\n<p>\nWell, I&#8217;ve talked about how the <glossary>CPU<\/glossary>\nhelps the OS keep track of pages in memory. I also talked about how the CR3\nregister helps keep track of  which <glossary>page directory<\/glossary> needs\nto be read. I also talked about how pages can be protected by using a few bits\nin the <glossary>page table<\/glossary> entry. However, one more thing is missing\nto complete the picture: keeping track of which process is currently running,\nwhich is done with the Task Register (TR).\n<\/p>\n<p>\nThe TR is not where most of the work is done. The <glossary>CPU<\/glossary>\nsimply uses it as a pointer to where the important information is kept. This\npointer is the Task State  Descriptor (TSD). Like the other descriptors that\nI&#8217;ve talked about, the TSD points to a particular <glossary>segment<\/glossary>.\nThis segment is the Task State Segment (TSS). The TSD contains, among other\nthings, the privilege level at which this task is operating. Using this\ninformation along with that in the <glossary>page table<\/glossary> entry, you\nget the protection that <glossary>protected mode<\/glossary> allows.\n<\/p>\n<p>\nThe <glossary>TSS<\/glossary>\ncontains essentially a snapshot of the <glossary>CPU<\/glossary>.\nWhen a process&#8217;s turn on the CPU is over, the state of the entire CPU needs to\nbe saved so that the program can continue where it left off. This information\nis stored in the TSS. This functionality is built into the CPU. When the OS\ntells the CPU a task switch is occurring (that is, a new process is getting its\nturn), the CPU knows to save this data <i>automatically<\/i>.\n<\/p>\n<p>\nIf we put all of these components together, we get an\n<glossary>operating system<\/glossary> that works together with the hardware to provide a\nmultitasking, multiuser system. Unfortunately, what  I talked about here are\njust the basics. I could spend a whole book just talking about the relationship\nbetween the operating system and the <glossary>CPU<\/glossary> and still not be\ndone.\n<\/p>\n<p>\nOne thing I didn&#8217;t talk about was the difference between the 80386, 80486,\nand Pentium. With each new processor comes new instructions. The 80486 added an\ninstruction pipeline to improve the performance to the point where the\n<glossary>CPU<\/glossary> could average almost one instruction per cycle. The\nPentium has dual instructions paths (pipelines) to increase the speed even more.\nIt also contains <em>branch prediction logic<\/em>, which is used to &#8220;guess&#8221;\nwhere the next instruction should come from.\n<\/p>\n<p>\nThe Pentium (as well as the later CPUs) has a few new features that\nmake for significantly more performance. This first feature is multiple\ninstruction paths or pipelines, which allow the <glossary>CPU<\/glossary> to work\non multiple instructions at the same time. In some cases, the CPU will have to\nwait to finish one before working on the other, though this is not always\nnecessary.\n<\/p>\n<p>\nThe second improvement is called dynamic execution. Normally, instructions\nare executed one after other. If the execution order is changed, the whole\nprogram is changed. Well, not exactly. In some instances, upcoming instructions\nare not based on previous instructions, so the processor can &#8220;jump ahead&#8221; and\nstart executing the executions before others are finished.\n<\/p>\n<p>\nThe next advance is branch prediction. Based on previous activity, the\n<glossary>CPU<\/glossary> can expect certain behavior to continue. For example,\nthe odds are that once the CPU is in a loop, the  loop will be repeated. With\nmore than one pipeline executing instruction, multiple possibilities can be\nattempted. This is not always right, but is right more than 75 percent of the\ntime!\n<\/p>\n<p>\nThe PentiumPro (P6) introduced the concept of data flow analysis. Here,\ninstructions are executed as  they are ready, not necessarily in the order in\nwhich they appear in the program. Often, the result is available before it\nnormally would be. The PentiumPro (P6) also introduced speculative execution, in\nwhich the <glossary>CPU<\/glossary> takes a guess at or anticipates what is\ncoming.\n<\/p>\n<p>\nThe P6 is also new in that it is actually two separate chips.\nHowever, the function of the second chip is the level 2 <glossary>cache<\/glossary>.\n Both an external\nbus and a &#8220;private&#8221; <glossary>bus<\/glossary>\nconnect the <glossary>CPU<\/glossary>\nto the level 2 <glossary>cache<\/glossary>,\n and both\nof these are 64 bits.\n<\/p>\n<p>\nBoth the Socket and the <glossary>CPU<\/glossary>\nitself changed with the Pentium II processor.\nInstead of a processor with pins sticking out all over the bottom, the Pentium\nII uses a  Single Edge Contact Cartridge (SECC). This reportedly eliminates the\nneed for resigning the socket with every new <glossary>CPU<\/glossary>\ngeneration. In addition, the\nCPU is encased in plastic, which protects the <glossary>CPU<\/glossary>\nduring handling.  Starting at\n&#8220;only&#8221;, the Pentium II can reach speeds of up to 450 MHz.\n<\/p>\n<p>\nIncreasing performance even further, the Pentium II has increased the\ninternal, level-one <glossary>cache<\/glossary>\nto 32KiB, with 16 <glossary>KiB<\/glossary>\nfor data and 16KiB for\ninstructions. Technically it may be appropriate to call the level-two cache\ninternal, as the 512KiB L2 <glossary>cache<\/glossary>\nis included within the SECC, making access\nfaster than for a traditional L2 <glossary>cache<\/glossary>.\n  The Dual Independent Bus (DIB)\narchitecture provides for higher <glossary>throughput<\/glossary>\nas there are separate system and\ncache buses.\n<\/p>\n<p>\nThe Pentium II also increases performance internally through changes to the\nprocessor logic. Using Multiple Branch Prediction, the Pentium predicts the flow\nof instructions through several branches. Because computers usually process\ninstructions in  loops (i.e. repeatedly) it is generally easy to guess what the\ncomputer will do next. By predicting multiple branches, the processor reduces\n&#8220;wrong guesses.&#8221;\n<\/p>\n<p>\nProcessor &#8220;management&#8221; has become an important part of the Pentium II. A\nBuilt-In Self-Test( BIST) is included, which is used to test things like the\ncache and the <glossary>TLB<\/glossary>.\n It also includes a diode within the case to monitor the\nprocessor&#8217;s temperature.\n<\/p>\n<p>\nThe Pentium II Xeon Processor added a &#8220;system <glossary>bus<\/glossary>\nmanagement interface,&#8221;\nwhich allows the <glossary>CPU<\/glossary>\nto communicate with other system management components\n(hardware and software). The thermal sensor, which was already present in the\nPentium II, as well as the new Processor Information <glossary>ROM<\/glossary>\n(PI ROM) and the\nScratch EEPROM use this <glossary>bus<\/glossary>.\n<\/p>\n<p>\nThe PI <glossary>ROM<\/glossary>\ncontains various pieces of information about the <glossary>CPU<\/glossary>,\n like the\nCPU ID, voltage tolerances and other technical information. The Scratch EEPROM\nis shipped blank from Intel but is intended for system manufacturers to include\nwhatever information they want to, such as an inventory of the other components,\nservice information, system default and so forth.\n<\/p>\n<p>\nLike the Pentium II, the latest (as of this writing) processor, the Pentium III\nalso comes in the  Single Edge Contact Cartridge. It has increased the number of\ntransistors from the 7.5 million in the Pentium II to over 9.5 million.\nCurrently, the Pentium III comes in 450Mhz and 500 MHz models, with a 550MHz\nmodel in the planning.\n<\/p>\n<p>\nThe Pentium II also includes the Internet Streaming SIMD Extensions, which\nconsist of 70 new instructions which enhance imagining in general, as well as 3D\ngraphics, streaming audio and video as well as speech recognition.\n<\/p>\n<p>\nIntel also added a serial number to the Pentium II. This is extremely\nuseful should the <glossary>CPU<\/glossary>\nitself get stolen or the computer get stolen, after the\nCPU has been installed.  In addition, the <glossary>CPU<\/glossary>\ncan be uniquely identified across\nthe <glossary>network<\/glossary>,\n regardless of the network card or other components. This can be\nused in the future to prevent improper access to sensitive data, aid in asset\nmanagement and help in remote management and configuration.\n<\/p>\n<p>\nEven today in the people still think that the PC <glossary>CPU<\/glossary>\nis synonymous with\nIntel.  That is if you are going to buy a <glossary>CPU<\/glossary>\nfor your PC that it will be\nmanufactured by Intel.  This is not the case.  Two manufactures Advanced Micro\nDevices (ADM) and Cyrix provide CPUs with  comparable functionality. Like any\nother brand name, Intel CPUs are often more expensive than an equivalent from\nanother company with the same performance.\n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Intel Processors Although it is an interesting subject, the ancient history of microprocessors is not really important to the issues at hand. It might be nice to learn how the young PC grew from a small, budding 4-bit system to &hellip; <a href=\"http:\/\/www.linux-tutorial.info\/?page_id=278\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-278","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/278","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=278"}],"version-history":[{"count":1,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/278\/revisions"}],"predecessor-version":[{"id":587,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/278\/revisions\/587"}],"wp:attachment":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=278"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}