{"id":285,"date":"2020-08-18T19:23:47","date_gmt":"2020-08-18T20:23:47","guid":{"rendered":"http:\/\/www.linux-tutorial.info\/?page_id=77"},"modified":"2020-08-22T19:26:18","modified_gmt":"2020-08-22T20:26:18","slug":"this-is-the-page-title-toplevel-120","status":"publish","type":"page","link":"http:\/\/www.linux-tutorial.info\/?page_id=285","title":{"rendered":"RAID"},"content":{"rendered":"\n<title>RAID<\/title>\n<p>\nRAID is an acronym for Redundant Array of Inexpensive Disks. Originally, the\nidea was that you would get better performance and reliability from several,\nless expensive drives linked together as you would from a single, more expensive\ndrive. The key change in the entire concept is that hard disk prices have\ndropped so dramatically that <glossary>RAID<\/glossary>\nis no longer concerned with inexpensive\ndrives. So much so, that the I in <glossary>RAID<\/glossary>\nis often interpreted as meaning &#8220;Intelligent&#8221; or &#8220;Independent&#8221;, rather than &#8220;Inexpensive.&#8221;\n<\/p>\n<p>\nIn the original paper that defined RAID, there were five levels. Since that paper\nwas written, the concept has been\nexpanded and revised. In some cases, characteristics of the original levels are\ncombined to form new levels.\n<\/p>\n<p>\nTwo concepts are key to understanding <glossary>RAID<\/glossary>.\nThese are redundancy and <glossary>parity<\/glossary>. The concept of parity is\nno different than that used in serial communication, except for the fact that\nthe <glossary>parity<\/glossary> in a RAID\nsystem can be used to not only detect errors, but correct them. This is because\nmore than just a single bit is used per byte of data. The <glossary>parity<\/glossary>\ninformation is stored on a drive separate from the data. When an error is detected, the\ninformation is used from the good drives, plus the <glossary>parity<\/glossary>\ninformation to correct the error. It is also possible to have an entire drive fail\ncompletely and still be able to continue working. Usually the drive can be\nreplaced and the information on it rebuilt even while the system is running.\nRedundancy is the idea that all information is duplicated. If you have a system\nwhere one disks is an exact copy of another, one disk is redundant for the\nother.\n<\/p>\n<p>\nA striped\narray is also referred to as <glossary>RAID<\/glossary>\n0 or RAID Level 0. Here, portions of the data\nare written to and read from multiple disks in parallel. This greatly increases\nthe speed at which data can be accessed. This is because half of the data is\nbeing read or written by each hard disk, which cuts the access time almost in\nhalf. The amount of data that is written to a single disk is referred to as the\nstripe width. For example, if single blocks are written to each disk, then the\nstripe width would be a block\n<\/p>\n<p>\nThis type of virtual disk provides increased\nperformance since data is being read from multiple disks simultaneously. Since\nthere is no <glossary>parity<\/glossary>\nto update when data is written, this is faster than system\nusing <glossary>parity<\/glossary>.\nHowever, the drawback is that there is no redundancy. If one disk\ngoes out, then data is probably lost. Such a system is more suited for\norganizations where speed is more important than reliability.\n<\/p>\n<p>\nKeep in mind\nthat data is written to all the physical drives each time data is written to the\nlogical disk. Therefore, the pieces must all be the same size. For example, you\ncould not have one piece that was 500 MB and a second piece that was only 400\nMb. (Where would the other 100 be written?) Here again, the total amount of\nspace available is the sum of all the pieces.\n<\/p>\n<p>\nDisk <glossary>mirroring<\/glossary>\n(also referred to as <glossary>RAID<\/glossary>\n1) is where data from the first drive is duplicated on the second\ndrive. When data is written to the primary drive, it is automatically written to\nthe secondary drive as well. Although this slows things down a bit when data is\nwritten, when data is read it can be read from either disk, thus increasing\nperformance. Mirrored systems are best employed where there is a large database\napplication\n<\/p>\n<p>\nAvailability of the data (transaction speed and reliability) is\nmore important than storage efficiency. Another consideration is the speed of\nthe system. Since it takes longer than normal to write data, mirrored systems\nare better suited to database applications where queries are more common than\nupdates.\n<\/p>\n<p>\nThe term used for <glossary>RAID<\/glossary>\n4 is a block interleaved undistributed parity\narray. Like <glossary>RAID<\/glossary>\n0, RAID 4 is also based on striping, but redundancy is built in\nwith <glossary>parity<\/glossary>\ninformation written to a separate drive. The term &#8220;undistributed&#8221; is\nused since a single drive is used to store the <glossary>parity<\/glossary>\ninformation. If one drive\nfails (or even a portion of the drive), the missing data can be created using\nthe information on the <glossary>parity<\/glossary>\ndisk. It is possible to continue working even with\none drive inoperable since the <glossary>parity<\/glossary>\ndrive is used on the fly to recreate the\ndata. Even data written to the disk is still valid since the <glossary>parity<\/glossary>\ninformation is updated as well. This is not intended as a means of running your system\nindefinitely with a drive missing, but rather it gives you the chance to stop\nyour system gracefully.\n<\/p>\n<p>\nRAID 5 takes this one step further and\ndistributes the <glossary>parity<\/glossary>\ninformation to all drives. For example, the parity drive\nfor block 1 might be drive 5 but the <glossary>parity<\/glossary>\ndrive for block 2 is drive 4. With\nRAID 4, the single <glossary>parity<\/glossary>\ndrive was accessed on every single data write, which\ndecreased overall performance. Since data and <glossary>parity<\/glossary>\nand interspersed on a RAID\n5 system, no single drive is overburdened. In both cases, the <glossary>parity<\/glossary>\ninformation is generated during the write and should the drive go out, the missing\ndata can be\nrecreated. Here again, you can recreated the data while the system is running,\nif a hot spare is used. Figure &#8211; Raid 5\n<\/p>\n<p>\nAs I\nmentioned before, some of the characteristics can be combined. For example, it\nis not uncommon to have stripped arrays mirrored as well. This provides\nthe speed of a striped array with redundancy of a mirrored array, without the\nexpense necessary to implement <glossary>RAID<\/glossary>\n5. Such a system would probably be referred\nto as <glossary>RAID<\/glossary>\n10 (RAID 1 plus RAID 0).\n<\/p>\n<p>\nRegardless of how long your drives are\nsupposed to last, they will eventually fail. The question is when. On a server, a\ncrashed harddisk means that many if not all of your employees are unable to work\nuntil the drive is replaced. However, there are ways of limiting the effects the\ncrash has in a couple ways. First, you can keep the system from going down\nunexpectedly. Second, you can protect the data already on the drive.\n<\/p>\n<p>\nThe key\nissue with <glossary>RAID<\/glossary>\nis the mechanisms the system uses to portray the multiple drives\nas single one. The two solutions are quite simply hardware and software. With\nhardware RAID; the <glossary>SCSI<\/glossary>\n<glossary>host adapter<\/glossary>  does all of the work. Basically, the\noperating system does not even see that there are multiple drives. Therefore,\nyou can use hardware <glossary>RAID<\/glossary>\nwith operating systems that do not have any support on their own.\n<\/p>\n<p>\nOn the other hand software <glossary>RAID<\/glossary>\nis less expensive. Linux comes included with\nsoftware, so there is no additional cost. However, to me this is no real\nadvantage as the initial hardware costs are a small fraction of the total cost\nof running the system. Maintenance and support play a much larger roll, so these\nought to be considered before the cost of the actual hardware. In it&#8217;s Annual\nDisaster Impact Research, Microsoft reports that on the average a downed server\ncosts at least $10,000 per hour. Think about how many <glossary>RAID<\/glossary>\ncontrollers you can buy with that money.\n<\/p>\n<concept id=\"2\" description=\"Linux can combine multiple drives into a single RAID system, even if the drives are of different types.\" \/>\n<p>\nAnother advantage of software RAID is the ability to use different types of drives. Although the\npartitions need to be the same size, the physical hardware can be different, such as <glossary>IDE<\/glossary>\nand <glossary>SATA<\/glossary>.\n<\/p>\n<p>\nIn addition, the total cost of\nownership also includes user productivity. Should a drive fail, performance\ndegrades faster with a software solution than with a hardware solution.\n<\/p>\n<p>\nLet&#8217;s\ntake an Adaptec AA-133SA <glossary>RAID<\/glossary>\ncontroller as an example. At the time of this writing it is one of the\ntop end models and provides three Ultra <glossary>SCSI<\/glossary>\nchannels, which means you could\ntheoretically connect 45 devices to this single <glossary>host adapter<\/glossary>.\nSince each of the channels is Ultra <glossary>SCSI<\/glossary>, you have a\nmaximum <glossary>throughput<\/glossary> of 120Mbit\/s. At the other\nend of the spectrum is the Adaptec AAA-131CA, which is designed more for high-end\nworkstations, as it only supports <glossary>mirroring<\/glossary>\nand striping.\n<\/p>\n<p>\nOne thing to note is that the Adaptec <glossary>RAID<\/glossary>\n<glossary>host<\/glossary> adapters do not just provide the interface, which\nmakes multiple drives appear as one. Instead, they all include a coprocessor,\nwhich increases the performance of the drives considerably.\n<\/p>\n<p>\nHowever, providing data faster and redundancy in not all of it, Adaptec <glossary>RAID<\/glossary>\ncontrollers also have the ability to detect errors and in some cases correct errors on the\nhard disk. Many <glossary>SCSI<\/glossary>\nsystems can already detect single-bit errors. However,\nusing the <glossary>parity<\/glossary>\ninformation from the drives, the Adaptec <glossary>RAID<\/glossary>\ncontrollers can correct these single-bit errors. In addition,\nthe Adaptec <glossary>RAID<\/glossary>\ncontrollers can also detect 4-bit errors.\n<\/p>\n<p>\nYou need to also keep in mind the fact that\nmaintenance and administration are more costly than the initial hardware. Even\nthough you have a <glossary>RAID<\/glossary>\n5 array, you still need to replace the drive should it\nfail. This brings up two important aspects.\n<\/p>\n<p>\nFirst, how well can your system\ndetect the fact that a drive has failed? Whatever mechanisms you chose must be\nin a position to immediately notify the administrators should a drive fail.\n<\/p>\n<p>\nThe second aspect returns to the fact that maintenance and administration\ncosts are much higher than the cost of the initial hardware. If the hardware\nmakes replacing the drive difficult, you increase your downtime and therefore\nthe maintenance costs increase. Adaptec has addressed this issue by allowing\nyou to &#8220;hot swap&#8221; your drives. This means you can replace the defective drive\non a running system, without have to shutdown the <glossary>operating system<\/glossary>.\n<\/p>\n<p>\nNote that this also requires that the case containing the <glossary>RAID<\/glossary>\ndrive be accessible. If your\ndrives are in the same case as the <glossary>CPU<\/glossary>\n(such as traditional tower cases), you\noften have difficulty getting to the drives. Removing one while the system is\nrunning is not practical. The solution is an external case, which is\nspecifically designed for <glossary>RAID<\/glossary>.\n<\/p>\n<p>\nOften you can configure the <glossary>SCSI<\/glossary>\nID of the drive with dials on the cases itself and sometimes the position in the case\ndetermines the <glossary>SCSI<\/glossary>\nID. Typically, the drives are mounted onto rails, which\nslide into the case. Should one fail, you simple slide it out and replace it\nwith the new drive.\n<\/p>\n<p>\nProtecting your data and being able to replace the drive\nis just a start. The next level up is what is referred to as &#8220;hot spares.&#8221; Here,\nyou have additional drives already installed that are simply waiting for another\nto break down. As soon as a failure is detected, the <glossary>RAID<\/glossary>\ncard replaces the\nfailed drive with a spare drive, simply reconfigures the array to reflect the\nnew drive and the failure is reported to the <glossary>administrator<\/glossary>.\nKeep in mind that this must be completely supported in the hardware.\n<\/p>\n<p>\nIf you have an I\/O-bound\napplication, a failed drive decreases the performance. Instead of just\ndelivering the data, your <glossary>RAID<\/glossary>\narray must calculate the missing data using the\nparity information, which means it has a slower response time in delivering the\ndata. The degraded performance continues until you replace the drive. With a hot\nspare, the <glossary>RAID<\/glossary>\narray is rebuilding it self as it is delivering data. Although\nperformance is obviously degraded, it is to a lesser extent than having to swap\nthe drives manually.\n<\/p>\n<p>\nIf you have a CPU-bound <glossary>application<\/glossary>,\nyou obtain\nsubstantial increases in performance over software solutions. If a drive fails,\nthe <glossary>operating system<\/glossary>\nneeds to perform the <glossary>parity<\/glossary>\ncalculations in order to\nreconstruct the data. This keeps the <glossary>CPU<\/glossary>\nfrom doing the other tasks and\nperformance is degraded. Because the Adaptec <glossary>RAID<\/glossary>\ncontroller does all of the\nwork of reconstructing the data, the <glossary>CPU<\/glossary>\ndoesn&#8217;t even notice it. In fact, even\nwhile the system is running normally, the <glossary>RAID<\/glossary>\ncontroller is doing the appropriate\ncalculations, so there is no performance lost here either.\n<\/p>\n<p>\nIn addition, the\nAdaptec <glossary>RAID<\/glossary>\ncontrollers can be configured to set the priority of performance\nversus availability. If performance is given a high priority, it will take\nlonger to restore the data. If availability is given the higher priority,\nperformance suffers. Either is valid, depending on your situation. It is also\npossible to give each the same priority.\n<\/p>\n<p>\nBecause the new drive contains no\ndata, it must take the time to re-create the data using the <glossary>parity<\/glossary>\ninformation and the data from the other drives. During this time performance will suffer as\nthe system is working to restore the data on the failed drive.\n<\/p>\n<p>\nRedundancy like this can (and therefore the safety of your data) be increased\nfurther by having redundant <glossary>RAID<\/glossary>\n5 arrays. For example, you could mirror the entire RAID set. This is often referred to as RAID 51, as it is a\ncombination of <glossary>RAID<\/glossary>\n5 and RAID 1, although RAID 51 was not defined in the\noriginally <glossary>RAID<\/glossary>\npaper. Basically, this is a RAID array which is mirrored. Should a\ndrive fail, not only can the data be recovered from the <glossary>parity<\/glossary>\ninformation, but it can also be copied from its mirror.\nYou might also create a RAID 15 array. This is a RAID 5 array, which is made up of mirror sets.\n<p>\n","protected":false},"excerpt":{"rendered":"<p>RAID RAID is an acronym for Redundant Array of Inexpensive Disks. Originally, the idea was that you would get better performance and reliability from several, less expensive drives linked together as you would from a single, more expensive drive. The &hellip; <a href=\"http:\/\/www.linux-tutorial.info\/?page_id=285\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-285","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/285","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=285"}],"version-history":[{"count":1,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/285\/revisions"}],"predecessor-version":[{"id":674,"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=\/wp\/v2\/pages\/285\/revisions\/674"}],"wp:attachment":[{"href":"http:\/\/www.linux-tutorial.info\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}