csc25-lecture-notes-163-168_part1.pdf
Document Details
Uploaded by SelfDeterminationOmaha
ITA
Tags
Full Transcript
Chapter 9 Storage and I/O Systems Storage Overview The non-volatile memory can be viewed as part of the memory hiera...
Chapter 9 Storage and I/O Systems Storage Overview The non-volatile memory can be viewed as part of the memory hierarchy system, or even as part of the I/O system. This is because it is invariably connected to the I/O buses and not to the main memory bus, as illustrated in Fig. 4.4. In this case, the main choices to store data are magnetic disks and flash memory. Magnetic Disks The purpose of magnetic disks is to serve as a non-volatile (persistent) storage. Usually, disks are big, cheap, and slow1. They reside in the lowest level of the memory hierarchy system (??). This device is based on a rotating disk covered with a magnetic surface. The disk uses a read-write head per surface to access information. In fact, disks may have more than one platter, as shown in Figs. 9.1 and 9.2. Figure 9.1: A sector, cluster, track and cylinder illustration. Fig. source: http://www.btdersleri.com/ders/Harddiskler In the past, disks were also used as devices for physical data transportation, e.g., floppy disks2. 1 When compared to flash memory. 2 https://en.wikipedia.org/wiki/Floppy_disk 157 9 Storage and I/O Systems Figure 9.2: Cylinder-head-sector (CHS) addressing. Here, the arm assembly moves to select a given cylinder. Then, a read-write head is selected (this is related to a platter). As the spindle rotates, a sector’s information is finally accessed by the selected read-write head. Fig. source: https://www.partitionwizard.com/help/what-is-chs.html Some tracks and sectors numbers examples. There are about 5k to 30k tracks per disk surface, i.e., top and bottom, and 100 to 500 sectors per track. The sector is the smallest unit that can be addressed in a disk, as shown in Fig. 9.3. Earlier, all tracks had the same number of sectors. Then, sectors had different physical sizes. Thus, the inner sectors (smaller areas) could have the same capacity as the outer sectors (bigger areas) by having a different density arrangement. Currently, disks have tracks with different numbers of sectors to get disks with bigger storage capacity. Platters have the same density. And, disks use logical block addressing - LBA, instead of CHS. Figure 9.3: There are different number of sectors in the inner and outer tracks, then given an increased total number of sectors, and finally, a bigger disk capacity, considering platters with a same density. A cylinder comprehends all the concentric tracks under the read-write head at a given point on all surfaces, i.e., the cylindrical intersection. The read/write process includes the following steps: 1. seek time – is the time to positioning the arm (as in Fig. 9.2) over the addressed track; 2. rotational latency – is the waiting time for the desired sector to rotate under the read-write head; and 3. transfer time – is the time taken to transfer a block of bits, i.e., sector, under the read-write head. 158 Storage Magnetic Disks Performance Seek Time The seek time3 is generally between 5 to 12 ms. This can be computed as in Eq. (9.1). Sum of the time for all possible seeks AST = (9.1) Total number of possible seeks Due to the locality with respect to the disk reference, the actual seek time can be only 25% to 33% of the time disclosed by manufacturers. Rotational Latency The rotational latency is about 3,600 to 15,000 RPM, i.e., 16 ms to 4 ms per revolution. However, this measurement is usually expressed as the average rotational latency - ARL, e.g., from 8 ms to 2 ms, i.e., the average latency to the desired information is halfway around the disk. This can be computed as in Eqs. (9.2) and (9.3). ARL = 0.5 × RotationP eriod (9.2) 60 RotationP eriod = [seconds] (9.3) x [RPM] Common values for disk rotational speed/latency x are 5,400 and 7,200 RPM. Transfer Time The transfer time depends on the: transfer size per sector, e.g., 1 KiB, 4 KiB; rotational speed, e.g., 3,600 to 15,000 RPM; and recording density (bits/inch), taken into account the disk diameter from 1.0 to 3.5 inches, for example. Typical transfer rates are from 3 to 65 MiB/s. Magnetic Disks Evolution Magnetic disks evolved over the years. There was an increase in the number of bits per square inch, i.e., disk density, a steep price reduction from US$ 100,000/GiB (1984) to less than US$ 0.5/GiB (2012), and a considerable increase in the RPM, from 3,600 RPM (in the ’80s) to close to 15,000 RPM (2000’s). The latter does not continue to increase due to problems with high rotational speed. Eq. (9.4) is used to computed the disk access time - DAT. DAT = SeekT ime + RotationalLatency + T ransf erT ime + ControllerT ime + QueuingDelay (9.4) RAID Systems Disks differ from the other levels in the memory hierarchy because they are non-volatile. They are also the lowest level in the hierarchy, i.e., there is no other level to fetch on in the computer if the data is not in the disk4. 3 Average seek time - AST as reported by the industry. 4 Or in another device from the lowest memory hierarchy level, if one wants to be more accurate. 159 9 Storage and I/O Systems Therefore, disks should not fail, but all hardware fails at some point in time. In this sense, there is the redundant array of independent disks - RAID5. RAID allows for multiple simultaneous accesses since the data are spread into multiple disks. Two basic techniques are also used here. The first one is stripping, where sequential data is logically allocated on separate disks to increase performance. The second is mirroring, where data is copied to identical disks, i.e., mirrored, to increase the information availability. Considering the main characteristics of a RAID system, latency is not necessarily reduced, availability is enhanced through the addition of redundant disks, and lost information can be rebuilt through redundant data. Reliability vs. Availability In RAID, reliability becomes a problem considering the system will have less reliability, i.e., more disks then bring a greater fail probability. Conversely, availability is increased, i.e., failures do not necessarily lead to unavailability. Standard Levels Summary Table 9.1 presents a summary about the RAID levels. Fig. 9.4 illustrates the RAID 3, RAID 4, and RAID 5 systems. Table 9.1: RAID standard levels summary. Level Description RAID 0 This level does not offer redundancy, but it is more efficient. It does not recover from failures. RAID 0 has striped/interleaved volumes. RAID 1 This level is redundant and is able to recover from one failure. It uses twice as many RAID 0 disks. RAID 1 has mirrored/copy of volumes. RAID 2 This level applies a memory-style error-correcting codes - ECC to disks. No extensive commercial use. RAID 3 This level has bit-interleaved parity. One parity/check disk for multiple data disks, able to recover from one failure. As illustrated in Fig. 9.4. RAID 4 This level has block-interleaved parity. One check disk for multiple data disks, able to recover from one failure. As illustrated in Fig. 9.4. RAID 5 This level has distributed block-interleaved parity. It is able to recover from one failure. As illustrated in Fig. 9.4. RAID 6 This level is a RAID 5 extension, considering another parity block. It is able to recover from double failures. Illustrated in Fig. 9.5. 5 It was formerly introduced as “inexpensive”, instead of “independent”. 160 Storage Figure 9.4: RAID examples. In RAID 3 (bit-interleaved parity), there is one disk (Disk 3, in this example) especially designated to be the check disk for multiple disks (other 3 disks as illustrated here). In RAID 4 (block-interleaved parity), there is also one disk specific to the parity information (Disk 3). Finally, in RAID 5, there is no specific parity disk as this is a distributed block-interleaved parity, i.e., parity information is distributed among the disks in the system. Fig. source: https://en.wikipedia.org/wiki/Standard_RAID_levels Example Simple example. Let’s consider a case with two drives and data, in a 3-drive RAID 5 array. Data from D1 = 1001 1001 (drive 1) Data from D2 = 1000 1100 (drive 2) The Boolean XOR function is used to compute the parity of D1 and D2, as follows: Parity P = 0001 0101, is written in the drive 3. Should any of the 3 drives fail, the contents can be restored using the same XOR function. If drive 1 fails, D1 contents can be restored by doing the next procedure: D2 = 1000 1100 P = 0001 0101 XOR D1 = 1001 1001 RAID 6 Details In RAID 6, there is the row-diagonal parity. In this case, each diagonal does not cover one disk, i.e., one is left out. Even if two disks fail, it will be possible to recover a block. Considering that one block was already recovered, the second one can be recovered through the row parity, as in RAID 4. Finally, this scheme needs just p − 1 diagonals to protect the p disks. Fig. 9.5 illustrates this concept. Figure 9.5: RAID 6 (p = 5); p + 1 disks in total; p − 1 disks have data. Row parity disk is just like in RAID 4. Each block of the diagonal parity disk contains the parity of the blocks in the same diagonal. 161 9 Storage and I/O Systems How RAID 6 works, an example. Let’s consider the Fig. 9.5 and assume that data disks number 1 and 3 fail. Standard RAID recovery that uses the first row (row parity) does not help here, because two data blocks, from disk 1 and disk 3, are with a problem. The way around is to perform a recovery procedure based on the diagonal 0 since it only has problems with the data block related to disk 3. Notice that the diagonal 0 does not involve disk 1. Given this, the row-diagonal approach starts by recovering one of the four blocks within the failed disk (disk 1), since a data block from disk 3 was already recovered by using the diagonal parity. Next, the diagonal 2 will be used since it does not involve disk 3, but involves the other failed disk, i.e., disk 1. Thus, a block from disk 1 is recovery by using the diagonal parity. Therefore, when data from these blocks are recovered, the standard RAID recovery approach can be used to recover two more blocks using RAID 4 (row parity). This will allow two more diagonals to be recovered. This loop continues until all the blocks are finally recovered. Flash Memory The flash memory technology is similar to the traditional EEPROM6 , but with higher memory capacity per chip and low-power consumption. The read access time is slower than DRAM, but much faster than disks. A 256-Byte transfer of flash would take around 6.5 µs, and 1,000 times more on disks, based on the 2010 numbers. Regarding the writing process, DRAM can be from 10 to 100 times faster. The stores in flash require the deletion of data. First, a memory block is erased, and then the new data is written. NOR- and NAND-based Flash Memories The first flash memories, NOR, were a direct competitor of the traditional EEPROM. They were randomly addressable and typically used in the computer’s basic input/output system - BIOS. After a while, NAND flash memories have emerged. They offered higher storage density, but can only be read in blocks as it eliminates the wiring required for random accesses. NAND is much cheaper per gigabyte and much more common than NOR flash. In 2010, the price was $2/GiB for flash, $40/GiB for SDRAM, and $0.09/GiB for disks. In 2016, $0.3/GiB for flash; $7/GiB for SDRAM, and $0.06/GiB for disks. There is a wear-out of flash with respect to the writings, limited to about 100 thousands to one million recordings, depending on the manufacturer. The memory life cycle can be expanded through the uniform distribution of writes through blocks. Floppy disks were now extinguished, and so hard drives in mobile systems, thanks to the solid-state disks - SSD. Clusters (I/O Servers) Evaluation Overview Next, the performance, cost, and dependability of a system designed to provide high I/O performance are evaluated. 6 Electrically erasable programmable read-only memory. 162