csc25-chapter_09a.pdf
Document Details
Uploaded by ManageableSatire
2024
Tags
Full Transcript
CSC-25 High Performance Architectures Lecture Notes – Chapter IX-A Storage & I/O Systems Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technol...
CSC-25 High Performance Architectures Lecture Notes – Chapter IX-A Storage & I/O Systems Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents Storage I/O Servers - Clusters Overview Overview Magnetic Disks Data and Assumptions Magnetic Disks Performance Performance Evaluation Magnetic Disks Evolution Cost RAID Systems Dependability Flash Memory References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 2/43 Outline Storage I/O Servers - Clusters References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/43 Storage Overview Non-volatile memory can be viewed as part of the memory hierarchy system Or even as part of the I/O system I because it is invariably connected to the I/O buses and not to the main memory bus How to store? I magnetic disk I flash memory 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/43 Storage (cont.) Magnetic Disks Purpose I non-volatile storage I big, cheap, and slow1 I lowest level in the memory hierarchy system It is based on a rotating disk covered with a magnetic surface Use a read/write head per surface to access Illustration from information http://www.btdersleri.com/ders/Harddiskler In fact, disks may have more than one platter Also used in the “remote past” as device for physical data transport, e.g., floppy disks 1 when compared to flash memory 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/43 Storage (cont.) Magnetic Disks Cylinder-head-sector addressing. Illustration from https://www.partitionwizard.com/help/what-is-chs.html 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/43 Storage (cont.) Magnetic Disks Example of tracks and sectors numbers I 5k ∼ 30k tracks per surface, i.e., top and bottom I 100 ∼ 500 sectors per track I sector is the smallest unit that can be addressed Generally, all tracks had the same number of sectors I then, sectors have different physical sizes Currently, disks have tracks with different numbers of sectors to get disks with bigger storage capacity There are less sectors in I platters with the same density the inner tracks, then given an increased total I logical block addressing - LBA, instead of CHS number of sectors; and finally, bigger disk capacity 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 7/43 Storage (cont.) Magnetic Disks Cylinder I all the concentric tracks under the r/w head at a given point on all surfaces, i.e., cylindrical intersection Read/write process steps 1. seek time – position the arm over the proper track 2. rotational latency – wait for the desired sector to rotate under the r/w head 3. transfer time – transfer a block of bits, i.e., sector, under the r/w head 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/43 Storage (cont.) Magnetic Disks Performance Seek time2 I between 5 to 12 ms Sum of the time for all possible seeks AST = (1) Total number of possible seeks Due to locality wrt disk reference, actual seek time can be only 25 to 33% of the time disclosed by manufacturers 2 Average seek time as reported by the industry 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/43 Storage (cont.) Magnetic Disks Performance Rotational latency I 3,600 to 15,000 RPM, i.e., 16 ms to 4 ms per revolution I average rotational latency - ARL I 8 ms to 2 ms, i.e., average latency to desired information is halfway around the disk I common values are 5,400; 7,200 RPM ARL = 0.5 × RotationPeriod (2) 60 RotationPeriod = [seconds] (3) x [RPM] 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/43 Storage (cont.) Magnetic Disks Performance Transfer time depends on I transfer size per sector, e.g., 1 KiB, 4 KiB I rotation speed, e.g., 3600 to 15000 RPM I recording density: bits/inch I disk diameter: 1.0 to 3.5 inches I typical transfer rate: 3 to 65 MiB/s 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 11/43 Storage (cont.) Magnetic Disks Evolution Increase in the number of bits per square inch, i.e., density A steep price reduction from US$ 100,000/GB (1984) to less than US$ 0.5/GB (2012) Considerable increase in RPM, from 3600 RPM (’80s) to close to 15000 RPM (2000’s) I did not continue to increase due to problems with rotation high speed 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/43 Storage (cont.) Magnetic Disks Evolution Disk access time - DAT DAT = SeekTime + RotationalLatency + TransferTime + ControllerTime + QueuingDelay (4) 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/43 Storage (cont.) RAID Systems Disks differ from other levels of memory hierarchy because they are non-volatile They are also the lowest level, i.e., there is no other level to fetch on in the computer if the data is not on the disk Therefore, disks should not fail, but all hardware fail 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 14/43 Storage (cont.) RAID Systems Redundant array of independent disks - RAID3 I multiple simultaneous accesses I data are spread into multiple disks I stripping sequential data is logically allocated on separate disks to increase performance I mirroring data is copied to identical disks, i.e., mirrored, to increase availability 3 Formerly introduced as “inexpensive” 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 15/43 Storage (cont.) RAID Systems Main characteristics I latency is not necessarily reduced I availability is enhanced through the addition of redundant disks I lost information can be rebuilt through redundant information 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/43 Storage (cont.) RAID Systems Reliability vs Availability Reliability I less, i.e., more disks, greater fail probability Availability I greater, i.e., failures do not necessarily lead to unavailability 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/43 Storage (cont.) RAID Systems RAID standard levels summary Level Description RAID 0 Not redundant, but more efficient. Does not recover from failures. Striped/interleaved volumes RAID 1 Redundant and able to recover from one failure. Uses twice as many RAID 0 disks. Mirror/copy volume RAID 2 Applies memory-style error-correcting codes - ECC to disks. No com- mercial use RAID 3 Bit-interleaved parity. One parity/check disk for multiple data disks, able to recover from one failure RAID 4 Block-interleaved parity. One check disk for multiple data disks, able to recover from one failure RAID 5 Distributed block-interleaved parity. Able to recover from one failure RAID 6 RAID 5 extension, another parity block. Able to recover from double faults 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/43 Storage (cont.) RAID Systems Illustrations from https://en.wikipedia.org/wiki/Standard_RAID_levels 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/43 Storage (cont.) RAID Systems Let’s consider a case with two drives, in a 3-drive RAID 5 array Should any of the 3 drives fail, contents can be restored using the same XOR function Data from D1 = 1001 1001 (drive 1) Data from D2 = 1000 1100 (drive 2) If drive 1 fails, D1 contents can be restored by D2 = 1000 1100 The Boolean XOR function is used to compute P = 0001 0101 the parity of D1 and D2 XOR D1 = 1001 1001 P = 0001 0101, is written in the drive 3 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 20/43 Storage (cont.) RAID Systems RAID 6 details – row-diagonal parity I each diagonal does not cover (i.e., leaves out) one disk I even if two disks fail, it will be possible to recover a block I with one block recovered, the second one can be recovered through the row I needs just p − 1 diagonals to protect the p disks RAID 6 (p = 5); p + 1 disks total; p − 1 disks have data. Row parity disk is just like in RAID 4. Each block of the diagonal parity disk contains the parity of the blocks in the same diagonal 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/43 Storage (cont.) Flash Memory Technology similar to traditional EEPROM4 , higher memory capacity per chip Low-power consumption Read access time slower than DRAM, but much faster than disks I 256-byte transfer of flash would take around 6.5 µs, and 1000× more on disks (2010) I wrt writing, DRAM can be 10 to 100× faster Stores require “deletion” of data I first a memory block is erased, and then new data is written I i.e., erase-before-write 4 Electrically erasable programmable read-only memory 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/43 Storage (cont.) Flash Memory NOR- and NAND-based flash memories The first flash memories, NOR, was a direct competitor of the traditional EEPROM I randomly addressable I typically used in the BIOS After a while, NAND flash memories have emerged I offering higher storage density I but can only be read in blocks as it eliminates the wiring required for random access I much cheaper per gigabyte and much more common than NOR flash 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/43 Storage (cont.) Flash Memory 2010 – $2/GB for flash; $40/GB for SDRAM, and $0.09/GB for disks 2016 – $0.3/GB for flash; $7/GB for SDRAM, and $0.06/GB for disks There is wear-out of flash wrt writings, limited to 100K and 1M recordings Life cycle can be expanded through the uniform distribution of writes through blocks Floppy disks were extinguished, and so hard drives in mobile systems, thanks to solid state disks - SSD 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/43 Outline Storage I/O Servers - Clusters References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/43 I/O Servers - Clusters Overview Evaluating performance, cost, and dependability of a system designed to provide high I/O performance Reference: Rack VME T-80 I used in the Internet Archive I project started in 1996 I aims to make the historical record of the Internet over time Typical cluster building blocks I servers, e.g., storage nodes I Ethernet switch I rack 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 26/43 I/O Servers - Clusters (cont.) Data and Assumptions Rack VME T-80, Capricornian Systems5 Storage node – PetaBox GB2000 I 4× 500 GB parallel advanced technology attachment - PATA disk drives I 512 MB of DDR266 DRAM I 1× 10/100/1000 Ethernet interface I 1 GHz C3 processor from VIA, 80x86 instruction set I dissipates ≈ 80 W in typical configurations 40× GB2000s fit in a standard VME rack, giving a total of 80 TB of raw capacity Nodes connected together with a 48-port 10/100/1000 switch, dissipating ≈ 3 kW, and the limit is usually 10 kW per rack 5 Data from 2006 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 27/43 I/O Servers - Clusters (cont.) Data and Assumptions I $500 I processor, performance of 1,000 MIPS6 I DRAM I ATA disk controller I power supply, fans, and enclosure I $375 ×4 I 7200 RPM PATA drives holds 500 GB I average seek time of 8.5 ms I transfers at 50 MB/sec from the disk I PATA link speed is 133 MB/sec 6 Millions of instructions per second 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 28/43 I/O Servers - Clusters (cont.) Data and Assumptions I $3,000 I 48-port 10/100/1000 Ethernet switch I all cables for a rack I ATA controller adds 0.1 ms of overhead to perform a disk I/O I operating system uses 50,000 CPU instructions for a disk I/O I network protocol stacks use 100,000 CPU instructions I transmit a data block between the cluster and the external world I average I/O size is I 16 KB for accesses to the historical record I 50 KB when collecting a new snapshot 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 29/43 I/O Servers - Clusters (cont.) Performance Evaluation Evaluate the cost per I/O per second - IOPS of the 80 TB rack Assuming that I every disk I/O requires an average seek and average rotational delay I workload is evenly divided among all disks I all devices can be used at 100% of capacity I the system is limited only by the weakest link, and I it can operate that link at 100% utilization Calculate for both average I/O sizes, i.e., 16 and 50 KiB 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 30/43 I/O Servers - Clusters (cont.) Performance Evaluation I/O performance is limited by the weakest link in the chain I evaluate the maximum performance of each link in the I/O chain 1. CPU, main memory, and I/O bus of one GB2000 2. ATA controllers, disks 3. network switch 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 31/43 I/O Servers - Clusters (cont.) Performance Evaluation 1, 000 MIPS CPU IOPSMAX = = 6, 667 50, 000 instructions per I/O + 100, 000 instructions per message CPU I/O performance is determined by the CPU speed; and the number of instructions to perform a disk I/O and to send it over the network 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 32/43 I/O Servers - Clusters (cont.) Performance Evaluation 266 × 8 MainMemory IOPSMAX = ≈ 133, 000 16 KiB per I/O 266 × 8 = ≈ 42, 500 50 KiB per I/O Maximum performance of the memory system is determined by the memory bandwidth and the size of the I/O transfers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 33/43 I/O Servers - Clusters (cont.) Performance Evaluation 133 MiB/s IO Bus IOPSMAX = ≈ 8, 300 16 KiB per I/O 133 MiB/s = ≈ 2, 700 50 KiB per I/O PATA link performance is limited by the bandwidth and the size of the I/O transfers Considering that each storage node has two buses, the I/O bus limits the maximum performance to ≤ 16,600 for 16 KiB blocks; and ≤ 5,400 for 50 KiB blocks 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 34/43 I/O Servers - Clusters (cont.) Performance Evaluation And now, the next link in the I/O chain, i.e., the ATA controllers 16 KiB PATATranferTime = ≈ 0.1 ms 133 MiB/s 50 KiB = ≈ 0.4 ms 133 MiB/s 1 ATA IOPSMAX = = 5, 000 0.1 ms + 0.1 ms controller overhead 1 = = 2, 000 0.4 ms + 0.1 ms controller overhead 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 35/43 I/O Servers - Clusters (cont.) Performance Evaluation Disks 0.5 × 60 16 KiB IOTime = 8.5 ms + + ≈ 13.0 ms 7200 RPM 50 MiB/s 0.5 × 60 50 KiB = 8.5 ms + + ≈ 13.7 ms 7200 RPM 50 MiB/s 1 Disk IOPSMAX = ≈ 77 13.0 ms 1 = ≈ 73 13.7 ms Or 292 ≤ Disk IOPSMAX ≤ 308 considering the 4 disks 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 36/43 I/O Servers - Clusters (cont.) Performance Evaluation The final link in the chain, i.e., the network connecting the computers to the out-side world 1, 000 Mbit Ethernet IOPSMAX per 1000 Mbit = = 7, 812 16 KiB × 8 1, 000 Mbit = = 2, 500 50 KiB × 8 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 37/43 I/O Servers - Clusters (cont.) Performance Evaluation What is the performance bottleneck of the storage node? I clearly, the disks Rack IOPS = 40 × 308 = 12, 320 = 40 × 292 = 11, 680 The network switch would be the bottleneck if it could not support I 12,320 × 16 KiB × 8 = 1.6 Gbits/s for 16 KB blocks, and I 11,680 × 50 KiB × 8 = 4.7 Gbits/s for 50 KB blocks 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 38/43 I/O Servers - Clusters (cont.) Performance Evaluation What is the performance bottleneck of the storage node? I clearly, the disks Rack IOPS = 40 × 308 = 12, 320 = 40 × 292 = 11, 680 The network switch would be the bottleneck if it could not support I 12,320 × 16 KiB × 8 = 1.6 Gbits/s for 16 KB blocks, and I 11,680 × 50 KiB × 8 = 4.7 Gbits/s for 50 KB blocks 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 38/43 I/O Servers - Clusters (cont.) Cost Rack$ = 40 × ($500 + (4 × $375)) + $3, 000 + $1, 500RACK = 84, 500 Statistics I disks represent almost 70% of the total cost I cost per terabyte is almost $1,000 I about a factor of 10 to 15 better than storage cluster from the prior edition7 , in 2001 I cost per IOPS is about $7 7 Hennessy, Patterson; CAQP 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 39/43 I/O Servers - Clusters (cont.) Dependability I dependability is a measure of accomplishing a faultless service I mean time to failure - MTTF I availability is a measure of a service performance without interruptions I mean time to repair - MTTR MTTF Availability = (5) MTTF + MTTR I mean time between failure - MTBF MTBF = MTTF + MTTR (6) I failure rate 1 (7) MTTF 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 40/43 I/O Servers - Clusters (cont.) Dependability Resulting mean time to fail of the rack Assumptions wrt MTTF 1. 40× CPU/memory/enclosure = 1,000,000 h 2. 40 × 4 PATA Disk = 125,000 h 3. 40× PATA controller = 500,000 h 4. 1× Ethernet Switch = 500,000 h 5. 40× power supply = 200,000 h 6. 40× fan = 200,000 h 7. 40 × 2 PATA cable = 1,000,000 h (one cable per 2 disks) 40 160 40 + 1 40 + 40 80 1882 FailureRate = + + + + = 1 × 106 125 × 103 500 × 103 200 × 103 1 × 106 1 × 106 1 1 × 106 MTTF = = ≈ 531 h (22 days, 3 hours) FailureRate 1882 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 41/43 Outline Storage I/O Servers - Clusters References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 42/43 Information to the reader Lecture notes mainly based on the following references Castro, Paulo André. Notas de Aula da disciplina CES-25 Arquiteturas para Alto Desempenho. ITA. 2018. Hennessy, J. L. and D. A. Patterson. Computer Architecture: A Quantitative Approach. 6th. Morgan Kaufmann, 2017. Patterson, D. and S. Kong. Lecture notes, CS152 Computer Architecture and Engineering, Lecture 19: I/O Systems. Online. 1995. 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 43/43