Full Transcript

Singapore CSC1104 - COMPUTER ORGANISATION & ARCHITECTURE LECTURE 1 : COMPUTER EVOLUTION AND PERFORMANCE Associate Professor Cao Qi [email protected] School of Computing Science Information Module-Lead: Dr Cao Qi: Email: [email protected]...

Singapore CSC1104 - COMPUTER ORGANISATION & ARCHITECTURE LECTURE 1 : COMPUTER EVOLUTION AND PERFORMANCE Associate Professor Cao Qi [email protected] School of Computing Science Information Module-Lead: Dr Cao Qi: Email: [email protected] Webpage: https://www.gla.ac.uk/schools/computing/staff/qicao/ Technical Officer: Mr. Vincent Ng Chiew Guan Email: [email protected] 2 School of Computing Science Acknowledgement Main contents of CSC1104 - Computer Organisation and Architecture are derived from: Computer organization and architecture, Designing for performance. Author: William Stallings. Publisher: Pearson. Acknowledgement to: Author and Publisher. Computer organization and Design, The hardware/software interface. Authors: D. Patterson and J. Hennessy. Publisher: Morgan Kaufmann. Acknowledgement to: Authors and Publisher. 3 School of Computing Science Course Objectives : To present the nature and characteristics of modern-day computer systems, about their structure and function. To provide a thorough discussion of the fundamentals of computer organization and architecture. To learn programming on IoT devices. 4 School of Computing Science Course Venues Lectures: Monday 11:00 am – 13:00 pm, by Zoom Tutorials: Tuesday, 9:00 am – 10:00 am, E2- 02-14-Lectorial 10 Group Labs, E2-04-02-SR232 ▪ Group P1 : Tuesday, 10:00 – 11:00 am ▪ Group P2 : Tuesday, 11:00 am – 12:00 pm ▪ Group P3 : Tuesday, 12:00 – 13:00 pm 5 School of Computing Science Assessments Quiz 1 - worth 30% of your overall marks Class test - worth 5% of your overall marks (pop-up test at random timing) Project Assignment - worth 25% of your overall marks Weekly group class discussions – worth 5% of your overall marks Final exam - worth 35% of your overall marks 6 School of Computing Science Course Schedule Week Topics 1 Computer Evolution and Performance 2 Computer Function, Cache Memory 3 Internal Memory and External Memory 4 Input/Output, Operating System 5 Number Systems and Computer Arithmetic 6 Assembly Language 7 Break 8 Instruction Set: Characteristics and Data Types Quiz 1 (Week 1-6), 30% 9 Instruction Set: Types of Operations 10 Transfer of Control and Addressing Modes 11 Instruction Pipelining; CISC and RISC 12 Parallelism; Superscalar; Multicore Project assignment due, 25% 13 Revisions Final exam in Week 14, 35% 7 School of Computing Science Books and References Computer organization and architecture, Designing for performance, William Stallings, Pearson, 10th edition, 2016. Computer organization and Design, The hardware and software interface, David A. Patterson and John L. Hennessy, Morgan Kaufmann; 5th edition 2014. 8 School of Lecture Contents Computing Science Brief History of Computers: Classes of Computers Four Generations of Computers Evolutions of Processors, CISC and RISC: Complex instruction set of computers Reduced instruction set of computers CISC Processors and RISC Processors Computer Architecture and Organization Processors Performance: Measures of Performance 9 School of Classes of Computers Computing Science and Their Applications Servers (supercomputers): Many processors, large memory, high cost, usually used by big organizations - Google Personal computers (PCs): student laptop For single users at low cost. Embedded computers: IoT sensors - smart watch (have own processors & memory too Largest class of computers. Smart Mobile Devices: smartphone - consist multiple cpu, large space Powerful as PCs. 10 School of History - Generations Computing Science of Computers along the yrs, more and more powerful capabilities and physical demension reduced (smaller) Integrated Circuits Later Generations Processing power Memory capacity Dimensions Complexity Control units System software: load programs, move data from/to peripherals, perform common computations. calculation tasks.. software + hardware = powerful computer system 11 School of First Generation: Computing Science Vacuum Tubes used to control electrical current flows Used vacuum tubes for digital logic elements and memory. big dimension First computer: COLOSSUS, by in Computer COLOSSUS 1943-44. if its spoiled. it takes v long to identify which part is spoiled to change vacuum tubes First general-purpose computer: ENIAC in 1943–46. 12 Computer ENIAC tube change School of Von-Neumann Computing Science Architecture Von-Neumann architecture by John von Neumann in 1945. Stored-program computer: John von Neumann with the instruction and program stored-program IAS computer stored in same memory. Memory unit. Arithmetic logic unit (ALU). Control unit. Input–output (I/O) 13 School of Second Generation: Computing Science Transistors Transistor: solid-state silicon device. more reliable > vacuum tubes Invented at Bell Labs in 1947. Fully transistorized computers available in late 1950s. can control transistor to on and off, control which one acting as 1 and 0 14 School of Third Generation: Computing Science Integrated Circuits (IC) semiconductor tech A 12-inch (300 mm) wafer of Microelectronics era: Intel Core i7 (Courtesy Intel). invention of IC in 1958. 15 School of Later Generation: Large- Computing Science Scale Integration (LSI) large very large ultra large LSI: 1,000 – 10,000 VLSI: 10,000 – 1 million ULSI: > 1 million components per chip components per chip components per chip Construction of processors (control unit, arithmetic and logic unit). Construction of memory chips (storage density cost per bit access time ). 16 School of Decimal and Binary Computing Science Notations for Size Terms one or zero 2X vs. 10Y bytes ambiguity was resolved by adding a binary notation. 1Ki = 210 (1,024), 1Mi = 220 (1,048,576), 1Gi = 230 (1,073,741,824). 1K = 103 (1,000), 1M = 106 (1,000,000), 1G = 109 (1,000,000,000). 17 School of CISC v.s. RISC Computing Science CISC and RISC: different on instruction set. Complex Instruction Set Computer (CISC) processor - larger set of more complex instructions, where each instruction can perform multiple operations. architecture : - Multiple cycles: These complex instructions might take several clock cycles to execute. - Fewer instructions: Because the instructions are more powerful, the computer might need fewer instructions to complete a task. Complete task using a smaller number of assembly lines. Example processor: Intel X86 Architecture Reduced Instruction Set Computer (RISC) processor architecture : Complete task utilizing small and highly optimized set of instructions. - smaller set of simple instructions, each designed to execute very quickly Example processor: ARM Architecture instruction - One instruction per cycle: Each is designed to be completed in a single clock cycle, making processing fast. - More instructions: Since the instructions are simpler, the computer might 18need to execute more of them to complete a task. School of Example - Difference of CISC Computing Science and RISC Multiplying data (two numbers) stored in memory. 1 2 3 4 5 1 Execution unit can only operate data been loaded Memory 2 into registers. 3 Task: 2 numbers - one stored at memory location (2:4), the other stored at (3:2). Calculate their product then store the product back to (2:4). RISC: A B CISC: Use simple instructions executed C D Registers A specific instruction within 1 clock cycle. "MULT" divided into 3 commands: E F is used ("MULT"). The entire task of "LOAD“, “MUL“, "STORE“. multiplying 2 Need to code four lines. +-x/ Execution Unit numbers can be LOAD A, 2:4 completed with one LOAD B, 3:2 instruction: MUL A, B MULT 2:4, 3:2 STORE 2:4, A 19 School of Intel x86 Architecture – Computing Science CISC Processors With instruction set architecture (ISA) for microprocessor-based computing. Program written on older version can execute on newer versions. All changes have involved additions to the instruction set. Over 500 instructions in the instruction set. x86 represents design effort on CISC. 20 School of ARM Architecture – RISC Computing Science Processors ARM: RISC-based microprocessors by ARM Holdings. For high-performance, low-power-consumption, small-size, and low-cost processor for embedded systems. Embedded system: a dedicated function embedded as a part of a complete device or system. Billions of embedded Millions of computers computer systems are are sold each year. produced each year. 21 School of Computer Architecture and Computing Science Computer Organisation Architectural Example Attributes : Computer Architecture instruction set, RISC OR CISC? Attributes of a system visible number of bits to represent to programmers. various data types, Attributes direct impact on I/O mechanisms, 3 type memory addressing modes. logical execution of program. Organizational Example Attributes : Computer Organisation hardware details transparent to programmer, such as control Operational units and their signals, interconnections that realize interfaces between computer architectural specifications. and peripherals, memory technology used. 22 School of Distinction between Computing Science Architecture and Organization Many manufacturers offer a family of computer models, with same architecture but different in organization. A particular computer architecture may span many years. Computer organizations change with technology, price, performance characteristics, computer models, etc. For microcomputers, changes in technology influence not only organization but also more architectures. Generally, there is less of a requirement for generation-to- generation compatibility for these smaller machines. 23 School of Function Operations Computing Science 4 basic functions performed by computers: Operating environment Data Processing (source and destination ❑ Processing a wide variety of forms of data. of data) Data Storage ❑ Short term storage. memory Data ❑ Long term storage. hard disc movement apparatus Data Movement read/write, receive/transmite ❑ Input–output (I/O): when data received from or delivered to a peripheral device. Control ❑ Communications: when data moved over Mechanism longer distances, to/from a remote device. Control Data Data control, copy/paste storage processing ❑ Managing computer’s resources and facility facility performance of its functional parts. 24 School of Examples of Computer Computing Science Function Operations Data Data movement movement Control Control Data Data Data Data storage processing storage processing Data Data movement movement Control Control Data Data Data Data storage processing storage processing 25 Single Core School of Computing Science Processor Structure Main Main structural components of a computer: I/O memory Central processing unit (CPU): Controls operation System Bus and performs data processing functions. Main memory: Stores data. CPU I/O: Moves data from/to external environment. CPU System interconnection: Communication paths among CPU, main memory, and I/O. ALU Registers Major structural components of CPU: Internal Bus Control unit: Controls operation of CPU. Arithmetic and logic unit (ALU): Control Control Unit Unit Performs data processing functions. Control unit Registers: Provides internal storage of CPU. Sequencing registers & logic decoders CPU interconnection: Communication paths among control unit, ALU, registers, etc. Control memory 26 School of Clock Cycle Time or Computing Science Clock Rate Clock speed of processor (clock cycle time (or period of clock signal): period T. Clock rate or clock speed: frequency f). Clock rate f = 1/T T = 1/f e.g., 1-MHz processor receives 1 million clock pulses per second. The higher clock rate , the more data can be processed within a Clock rate , time to fixed time under same conditions. process each operation 27 School of Processors Performance Computing Science Performance formula relates to number of clock cycles and clock cycle time (Period) to CPU time: Execution time for a program = No. of CPU clock cycles needed × Clock cycle time Clock rate (Frequency) inverse to clock cycle time: No. of CPU clock cycles needed Execution time = 𝐂𝐥𝐨𝐜𝐤 𝐑𝐚𝐭𝐞 Hence, ways to improve CPU performance: Reduce number of clock cycles needed for a program Reduce the clock cycle time, (or increase clock rate). 28 School of Example 1.1 – CPU Computing Science Performance ❖ A program runs in 10 seconds on computer A which has 2 GHz clock rate. The designer plans to build a computer B, run the same program in 6 seconds. A substantial increase in clock rate is possible, but will affect the rest of CPU design, causing computer B to require 1.2 times as many clock cycles as computer A. What clock rate is for computer B? Solution: ❑Number of clock cycles needed for the program on computer A: ▪ No. of CPU clock cycles on A = Execution time on A * Clock Rate on A = 10 seconds * 2 × 109 cycles per second = 20 × 109 clock cycles. ▪ No. of CPU clock cycles on B = 1.2 × No. of CPU clock cycles on A = 1.2 x 20 x 109 cycles = 24 x 109 cycles. No. of CPU clock cycles on B 𝟐𝟒×𝟏𝟎𝟗 𝐜𝐲𝐜𝐥𝐞𝐬 ▪ Clock Rate on B = = =4 𝐄𝐱𝐞𝐜𝐮𝐭𝐢𝐨𝐧 𝐭𝐢𝐦𝐞 𝐨𝐧 𝐁 𝟔 𝐬𝐞𝐜𝐨𝐧𝐝𝐬 29 x 109 cycles/second = 4 GHz School of Instruction Performance Computing Science Execution time also depends on No. of CPU instructions in a program, and clock cycles per instruction. Clock cycles per instruction (CPI): Average number of clock cycles per instruction for a program. Different instructions may need different number of clock cycles; CPI is average clock cycles of all instructions executed in a program. σ𝒏𝒊=𝟏(𝑪𝑷𝑰𝒊 × 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒐𝒖𝒏𝒕𝒊 ) 𝑪𝑷𝑰 = 𝑻𝒐𝒕𝒂𝒍 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒐𝒖𝒏𝒕 where CPIi and Instruction Counti are for each instruction class i. Performance formula relates CPI to CPU time as: No. of CPU clock cycles needed = Total No. of instructions × clock cycles per instruction (CPI). 30 School of Example 1.2 - Instruction Computing Science Performance ❖ Two computers with same instruction set architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for a program. Computer B has a clock cycle time of 500 ps and a CPI of 1.2 for same program. Which computer is faster for this program, by how much? (Hint: 1 ps = 1 / 1012 Hz). Solution: ❑ Each computer executes the same number of instructions for the program, first calculate No. of CPU clock cycles needed : ▪ CPU clock cycles on A = No. of instruction × CPI on A = No. of instruction × 2.0. ▪ CPU clock cycles on B = No. of instruction × CPI on B = No. of instruction × 1.2. ▪ Then compute execution time for each computer: ▪ Execution time on A = CPU clock cycles on A × Clock cycle time A = No. of instruction × 2.0 × 250 ps. ▪ Execution time on B = CPU clock cycles on B × Clock cycle time B = No. of instruction × 1.2 × 500 ps. ▪ Hence, computer A is faster for it, with less CPU execution time at: No. of instruction×500 ps 5 × 100%= × 100% = 83.33%. 31 No.of instruction × 600 ps 6 School of CPU Performance Computing Science Equations CPU performance determined by 3 key factors: Instruction count (No. of instructions executed by a program), CPI (Clock cycles per instruction), Clock cycle time (or Clock Rate). CPU performance equation: Execution time = Instruction count × CPI × Clock cycle time. Clock rate is inverse to clock cycle time: Instruction count × CPI Important Execution time =. equations! 𝐂𝐥𝐨𝐜𝐤 𝐑𝐚𝐭𝐞 Millions of instructions per second (MIPS): as a common measure of performance. Instruction count 1 Cl𝐨𝐜𝐤 𝐑𝐚𝐭𝐞 MIPS rate = = 𝟔 = 𝐄𝐱𝐞𝐜𝐮𝐭𝐢𝐨𝐧 𝐭𝐢𝐦𝐞 × 𝟏𝟎𝟔 𝐂𝐏𝐈 × 𝐂𝐥𝐨𝐜𝐤 𝐂𝐲𝐜𝐥𝐞 𝐭𝐢𝐦𝐞× 𝟏𝟎 𝐂𝐏𝐈× 𝟏𝟎𝟔 32 School of Units of Measurement Computing Science for CPU Performance Reliable measure of computer performance is time. Execution time = seconds/program = Instruction count Clock cycles seconds × ×. program Instruction Clock cycles Execution time = Instruction count × CPI × Clock cycle time 33 School of Example 1.3 - CPU Computing Science Performance A program needs execution of 2 million instructions on a 400 MHz CPU. The program consists of 4 major types of instructions. Instruction mix and CPI for each type are below, based on a program trace experiment. Calculate the MIPS rate. Solution: ❑ According to the table above, average CPI of all instructions is: ▪ CPI = 1 × 60% + 2 × 18% + 4 × 12% + 8 × 10% = 2.24. ❑ Clock rate is 400 MHz. Based on the MIPS rate equation: Instruction count 1 ▪ MIPS Rate = = = 𝐄𝐱𝐞𝐜𝐮𝐭𝐢𝐨𝐧 𝐭𝐢𝐦𝐞× 𝟏𝟎𝟔 𝐂𝐏𝐈 × 𝐂𝐥𝐨𝐜𝐤 𝐂𝐲𝐜𝐥𝐞 𝐭𝐢𝐦𝐞× 𝟏𝟎𝟔 Cl𝐨𝐜𝐤 𝐑𝐚𝐭𝐞 𝟒𝟎𝟎×𝟏𝟎𝟔 𝟔 = 𝟔 = 179. 34 𝐂𝐏𝐈× 𝟏𝟎 𝟐.𝟐𝟒×𝟏𝟎 School of Example 1.4 – Compare Computing Science Code Performance A program consists of 3 major classes of instructions. Hardware designers supplied following facts: Instruction Class A Instruction Class B Instruction Class C CPI 1 2 3 For a high-level language program, compiler designer plans two code sequences requiring the following instruction counts. Which code sequence executes more instructions? Which is faster? What is the CPI for each sequence? Code Instruction counts for each instruction class Sequence Class A Class B Class C X 2 1 2 Solution: Y 4 1 1 ❑ Total No. of instructions executed by Code Sequence X: 2 + 1 + 2 = 5. ❑ Total No. of instructions executed by Code Sequence Y: 4 + 1 + 1 = 6. ❑ CPU clock cycles = σ (instruction counti × CPIi of each inst𝐫𝐮𝐜𝐭𝐢𝐨𝐧) ❑ Total No. of CPU clock cycles by Code Sequence X: 2*1 + 1*2 + 2*3 = 10. ❑ Total No. of CPU clock cycles by Code Sequence Y: 4*1 + 1*2 + 1*3 = 9. σ𝒏 (𝑪𝑷𝑰𝒊 ×𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒐𝒖𝒏𝒕𝒊 ) 𝟏𝟎 ❑ CPI of Sequence X: 𝑪𝑷𝑰𝒙 = 𝒊=𝟏 = =𝟐 𝑻𝒐𝒕𝒂𝒍 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒐𝒖𝒏𝒕 𝟓 σ𝒏 (𝑪𝑷𝑰𝒊 ×𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒐𝒖𝒏𝒕𝒊 ) 𝟗 ❑ CPI of Sequence y: 𝑪𝑷𝑰𝒚 = 𝒊=𝟏 = = 𝟏. 𝟓 35 𝑻𝒐𝒕𝒂𝒍 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒐𝒖𝒏𝒕 𝟔 School of Understanding Program Computing Science Performance Component Affect How? Algorithm Instruction Determines No. of instructions executed. May affect count, CPI, by favoring slower or faster instructions. E.g., if CPI algorithm uses more divisions, tends to a higher CPI. Programming Instruction Affects instruction count, as statements in language language count, are translated to machine instructions. May affect CPI CPI due to its features; e.g., a language with heavy data abstraction (e.g., Java) requires indirect calls, uses higher CPI. Compiler Instruction Compiler efficiency affects both instruction count and count, CPI, as compiler translates source language CPI instructions into machine instructions. Instruction Instruction Affects all 3 aspects of CPU performance: instructions set count, needed for a program, clock cycles of each architecture clock rate, CPI instruction, and overall clock rate of CPU. 36 School of Computing Science Next Lecture Lecture 2: Computer Function and Cache Memory 37 Singapore CSC1104 - COMPUTER ORGANISATION & ARCHITECTURE LECTURE 2 : COMPUTER FUNCTION AND CACHE MEMORY Assoc. Prof. Cao Qi [email protected] School of Computing Science Acknowledgement Main contents of CSC1104 - Computer Organisation and Architecture are derived from: Computer organization and architecture, Designing for performance. Author: William Stallings. Publisher: Pearson. Acknowledgement to: Author and Publisher. Computer organization and Design, The hardware/software interface. Authors: D. Patterson and J. Hennessy. Publisher: Morgan Kaufmann. Acknowledgement to: Authors and Publisher. 2 School of Lecture Contents Computing Science Computer Function: Instruction Fetch and Execute Interrupts Cache Memory: Characteristics of Memory Systems Memory Hierarchy Cache Memory Principles 3 School of Computing Science Computer Function 4 School of Von-Neumann Computing Science Architecture (Recap) Von-Neumann architecture by John von Neumann. Data and instructions stored in the same memory. Memory contents are accessible by addresses. Instruction execution in a sequential way 5 School of CPU Components: Top- Computing Science Level View Memory address Memory content Memory address register (MAR): specifies address in memory for the next read or write. Memory buffer register (MBR): contains data read/write from/to memory. Program counter (PC): Instruction register (IR): Transfers data between external I/O 6 devices and Computer. School of Computer Functions Computing Science Execute a program, with a set of instructions stored in memory. An instruction cycle: tasks required to process a single instruction: Fetch cycle: processor reads instructions from memory one at a time. Execute cycle: current instruction is executed. 7 School of Computing Science Instruction Fetch and PC e.g. Address: 0x0300 AC e.g. 0x0301 Instruction: 0x0302 0x1234 Accumulator Processor fetches an instruction from memory to instruction register (IR). program counter (PC) register holds address of the next instruction to be fetched. PC is increased after each instruction fetch, pointing to the next instruction in sequence, unless special instructions received. Accumulator (AC): data register in CPU is for temporary storage. 8 School of Instruction Register (IR) Computing Science Fetched instruction is loaded into the IR. Instruction specifies action to be taken, with 4 action categories: Processor-memory: Data transferred in processor memory. Processor-I/O: Data transferred in processor I/O peripherals. Data processing: Processor performs certain operations on data. Control: Alters sequence of instruction execution. e.g., CPU fetches an instruction from memory 0x145, which is a control instruction to specify the next instruction being from memory 0x182. CPU will then set the program counter (PC) to 0x182. On the next fetch cycle, instruction fetched from memory 0x182, rather than 0x146 (i.e., 0x145 + 1 = 0x146). 9 School of Operators, operands, Computing Science operations Operators are expressions to do some operation (e.g., add, subtract, multiply, store). Operands are the values the operators do operations upon. 10 School of Example 2.1 - A Partial Computing Science Program Execution 4 bits 12 bits 16 bits Instruction code: 24 = 16 different opcodes, 212 = 4,096 memory addressed Both instructions and data are 16-bits long Fetch Cycle Execute Cycle +1 Example opcodes: 0x1 = Load AC opcode address address from memory 0x2 = Store AC to memory +1 0x5 = Add data from memory into AC opcode address +1 opcode address 11 School of Interrupts Computing Science Almost all computers provide a mechanism by other modules (I/O, memory, etc.) to interrupt normal processing of CPU. Interrupts as a way to improve processing efficiency. Most external devices are much slower than CPU. 12 School of Computing Science Classes of Interrupts Classes of Descriptions Interrupts Program Result of arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside a user’s allowed memory space. Timer Generated by timer within CPU. It allows operating system to perform certain functions on a regular basis. I/O Generated by I/O controller, to signal normal completion of an operation, request service from CPU, or to signal a variety of error conditions. Hardware Generated by a failure such as power failure or memory Failure parity error. 13 School of Program Flow without Computing Science and with Interrupts X = interrupt occurs during course of execution of user program 14 School of Computing Science Interrupt Cycle Receiving interrupt request, CPU checks which interrupt occurs. For a pending interrupt, interrupt handler is performed: CPU suspends execution of the current program; saves its context (program counter; data relevant to current activity). Sets program counter (PC) to the starting address of an interrupt handler. CPU resumes original execution after I/O interrupt is serviced. 15 School of Sequential Interrupt Computing Science Approach 1. Disable interrupts while an interrupt is being processed. when an interrupt occurs, CPU interrupts are disabled immediately. New interrupt requests won’t be responded. ❑Interrupts are handled in sequential order. ISR: Interrupt ❑After ISR completes, service routine CPU interrupts are enabled. Before resuming user program, CPU checks if any interrupts occur but yet to respond. 16 School of Nested Interrupt Computing Science Approach 2. Define priorities for interrupts. Allow higher priority interrupt to be serviced first. It can interrupt lower-priority interrupt service routine (ISR). ❑ Devices are assigned by different priorities of interrupt handler. Higher priority interrupt ISR, ❑ The higher the e.g., priority = 2 number, the higher Lower priority the interrupt priority. interrupt ISR, ❑ Interrupt from higher e.g., priority = 1 priority devices are responded first by CPU. 17 School of Example 2.2 – ISR of Computing Science Multiple Interrupts ❑ System with three I/O devices: a printer (2), a disk (4), and a communications line (5). priority = 5 priority = 2 priority = 4 18 School of Computing Science Memory Hierarchy and Cache Memory 19 School of Important Characteristics Computing Science of Memory 1. Capacity of memory: (bytes or words) 1 word = 2 bytes; 1 byte = 8 bits. 2. Performance of memory Access time (latency): time from an address is presented to the memory, to data stored or read out. Memory cycle time: access time + any additional time required before the next access can commence. Transfer rate: the rate at which data can be transferred into or out of a memory unit. ❑ For random-access memory: 1/(clock cycle time). 𝑛 ❑ For non-random-access memory: 𝑛𝑇 = 𝑇𝐴 + 𝑅 where Tn = Average time to read or write n bits. TA = Average access time. n = Number of bits to read/write. 20 R = Transfer rate, in bits per second (bps) School of Physical Characteristics Computing Science ▪ Volatile memory: data decayed naturally or lost when electrical power is off. ▪ Nonvolatile memory: no electrical power needed to retain data once recorded. ▪ Nonerasable memory: cannot be altered, unless destroying storage units, i.e., Read-only memory (ROM). ▪ Erasable memory: can be altered and erased from storage. ▪ Semiconductor memory: either volatile or nonvolatile. ▪ Magnetic-surface memories: nonvolatile 21 School of Memory Hierarchy Computing Science Design constraints on a computer’s memory: how large? how fast? how expensive? Trade-off: capacity, access time, cost. Access time ↓ cost per bit ↑ Capacity ↑ cost per bit ↓ Capacity ↑ access time ↑ Solution: not to rely on single memory components, but on a memory hierarchy. 22 School of Relative Cost, Size and Computing Science Speed Characteristics Smaller, more expensive, faster memories supplemented by larger, cheaper, slower memories. Primary: Volatile, semiconductor Cost per bit ↓ Capacity ↑ Access time ↑ Frequency of access by CPU ↓ Secondary: Non-volatile, external 23 School of Performance of Accesses 2 Computing Science Levels of Memory Two-level memory: M1 is smaller, faster, more expensive than M2. If data is in Level M1, CPU can access directly. If in Level M2, data is first transferred to Level M1, then accessed by CPU. M2 contain all program instructions and data. Most recently accessed instructions and data are M1. Cluster data in M1 need be swapped back to M2 regularly. Hit ratio H: probability of data is found in M1. T1: access time to M1 T2: access time to M2 Total access time per word = H ×T1 + (1 – H) ×(T1 + T2) 24 School of Example 2.3 – Accesses Computing Science 2 Levels of Memory ❖For a two levels of memory system, Level M1 contains 1000 bytes and has an access time of 0.01 μs; Level M2 contains 100,000 bytes with an access time of 0.1 μs. The hit ratio is 95%. Calculate the average time to access a word by the CPU? Solution: a) T1 = 0.01 μs. T2 = 0.1 μs. H = 0.95 b) Time = H ×T1 + (1 – H) ×(T1 + T2) = 0.95×0.01 μs + 0.05×(0.01 μs + 0.1 μs) = 0.0095 + 0.0055 = 0.015 μs. 25 School of Cache Memory Principles Computing Science When CPU attempts to read a word from memory address, checks if it is in cache. If so, delivered to CPU directly. If not, a data block of memory read into cache, word is delivered to CPU. Locality of reference: when a data block fetched into cache from memory, likely there will be future references to data in the same block. 26 School of Cache/Main Memory Computing Science Structure C blocks (lines ) Cache consists of 2n addressable blocks called words lines. 1 block = K words 1 line = K words + tag (+ control bits) M blocks Size of 1 cache line = size of 1 memory block = K words No. of blocks in memory: M = 2n/K C Access time of larger cache ↑ Chips and circuit boards area limits cache capacity. Cache performance is very sensitive to the nature of workload, impossible to get a single “optimum” cache capacity. 29 School of Mapping Function and Cache Computing Science Access Methods No. of cache lines

Use Quizgecko on...
Browser
Browser