EE3220 System-on-Chip Design Lecture Notes 2 PDF
Document Details
Uploaded by ConvenientLagrange5845
City University of Hong Kong
Tags
Related
- Introduction to Computer Science Chapter 5 PDF
- Catholic Schools in Ifugao Computer 9 Second Quarter Learning Package 2022-2023 PDF
- ICS 2601 Introduction to Computing MODULE 5 PDF
- Topic 2.0 – Microprocessor Architecture PDF
- Structura sistemului cu microprocesor PDF
- Sytemes Logiques et Architecture Des Ordinateurs PDF
Summary
These lecture notes provide an introduction to modern microprocessors and cover key concepts in computer architecture. The document includes questions on topics like logic design, and system on chip designs.
Full Transcript
EE3220 System-on-Chip Design Lecture Note 2 Introduction to Modern Microprocessor 1 2 Lecture 1 Recap § Course Overview § Review Background § Preview Design Boards and Tools § Intro to Computer System This is simply an introduction. We will § Intro...
EE3220 System-on-Chip Design Lecture Note 2 Introduction to Modern Microprocessor 1 2 Lecture 1 Recap § Course Overview § Review Background § Preview Design Boards and Tools § Intro to Computer System This is simply an introduction. We will § Intro to Chip Design examine the details of “Embedded § Intro to Embedded System System” and “ARM Architecture” in § Intro to ARM Architecture subsequent lectures 3 4 Questions on Acronyms in Lecture 1 § What do these acronyms stand for? What do they mean? § CPU, MCU, SOC, ASIC, IC, ASSP § RAM, ROM, ADC, DAC, DMA § USB, LCD, UART, RF, IO § RTOS, HDL § KMAP, SOP, POS, HA, FA § IOT, IOE § HLS, FPGA, PLD, CPLD § SISD, SIMD, MISD, MIMD § CMOS, BJT, ECL § LSB, MSB 5 Questions on Logic Design § How to determine if a two-complement number is negative? § Check the LSB. (T / F) § Check the MSB. (T / F) 6 Questions on Logic Design § A 64-bit adder may be implemented with or without pipeline. Check if the following statements are correct. § A pipelined adder can provide higher bandwidth. (T / F) § A pipelined adder can provide lower latency. (T / F) § A pipelined adder may consume more power. (T / F) § A pipelined adder may run at a higher clock rate. (T / F) § A pipelined adder may consume more area. (T / F) 7 Questions on Logic Design § What are the key Verilog / VHDL features absent in the C programming language? § Floating Point Representation (T / F) § Different Signal Levels (T / F) § Analog Representation (T / F) § Reactivity (T / F) § Concurrency (T / F) § Timing Control (T / F) § Object Oriented Design (T / F) § Clocking (T / F) 8 Questions on SOC § Which of the following are essential components in a SOC design? § CPU (T / F) § RAM (T / F) § ROM (T / F) § SSD / HDD (T / F) § Internal Bus (T / F) § USB Port (T / F) § Ethernet Port (T / F) § Wi-Fi Port (T / F) 9 Questions on Processor Architecture § Most low-cost IoT chips are based on which architectural type? § SISD (T / F) § SIMD (T / F) § MISD (T / F) § MIMD (T / F) 10 Questions on Processor Architecture § x86 chips are based on which architectural type? § SISD (T / F) § SIMD (T / F) § MISD (T / F) § MIMD (T / F) 11 Questions on Processor Architecture § GPU chips are based on which architectural type? § SISD (T / F) § SIMD (T / F) § MISD (T / F) § MIMD (T / F) 12 Questions on Processor Architecture § Which of the following statements regarding Cache are correct. § Cache is used to improve Processor Performance. (T / F) § Cache is used to reduce Processor Power. (T / F) § L1 Cache is larger than L2 Cache. (T / F) § L1 Cache is faster than L2 Cache. (T / F) § Cache and Register File are both CPU Memory (T / F) § Cache design is like Register File design. (T / F) 13 Questions on SOC Design § What are the key purposes of functional verification? § Check for any performance issues. (T / F) § Check for any set-up time issues. (T / F) § Check for any hold time issues. (T / F) § Check for any hardware logic issues. (T / F) § Check for any firmware logic issues. (T / F) § Check for any reliability issues. (T / F) § Check for any cost issues. (T / F) § Check for any production issues. (T / F) 14 Questions on SOC Design § What are the key advantages of using an FPGA versus an ASIC in an embedded system? § Reduce Unit Cost? (T / F) § Reduce Development Cost? (T / F) § Reduce Development Time? (T / F) § Reduce Form Factor? (T / F) § Improve Performance? (T / F) § Improve Reliability? (T / F) § Reduce Power Consumption? (T / F) 15 Questions on SOC Design § Using the negative binomial yield model with the defect density = 1 defect per 100mm2. Check if the following statements are correct. § Yield can be improved with smaller die size. (T / F) § Yield can be improved with lower defect density. (T / F) § Yield = 0 if die size > 100mm2. (T / F) § Yield = ~0.42 if die size = 100mm2. (T / F) § Yield is lower in 3nm process than in 5nm process. (T / F) § Yield can affect the total costs. (T / F) 16 Questions on SOC Design § Which of the following statements regarding SOC performance are correct. § Bandwidth (MB/s) is a key performance metric. (T / F) § Latency (us) is a key performance metric. (T / F) § Bandwidth is more important than latency. (T / F) § Higher performance usually means higher power. (T / F) § Higher performance usually means larger chip size. (T / F) § Faster clock usually means higher performance. (T / F) 17 18 Lecture 2 Outline § Introduction to the First Microprocessor § Concept of Computation § Microcontroller vs. Microprocessor § ARM Chip & Die § Stored Program Concept § RISC Philosophy § Instruction Set Architecture § Pipelined Instruction Execution § Compilation and Assembling § ARM Processor Family 19 The First Microprocessor § In 1971 Intel introduces the first commercially available microprocessor – the 4004 § 4-bit data bus, 45 instructions § Required many support chips to build a functioning system § 2,300 transistors on one IC Since 1970s, we have been observing great developments in both computer architecture and integrated circuit fabrication. Microprocessors have become MORE POWERFUL and CHEAPER. 20 Moore’s Law The number of transistors on a chip will roughly DOUBLE every TWO YEARS. § Gordon Moore’s prediction from 1965! § He had been right for more than 50 years… § WHY? § How often do you change your phone or computer? Founders of Intel: Andy Grove, Robert Noyce and Gordon Moore 21 Moore’s Law The performance of μP doubles every 18 months. The price of the same μP cuts by half every 18 months. 22 Classic: Intel 8086 Processor & Chip § Launched in 1978, 10MHz, 16- bits, 3um. § 16 general registers § You can learn more from EE computer architecture course. 23 Design Abstraction Levels SYSTEM MODULE + GATE CIRCUIT DEVICE G S D n+ n+ 24 x86 vs. ARM § Mobile computing market is continuously growing. 25 Mobile vs PC Processors 26 Technology Comparison: Programmability vs Performance 27 What is Computer? § A Computer is a Machine that can perform simple calculations § A Computer can also process Algorithms where it... § Performs a sequence of calculations; § Makes decisions based on the results of calculations; and § Repeats the sequence if wanted. 28 Solving different TYPES of Math problems 1. 1+2+3+4+5=? 2. 16 x 8 + 29 = ? 3. A circle has a radius of 5 cm, what is its area? 29 Concept of Computation Let's start with the way we handle computations... brain Paper/Questions instructions Control data Execution Calculator 30 Concept of Computation (Cont') How about machine computations? Central Processing Unit (CPU) Program instructions Main Memory Control Arithmetic data Unit Is there anything missing? 31 The state in a stored-program digital computer A processor is a finite-state automation that executes instructions held in a memory. FF..FF16 Keep its instructions and data in the same memory system instructions registers address data processor instructions memory and data 00..00 16 32 Concept of Computation (Cont') I/O provides extra data and allow interactions. Central Input/Output Processing Unit (CPU) Program instructions Main Memory Control Arithmetic data Unit Do we have a simple model to describe a computer? 33 Von Neumann Architecture Control CPU Memory Peripheral I/O Data Address Component Functions Central Processing Unit (CPU) controls the system and performs calculations Main Memory stores both programs and data Peripheral Input/Output (I/O) allows data to be input to system allows results to be output from system 34 Harvard Architecture § It is a Computer Architecture with ALU separate storage and signal pathways for instructions and data. § Modern Processors are mostly von Neumann Instruction Control Data machines, with the program code stored in Memory Unit Memory the same main memory as the data. § Main memory, Cache, TLB, Storage. § Speed and Performance. I/O § Advanced topics: Read / Write Hazards. 35 Microcontroller vs Microprocessor § Both have a CPU core to execute instructions § Microcontroller has peripherals for concurrent embedded interfacing and control § Analog § Non-logic Level Signals § Timing § Events § Clock generators § Communications § Reliability and safety 36 Microprocessor vs Microcontroller microprocessor External Control Control CPU Memory Peripheral Input I/O Output Data External Data Address External Address microcontroller A MICROPROCESSOR mainly refers to the CPU with some memories. A MICROCONTROLLER UNIT (MCU) is a MICROPROCESSOR integrated with both memory and I/O. It is a general-purpose device that is designed to fetch data, perform limited calculations and control the environment. 37 Stored Program Concept What is inside my computer program? 38 Stored Program Concept § CPU executes Instructions stored in Memory. § Known as “Stored Program” Concept § Recall there are 2 two kinds of memory to store programs and data. § RAM – Random Access Memory § Can be used for both Program and Data § ROM – Read-Only Memory § Can be used for fixed Program & Constant Data § Different computers have different ways in arranging Memory (Computer Architecture) 39 Fetch-Decode-Execute Cycle § The CPU is a finite state machine (FSM) which runs the programs stored in the memory by the user. § It repeatedly performs three operations: § Fetch – retrieve an instruction from memory § Decode – interpret the instruction § Execute – control appropriate hardware to carry out the instruction CPU Fetch Decode Execute 40 ARM Die Photo § ARM1 - Microarchitectures – Acorn § ARM1 was the first ARM microarchitecture designed and realized by Acorn Computers for the BBC Computer Literacy Project. § ARM1 was introduced in 1985 and was extended to be used as a coprocessor in the Acorn's BBC Micro microcomputers. 41 ARM1 Block Diagram § Scalar, Pipelined Processor § 3-pipeline § Fetch, Decode, Execute § Instruction Set Architecture (ISA): ARMv1 42 What is inside a Program? § We had written "computer programs" in C or Java. § These were compiled and executed. § If we do not forget a μP/μC is a digital electric circuit (hardware - HW), then the program must be stored as strings of '0' or '1' bits in the memory. § Assembly programming is considered the closest and the lowest level of programming to the HW. § We write instructions that the machine readily understands and executes. 43 Instructions: Computer’s Language To command computer hardware, you must speak its language. § The words of a computer's language are called instructions, and its vocabulary is called an instruction set. § It is interesting that there are different dialects of computer languages. It is easy to pick up others if you once learn it. § The functionalities of a computer are revealed with its instruction set. 44 Instructions: Operations Every computer must be able to perform simple arithmetic: ADD r3, r1, r2 ;r3 = r1 + r2 Instruct the computer to add two variables r1 and r2 and to put their sum back in r3. The words to the right of the semi-colon (;) are comments for the human reader. Q: Why is an instruction usually simple? A: Simplicity favours regularity and flexibility and is better for general-purpose applications. 45 Instructions: Operands The operands of instructions are restricted. They must be from a limited number of special locations in HW called registers. ADD r3, r1, r2 ;r3 = r1 + r2 In this example, all r1, r2 and r3 are registers. In writing instructions, we often need to assign variables to registers.You should note that some registers are dedicated for a special purpose, e.g. r15 is the program counter. Q: Why is the number of registers small and limited? A: Smaller is faster. A very large number of registers may increase the clock cycle time. 46 RISC Philosophy § RISC, or Reduced Instruction Set Computer: § A type of microprocessor architecture that utilizes a small, highly-optimized set of instructions, rather than a more complex set of instructions. § History: § The first RISC projects came from IBM, Stanford, and UC-Berkeley in the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy which has become known as RISC. 47 RISC Philosophy § MIPS Technologies, Inc.: § Founded in 1984 upon the Stanford research from which the first MIPS chip resulted. Started with MIPS R2000 processor (1986). § MIPS: Microprocessor without Interlocked Pipelined Stages. § In a pipelined processor, there must exist a mechanism to enforce dependencies between instructions. Data dependency occurs when an instruction depends on the results of a previous instruction. § In early processors, pipelined interlocks were implemented by software. § Pipelining is a standard feature in RISC processors. 48 Pipelined Instruction Execution Instruction # 1 fetch decode reg ALU mem writeback 2 fetch decode reg ALU mem writeback 3 fetch decode reg ALU mem writeback 4 fetch decode reg ALU mem writeback TIME Check for data dependencies (data hazards): RAR (Read-After-Read), RAW (Read-After-Write), WAR (Write-After-Read), WAW (Write-After-Write) 49 RISC vs. CISC § Brief Comparison: 50 RISC vs. CISC § Which is better? § Intense debate in the 1980s and 1990s. § RISC can support faster CLOCK SPEED. § RISC can support more COMPILER OPTIMIZATION. § CISC can provide better CODE DENSITY. § Hardware can also translate CISC-style instructions into RISC-style instructions. § Performance depends on: § Instruction numbers per Task. § Instruction executed per Clock. § Clock Speed. 51 MIPS ISA § MIPS instruction set architecture (ISA): § Contains a set of simple arithmetic, logical, memory-access, branch, and jump instructions. § Emphasizes simplicity and excludes instructions that might take longer than the most common instructions. § There are 32 general-purpose registers in the MIPS architecture. § Each register is 32 bits wide. Registers are often referred to as $0, $1, $2, …, and $31. 52 52 Instruction Set Architecture (ISA) 4 bits 12 bits opcode S Instruction Opcode Effect LDA S 0000 ACC := mem16[S] STO S 0001 mem16[S] := ACC ADD S 0010 ACC := ACC + mem16[S] SUB S 0011 ACC := ACC - mem16[S] JMP S 0100 PC := S JGE S 0101 if ACC >= 0 PC := S JNE S 0110 if ACC !=0 PC := S STP 0111 stop 53 Different types of Instruction Format f bits n bits n bits n bits n bits function op 1 addr. op 2 addr. dest addr. next i addr. 4-address instruction format f bits n bits n bits n bits function op 1 addr. op 2 addr. dest addr. 3-address instruction format f bits n bits n bits function op 1 addr. dest addr. 2-address instruction format f bits n bits function op 1 addr. 1-address instruction format f bits function 0-address instruction format 54 MIPS ISA 55 Typical Dynamic Instruction Usage Instruction type Dynamic usage Data movement 43% Control flow 23% Arithmetic operations 15% Comparisons 13% Logical operations 5% Other 1% 56 From Program to Instructions Machine From Program to Assembly Code Instructions Instructions ARM ARM mov r1, #3 03 10 a0 e3 Your Program mov r2, #4 04 20 a0 e3 add r3, r1, r2 02 30 81 e0 int a, b, c; a = 3; translate translate Intel Intel b = 4; c = a + b; mov eax, 3 b8 03 00 00 00 mov ebx, 4 bb 04 00 00 00 add eax, ebx 01 d8 CPU can only understand mov ecx, eax 89 c1 machine instructions !! Compilers do this job for you Different CPU Microarchitecture Different Set of Instructions 57 Example: A Typical Datapath An instruction triggers the movement of data of each component in a datapath. The question is how optimized the datapath is? address bus control PC IR memory ALU ACC data bus 58 59 SOC DESIGN HARDWARE DESIGN FIRMWARE DESIGN 60 A 3-Stage CPU A Simple 3-Stage “ARM” CPU Three Stage CPU Instruction Memory 1. Fetch - fetch an instr. from mem 2. Decode - interpret the instr. properly mov r1, #4 3. Execute - execute the instr. mov r2, #3 add r3, r1, r2 r1 + r2 - x r3 ÷ Fetch Decode Execute 61 A 3-Stage CPU A Simple 3-Stage “ARM” CPU Program Instructions are Instruction Memory loaded into instr mem mov r1, #4 before program mov r2, #3 execution add r3, r1, r2 r1 + r2 - x r3 ÷ Fetch Decode Execute 62 A 3-Stage CPU A Simple 3-Stage “ARM” CPU Instruction Memory Program counter points the mov r1, #4 Set R1 to 4 first instr. mov r2, #3 add r3, r1, r2 r1 4 + r2 - x r3 ÷ Fetch Decode Execute 63 A 3-Stage CPU A Simple 3-Stage “ARM” CPU Instruction Memory Similar to the first instr. Write 3 to r2 mov r1, #4 mov r2, #3 Set R2 to 3 add r3, r1, r2 r1 4 + r2 3 - x r3 ÷ Fetch Decode Execute 64 Processor Micro-Architecture A Simple 3-Stage “ARM” CPU Instruction Memory Write the result to r3 mov r1, #4 mov r2, #3 add r3, r1, r2 R3