Summary

This document contains course materials related to microprocessors, microcontrollers, and Pentium architecture, likely for an undergraduate-level computer engineering course at MIT-WPU. It includes unit information, syllabus content, learning resources, and course objectives.

Full Transcript

Microprocessor, Microcontroller and Applications Course Code: CSE2PM02A Credits: 3 TH Course Structure - SY 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 2 Syllabus Uni...

Microprocessor, Microcontroller and Applications Course Code: CSE2PM02A Credits: 3 TH Course Structure - SY 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 2 Syllabus Unit 1: Pentium Architecture: Evolution of Microprocessors: 8086 to Pentium, Pentium features, Pentium superscalar architecture – Pipelining, Branch prediction, Instruction and Data caches. The Floating-Point Unit: features, pipeline stages and data types, Pentium programmer model, Register set, System registers, addressing modes, Instruction set. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 3 8/28/2024 Learning Resources: Text Books: 1. James Antonakos, “The Pentium Microprocessor”, 2004, Pearson Education ISBN – 81-7808- 545 2. The 8051 microcontroller: architecture, programming, and applications, Kenneth J. Ayala. p. cm. Includes index. ISBN 0-314- 77278-2 (soft) Reference Books: 1. Mazidi Ali Muhammad, Mazidi Gillispie Janice, and McKinlay Rolin D., “The 8051 Microcontroller and Embedded Systems using Assembly and C”, Pearson, 2nd Edition, 2006 2. Barry B. Brey, “The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium, PentiumPro Processor, PentiumII, PentiumIII, Pentium IV, Architecture, Programming & Interfacing”, Eighth Edition, Pearson Prentice Hall, 2009 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 4 8/28/2024 Learning Resources: MOOCs: https://nptel.ac.in/courses/108/105/108105102/ Web Resource 1. https://www.intel.com/content/dam/www/public/us/en/documents/manual s/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf 2. https://www.cs.cmu.edu/~410/doc/intel-isr.pdf Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 5 8/28/2024 Course Objectives: By participating in and understanding all facets of this Course a student will be able: 1.To learn the architecture and programming of Pentium Microprocessor 2.To understand the operating modes and memory management mechanism of Pentium Processor. 3.To provide insight to protection and multitasking environment of Pentium Processor. 4.To understand the internal architecture of 8051 Microcontrollers. 5.To learn and implement the architectural Features of 8051 Microcontroller. Unit 1 Microprocessor, Microcontroller Applications_ 6 8/28/2024 CSE2PM02A 2024-25 Sem 3 Course Outcome: After completion of the course the students will be able to: - 1.Describe the Pentium features, system architecture thoroughly and develop 80x86 Assembly language programs for various applications. 2.Illustrate the working of memory management unit in protected mode of Pentium. 3.Interpret the mechanism of Protection and Task Management in Pentium. 4.Demonstrate the knowledge of the 8051-microcontroller instruction set and addressing Modes. 5.Interpret the features of the 8051 Microcontroller for various applications. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 7 8/28/2024 Unit 1: Pentium Architecture ❑ Evolution of Microprocessors: 8086 to Pentium, ❑ Pentium features, ❑ Pentium superscalar architecture – Pipelining, ❑ Branch prediction, ❑ Instruction and Data caches ❑ The Floating-Point Unit: features, ❑ pipeline stages and data types, ❑ Pentium programmer model, ❑ Register set, System registers, addressing modes, Instruction set ⮚https://www.javatpoint.com/microprocessor-vs-microcontroller ⮚https://www.javatpoint.com/8086-microprocessor Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 8 8/28/2024 Microprocessor Microcontroller Swiss knife knife Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 9 8/28/2024 Evolution of Microprocessors: 8086 to Pentium Microprocessor is a semiconductor device consisting of electronic logic circuits manufactured by techniques such as Large Scale Integration (LSI) or Very Large Scale Integration. It is capable of performing computing functions and making decisions to change the sequence of program execution. Transistor was invented in 1948 (23 December 1947 in Bell lab). IC was invented in 1958 (Fair Child Semiconductors) By Texas Instruments J Kilby. The first microprocessor was invented by INTEL (Integrated Electronics). Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 10 8/28/2024 Evolution of Microprocessors: 8086 to Pentium 4-bit Microprocessors The first microprocessor was introduced in 1971 by Intel Corp. It was named Intel 4004 as it was a 4 bit processor. It was a processor on a single chip. It could perform simple arithmetic and logic operations such as addition, subtraction, Boolean AND and Boolean OR. 8-bit Microprocessors The first 8 bit microprocessor which could perform arithmetic and logic operations on 8 bit words was introduced in 1973 again by Intel. This was Intel 8008 and was later followed by an improved version, Intel 8080 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 11 8/28/2024 Evolution of Microprocessors: 8086 to Pentium 16-bit Microprocessors The 8-bit processors were followed by 16 bit processors. They are Intel 8086 and 80286. 32-bit Microprocessors The 32 bit microprocessors were introduced by several companies but the most popular one is Intel 80386. Pentium Series Instead of 80586, Intel came out with a new processor namely Pentium processor. Its performance is closer to RISC performance. Pentium was followed by Pentium Pro CPU. Pentium Pro allows allow multiple CPUs in a single system in order to achieve multiprocessing. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 12 8/28/2024 Evolution of Microprocessors: 8086 to Pentium Features of 8086: The most prominent features of a 8086 microprocessor are as follows. It has a 20 bit address bus can access upto 220 memory location. Support upto 64 KB I/O Ports, Word size is 16 bits (data bus). 40 pin dual in line package. Address ranges from 00000H to FFFFFH. Memory is addressable- every byte has a separate address. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 13 8/28/2024 Evolution of Microprocessors: 8086 to Pentium It has an instruction queue, which is capable of storing six instruction bytes from the memory resulting in faster processing. It was the first 16-bit Processor having 16-bit ALU, 16-bit registers, internal data bus, and 16-bit external data bus resulting in faster processing. It uses two stages of pipelining, i.e. Fetch Stage and Execute Stage, which improves performance Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 14 8/28/2024 Extra Information 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 15 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 16 Sem 3 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 17 Sem 3 Pentium Processor Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 18 8/28/2024 2024-25 Sem 3 Typical specifications – Only for information https://cpu-z.en.uptodown.com/windows/download https://www.youtube.com/watch?v=W9Pq32f2rsQ 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 19 Typical specifications – Only for information Double Data Rate (DDR) Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 8/28/2024 20 2024-25 Sem 3 Components of Pentium Architecture Unit 1 Microprocessor, Microcontroller 8/28/2024 21 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Architecture Components of Pentium Architecture Unit 1 Microprocessor, Microcontroller 8/28/2024 22 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Processor Features The Features of Pentium Processor are: 32bit registers to hold data. 32 Bit Address Bus and 64 bit data Bus (with Burstable feature) Able to be burst. Having the ability to exceed the normal maximum bandwidth for short periods Use/role of the above Feature in Pentium Performance ⮚ Address bus decides the memory addressing capacity of processor ⮚ Address lines are 32 so memory can be addressed = 2^32 = 4GB ⮚ Data bus governs the data handling/processing capacity i.e. maximum size of data can be operated on. Size of data and address bus should be as large as possible to improve system performance Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024- 8/28/2024 23 25 Sem 3 Pentium Processor Features The Features of Pentium Processor are: Superscalar – 2 execution pipelines U and V Use/role of the above Feature in Pentium Performance ⮚ Any architecture which supports parallel computing through massive pipelining is called as Superscalar Architecture. ⮚ Pipeline implements parallel/overlapped operations in the system and uses all the resources (Buses, ALU, Decoding unit etc.) to the optimum level. ⮚ System throughput is improved. Throughput is proportional to frequency in ideal case.(How much data can be transferred from one location to another in a given amount of time. ) Ex. Pipelined Vs Non pipelined operation https://www.youtube.com/watch?v=AsthZgIS2Lw&list=PLm_MSClsnwm8Dw5nmh8A5E7gO06j1TeIa Unit 1 Microprocessor, Microcontroller 8/28/2024 24 Applications_ CSE2PM02A 2024-25 Sem 3 Speed up = [nkt] / [kt+ (n-1)t] = nk / [k + n-1] = k / [ 1 + (k-1)/n ] Speed up ~ k (approx.) as n >> k n = No. of Operations K = No. of Pipeline stages t = Time required at each pipeline stage Unit 1 Microprocessor, Microcontroller 8/28/2024 25 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Processor features The Features of Pentium Processor are: Separate Code and Data Caches. 8KB 2-way set associative code cache + TLB (Translation Lookaside buffer) 8KB 2-way, dual access data cache + TLB Use/role of the above Feature in Pentium Performance ⮚ Separate code and data cache balances load on buses so operations are faster. ⮚ Memory Management is improved by insertion of TLB – Translation Lookaside buffer between processor and cache. ⮚ Cache organizing Techniques: ⮚ (i): Direct Mapping (ii) Fully Associative (iii) Set Associative Unit 1 Microprocessor, Microcontroller Applications_ 8/28/2024 26 CSE2PM02A 2024-25 Sem 3 Only for Understanding Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 27 Sem 3 Only for understanding 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 28 Only for understanding 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 29 Only for understanding 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 30 Pentium Processor features ⮚ 2 Prefetch buffers ⮚ BTB-Branch Target Buffer to support Branch prediction logic. Use/role of the above Feature in Pentium Performance Dynamic Branch Prediction: ⮚Branch prediction logic is implemented using Prefetch Buffer and Branch Target Buffer (BTB). ⮚Branch prediction logic reduces branch penalty (in no. of cycles required to flush the pipeline and reload the pipeline with target instructions) ⮚Branch prediction can be Static or Dynamic in nature. ⮚Detailed explanation in later subsection of the unit. 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 31 Pentium features ⮚ Pipelined Floating-Point Unit (FPU) ⮚ Improved Instruction Execution Time Use/role of the above Feature in Pentium Performance ⮚ Separate pipeline to process floating point numbers. ⮚ Availability of U and V pipes improves system performance by reducing instruction execution time. 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 32 Pentium Features ⮚ Write back MESI Data Cache 4=2=00,01,10,11 (Modified/Exclusive/Shared/Invalid) Protocol in the Use/role of the above Feature in Pentium Performance Cache Updation Policies------ 1. Write Back:Writing is done only to the cache. 2. Write Through: Write is done synchronously both to the cache and to the backing store. Unit 1 Microprocessor, Microcontroller 8/28/2024 33 Applications_ CSE2PM02A 2024-25 Sem 3 Only for understanding 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 34 Only for understanding Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 35 Sem 3 Only for understanding Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 36 Sem 3 Only for understanding Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 37 Sem 3 Pentium features System Management Mode: ⮚ Provides Power Management mechanisms Execution Tracing: ⮚ The Pentium 4's execution trace cache stores micro- operations (Execution of an instruction (the instruction cycle) has a number of smaller units — Fetch, indirect, execute, interrupt, etc. Each part of the cycle has a number of smaller steps called micro-operations) resulting from decoding x86 instructions, providing also the functionality of a micro-operation cache. ⮚ Having this, the next time an instruction is needed, it does not have to be decoded into micro-ops again. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024- 8/28/2024 38 25 Sem 3 Only for understanding Unit 1 Microprocessor, Microcontroller 8/28/2024 39 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium features Address Parity (from the Latin parity, meaning equal or equivalent) AP: This signal can also be used along with APCHK# and EADS# for snooping cycles. APCHK#(Output): When the Pentium processor has detected address parity errors on an inquire or snooping cycle, the Address Parity Check (APCHK#) status signal is asserted 2 clock cycles after EADS#. 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 40 Pentium features Virtual Mode Extension----- ⮚ The virtual mode extensions are very useful to memory managers and multitasking operating systems. ⮚ Memory managers can primarily benefit by the use of the interrupt redirection bit map to reduce the number of switches to and from protected mode. https://www.youtube.com/watch?v=o2_iCzS9-ZQ Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 8/28/2024 41 2024-25 Sem 3 https://www.youtube.com/watch?v=o2_iCzS9-ZQ Only for understanding Unit 1 Microprocessor, Microcontroller 8/28/2024 42 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Features Internal Parity Checking: ⮚Parity checking for data integrity is done very rigorously in Pentium. ⮚Pentium checks parity on both the external processor pins, and the internal data structures. ⮚ Cache, buffers, and microcode ROM are all parity checked. ⮚Additionally, the Pentium supports Functional Redundancy Checking (FRC) [error detection CRC- Cyclic Redundancy Check(CRC) for Error Detection and Correction | Computer Networks] Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 43 Sem 3 Pentium features Bus cycle pipeline: ⮚ The Pentium is “superscalar pipelined architecture.” ⮚ Superscalar means that the CPU can execute two (or more) instructions per cycle. ⮚ To be more precise: The Pentium can generate the results of two instructions in a single clock cycle. ⮚ The 80486 and Pentium have five-stage pipelines. Unit 1 Microprocessor, Microcontroller 8/28/2024 44 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Super Scalar Architecture - Details Unit 1 Microprocessor, Microcontroller 8/28/2024 45 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Architecture Detailed view Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 46 8/28/2024 On Chip Caches On Chip two integrated cache – 8Kbyte Data and 8 Kbyte Code Cache Data Cache Supports MESI (Modify/Exclusive/Shared/Invalid) write back cache consistency Protocol. Data cache is configurable as Write Through or Write Back on a line-by-line basis. Code cache is write protected, supports Shared and Invalid stages of MESI Protocol Cache is organized as a 2-way set associative cache. Replacement in both the data and code caches is handled by the LRU (Least Recently Used) mechanism. The data cache can be accessed simultaneously from both pipes, as long as the references are to different cache banks. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 8/28/2024 47 3 On Chip Caches Data cache supports the MESI ((Modify/Exclusive/Shared/Invalid) write back cache consistency protocol which requires 2 state bits. The code cache supports the S and I state only and therefore requires only one state bit. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 8/28/2024 48 Sem 3 TLB - Translation Look Aside Buffer ⮚ Each of the caches are accessed with physical addresses and each cache has its own TLB ⮚ TLB translates linear address to physical address as required for paging technique ⮚ Number of TLB entries = 256 Unit 1 Microprocessor, Microcontroller Applications_ 8/28/2024 49 CSE2PM02A 2024-25 Sem 3 TLB - Translation Look Aside Buffer (To be discussed in Unit 4) A TLB is organized as a fullyassociative cache and typically holds 16 to 512 entries. Each TLB entry holds a virtual page number and its corresponding physical page number. In general, the processor can keep the last several page table entries in a small cache called a translation lookaside buffer (TLB). The processor “looks aside” to find the translation in the TLB before having to access the page table in physical memory. In real programs, the vast majority of accesses hit in the TLB, avoiding the time-consuming page table reads from physical memory. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 8/28/2024 50 3 Pre-Fetch Buffer ⮚ To support Branch Prediction Pentium implements Two Pre-fetch Buffers. ⮚ One to pre-fetch code in a linear fashion. ⮚ Other to pre-fetches code according to the Branch Target Buffer (BTB) so the needed code is almost always pre-fetched before it is needed for execution. ⮚ Two independent pairs of line-size (32-byte) pre- fetch buffers operate in conjunction with the branch target buffer. ⮚ Only one pre-fetch buffer actively requests pre- fetches at any given time. ⮚ Pre-fetches are requested sequentially until a branch instruction is fetched. Unit 1 Microprocessor, Microcontroller 8/28/2024 51 Applications_ CSE2PM02A 2024-25 Sem 3 Role of Pre-fetch Buffer in Branch Prediction ⮚ When a branch instruction is Example code: Branching instruction: JNZ fetched, branch target buffer Execution sequence: I0, I1, I2 and I3 (BTB) predicts whether the Decision at I3: branch will be taken or not. If condition is true (i.e. ZF = 0) then after I3; I1 and I2 will be executed. So pipeline must have instructions prefetched as I0 I1 I2 I3 I1 I2 I3 I1 I2 I3 and so on instead ⮚ If predicted not taken, pre- of I0 I1 I2 I3 I4, I5 and I6 fetch requests continue linearly. If condition is false (i.e. ZF = 1) then after I3;I4, I5 and I6 will be executed. So pipeline must have instructions prefetched as I0 I1 I2 I3 I4, I5 and I6 ⮚ On a predicted taken branch the other pre-fetch buffer is I0 MOV SI, addr enabled and begins to pre-fetch Bk : I1 INC SI as though the branch was I2 DEC CX taken. I3 JNZ BK I4 ADD AX,BX ⮚ If a branch is discovered mis- I5 INC AX predicted, the instruction I6 MOV [SI] , AX pipelines are flushed and prefetching activity starts over. 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 52 3 Control Unit ⮚ It interprets the instruction word and microcode entry point fed to it by Instruction Decode Unit. ⮚ It handles Exceptions, Breakpoints and Interrupts. ⮚ It controls the integer pipelines and floating point sequences. ⮚ Microcode ROM: Stores microcode sequences. Unit 1 Microprocessor, Microcontroller 8/28/2024 53 Applications_ CSE2PM02A 2024-25 Sem 3 Pentium Superscalar Architecture with details of Pipeline Stages https://www.youtube.com/watch?v=x2yo2aDMHfA 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 54 Pentium Super-Scalar architecture Pentium is capable of executing 2 Integer or 2 Floating Point instructions simultaneously. This parallel execution is done through two instruction pipelines, the “u” pipe and the “v” pipe. The u-pipe can execute all integer and floating-point instructions. The v-pipe can execute simple integer instructions and the FXCH floating- point instruction. Processors capable of parallel instruction execution of multiple instructions are known as Superscalar machines. 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 3 55 U and V Pipelines U pipeline: Capable of Simple Instructions handling full instruction set. - are hardwired and do not The u-pipe can execute all need microcode, execute in integer and floating-point one clock cycle (Few are listed instructions. below) V pipeline :Only simple mov reg, reg/mem/imm instructions. The v-pipe can mov mem, reg/imm execute simple integer inc reg/mem instructions and the FXCH floating-point instruction. dec reg/mem Unit 1 Microprocessor, Microcontroller 8/28/2024 56 Applications_ CSE2PM02A 2024-25 Sem 3 Pipelining Pentium Processor Integer Pipeline with 5 Stages. 1.Pre fetch (PF): 2.Decode1: 3.Decode2: 4.Execute (EX): 5.Write Back (WB): Unit 1 Microprocessor, Microcontroller 8/28/2024 57 Applications_ CSE2PM02A 2024-25 Sem 3 Pipelining stages Details Stage 1 Pre fetch (PF) Instructions are pre fetched from the on-chip instruction cache or memory. ADD AL,BL Unit 1 Microprocessor, Microcontroller 8/28/2024 58 Applications_ CSE2PM02A 2024-25 Sem 3 Details of Prefetch Pipeline stages of Pentium The first stage of the pipeline is the Prefetch (PF) stage in which instructions are prefetched from the on-chip instruction cache or memory. ⮚ Because the Pentium processor has separate caches for instructions and data, prefetches do not conflict with data references for access to the cache. ⮚ In the PF stage, two independent pairs of 32-byte prefetch buffers operate in conjunction with the branch target buffer. ⮚ This allows one prefetch buffer to prefetch instructions sequentially, while the other prefetches according to the branch target buffer predictions 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 59 2024-25 Sem 3 Prefetch stage ⮚ Two pre fetch buffers, 32bytes each. ⮚ At a time only one pre fetch buffer is connected to u and v pipes. ⮚ The buffer connected to the pipes- ⮚ - fetches instructions from code cache sequentially. ⮚ The other prefetch buffer ⮚ - fetches the instructions as directed by BTB. Unit 1 Microprocessor, Microcontroller 8/28/2024 3/14/2019 60 35 Applications_ CSE2PM02A 2024-25 Sem 3 Pipelining stages Details Stage 2 Decode1 The decoders decodes instructions and decides whether they can be paired (control word generator) Two parallel decoders attempt to decode and issue the next two sequential instructions. The decoders determine whether one or two instructions can be issued (the instruction pairing rules ). Unit 1 Microprocessor, Microcontroller 8/28/2024 61 Applications_ CSE2PM02A 2024-25 Sem 3 Pipelining stages Details Stage 3 Decode2: The D1 stage is followed by Decode2 (D2) Addresses of memory resident operands are calculated (decode control word and generate memory address). Unit 1 Microprocessor, Microcontroller 8/28/2024 62 Applications_ CSE2PM02A 2024-25 Sem 3 Pipelining stages Details Stage 4 Execute (EX): Execute the instructions with data Cache and ALU access Unit 1 Microprocessor, Microcontroller 8/28/2024 63 Applications_ CSE2PM02A 2024-25 Sem 3 Pipelining stages Details Stage 5 Write Back (WB): Modify processor state and complete execution. Unit 1 Microprocessor, Microcontroller 8/28/2024 64 Applications_ CSE2PM02A 2024-25 Sem 3 Instruction flow in Pentium pipeline Unit 1 Microprocessor, Microcontroller 8/28/2024 65 Applications_ CSE2PM02A 2024-25 Sem 3 Pipeline performance The maximum throughput is observed with-pipeline Every instruction takes exactly one clock cycle to execute No or minimum Stalls in pipeline The number of clock cycles = m+(n-1) where m = Number of stages in a pipeline n = Number of instructions in a program Example: A sequence of 1000 instructions will require 3000 =(3*1000) clk cycles on a non pipelined machine and only 1002=(3+1000-1) cycles when pipelined! Unit 1 Microprocessor, Microcontroller 66 8/28/2024 Applications_ CSE2PM02A 2024-25 Sem 3 Pipeline stall at decode and execute stage of pipeline Unit 1 Microprocessor, Microcontroller 8/28/2024 67 Applications_ CSE2PM02A 2024-25 Sem 3 Execution Unit - creating Stall / bubble I1, I3, I4 : Simple instructions : 1 CLK Cycle I2 : Multiplication Instruction : 4CLK Cycles Prfetch Buffer I5 STALL I4 I3 I2 I1 Decode1 Decode2 Execution WriteBack Unit 1 Microprocessor, Microcontroller 8/28/2024 68 Applications_ CSE2PM02A 2024-25 Sem 3 Decode2 : Operand fetch, Address generation - creating stall I1, I2, I4 : Simple instructions : 1 CLK Cycle I3 : mov ax, buffer[bx+si] : 4CLK Cycles Prfetch Buffer I5 STALL I4 I3 I2 I1 Decode1 Decode2 Execution WriteBack Unit 1 Microprocessor, Microcontroller 8/28/2024 69 Applications_ CSE2PM02A 2024-25 Sem 3 Decode1 :Instr. Decoding, instruction pairing – creating Stall I1, I2, I3 : Simple instructions : 1 CLK Cycle I4 : Xchg : 4CLK Cycles Prfetch Buffer I5 STALL I4 I3 I2 I1 Decode1 Decode2 Execution WriteBack Unit 1 Microprocessor, Microcontroller 8/28/2024 70 Applications_ CSE2PM02A 2024-25 Sem 3 Instruction Pairing Rules Unit 1 Microprocessor, Microcontroller 8/28/2024 71 Applications_ CSE2PM02A 2024-25 Sem 3 Instruction Pairing Restrictions /Rules for parallel execution 1. Both must be simple instructions. 2. No data dependencies may exist between them. Read after Write Hazard/ Flow dependence Write after Write Hazard/ Output dependence Ex. MOV AX, BX Ex. MOV AX, BX MOV DX, AX MOV AX, CX 3. For floating point instructions pairing: The first instruction of the pair can be – FLD, FLD st(i), FADD, FMUL, FDIV, FCOM, FTST(test floating-point (compare with 0.0)), FABS (absolute value), FCHS (Change Sign- Complements sign of ST(0) This operation changes a positive value into a negative value of equal magnitude or vice versa. The second instruction must be FXCH. https://docs.oracle.com/cd/E18752_01/html/817-5477/eoizy.html Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024- 8/28/2024 72 25 Sem 3 Instruction Pairing Restrictions /Rules for parallel execution 4. Neither instruction may contain both immediate data and a displacement value (MOV TABLE[SI],7) 5. Prefixed instructions may execute only in U pipeline (MOV ES:[DI],AL) Unit 1 Microprocessor, Microcontroller 8/28/2024 73 Applications_ CSE2PM02A 2024-25 Sem 3 Branch Prediction https://www.youtube.com/watch?v=rJAEGbpRrL4 https://www.youtube.com/watch?v=yk-U6qqeGE0 Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 2024-25 Sem 8/28/2024 74 3 ⮚ Branch Prediction Performance gain through pipelining can be reduced by the presence of program transfer instructions (such as JMP, CALL, RET and conditional jumps). I0 MOV SI, addr Bk : I1 INC SI I2 DEC CX ⮚ They change the sequence causing all the instructions that I3 JNZ BK entered the pipeline after program transfer instruction invalid. I4 ADD AX,BX I5 INC AX I6 MOV [SI] , AX ⮚ Suppose instruction I3 is a conditional jump to I1 at some other address (target address), then the instructions that entered after I3 is invalid and new sequence beginning with I1 need to be loaded in. ⮚ This causes bubbles in pipeline, where no work is done as the pipeline stages are reloaded. Unit 1 Microprocessor, Microcontroller 8/28/2024 75 Applications_ CSE2PM02A 2024-25 Sem 3 Branch Prediction Continued… ⮚ In this scheme, a prediction is made with Algorithm implemented in Pentium for reducing pipeline stalls during execution of jmp instructions. reference to the branch instruction It contains a Branch prediction state machine with currently in pipeline. four states: ⮚ Prediction will be either taken or not taken. ⮚ If the prediction turns out to be true, the pipeline will not be flushed and no clock cycles will be lost. ⮚ If the prediction turns out to be false, the pipeline is flushed and started over with the correct instruction. ⮚ To avoid this problem, the Pentium uses a scheme called Dynamic Branch Prediction. Unit 1 Microprocessor, Microcontroller 8/28/2024 76 Applications_ CSE2PM02A 2024-25 Sem 3 Branch Prediction Algorithm First occurrence of “JNZ” instruction I0 MOV SI, addr Bk : I1 INC SI I2 DEC CX I3 JNZ BK I4 ADD AX,BX I5 INC AX I6 MOV [SI] , AX Second occurrence of “JNZ” instruction Unit 1 Microprocessor, Microcontroller 8/28/2024 77 Applications_ CSE2PM02A 2024-25 Sem 3 Branch Prediction ⮚ The Pentium processor uses a Branch Target Buffer (BTB) to predict the outcome of branch instructions which minimizes pipeline stalls due to pre-fetch delays. ⮚ Algorithm implemented in Pentium for reducing pipeline stalls during execution of jmp instructions It contains a Branch prediction state machine with four states: (1) strongly not taken - 00 (2) weakly not taken - 01 (3) weakly taken - 10 (4) strongly taken - 11 Branch prediction state machine with four sates Unit 1 Microprocessor, Microcontroller 8/28/2024 78 Applications_ CSE2PM02A 2024-25 Sem 3 Branch Prediction Algorithm Two 32 bytes prefetch buffers work with BTB. One fetches instruction from the current program address and the other buffer is activated when the branch BTB predicts “Taken”. Initially History bits are 11, when a new target address is placed in BTB. Whenever branch instruction is encountered, these bits are updated. If the branch is not taken repetitively then History bits become 00. BTB is accessed with the address instruction during D1 stage. If found and the BTB’s prediction is “Taken”, second prefetch buffer becomes active. If a new branch instruction is encountered (no target address in BTB), the prediction is “Not taken”. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 8/28/2024 79 2024-25 Sem 3 Branch Prediction Algorithm (Extra Information) Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 8/28/2024 80 2024-25 Sem 3 Example of Branch prediction and subsequent explanation I0 MOV SI, addr In this example: Bk : I1 INC SI I2 DEC CX Instruction I3 is branching instruction which will work on the condition I3 JNZ BK I4 ADD AX,BX based on zero flag. I5 INC AX Condition will be True if ZF = 0, not set and branching will happen; I6 MOV [SI] , AX means execution sequence will change from current location (source location i.e. I3) to target location (i.e. I1 or label Bk) If condition is False i.e. ZF=1; execution sequence will be from current location (I3) to next location (I4) and so on. When I3 is getting executed first time there is no history associated with it and hence BTB contents will be either 00 or garbage values i.e. no meaning associated with it. In pipeline at D1 stage, the instruction is decoded and system knows whether it is branching instruction or not. At Ex stage of pipeline, system comes to know whether branching will take place or not and hence at this stage BTB contents are updated and used for next iteration of the loop. Unit 1 Microprocessor, Microcontroller Applications_ CSE2PM02A 8/28/2024 81 2024-25 Sem 3 Example - I0 MOV SI, addr ⮚ First time execution of branching Bk : I1 INC SI instruction JNZ Bk I2 DEC CX I3 JNZ BK I4 ADD AX,BX ⮚ During D1 stage, the JNZ instruction is not I5 INC AX known to BTB. I6 MOV [SI] , AX so it’s miss, thus predicts NO JUMP WOULD BE TAKEN Accordingly the subsequent instructions I4, I5 … has entered in pipeline. This is illustrated in next slide. Unit 1 Microprocessor, Microcontroller Applications_ 8/28/2024 82 CSE2PM02A 2024-25 Sem 3 T1 CLK T2 CLK T5 CLK Control Hazard U pipe U pipe U pipe 3 4 7 2 3 6 1 2 5 D1 ; I0 MOV SI, addr D2 1 Bk : I1 INC SI 4 I2 DEC CX I3 JNZ BK I4 ADD AX,BX I5 INC AX I6 MOV [SI] , AX Ex 3 The EX stage (Execute stage) gives feedback to BTB – Branch actually to be taken. Actions taken further are : 1. BTB is updated as below 2. Pipeline flushed and Instructions I4, I5, I6 and I7 which are entered in pipeline are cleared. 3. Instructions fetched from new address “BK” (I1 and I2 ) in the pre fetch buffer2 I0 MOV SI, addr 4. Pre fetch buffer 2 connected to the pipeline and it behaves Bk : I1 INC SI I2 DEC CX as main buffer/ buffer 1 as long as branching takes place. I3 JNZ BK I4 ADD AX,BX These actions are illustrated in next slide. I5 INC AX I6 MOV [SI] , AX Unit 1 Microprocessor, Microcontroller 8/28/2024 84 Applications_ CSE2PM02A 2024-25 Sem 3 First occurrence of JNZ Pre fetch Buffer 1 Pre fetch Buffer 2 Bring correct instruction from code cache 7 2 6 IP 1 Connect buffer 2 5 I0 MOV SI, addr Flush the pipeline Bk : I1 INC SI I2 DEC CX I3 JNZ BK 4 I4 ADD AX,BX I5 INC AX I6 MOV [SI] , AX 3 8/28/2024 Unit 1 Microprocessor, Microcontroller Applications_ 85 CSE2PM02A 2024-25 Sem 3 I0 MOV SI, addr Second occurrence of “JNZ” instruction Bk : I1 INC SI I2 DEC CX I3 JNZ BK I4 ADD AX,BX I5 INC AX I6 MOV [SI] , AX ⮚ BTB Hit, Prediction – Branch Strongly taken when reaches Execute Stage ⮚ BTB instructs prefetch stage to fetch instructions from addr. “BK” ⮚ Buffer1: prefetches instructions from target ie. I1 and I2 ⮚ Buffer2: prefetches instructions I4, I5 and so on. ⮚ During last iteration, when ZF=1 then branching will not happen and new instructions will be readily available in one of the buffer hence the correct buffer will be attached to U and V pipe. Unit 1 Microprocessor, Microcontroller 8/28/2024 86 Applications_ CSE2PM02A 2024-25 Sem 3 BTB Example Consider the following loop for computing prime numbers: for(k=i+prime;k

Use Quizgecko on...
Browser
Browser