Part 1 - 2.pdf | Quizgecko

COMPUTER ORGANIZATION: 1. PROCESSORS Instructor: Dr. Abrar Wafa PARALLEL PROCESSING 2 PARALLEL PROCESSING What does Parallel Processing mean? Making the CPU perform more than one task at the same time. Two types of parallel processing: 1. Instruction Level Parallel Processing (Pipelining): making a single CPU performs several instructions at the same time. Ø Fetch, decode, execute, store cycle we learned about. 2. Processor Level Parallel processing: using multiple CPUs to solve a single problem. 3 PARALLEL PROCESSING ONE PROCESSOR Instruction Level Parallel Processing INSTRUCTION LEVEL PARALLEL PROCESSING Program is divided into instructions and instruction execution is done through five main steps: 1. Fetch Instruction 2. Decode Instruction 3. Fetch operand 4. Execute 5. Store Results Ø In a normal CPU all these steps are performed one after the other (sequentially) because they are done by a single hardware. 5 PARALLEL EXECUTION WITHIN ONE CPU 1. Sequential execution no instruction can start executing until the previous instruction is completed. 2. Pipeline execution provides partial overlapping of the execution of two instructions. Ø This means that the two instructions must be issued sequentially not on parallel. 3. Superscalar architecture refers to the use of multiple pipelines or a single pipeline with multiple execution units, to allow the processing of more than one instruction at a time. 6 PARALLEL EXECUTION WITHIN ONE CPU 1. Sequential execution no instruction can start executing until the previous instruction is completed. 2. Pipeline execution provides partial overlapping of the execution of two instructions. Ø This means that the two instructions must be issued sequentially not on parallel. 3. Superscalar architecture refers to the use of multiple pipelines or a single pipeline with multiple execution units, to allow the processing of more than one instruction at a time. 7 2. PIPELINE EXECUTION Ø Instruction Pipelining is the process of making the CPU capable of issuing a new instruction before the previous one has completed execution. § This means that the CPU can overlap execution of different instructions at the same time (concurrently). Ø If we assign a separate hardware unit to each of the five execution steps and make them all work on parallel then we will have a pipelined processor. Store 8 2. PIPELINE EXECUTION Ø A pipeline processor consists of a sequence of (M) data processing circuits called stages. § In the figure M = 5. Ø The output of each stage will be the input to the next stage in the pipeline as where the pipeline consists of five stages. Ø Note that the input to Decode is the output of Fetch and so on. Store 9 INSTRUCTION LEVEL PARALLEL PROCESSING In an ideal pipeline, Each stage completes its operation in one clock cycle. All stages have equal amount of time to finish their tasks. If different stages require different amount of time, the system clock period must be equal to the stage that requires the longest time to complete its operation. Ø Processors examples: § Intel 80486 CPU uses a single five-stage pipeline. § Pentium 4 uses a single 20-stage pipeline. 10 (a) single five-stage pipeline (5 execution steps) (b) pipeline execution • Note that at time T=5 the pipeline is full Ø All the stages are working on parallel on different instructions). 11 PIPELINING DEFINITIONS § Pipeline Setup time: time needed for the pipeline to become full. Ø Pipeline setup time = #of stages * time for each stage (clock period). § Example from previous slide: # of stages = 5, time for each stage =1 § Pipeline setup time = 5 * 1 = 5 § Latency (delay) time: the time needed by the CPU to finish executing a single instruction. This time is equal to: Ø If no pipeline is used: Latency time = # of execution steps * clock period Ø If the CPU is pipelined Latency time = clock period (after the setup time) 12 PIPELINING DEFINITIONS CPU throughput or CPU bandwidth or CPU rate: Ø CPU throughput = maximum number of instructions completed in one second. Ø CPU Throughput is measured in Millions of Instructions per Seconds (MIPS): CPU Throughput =1/t t:time needed to execute an instruction 13 CONSTRAINTS ON INSTRUCTION LEVEL PARALLEL PROCESSING In ideal situations, (theoretically) the expected increase in CPU performance from using pipelining is proportional to the number of pipeline stages. Ø This can happen if the pipeline operation can continue without any interruption throughout program execution. This is practically not the case. There are certain constraints and limitations that will prevent the pipeline from gaining the theoretical increase in performance: 1. Data Dependency 2. Resource Conflict 14 1- DATA DEPENDENCY Data Dependency: when one instruction depend on the result of the previously instruction. For example, assume that A = 5, and the following two instructions are in the program to be executed: other & depended A Ø i1: A=A+3 5 32 8 Ø i2: B=A*4 on + * 3 u Δ = = 15 1- DATA DEPENDENCY For example, assume that A = 5, and the following two instructions are in the program to be executed: Ø i1: A=A+3 Ø i2: B=A*4 § Note that the data needed for i2 depends on the result of executing i1. § What is the result obtained when the two instructions are executed using the pipeline? Ø B = 20 (NOT CORRECT) Ø This incorrect result happens because the original value of A=5 is used, not the new value of A after i1 is executed A=8. § What is the correct result? Ø Since i was in the execute stage while i is in the fetch 1 2 operand stage, these two instructions can not be executed on Ø B = 32 parallel. 16 2- RESOURCE CONFLICT Resource Conflict when there is a competition of two or more instructions for the same resource at the same time (memories, cache, buses, register file, functional unit (ALU)). Ø For example, two programs want to execute (operate on the CPU) at the same time. Conflicts must be handled by either: hardware or specialized compilers and operating systems. 17 PARALLEL EXECUTION WITHIN ONE CPU 1. Sequential execution no instruction can start executing until the previous instruction is completed. 2. Pipeline execution provides partial overlapping of the execution of two instructions. Ø This means that the two instructions must be issued sequentially not on parallel. 3. Superscalar architecture refers to the use of multiple pipelines or a single pipeline with multiple execution units, to allow the processing of more than one instruction at a time. 18 3. SUPERSCALAR ARCHITECTURES Superscalar architecture can be thought of as a form of "internal multiprocessing", since there really are multiple parallel processors inside the CPU. Ø Most modern processors are superscalar a. Use multiple pipelines b. Use a single pipeline with multiple execution units. 19 A) MULTIPLE PIPELINES Ø If more than a single pipeline is used then full overlapping of the execution of two instructions can be provided and two instructions can be issued on parallel. FIGURE SHOWS TWO FIVE-STAGE PIPELINES WITH A SINGLE FETCH UNIT. THIS MODEL WAS USED IN INTEL PENTIUM PROCESSOR. 20 B) SINGLE PIPELINE WITH MULTIPLE EXECUTION UNITS Ø Here we have a single pipeline with multiple execution units within the same pipeline. Ø For this configuration to work properly and make use of the multiple execution units, Stage 3 (S3) has to be much faster than Stage 4 (S4). Ø This type of pipeline was implemented in Intel Pentium II processor 21 PARALLEL PROCESSING MULTIPLE PROCESSORS Process Level Parallel Processing 22 PARALLEL PROCESSING - PROCESSOR LEVEL moly CPU There are three architectures under this category: 1) Array processor. 2) Multiprocessors. 3) Multicomputer. 23 1) ARRAY PROCESSOR Ø Array Processor: Special function computers that are used to solve problems in engineering and physical sciences that require real time processing of large vectors. Ø In Array processors, large number of identical processors that perform the same instruction on different operands (data sets). Ø A single control unit broadcast (send) the operation to be performed by all the processors on the data stored in the processors’ local memories. Ø For example, to perform an operation on two vectors, we should have as many processors as the number of the two vector elements. 24 1) ARRAY PROCESSOR Examples of array processors are: ILLIAC-IV has 64-bit processors, CM-2 machine which can have up to 65536 processors each deals with onebit MP-1216 has a maximum of 16384 processors each are 4-bit. 25 2) MULTIPROCESSORS Ø Multiprocessors: A system with independent CPUs sharing a common memory and each CPU has its own local memory. Ø All CPUs are connected by either a single bus or in general an interconnecting network. Ø This will lead to possibility of having bus conflicts (more than one CPU requesting the bus) and requires a specialized operating system to coordinate the bus usage. 55 Y Shared memory multiprocessor with local memories Ø CPUs here communicate with each other by reading and writing messages into the shared main memory. 26 3) MULTI-COMPUTERS ØMulticomputer: Standalone interconnected computers that appears to the end user as a single system. ØStandalone interconnected computes that appears to the end user as a single system. ØIn this type of systems, CPUs communicate by passing messages to each other like email. Examples of computer networks Topologies 27

Part 1 - 2.pdf

Document Details

Tags

Related

Full Transcript