Pipelining and Parallel Processing PDF
Document Details
Uploaded by Deleted User
York University
2012
Mokhtar Aboelaze
Tags
Summary
These lecture notes from York University's CSE4210 course cover pipelining and parallel processing, an important aspect of computer architecture. The notes include definitions, diagrams, and equations to explain these concepts.
Full Transcript
YORK UNIVERSITY CSE4210 Chapter 3 Pipelining and parallel Processing CSE4210 Winter 2012 Mokhtar Aboelaze YORK UNIVERSITY CSE4210 Pipelining -- Introduction Pipelining can be used to reduce the the...
YORK UNIVERSITY CSE4210 Chapter 3 Pipelining and parallel Processing CSE4210 Winter 2012 Mokhtar Aboelaze YORK UNIVERSITY CSE4210 Pipelining -- Introduction Pipelining can be used to reduce the the critical path. That can lead to either increasing the clock speed, or decreasing the power consumption Multiprocessing can be also used to increase speed or reduce power. 1 YORK UNIVERSITY CSE4210 Pipelining x(n) x(n-1) x(n-2) D D a ⊗ b ⊗ c ⊗ ⊕ ⊕ y(n) The critical path here is two additions and one multiplication Ts = 2Tadd + Tmul , fs = 1 (2Tadd + Tmul ) YORK UNIVERSITY CSE4210 a(n) b(n) x(n) ⊕ ⊕ y(n) a(n) b(n-1) Pipelining x(n) ⊕ D ⊕ y(n-1) a(2k) b(2k) x(2k) ⊕a(2k+1) ⊕b(2k+1)y(2k) Parallel ⊕ ⊕ y(2k+1) processing x(2k+1) 2 YORK UNIVERSITY CSE4210 Pipelining Advantages – Could be used to reduce power and/or to increase clock rate (speed) Disadvantages – Increases number of delay elements (latches or flip-flops) – Increases latency YORK UNIVERSITY CSE4210 Pipelining Cutset: is a set of edges of a graph if removed, the graph becomes partitioned Feed forward cutset: a cutset where the data moves in the forward direction on all the edges in the cutset We can place latches on a feed-forward cutset without affecting the functionality of the graph. 3 YORK UNIVERSITY CSE4210 Pipelining x(n) x(n-1) x(n-2) D D D a ⊗ b ⊗ c ⊗ ⊕ D ⊕ y(n) YORK UNIVERSITY CSE4210 Example A2 D A4 Critical Path? A1 A6 D A3 A5 A2 A4 A2 D A4 D A1 A1 A6 A6 D D A3 A5 A3 A5 4 YORK UNIVERSITY CSE4210 Data Broadcast Structures Reversing the direction of all the edges in a given SFG, and interchanging the ,input and output preserves the functionality of the system. YORK UNIVERSITY CSE4210 Data Broadcast o Z-1 o Z-1 o x(n) b c a o o o y(n) o Z-1 o Z-1 o y(n) b c a o o o x(n) 5 YORK UNIVERSITY CSE4210 Fine-Grain Pipelining Multiplication time = 10 Addition time = 2 Critical path = ? x(n) c ⊗ b ⊗ a ⊗ D ⊕ D ⊕ y(n) YORK UNIVERSITY CSE4210 Fine grain Pipelining 6 YORK UNIVERSITY CSE4210 Parallel Processing y (n) = ax(n) + bx(n − 1) + cx(n − 2) y (3k ) = ax(3k ) + bx(3k − 1) + cx(3k − 2) y (3k + 1) = ax(3k + 1) + bx(3k ) + cx(3k − 1) y (3k + 2) = ax(3k + 2) + bx(3k + 1) + cx(3k ) YORK UNIVERSITY CSE4210 X(3k) or x(3k-2)? 7 YORK UNIVERSITY CSE4210 Complete Parallel System Clock period x(n) Serial to Parallel T/4 Sampling period Converter T/4 x(4k+2) x(4k) x(4k+3) x(4k+1) y(4k) MIMO y(4k+1) Parallel to Serial y(n) y(4k+2) Clock period y(4k+3) T SYSTEM Converter YORK UNIVERSITY CSE4210 S/P and P/S Converter T/4 T/4 T/4 D D D T T T T x(4k+3) x(4k+2) x(4k+1) x(4k) T T T T D D D y(n) T/4 T/4 T/4 8 YORK UNIVERSITY CSE4210 Parallel Processing Why use parallel processing? It increases the hardware. There is a limit for the use of pipelining, you may not be able to pipeline a functional unit beyond a certain limie Also, I/O usually imposes a bound on the cycle time (communication bound) YORK UNIVERSITY CSE4210 Combining pipelining and parallel processing 9 YORK UNIVERSITY CSE4210 Low Power P = Ctotal Vo2 f C chargeVo T pd = Simple approximation k (Vo − Vt ) 2 for CMOS Ctotal is the total capacitance of the circuit, Vo is the supply voltage. Ccharge is the capacitance to be charged/discharged in a single clock cycle. Pipelining and parallel processing could be used to minimize power or execution time. YORK UNIVERSITY CSE4210 Low Power What happens in case of M –pipelining? Critical path is reduced by M, so is Ccharge If we keep the same f, then we have more time to charge Ccharge, which means we can reduce the supply voltage C charge C chargeVo β Vo Tseq = = T pip = M ,P = ? k (Vo − Vt ) 2 k ( βVo − Vt ) 2 10 YORK UNIVERSITY CSE4210 Low Power —Example x(n) c m1 b m1 a m1 D a1 D a1 y(n) YORK UNIVERSITY CSE4210 Example 11 YORK UNIVERSITY CSE4210 Parallel processing What happen for L parallel System? Total capacitance increased by L Same performance, increase T by L More time to charge Ccharge, can decrease V C chargeVo C ch arg e β Vo LTseq = L = T par = ,P = ? k (Vo − Vt ) 2 k ( βVo − Vt ) 2 YORK UNIVERSITY CSE4210 Parallel Processing Example Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-parallel version in 3.18(b). The parallel filter has exactly 2 copies of the original filter. The dashed line donates the critical path. Assume Also that TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA – What is the supply voltage of the 2-parallel filter? – What is the power consumption of the 2-parallel filter as a percentage of the original filter? 12 YORK UNIVERSITY CSE4210 YORK UNIVERSITY CSE4210 Example 13 YORK UNIVERSITY CSE4210 Different Architecture 14