1- Introduction to Parallel Computing.pdf
Document Details
Uploaded by Ameera
University of Sharjah
Tags
Full Transcript
Introduction Chapter 1 Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 1 Agenda 1.1 Motivating Parallelism 1.2 Scope of Parallel Computing 1.3 Organization and Contents...
Introduction Chapter 1 Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 1 Agenda 1.1 Motivating Parallelism 1.2 Scope of Parallel Computing 1.3 Organization and Contents of the Text Basic Computer system Concepts Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 2 The Computational Power Argument – from Transistors to FLOPS Moore's Law: The amount of computing power available at a given cost doubles approximately every 18 months. This empirical relationship has been amazingly resilient over the years both for microprocessors as well as for DRAMs. It is possible to fabricate devices with very large transistor counts. How we use these transistors to achieve increasing rates of computation is the key architectural challenge. A logical recourse to this is to rely on parallelism – both implicit and explicit. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 3 Performance: Processor vs. Memory Rate of increase in the processor speed is much higher than memory. This increases the gap between processor and memory speeds which mandates intermediate stages to bridge this gap Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 4 Performance: Processor vs. Memory However, CPU is not taking the focus in the last decade Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 5 The Memory/Disk Speed Argument While clock rates of high-end processors have increased at roughly 40% per year over the past decade, DRAM access times have only improved at the rate of roughly 10% per year over this interval. This gap between processor speed and memory presents a tremendous performance bottleneck. Parallel platforms typically yield better memory system performance because they provide – (i) larger aggregate caches, and – (ii) higher aggregate bandwidth to the memory system (both typically linear in the number of processors). Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 6 The Data Communication Argument As the networking infrastructure evolves, the vision of using the Internet as one large heterogeneous parallel/distributed computing environment has begun to take shape. In large commercial datasets of data mining distributed over network, it is infeasible to collect the data at a central location even if the computing power is available to accomplish the required task without resorting to parallel computing. In these cases, the motivation for parallelism comes not just from the need for computing resources but also from the infeasibility or undesirability of alternate (centralized) approaches. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 7 Motivating Parallelism Why is it critical to learn how to build efficient parallel application? Development of parallel software has traditionally been thought of as time and effort intensive. This can be largely attributed to – The inherent complexity of specifying and coordinating concurrent tasks, – Lack of portable algorithms, – Non-standardized environments, and software development toolkits. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 8 Motivating Parallelism Without using standards: If it takes two years to develop a parallel application, during which time the underlying hardware and/or software platform has become obsolete, the development effort is clearly wasted. Considerable progress has been made in standardization of programming environments to ensure a longer lifecycle for parallel applications. All of these present compelling arguments in favor of parallel computing platforms. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 9 Scope of Parallel Computing Applications in Engineering and Design: high- speed circuits (layouts for delays and capacitive and inductive effects), and structures (optimizing structural integrity, design parameters, cost, etc.), design of microelectromechanical and nanoelectromechanical systems (MEMS and NEMS) Scientific Applications: Functional and structural characterization of genes and proteins hold the promise of understanding and fundamentally influencing biological processes, developing new drugs and cures for diseases and medical conditions requires innovative algorithms as well as large-scale computational power such as applications in bioinformatics. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 10 Scope of Parallel Computing Commercial Applications: Parallel platforms ranging from multiprocessors to linux clusters are frequently used as web and database servers, large brokerage houses. Wall Street handles hundreds of thousands of simultaneous user sessions and millions of orders. Some of the largest supercomputing networks are housed on Wall Street. The availability of large-scale transaction data has also sparked considerable interest in data mining and analysis for optimizing business and marketing decisions. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 11 Scope of Parallel Computing Applications in Computer Systems: In the area of cryptography, some of the most spectacular applications of Internet-based parallel computing have focused on factoring extremely large integers. Embedded systems increasingly rely on distributed control algorithms for accomplishing a variety of tasks. A modern automobile consists of tens of processors communicating to perform complex tasks for optimizing handling and performance. In such systems, traditional parallel and distributed algorithms for leader selection, maximal independent set, etc., are frequently used. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 12 Organization and Contents of the course This course provides a comprehensive and self- contained exposition of problem solving using parallel computers. Algorithms and metrics focus on practical and portable models of parallel machines. Principles of algorithm design focus on desirable attributes of parallel algorithms and techniques for achieving these in the contest of a large class of applications and architectures. Programming techniques cover standard paradigms such as MPI and POSIX threads that are available across a range of parallel platforms. Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 13 Chapters Organization Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 14 Processes versus Processors Processes are logical computing agents (an instant of executing a program) that perform tasks. Processors are the hardware units that physically perform computations. In this course, we choose to express parallel algorithms and programs in terms of processes. In most cases, when we refer to processes in the context of a parallel algorithm, there is a one-to-one correspondence between processes and processors. It is appropriate to assume that there are as many processes as the number of physical CPUs on the parallel computer Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 15 Program & Instruction What is a Program? – A sequence of instructions that are written by the programmer who write the code of this program to perform a specific task. CDSEG SEGMENT START: ASSUME CS:CDSEG,DS:DTSEG,SS:STSEG MOV AX,DTSEG MOV DS,AX CALL SQ CALL CS:SQF MOV AH,4CH INT 21H ;RETURN TO DOS END START CDSEG ENDS Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 16 Quantitative Principles of Computer Design The Processor Performance Equation – Processors are sequential IC running using a clock of constant rate called (ticks, clock ticks, clock periods, clocks, cycles, or clock cycles). – Designer refer to CPU time by either its duration (i.e. 1 ns) or by its rate (i.e. 1GHz) CPU time CPU clock cycles for a program Clock cycle time or CPU clock cycles for a program CPU time Clock rate Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 17 Quantitative Principles of Computer Design (cont.) The Processor Performance Equation – instruction count (IC) – The average number of clock cycles per instruction (CPI) – Instruction Per Cycle (IPC) which is reciprocal of CPI is also widely used – For CPI lower is better and vice versa for IPC CPU clock cycles for a program CPI Instruction count CPU time Instruction count Cycles per instruction Clock cycle time or Instructions Clock cycles Seconds Seconds CPU time Pr ogram Instructions Pr ogram Clock cycles Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 18 Quantitative Principles of Computer Design (cont.) The Processor Performance Equation – CPU time is equally dependent on the three components: Clock cycle time: Hardware technology and organization CPI: Organization and instruction set architecture Instruction count : Instruction set architecture and compiler technology CPU time Instruction count Cycles per instruction Clock cycle time Parallel & Distributed Processing - 1502412 Dr. Ali El-Moursy 19