merged_sample.pdf
Document Details
Uploaded by Ameera
University of Sharjah
Tags
Full Transcript
University of Sharjah College of Computing and Informatics Department Computer Engineering 1502412 – Parallel & Distributed Processing...
University of Sharjah College of Computing and Informatics Department Computer Engineering 1502412 – Parallel & Distributed Processing Sample Exam 1 Student Name Student ID THIS IS A CLOSED BOOK EXAM. SHOW ALL STEPS FOR FULL CREDIT. A B C D Total 5 5 5 5 20 Parallel & Distributed Processing Prof. Ali El-Moursy Sample 1 A. Answer True or False for each of the following statements 1. True or False (5 points , 0.5 point each) 1.1. MPI is used to communicate data between multiple processing nodes in Shared-Memory Architecture. T F 1.2. Assuming all other factors are fixed, the higher the IPC is, the higher the microprocessor performance. T F 1.3. Aggregate memory bandwidth is the main reason for superscaling in distributed parallel systems. T F 1.4. Even if the parallel system efficiency is one (1), it may not be cost optimal. T F 1.5. The finer the task decomposition granularity is, the higher the parallel algorithm concurrency is. T F 1.6. Maximum level of concurrency using intermediate decomposition is n2 for multiplying two nXn matrixes. T F 1.7. Parallel algorithm is scalable if it has constant efficiency with increasing both problem size and processing elements count. T F 1.8. NUMA indicates accessibility of a remote cache with longer latency. T F 1.9. Overall overhead of parallel program increases with the increase in the processing elements count. T F 1.10. Task dependency graph TDG shows data dependency among the task. T F 1.11. Data replication trades off the memory space with the performance. T F B. Select ONE answer ONLY for each of the folowing questions:- (5 points) 2. In ideal case processor with 10-pipeline stages has IPC of (1 point) a) one b) 1/10 c) ten d) None of the above 3. Comparing Level one cache (L1 cache) to Level two cache (L2 cache), L1 is (1 point) a) Faster than L2 Cache b) Closer to the processor c) Larger in Capacity d) Both a) and b) Parallel & Distributed Processing Prof. Ali El-Moursy Sample 2 4. Process state includes (1 point) a) Architecture/Logical Registers b) Program Counter c) Memory space state d) All of the above 5. Assume an ideal case with no parallel algorithm overhead, which of the following decomposition technique/s may cause a slowdown (1 point) a) Data Decomposition b) Exploratory Decomposition c) Recursive Decomposition d) All of the above 6. Which of the following technique/s is/are used to minimize the concurrent task interaction overhead (1 point) a) Data replication b) Optimized collective communication operations c) Overlaping computation with interaction d) All of the above Parallel & Distributed Processing Prof. Ali El-Moursy Sample 3 C. Answer each of the following question:- 7. For the task graphs given in the Figure below, determine the following: i. Maximum degree of concurrency. ii. Critical path length. iii. The maximum achievable speedup if the number of processes is 4. iv. The parallel algorithm efficiency if the number of processes is 4. Note: Assume each task takes one time unit. (5 points) Maximum Degree of concurrency Critical path Speedup 4 processors Efficiency 4 processors Parallel & Distributed Processing Prof. Ali El-Moursy Sample 4 D. Answer each of the following question:- 8. For the following database enquiry: I. Propose a parallel processing algorithm to perform the enquiry showing the TDG. Assume that each criteria search in the database is directly time dependent on the number of records to be searched/generated. Generate TDG. II. Which decomposition you used? Circle whatever applied (Recursive, Data, or Exploratory) (5 points) MODEL = `` Corolla '' AND (YEAR = 1999 OR 2000) AND(COLOR = ``BLACK'' OR COLOR = ``WHITE) ID# Model Year Color Dealer Price 1 Corolla 2002 BLACK NY $17,000 2 Civic 2002 Blue MN $18,000 3 Corolla 2000 White FL $21,000 4 Camry 2001 Green NY $21,000 5 Prius 2001 Green CA $18,000 6 Corolla 1999 BLACK FL $23,000 7 Civic 2001 White OR $17,000 8 Altima 2001 Green FL $19,000 9 Maxima 2001 Blue NY $22,000 10 Corolla 2000 Green MN $22,000 11 Accord 2000 BLACK VT $18,000 12 Civic 2001 Red CA $17,000 13 Corolla 1999 Green MN $18,000 14 Civic 2002 Red WA $18,000 15 Corolla 2000 White WA $22,000 Parallel & Distributed Processing Prof. Ali El-Moursy Sample 5 Parallel & Distributed Processing Prof. Ali El-Moursy Sample 6 1. Answer True or False for each of the following statements (0.5 point each) Introduction 1.1. Processors are usually referring to the program execution instant while processes refer to the platform. F 1.2. Moore's Law: The amount of computing power available at a given cost doubles approximately every 18 months. T 1.3. Many applications in different scopes are utilizing parallel processing. T 1.4. Assuming all other factors are fixed, the higher the CPI is, the lower the microprocessor performance. T ILP 1.5. The parallelism among instructions of the same serial program is called ILP T 1.6. Structure Hazard is a problem that faces the pipelining when two pipeline stages access the same hardware. T 1.7. Two instructions are said to be parallel when they are dependent. F 1.8. Superscalar processor indicates separating cache into data cache and instruction cache. F 1.9. Microprocessor pipelining is a technique to improve microprocessor performance by concurrent instruction processing in two different pipeline T stages. 1.10. We may stall the pipeline if the instructions are dependent. T 1.11. Stall is to execute multiple instructions on different FU simultaneously. F 1.12. Dynamic Instruction Scheduling is used in VLIW machines. F 1.13. Superscalar processor utilizes Static Instruction Scheduling to improve processor performance. F Parallel & Distributed Processing Prof. Ali El-Moursy 1 Cache 1.14. Compilers can access and control the data placement in the cache. F 1.15. Special locality is the probability to reuse the same data soon (locality in time). F 1.16. Cache Miss happens when the CPU find the data in the cache. F 1.17. Level-2 cache (L2) smaller in data capacity than level-1 cache (L1) F Multiprocessing &Multithreading 1.18. Context switching is the swapping out and in of different applications in multi-tasking T 1.19. All threads of the same program share the same set of Architecture (logical) Registers F 1.20. Superscalar uni-processors usually have both vertical and horizontal waste of issue-slots. T 1.21. In multi-core processors, CPU is share between different threads/applications. F Parallel Architecture 1.22. Multi-Computers usually connected through Network. T 1.23. In Distributed-Memory architecture, the processors uniformly Access the Memory (UMA). F 1.24. In multiprocessors, the number of processors is usually 4 to 32. T 1.25. Parallel Programming Paradigm for Centralized Memory Architecture is MPI. F Parallel & Distributed Processing Prof. Ali El-Moursy 2 Analytical Model & Parallel Algorithms 1.26. Aggregate memory bandwidth is the main reason for superscaling in distributed parallel systems. T 1.27. Even if the parallel system efficiency is one (1), it may not be cost optimal. F 1.28. Critical Path length represents the Time Parallel (Tp) T 1.29. For a parallel algorithm, maximum degree of concurrency represents the ideal Speed up. F 1.30. The finer the task decomposition granularity is, the higher the parallel algorithm concurrency is. T 1.31. Maximum level of concurrency using intermediate decomposition is n2 for multiplying two nXn matrixes. T 1.32. Parallel algorithm is scalable if it has constant efficiency with increasing both problem size and processing elements count. T 1.33. Overall overhead of parallel program increases with the increase in the processing elements count. T 1.34. Task dependency graph TDG shows data dependency among the task. F 1.35. Data Decomposition usually follows dynamic task generation. F 1.36. Data replication trades off the memory space with the performance. T 1.37. Computation replication is always useless. F 1.38. Non-blocking communications support Overlapping computations with interactions in Shared-Address Space F 1.39. Owner Computes Rule states that the process assigned a particular data item is responsible for all computation associated with it. T Parallel & Distributed Processing Prof. Ali El-Moursy 3 B. Select ONE answer ONLY for each of the folowing questions:- ILP 2. Different techniques to exploit ILP in the microprocessors is/are (1 point) a) pipeling b) out-of-order execution c) multiple issue d) All the above 3. VLIW processors schedule the instructions for execution (1 point) a) Dynamically b) Statically c) Both a) and b) d) None of the above 4. If the instructions are truly dependent, we have to do which of the following for the pipeline? (1 point) a) Predict the value b) Multi-issue the instructions c) Stall d) None of the above 5. In ideal case processor with 20-pipeline stages has IPC of (1 point) a) 1/20 b) one c) twenty d) None of the above TLP/ Multiprocessor 6. Processors that can run multiple thread/applications in parallel simultaneously at the same time is/are (1 point) a) Uni-processor b) Multi-threaded processor c) Multi-core processor d) Both b) & c) Cache 7. The cache support (1 point) a) Temporal locality b) Special locality c) Both a) and b) d) None of the above Parallel & Distributed Processing Prof. Ali El-Moursy 4 8. Registers are faster than (1 point) a) Level-one cache (L1) b) Level-two cache (L2) c) Main memory d) All of the above 9. Registers has more capacity than (1 point) a) Level-one cache (L1) b) Level-two cache (L2) c) Main memory d) None of the above 10. If the CPU can find the data in the cache, this is called (1 point) a) Cache hit b) Cache miss c) Cache Coherency d) Cache Block 11. The data is moved from the cache to the memory in the size of (1 point) a) Bit by bit b) Byte by byte c) One full Block d) None of the above Parallel Achtecture 12. Parallel Processing includes (1 point) a) SISD b) MISD c) MIMD d) Both b) and c) 13. Cache Coherency protocol can be (1 point) a) Cache update b) Cache invalidate c) None of the above d) Both a) and b) 14. In Symmetric Shared Memory Multiprocessor, the memory access is called (1 point) a) UMA b) NUMA c) Cache Coherency d) Both b) and c) Parallel & Distributed Processing Prof. Ali El-Moursy 5 Analytical Model & Parallel Algorithms 15. Assume an ideal case with no parallel algorithm overhead, which of the following decomposition technique/s may cause a slowdown (1 point) a) Data Decomposition b) Exploratory Decomposition c) Recursive Decomposition d) All of the above 16. The overhead that a parallel program incurs due to interaction among its processes depends on(1 point) a) The volume of Data exchange during interaction b) The frequency of interation c) The spatial and temporal pattern of interactions d) All of the above 17. Which of the following technique/s is/are used to minimize the concurrent task interaction overhead (1 point) a) Data replication b) Optimized collective communication operations c) Overlaping computation with interaction d) All of the above Parallel & Distributed Processing Prof. Ali El-Moursy 6