HPC System Administration and Management - PG-DHPCSA Past Paper

Document Details

HealthfulWombat6056

Uploaded by HealthfulWombat6056

Bharati Vidyapeeth (Deemed to Be University)

2024

ACTS, Pune

Tags

HPC System Administration Cluster Computing Data Center Management High Performance Computing

Summary

This document provides teaching guidelines for a postgraduate program in HPC System Administration and Management, specifically for March 2024. It outlines the course syllabus, with topics like data center design and management, cluster building, HPC systems management and monitoring, and a case study on Param Shavak. The document includes detailed lecture outlines and assignments.

Full Transcript

ACTS, Pune Suggested Teaching Guidelines for HPC System Administration and Management – PG-DHPCSA March 2024 Duration: 60 class room hours + 60 Lab hours Objective: To introduce HPC System Administration and Management. Prerequisites...

ACTS, Pune Suggested Teaching Guidelines for HPC System Administration and Management – PG-DHPCSA March 2024 Duration: 60 class room hours + 60 Lab hours Objective: To introduce HPC System Administration and Management. Prerequisites: Knowledge of Computer Networks Evaluation method: CCEE Theory exam– 40% weightage Lab exam (Case Study based) – 40% weightage Internal exam – 20% weightage List of Books / Other training material Text Book:  High Performance Cluster Computing: Architectures & Systems (Volume-1) by Rajkumar Buyya, Pearson Reference: 1. An Introduction to Parallel Computing: Design and Analysis of Algorithms (Authors: Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis) 2. Parallel programming in C with MPI and OpenMP (Author: Michael J. Quinn) 3. Distributed Computing and Networking: 11th International Conference, ICDCN 2010, Kolkata, India, January 3-6, 2010, Proceedings (Lecture Notes in Computer... Computer Science and General Issues) 1st Edition. Edition 4. Distributed Computing Author: Seema Shah, Sunita Mahajan, Oxford Publications Note: Each session mentioned is for theory and of 2 hours duration. Lab assignments are indicatives, faculty need to assign more assignments for better practice. Data Center: Design & Management (14 Hrs Theory) Session 1 & 2 Lecture: o Data center overview o Design issues Session 3 & 4 Lecture: o HVAC o Power sizing Session 5 Lecture: o Data center matrices and best practices o Security & safety Session 6 & 7 Lecture: o Collection, rejection and reuse of heat o Liquid cooling on data centers o Energy use systems o Cabinet & cable Management Assignment: PG-DHPCSA Page 1 of 3 ACTS, Pune Suggested Teaching Guidelines for HPC System Administration and Management – PG-DHPCSA March 2024 o Case study about Data Center and Visit of Data Center Ecosystem: Architecture of HPC Cluster (30 Hrs Theory + 44 Hrs Lab) Session 8 & 9 Lecture: o Requirement Analysis Session 10 &11 Lecture: o Building blocks of HPC Session 12 & 13 Lecture: o Hardware and software selection process o Cluster Planning o Adapting Standard Linux for HPC environment (Configuration and feature selection) Session 14 &15 Lecture: o Design of HPC Cluster Session 16 &17 Lecture: o Architecture and Cluster software Session 18 &19 Lecture: o Cluster building tools Session 20 & 21 Lecture: o Multicore-architecture o Pascal o Accelerator cards o Configuring & setting environment for accelerator cards (CUDA Library) Session 22 Lecture: o Latest trends and technologies in HPC o Case study: Param Shavak and Use Cases of Param Shavak for HPC solutions Assignment: o Write Survey Paper on Multicore processor and latest advancement in this HPC System Management and Monitoring (16 Hrs Theory + 36 Lab Hrs) Session 23 Lecture: s o IPMI o HMC Session 24 & 25 PG-DHPCSA Page 2 of 3 ACTS, Pune Suggested Teaching Guidelines for HPC System Administration and Management – PG-DHPCSA March 2024 Lecture: o User management using LDAP/NIS o Processor usage, memory usage o Network monitoring, network usage o Gangila, Nagios o Node resources Session 26, 27, 28, 29 & 30 Lecture: o System Benchmarking o Theoretical peak performance o HPL bench mark, Tuning HPL, Problem size, Block size, process grid PxQ Assignment: o operate, maintain, integrate, upgrade, and manage all HPC resources including related hardware and software Assignments –Lab: o Data Centre visit o Building a manual HPC Cluster o Building an HPC Cluster using different Cluster building and management tools o Monitoring tools installation & configuration o Network monitoring using Nagios o IPMI configuration o System benchmarking using HPL o Case study HPC Solution (PARAM Shavak) PG-DHPCSA Page 3 of 3

Use Quizgecko on...
Browser
Browser