MapReduce Code Fragment Example CSC-25

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the main challenge in building a high-performance system?

  • Building a fast system (correct)
  • Building a power-efficient system
  • Building a cost-effective system
  • Building a fast CPU

What is the term used to measure the energy efficiency of a data center?

  • Cost-Performance Optimization (CPO)
  • Power Utilization Effectiveness (PUE) (correct)
  • Energy Efficiency Ratio (EER)
  • Server Utilization Factor (SUF)

What is the primary goal of warehouse-scale computing?

  • To increase storage capacity
  • To achieve cost-performance optimization (correct)
  • To build a fast CPU
  • To reduce power consumption

What is the term used to describe the delivery of computing resources over the internet?

<p>Cloud Computing (A)</p> Signup and view all the answers

What is the benefit of using a server from a Google WSC?

<p>Regular machine architecture (D)</p> Signup and view all the answers

What is the term used to describe the process of moving data between nodes in a warehouse-scale computer?

<p>Seeks and Scans (A)</p> Signup and view all the answers

What does PB stand for in the context of data storage?

<p>Petabytes (A)</p> Signup and view all the answers

In the provided MapReduce example, what is the purpose of the 'map' function?

<p>To produce a list of all words (A)</p> Signup and view all the answers

What is the primary purpose of the Power Utilization Effectiveness (PUE) metric?

<p>To measure the energy efficiency of a data center (B)</p> Signup and view all the answers

What is the main difference between the local node and array levels in the memory hierarchy?

<p>The network overhead (D)</p> Signup and view all the answers

What is the typical PUE value for a data center?

<p>Around 1.69 (A)</p> Signup and view all the answers

What is the main advantage of using regular machines in Warehouse-scale Computing?

<p>They are more cost-effective (D)</p> Signup and view all the answers

What is the main lesson learned from the seeks and scans example?

<p>Avoid random seeks (D)</p> Signup and view all the answers

What is the main feature of Cloud Computing, as advocated?

<p>It is a public utility (B)</p> Signup and view all the answers

What is the typical storage capacity of a Google WSC server?

<p>256 GB (D)</p> Signup and view all the answers

What is the main goal of cost-performance optimization in Warehouse-scale Computing?

<p>To optimize the cost-performance ratio (C)</p> Signup and view all the answers

What is the term used to describe a large-scale computing system that acts as a single giant machine?

<p>Warehouse-Scale Computing (WSC) (B)</p> Signup and view all the answers

What is the primary goal of Warehouse-Scale Computing?

<p>Providing information technology for the world (A)</p> Signup and view all the answers

What is the term used to describe the ability of a system to operate continuously with minimal downtime?

<p>Dependability via redundancy (C)</p> Signup and view all the answers

What is the primary benefit of using commercial cloud computing?

<p>Accessibility to anyone with a credit card (A)</p> Signup and view all the answers

What is the term used to describe the ratio of work done per unit of energy consumed?

<p>Work done per joule (B)</p> Signup and view all the answers

What is the primary goal of cost-performance optimization in Warehouse-Scale Computing?

<p>Maximizing the work done per dollar (C)</p> Signup and view all the answers

What is the term used to describe the framework for batch processing in Warehouse-Scale Computing?

<p>MapReduce (B)</p> Signup and view all the answers

What is the primary benefit of using MapReduce in Warehouse-Scale Computing?

<p>Ability to process large amounts of data in parallel (A)</p> Signup and view all the answers

What is the term used to describe the ability of a system to operate with minimal downtime, with a downtime of less than 1 hour per year?

<p>Four nines (C)</p> Signup and view all the answers

What is the primary benefit of using warehouse-scale computing?

<p>Ability to provide information technology for the world (A)</p> Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

High Performance Architectures

  • The course is about High Performance Architectures, specifically Warehouse-scale Computers (WSCs) and Cloud Computing.

Concepts

  • A Warehouse-scale Computer (WSC) is the foundation of internet services used by billions of people worldwide.
  • WSCs act as one giant machine, with costs running into hundreds of millions of dollars for the building, electrical and cooling infrastructure, servers, and networking equipment.
  • They can house 50,000-100,000 servers.

Main Goals

  • Cost-performance: work done per dollar is critical, with a focus on scale.
  • Energy efficiency: energy consumed is turned into heat, so work done per joule is critical.
  • Dependability via redundancy: 99.99% of availability, with a maximum of 1 hour of downtime per year.

Main Requirements

  • Ample parallelism: data-level parallelism, internet service applications, and request-level parallelism.
  • Operational costs: energy, power distribution, and cooling represent over 30% of costs over 10 years.
  • Location: inexpensive electricity, proximity to Internet backbone optical fibers, and human resources nearby.

Nodes, Racks, and Switches

  • A WSC consists of nodes, racks, and switches, with a hierarchy of switches.

Memory Hierarchy

  • Local Node: 16 GiB DRAM, 128 GiB Flash, 2,048 GiB Disk, and 1 Gbit/s Ethernet port.
  • Rack: 80 nodes, with an array having 30 racks.
  • Array: 30 racks, with a networking hierarchy that increases latency and reduces bandwidth.

Networking Hierarchy

  • Regular Layer 3 routers connect arrays together and to the Internet.
  • Core routers operate in the Internet backbone.

Power Utilization Effectiveness (PUE)

  • PUE is a metric to evaluate the efficiency of a WSC.
  • PUE = Total facility power consumption / IT equipment power consumption.
  • The bigger the PUE, the less efficient the WSC.

Server from a Google WSC

  • Intel Haswell 2.3 GHz CPUs, with 2 sockets, 18 cores, 2 threads, and 72 "virtual cores" per machine.
  • 2.5 MiB last level cache per core, 256 GB DDR3-1600 DRAM, 2 × 8 TB SATA disk drives or 1 TB SSD, and 10 Gbit/s Ethernet link.

Why Regular Machines

  • Comparison of regular machines, including HP Integrity Superdome and HP ProLiant ML350 G5, with different processors, memory, disk storage, and price-performance.

Seeks and Scans

  • Importance of minimizing random seeks, with a scenario of updating 1% of records in a 1 TB database with 100 Bytes records.
  • Two scenarios: random access with 30 ms seek, read, and write time, and rewriting all records with 100 MiB/s throughput.

Cloud Computing

  • Definition: providing information technology for the world, instead of high-performance computing for scientists and engineers.
  • Quick growth of commercial cloud computing, making WSC accessible to anyone with a credit card.
  • Quote from Seymour Cray, considered the father of the supercomputer, on the potential of computing as a public utility.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

csc25-chapter_10.pdf

More Like This

Big Data
7 questions
MapReduce Data Reading Quiz
5 questions
Understanding Hadoop: MapReduce and HDFS
10 questions
Use Quizgecko on...
Browser
Browser