quiz image

MapReduce Code Fragment Example CSC-25

ManageableSatire avatar
ManageableSatire
·
·
Download

Start Quiz

Study Flashcards

26 Questions

What is the main challenge in building a high-performance system?

Building a fast system

What is the term used to measure the energy efficiency of a data center?

Power Utilization Effectiveness (PUE)

What is the primary goal of warehouse-scale computing?

To achieve cost-performance optimization

What is the term used to describe the delivery of computing resources over the internet?

Cloud Computing

What is the benefit of using a server from a Google WSC?

Regular machine architecture

What is the term used to describe the process of moving data between nodes in a warehouse-scale computer?

Seeks and Scans

What does PB stand for in the context of data storage?

Petabytes

In the provided MapReduce example, what is the purpose of the 'map' function?

To produce a list of all words

What is the primary purpose of the Power Utilization Effectiveness (PUE) metric?

To measure the energy efficiency of a data center

What is the main difference between the local node and array levels in the memory hierarchy?

The network overhead

What is the typical PUE value for a data center?

Around 1.69

What is the main advantage of using regular machines in Warehouse-scale Computing?

They are more cost-effective

What is the main lesson learned from the seeks and scans example?

Avoid random seeks

What is the main feature of Cloud Computing, as advocated?

It is a public utility

What is the typical storage capacity of a Google WSC server?

256 GB

What is the main goal of cost-performance optimization in Warehouse-scale Computing?

To optimize the cost-performance ratio

What is the term used to describe a large-scale computing system that acts as a single giant machine?

Warehouse-Scale Computing (WSC)

What is the primary goal of Warehouse-Scale Computing?

Providing information technology for the world

What is the term used to describe the ability of a system to operate continuously with minimal downtime?

Dependability via redundancy

What is the primary benefit of using commercial cloud computing?

Accessibility to anyone with a credit card

What is the term used to describe the ratio of work done per unit of energy consumed?

Work done per joule

What is the primary goal of cost-performance optimization in Warehouse-Scale Computing?

Maximizing the work done per dollar

What is the term used to describe the framework for batch processing in Warehouse-Scale Computing?

MapReduce

What is the primary benefit of using MapReduce in Warehouse-Scale Computing?

Ability to process large amounts of data in parallel

What is the term used to describe the ability of a system to operate with minimal downtime, with a downtime of less than 1 hour per year?

Four nines

What is the primary benefit of using warehouse-scale computing?

Ability to provide information technology for the world

Study Notes

High Performance Architectures

  • The course is about High Performance Architectures, specifically Warehouse-scale Computers (WSCs) and Cloud Computing.

Concepts

  • A Warehouse-scale Computer (WSC) is the foundation of internet services used by billions of people worldwide.
  • WSCs act as one giant machine, with costs running into hundreds of millions of dollars for the building, electrical and cooling infrastructure, servers, and networking equipment.
  • They can house 50,000-100,000 servers.

Main Goals

  • Cost-performance: work done per dollar is critical, with a focus on scale.
  • Energy efficiency: energy consumed is turned into heat, so work done per joule is critical.
  • Dependability via redundancy: 99.99% of availability, with a maximum of 1 hour of downtime per year.

Main Requirements

  • Ample parallelism: data-level parallelism, internet service applications, and request-level parallelism.
  • Operational costs: energy, power distribution, and cooling represent over 30% of costs over 10 years.
  • Location: inexpensive electricity, proximity to Internet backbone optical fibers, and human resources nearby.

Nodes, Racks, and Switches

  • A WSC consists of nodes, racks, and switches, with a hierarchy of switches.

Memory Hierarchy

  • Local Node: 16 GiB DRAM, 128 GiB Flash, 2,048 GiB Disk, and 1 Gbit/s Ethernet port.
  • Rack: 80 nodes, with an array having 30 racks.
  • Array: 30 racks, with a networking hierarchy that increases latency and reduces bandwidth.

Networking Hierarchy

  • Regular Layer 3 routers connect arrays together and to the Internet.
  • Core routers operate in the Internet backbone.

Power Utilization Effectiveness (PUE)

  • PUE is a metric to evaluate the efficiency of a WSC.
  • PUE = Total facility power consumption / IT equipment power consumption.
  • The bigger the PUE, the less efficient the WSC.

Server from a Google WSC

  • Intel Haswell 2.3 GHz CPUs, with 2 sockets, 18 cores, 2 threads, and 72 "virtual cores" per machine.
  • 2.5 MiB last level cache per core, 256 GB DDR3-1600 DRAM, 2 × 8 TB SATA disk drives or 1 TB SSD, and 10 Gbit/s Ethernet link.

Why Regular Machines

  • Comparison of regular machines, including HP Integrity Superdome and HP ProLiant ML350 G5, with different processors, memory, disk storage, and price-performance.

Seeks and Scans

  • Importance of minimizing random seeks, with a scenario of updating 1% of records in a 1 TB database with 100 Bytes records.
  • Two scenarios: random access with 30 ms seek, read, and write time, and rewriting all records with 100 MiB/s throughput.

Cloud Computing

  • Definition: providing information technology for the world, instead of high-performance computing for scientists and engineers.
  • Quick growth of commercial cloud computing, making WSC accessible to anyone with a credit card.
  • Quote from Seymour Cray, considered the father of the supercomputer, on the potential of computing as a public utility.

This quiz is based on a code fragment example in MapReduce, covering words and documents indexing. It includes a map function and a reduce function with explanations of their inputs and outputs.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Technologies pour le Big Data
5 questions

Technologies pour le Big Data

TranquilGyrolite6380 avatar
TranquilGyrolite6380
Understanding Hadoop: MapReduce and HDFS
10 questions
Introducción a Big Data – Parte 2
12 questions
Use Quizgecko on...
Browser
Browser