MapReduce Code Fragment Example CSC-25
26 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the main challenge in building a high-performance system?

  • Building a fast system (correct)
  • Building a power-efficient system
  • Building a cost-effective system
  • Building a fast CPU
  • What is the term used to measure the energy efficiency of a data center?

  • Cost-Performance Optimization (CPO)
  • Power Utilization Effectiveness (PUE) (correct)
  • Energy Efficiency Ratio (EER)
  • Server Utilization Factor (SUF)
  • What is the primary goal of warehouse-scale computing?

  • To increase storage capacity
  • To achieve cost-performance optimization (correct)
  • To build a fast CPU
  • To reduce power consumption
  • What is the term used to describe the delivery of computing resources over the internet?

    <p>Cloud Computing</p> Signup and view all the answers

    What is the benefit of using a server from a Google WSC?

    <p>Regular machine architecture</p> Signup and view all the answers

    What is the term used to describe the process of moving data between nodes in a warehouse-scale computer?

    <p>Seeks and Scans</p> Signup and view all the answers

    What does PB stand for in the context of data storage?

    <p>Petabytes</p> Signup and view all the answers

    In the provided MapReduce example, what is the purpose of the 'map' function?

    <p>To produce a list of all words</p> Signup and view all the answers

    What is the primary purpose of the Power Utilization Effectiveness (PUE) metric?

    <p>To measure the energy efficiency of a data center</p> Signup and view all the answers

    What is the main difference between the local node and array levels in the memory hierarchy?

    <p>The network overhead</p> Signup and view all the answers

    What is the typical PUE value for a data center?

    <p>Around 1.69</p> Signup and view all the answers

    What is the main advantage of using regular machines in Warehouse-scale Computing?

    <p>They are more cost-effective</p> Signup and view all the answers

    What is the main lesson learned from the seeks and scans example?

    <p>Avoid random seeks</p> Signup and view all the answers

    What is the main feature of Cloud Computing, as advocated?

    <p>It is a public utility</p> Signup and view all the answers

    What is the typical storage capacity of a Google WSC server?

    <p>256 GB</p> Signup and view all the answers

    What is the main goal of cost-performance optimization in Warehouse-scale Computing?

    <p>To optimize the cost-performance ratio</p> Signup and view all the answers

    What is the term used to describe a large-scale computing system that acts as a single giant machine?

    <p>Warehouse-Scale Computing (WSC)</p> Signup and view all the answers

    What is the primary goal of Warehouse-Scale Computing?

    <p>Providing information technology for the world</p> Signup and view all the answers

    What is the term used to describe the ability of a system to operate continuously with minimal downtime?

    <p>Dependability via redundancy</p> Signup and view all the answers

    What is the primary benefit of using commercial cloud computing?

    <p>Accessibility to anyone with a credit card</p> Signup and view all the answers

    What is the term used to describe the ratio of work done per unit of energy consumed?

    <p>Work done per joule</p> Signup and view all the answers

    What is the primary goal of cost-performance optimization in Warehouse-Scale Computing?

    <p>Maximizing the work done per dollar</p> Signup and view all the answers

    What is the term used to describe the framework for batch processing in Warehouse-Scale Computing?

    <p>MapReduce</p> Signup and view all the answers

    What is the primary benefit of using MapReduce in Warehouse-Scale Computing?

    <p>Ability to process large amounts of data in parallel</p> Signup and view all the answers

    What is the term used to describe the ability of a system to operate with minimal downtime, with a downtime of less than 1 hour per year?

    <p>Four nines</p> Signup and view all the answers

    What is the primary benefit of using warehouse-scale computing?

    <p>Ability to provide information technology for the world</p> Signup and view all the answers

    Study Notes

    High Performance Architectures

    • The course is about High Performance Architectures, specifically Warehouse-scale Computers (WSCs) and Cloud Computing.

    Concepts

    • A Warehouse-scale Computer (WSC) is the foundation of internet services used by billions of people worldwide.
    • WSCs act as one giant machine, with costs running into hundreds of millions of dollars for the building, electrical and cooling infrastructure, servers, and networking equipment.
    • They can house 50,000-100,000 servers.

    Main Goals

    • Cost-performance: work done per dollar is critical, with a focus on scale.
    • Energy efficiency: energy consumed is turned into heat, so work done per joule is critical.
    • Dependability via redundancy: 99.99% of availability, with a maximum of 1 hour of downtime per year.

    Main Requirements

    • Ample parallelism: data-level parallelism, internet service applications, and request-level parallelism.
    • Operational costs: energy, power distribution, and cooling represent over 30% of costs over 10 years.
    • Location: inexpensive electricity, proximity to Internet backbone optical fibers, and human resources nearby.

    Nodes, Racks, and Switches

    • A WSC consists of nodes, racks, and switches, with a hierarchy of switches.

    Memory Hierarchy

    • Local Node: 16 GiB DRAM, 128 GiB Flash, 2,048 GiB Disk, and 1 Gbit/s Ethernet port.
    • Rack: 80 nodes, with an array having 30 racks.
    • Array: 30 racks, with a networking hierarchy that increases latency and reduces bandwidth.

    Networking Hierarchy

    • Regular Layer 3 routers connect arrays together and to the Internet.
    • Core routers operate in the Internet backbone.

    Power Utilization Effectiveness (PUE)

    • PUE is a metric to evaluate the efficiency of a WSC.
    • PUE = Total facility power consumption / IT equipment power consumption.
    • The bigger the PUE, the less efficient the WSC.

    Server from a Google WSC

    • Intel Haswell 2.3 GHz CPUs, with 2 sockets, 18 cores, 2 threads, and 72 "virtual cores" per machine.
    • 2.5 MiB last level cache per core, 256 GB DDR3-1600 DRAM, 2 × 8 TB SATA disk drives or 1 TB SSD, and 10 Gbit/s Ethernet link.

    Why Regular Machines

    • Comparison of regular machines, including HP Integrity Superdome and HP ProLiant ML350 G5, with different processors, memory, disk storage, and price-performance.

    Seeks and Scans

    • Importance of minimizing random seeks, with a scenario of updating 1% of records in a 1 TB database with 100 Bytes records.
    • Two scenarios: random access with 30 ms seek, read, and write time, and rewriting all records with 100 MiB/s throughput.

    Cloud Computing

    • Definition: providing information technology for the world, instead of high-performance computing for scientists and engineers.
    • Quick growth of commercial cloud computing, making WSC accessible to anyone with a credit card.
    • Quote from Seymour Cray, considered the father of the supercomputer, on the potential of computing as a public utility.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    csc25-chapter_10.pdf

    Description

    This quiz is based on a code fragment example in MapReduce, covering words and documents indexing. It includes a map function and a reduce function with explanations of their inputs and outputs.

    More Like This

    Big Data
    7 questions
    MapReduce Data Reading Quiz
    5 questions
    Technologies pour le Big Data
    5 questions

    Technologies pour le Big Data

    TranquilGyrolite6380 avatar
    TranquilGyrolite6380
    Use Quizgecko on...
    Browser
    Browser