Giant-Scale Internet Services Quiz
40 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is essential to start with for giant-scale services?

A professional data center and layer-7 switches

Which availability metric should be focused on as much as MTBF?

  • Uptime
  • MTTR (correct)
  • Data replication
  • Graceful degradation
  • Data replication is sufficient for preserving uptime under faults.

    False

    What does intelligent admission control help implement?

    <p>A high-availability strategy</p> Signup and view all the answers

    Use DQ analysis on all _____ to ensure reliability.

    <p>upgrades</p> Signup and view all the answers

    What should be developed to minimize downtime during upgrades?

    <p>Mostly automatic upgrade methods like rolling upgrades</p> Signup and view all the answers

    Eric A. Brewer founded the Federal Search Foundation in 2001.

    <p>False</p> Signup and view all the answers

    Who is the Chief Scientist of Inktomi?

    <p>Eric A. Brewer</p> Signup and view all the answers

    What is one of Eric A. Brewer's research interests?

    <p>Mobile and wireless computing</p> Signup and view all the answers

    What is one load-management approach that uses custom nodes for session management?

    <p>Service-specific layer-7 routers</p> Signup and view all the answers

    Which approach includes clients in the load-management process?

    <p>Smart client</p> Signup and view all the answers

    Round-robin DNS assigns different servers to different clients to achieve simple load balancing.

    <p>True</p> Signup and view all the answers

    What is the defined formula for yield?

    <p>yield = queries completed / queries offered</p> Signup and view all the answers

    What does MTTR stand for?

    <p>Mean-time-to-repair</p> Signup and view all the answers

    What is the preferred focus for giant-scale systems regarding availability?

    <p>Improving MTTR</p> Signup and view all the answers

    What happens to the effective size of a partitioned persistent store during a node failure?

    <p>It decreases</p> Signup and view all the answers

    A perfect system would have 100 percent yield and 100 percent harvest.

    <p>True</p> Signup and view all the answers

    Which metric focuses on the fraction of completed queries?

    <p>Yield</p> Signup and view all the answers

    Which system aims to maintain 100 percent harvest under a fault?

    <p>Replicated system</p> Signup and view all the answers

    What are giant web services?

    <p>Internet-based systems that provide various services such as instant messaging, wireless services, etc.</p> Signup and view all the answers

    The focus of the article is on wide-area issues such as network partitioning.

    <p>False</p> Signup and view all the answers

    Which of the following is NOT a component of the basic model for giant-scale services?

    <p>Database management system</p> Signup and view all the answers

    What advantage does centralizing infrastructure services offer?

    <p>Lower overall cost and improved efficiency.</p> Signup and view all the answers

    What is the primary role of the load manager in giant-scale services?

    <p>To balance load among active servers.</p> Signup and view all the answers

    Giant-scale services should maintain _____ availability to meet user expectations.

    <p>high</p> Signup and view all the answers

    What is a significant challenge mentioned in relation to giant-scale services?

    <p>High availability</p> Signup and view all the answers

    What is the expected number of people with internet access predicted in the next ten years?

    <p>1.1 billion</p> Signup and view all the answers

    In which type of traffic do read-only queries outnumber updates?

    <p>Read-mostly traffic</p> Signup and view all the answers

    Clusters in giant-scale services are used for independent faults.

    <p>True</p> Signup and view all the answers

    What load redirection method does the Inktomi search engine use?

    <p>Randomization</p> Signup and view all the answers

    What is the implication of losing two of five nodes in a replica group?

    <p>A redirected load of 2/3 extra load.</p> Signup and view all the answers

    Replication on disk is cheap, but accessing the replicated data requires __________ points.

    <p>DQ</p> Signup and view all the answers

    Graceful degradation mechanisms are critical for delivering high availability.

    <p>True</p> Signup and view all the answers

    What can cause traffic to exceed average levels in online ticket sales?

    <p>Single-event bursts.</p> Signup and view all the answers

    What basic constraints must be taken into account for graceful degradation?

    <p>Both A and B</p> Signup and view all the answers

    Which method guarantees that stock trade requests will be executed within 60 seconds?

    <p>Priority-based Admission Control</p> Signup and view all the answers

    What is the role of dynamic database reduction?

    <p>Reduce quality</p> Signup and view all the answers

    Natural disasters affect only one replica at a time.

    <p>False</p> Signup and view all the answers

    What is one approach to perform online evolution?

    <p>Rolling upgrade.</p> Signup and view all the answers

    During the 'big flip', we __________ switch all traffic to the upgraded nodes.

    <p>atomically</p> Signup and view all the answers

    Study Notes

    Giant-Scale Services Overview

    • Growth of web portals and ISPs, such as AOL and Yahoo, has multiplied over tenfold in five years.
    • Essential focus on infrastructure services, which include instant messaging and various remote access applications.

    Key Requirements of Giant-Scale Services

    • Need for high availability, particularly for major platforms like eBay and CNN.
    • Services must always be available, requiring robust infrastructure to handle growth and evolution.

    Basic Model for Giant-Scale Services

    • Services rely on a load manager to balance traffic among servers, enhancing availability.
    • Clients access services over the internet, utilizing a best-effort IP network.
    • Serves as an intermediary between external names and server IP addresses, ensuring reliability amidst server failures.

    Advantages of the Basic Model

    • Access Anywhere: Facilitates user access from various locations and devices, including set-top boxes and smart devices.
    • Cost Efficiency: Centralized infrastructure allows for better resource utilization compared to standalone devices.
    • Groupware Support: Centralizes data for collaboration tools, improving functionality for applications like teleconferencing and group management.
    • Efficient Upgrades: Services can be upgraded seamlessly without physical distribution capabilities.

    Clusters in Giant-Scale Services

    • Clusters consist of multiple commodity servers functioning together to meet high scalability requirements.
    • Example deployments:
      • AOL Web cache: over 1,000 nodes, processing 10 billion queries/day.
      • Inktomi search engine: over 1,000 nodes, more than 80 million queries/day.
    • Nodes generally have a three-year depreciation timeline, providing scalability as service needs grow.

    Load Management Advances

    • Modern load management utilizes layer-4 and layer-7 switches to monitor server health and distribute traffic effectively.
    • Methods include:
      • Round-robin DNS for basic load balancing.
      • Session management via service-specific front-end nodes.

    Challenges and Considerations

    • Downtime prevention is critical, requiring automatic detection and isolation of non-functioning nodes.
    • Multiple load management strategies ensure service continuity and resilience during failures.

    Persistent Data Store

    • Data storage across servers uses replicated or partitioned approaches to maintain data availability and integrity.
    • Includes options for network-attached storage systems to enhance overall system performance.

    Implications for Design and Evolution

    • Focus on scalability, availability, graceful degradation, and ease of upgrading is crucial for meeting user expectations.
    • Equipment and operational costs are often weighed against the service bandwidth demands and service quality.### System Complexity and Design
    • A simple Web farm utilizes round-robin DNS for load management with a persistent data store achieved by replicating all content to all nodes.
    • Web farms typically experience no coherence traffic and may not require a backplane, although a secondary LAN for manual updates is common.
    • In contrast, a search engine cluster supports external programs (e.g., Web servers) via layer-4 switches that balance load and mask faults, ensuring data availability despite node failures.
    • Persistent data in search engine clusters is partitioned across servers, increasing capacity but risking data loss during server outages.

    High Availability Metrics

    • High availability is critical for large-scale systems; uptime, typically expressed in “nines,” measures how often a system is operational.
    • Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) are key components impacting uptime, with uptime calculable as:
      uptime = (MTBF – MTTR) / MTBF.
    • Focus on improving MTTR is encouraged, as it allows for more manageable system adjustments compared to reducing MTBF.

    Availability Measurement Terms

    • Yield: Fraction of completed queries, calculated as
      yield = queries completed / queries offered.
    • Harvest: Fraction of available data relative to the complete dataset, defined as
      harvest = data available / complete data.
    • A perfect system would achieve 100% yield and 100% harvest, ensuring each query reflects the entire database.

    DQ Principle

    • The Data per Query (DQ) Principle suggests a relationship between data requested per query and queries processed per second, remaining constant under normal conditions.
    • Design should accommodate capacity constraints tied to physical limitations like total I/O bandwidth, particularly at high utilization common in large-scale systems.

    Replication vs. Partitioning

    • Replication increases availability by maintaining complete data but reduces yield when failures occur; partitioning maintains yield better under fault conditions.
    • Both methods initially maintain the same DQ value, but under failure, replication sees a yield reduction, while partitioned systems preserve yield but experience reduced harvest.

    Capacity and Overload Considerations

    • When a system experiences failures, redirected loads can drastically increase stress on remaining nodes, complicating server management.
    • Replication is often deemed inefficient under high utilization without adequate excess capacity to handle failures, emphasizing the need for effective load-balancing solutions.

    Graceful Degradation

    • Mechanisms for graceful degradation become crucial during excess load conditions to maintain service availability.
    • The DQ principle offers methods for graceful degradation, including limiting query capacity or reducing data to improve overall performance.
    • Strategies may involve admissions controls to lessen query load or dynamic database reductions to lower the amount of data processed, thereby increasing the effective operational capacity.### Giant-Scale Services Overview
    • Emphasis on graceful degradation to manage system availability during failures and saturation.
    • Partitioned systems can replicate key data to enhance reliability, allowing a backup node to take over if the primary fails.

    Cost-Based Admission Control (AC)

    • Inktomi employs dynamic AC based on estimated query costs, balancing data and query metrics.
    • Reducing data (D) during high demand allows for increased query capacity (Q), optimizing service provision.
    • Simplistic policies may reduce D too aggressively, risking performance.

    Priority-Based Admission Control

    • Datek prioritizes stock trade queries, ensuring execution within a strict time frame, enhancing user experience.
    • Low-value queries may be denied to preserve resources for higher-priority requests.

    Data Freshness and Saturation

    • Financial services may allow for less frequent updates on stock quotes during system saturation, trading off data accuracy for performance.
    • Cached data may not reflect the current state, impacting user experience and system yield.

    Disaster Tolerance Strategies

    • Effective disaster recovery involves managing replica groups and implementing graceful degradation to mitigate failover impacts.
    • Diversifying locations for replicas minimizes the risk from localized disasters.

    Online Evolution and Migration

    • Software and system upgrades are crucial for maintaining giant-scale services, with a focus on minimizing downtime and preserving quality.
    • Upgrades can be executed through fast reboot, rolling upgrades, or "big flip," each impacting system availability differently.

    Upgrade Approaches

    • Fast Reboot: Quick system restart; dependent on staging areas to minimize service disruption.
    • Rolling Upgrade: Sequential node upgrades; maintains service continuity with minimal capacity loss.
    • Big Flip: Simultaneous upgrades of node halves; complex but effective for substantial changes, allowing for controlled failovers.

    Key Lessons for Scalable Systems

    • Establish a professional infrastructure with appropriate metrics for availability, focusing on both uptime and user experience.
    • Monitor and measure performance with tools like DQ analysis to inform system operations and upgrades.

    Automation and Control

    • Maximize automation in upgrades to minimize disruptions, integrating smart clients for enhanced disaster recovery.
    • Anticipate and plan for fault management through intelligent resource allocation and analysis.

    Conclusion

    • Understanding and managing availability metrics is critical in designing resilient giant-scale services.
    • Continuously evolving systems require a balance between minimal changes and effective upgrades to maintain high performance.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Giant.pdf

    Description

    Test your knowledge on the lessons learned from giant-scale web services. This quiz explores the new tools and methods required to handle issues in scalable internet services. Perfect for those interested in web technology and infrastructure.

    More Like This

    Use Quizgecko on...
    Browser
    Browser