System Design Review
61 Questions
3 Views

System Design Review

Created by
@AngelicLosAngeles

Questions and Answers

What is synchronous replication?

  • The primary node waits for acknowledgments from secondary nodes. (correct)
  • The primary node discards data updates.
  • The primary node reports success to the client immediately.
  • The primary node does not wait for acknowledgments from secondary nodes.
  • What is an advantage of asynchronous replication?

  • All secondary nodes are completely up to date.
  • It guarantees low latency.
  • The primary node can continue its work even if all secondary nodes are down. (correct)
  • It ensures data integrity.
  • What is partitioning?

    Partitioning is the process of dividing a large dataset into smaller chunks stored at different nodes to balance the load.

    Synchronous replication requires __________ from secondary nodes.

    <p>acknowledgments</p> Signup and view all the answers

    Which of the following is a type of horizontal partitioning?

    <p>Key-range based partitioning</p> Signup and view all the answers

    Vertical partitioning involves splitting a table into multiple tables based on rows.

    <p>False</p> Signup and view all the answers

    What are key-value stores?

    <p>Key-value stores are a type of distributed data storage model that binds unique keys to specific values.</p> Signup and view all the answers

    What is one of the main functional requirements of key-value stores?

    <p>The ability to always write data into the key-value storage.</p> Signup and view all the answers

    A Content Delivery Network (CDN) consists of a single server.

    <p>False</p> Signup and view all the answers

    What is the primary purpose of a CDN?

    <p>The primary purpose of a CDN is to improve content delivery speed and reduce latency for end users.</p> Signup and view all the answers

    What is the Domain Name System (DNS)?

    <p>The DNS is the Internet's naming service that maps human-friendly domain names to machine-readable IP addresses.</p> Signup and view all the answers

    Which of the following are types of DNS hierarchy servers? (Select all that apply)

    <p>DNS Resolver</p> Signup and view all the answers

    DNS uses a hierarchical structure for its name servers.

    <p>True</p> Signup and view all the answers

    What is caching in the context of DNS?

    <p>Caching refers to the temporary storage of frequently requested resource records to reduce response time and decrease network traffic.</p> Signup and view all the answers

    What happens to the DNS if the time-to-live (TTL) value is set too large?

    <p>Users will receive outdated information longer.</p> Signup and view all the answers

    What is the primary function of a load balancer?

    <p>To fairly divide client requests among available servers to avoid overloading any single server.</p> Signup and view all the answers

    Load balancers are considered a single point of failure (SPOF).

    <p>False</p> Signup and view all the answers

    DNS names are processed from the ______ to ______.

    <p>right, left</p> Signup and view all the answers

    What protocol does DNS typically use for query messages?

    <p>UDP</p> Signup and view all the answers

    Match the following DNS terminologies with their descriptions:

    <p>DNS Resolver = Initiates queries and forwards them to other DNS servers Root-level Name Servers = Receive requests from local servers Authoritative Name Servers = Provide IP addresses of web servers Caching = Temporary storage of resource records</p> Signup and view all the answers

    Why does the DNS use load balancing?

    <p>To distribute client requests to multiple servers to ensure availability and enhance performance.</p> Signup and view all the answers

    Why is a dedicated monitoring solution necessary in a distributed system?

    <p>To provide centralized visibility, detect correlations that individual logs would miss, and enable proactive alerts for faster troubleshooting.</p> Signup and view all the answers

    Which of the following is not included in server monitoring?

    <p>Monitoring user activity</p> Signup and view all the answers

    Monitoring network links and paths' latency is important for identifying bottlenecks.

    <p>True</p> Signup and view all the answers

    The high-level components of our monitoring service include storage and a data ______ service.

    <p>collector</p> Signup and view all the answers

    What is the purpose of a time-series database in a monitoring service?

    <p>To store metrics data such as CPU usage or the number of exceptions.</p> Signup and view all the answers

    Which cloud service provider does not provide a health status page as mentioned in the content?

    <p>IBM</p> Signup and view all the answers

    What type of monitoring involves ensuring efficient communication and connectivity for external clients?

    <p>Network monitoring</p> Signup and view all the answers

    What should be excluded when testing the service's standard functionality?

    <p>Active probing</p> Signup and view all the answers

    Which component of a CDN is responsible for directing clients to the nearest CDN facility?

    <p>Routing system</p> Signup and view all the answers

    Scrubber servers are used to distribute content to edge proxy servers.

    <p>False</p> Signup and view all the answers

    What do origin servers provide to clients?

    <p>Data that is unavailable at the CDN</p> Signup and view all the answers

    The component responsible for distributing content across CDN proxy servers is called the ______.

    <p>distribution system</p> Signup and view all the answers

    Match the following CDN methods with their descriptions:

    <p>Push CDN = Content is sent automatically from the origin server to proxy servers. Pull CDN = CDN retrieves unavailable data from origin servers when requested. Anycast = Routing methodology sharing a single IP address across multiple edge servers. Client multiplexing = Client receives a list of candidate servers to choose from.</p> Signup and view all the answers

    What is one of the main benefits of the pull CDN model?

    <p>Improved storage consumption</p> Signup and view all the answers

    Dynamic content caching optimization can enhance performance by executing scripts at proxy servers.

    <p>True</p> Signup and view all the answers

    What is the purpose of a management system in a CDN?

    <p>To observe resource usage and statistics</p> Signup and view all the answers

    A unique ID generation system within a distributed system provides ______ for identifying events and objects.

    <p>Uniqueness</p> Signup and view all the answers

    Which of the following methods takes both network distance and request load into consideration to reduce latency?

    <p>DNS redirection</p> Signup and view all the answers

    What does Geographic Sharding involve?

    <p>Allocating ranges based on data center location</p> Signup and view all the answers

    What is the maximum number of IDs that can be generated in a day?

    <p>86400000</p> Signup and view all the answers

    The ID-generating server is a single point of failure (SPOF).

    <p>True</p> Signup and view all the answers

    What approach is taken in a proactive method of handling IT infrastructure failures?

    <p>Preventing downtimes before they occur.</p> Signup and view all the answers

    What is one of the issues with using physical clocks for time measurement?

    <p>They can drift away by 17 seconds per day.</p> Signup and view all the answers

    Metrics can be defined as ___ that provide insight into the system.

    <p>measurements</p> Signup and view all the answers

    What are two conventional approaches to handling failures in IT infrastructure?

    <p>Reactive and proactive approaches.</p> Signup and view all the answers

    What does logging help in monitoring?

    <p>Stores CPU usage and application-related information</p> Signup and view all the answers

    What is one issue that can occur with probers?

    <p>Incomplete coverage or lack of user imitation.</p> Signup and view all the answers

    Pushing metrics is always preferred over pulling metrics.

    <p>False</p> Signup and view all the answers

    How can user privacy be protected in client-side monitoring?

    <p>By allowing users to control what data is collected and sent.</p> Signup and view all the answers

    What is local load balancing?

    <p>Local load balancers reside within a datacenter and divide incoming requests among the pool of available servers.</p> Signup and view all the answers

    Which algorithm forwards each request to a server in a repeating sequential manner?

    <p>Round-robin scheduling</p> Signup and view all the answers

    In a weighted round-robin algorithm, what is used to determine server allocation?

    <p>Weights assigned to nodes</p> Signup and view all the answers

    Dynamic algorithms maintain state by communicating with the server.

    <p>True</p> Signup and view all the answers

    What is a key difference between stateful and stateless load balancing?

    <p>Stateful load balancing maintains session information, while stateless load balancing does not.</p> Signup and view all the answers

    What layer do Layer 7 load balancers operate on?

    <p>Application layer</p> Signup and view all the answers

    What does ACID stand for in relational databases?

    <p>Atomicity, Consistency, Isolation, Durability.</p> Signup and view all the answers

    What do NoSQL databases primarily support?

    <p>Large volumes of semi-structured and unstructured data</p> Signup and view all the answers

    A database is an organized collection of ______.

    <p>data</p> Signup and view all the answers

    What is data replication?

    <p>Data replication is keeping multiple copies of the data at various nodes to achieve availability, scalability, and performance.</p> Signup and view all the answers

    What is a common use case for key-value databases?

    <p>Storing data in key-value pairs for fast access</p> Signup and view all the answers

    Study Notes

    Domain Name System (DNS)

    • DNS is the Internet’s naming service, translating human-friendly domain names to machine-readable IP addresses.
    • Users request IP addresses via browsers, prompting queries to DNS infrastructure.
    • The DNS operates as a distributed system with multiple name servers rather than a single server.
    • Resource records (RRs) are used to store domain name to IP address mappings in the DNS database.
    • Types of resource records include type, name, and value; these vary by record type.

    DNS Hierarchy

    • DNS resolvers initiate query sequences and reside within user networks, sometimes employing caching.
    • Root-level name servers maintain servers based on top-level domains (TLDs) like .com, .edu.
    • Top-level domain name servers hold IP addresses for authoritative name servers of specific organizations.
    • Authoritative name servers provide IP addresses of web or application servers.
    • DNS names are processed right to left, contrasting with UNIX file processing.

    Query Resolution Methods

    • Two types of DNS query resolutions exist:
      • Iterative: Local server queries root, TLD, and authoritative servers sequentially.
      • Recursive: Local server handles requests for the user, querying higher-level servers.
    • Iterative queries are generally preferred due to lower load on DNS infrastructure.

    Caching in DNS

    • Caching temporarily stores frequently requested RRs, improving response time and reducing network traffic.
    • Caching can occur at various levels, such as browsers, operating systems, and local name servers.
    • TTL (Time to Live) values should be small to foster high availability and quick updates to resource records.

    DNS Reliability and Scalability

    • DNS reliability is bolstered through caching, server replication, and utilizing UDP for fast performance.
    • There are 13 logical root name servers globally managed by multiple organizations, ensuring reliability and low latency.
    • The DNS is highly scalable, with numerous replicated instances managing user queries effectively.

    Load Balancers

    • Load balancers distribute client requests among servers to prevent overload and ensure high performance.
    • They can enhance scalability, availability, and resource utilization.
    • Key functions include health checking, TLS termination, predictive analytics, and mitigating DoS attacks.
    • Load balancers can fail; hence they are often deployed in pairs or clusters for high availability.

    Global Server Load Balancing (GSLB)

    • GSLB distributes traffic across geographical regions and reroutes during datacenter failures.
    • Decisions are made based on geographic location and datacenter health.
    • GSLB can be implemented on-premises or through Load Balancing as a Service (LBaaS).

    Load Balancing in DNS

    • DNS can respond with multiple IP addresses, using techniques like round-robin for load balancing.
    • Round-robin visits IPs in a circular order but can lead to issues like uneven load distribution if not managed.
    • DNS uses short TTL for caching to improve load balancing effectiveness.

    Local Load Balancing

    • Local load balancers operate within datacenters as reverse proxies to manage incoming requests efficiently.
    • Common algorithms include:
      • Round-robin: Sequentially forwards requests to servers.
      • Weighted round-robin: Prioritizes servers based on assigned weight.
      • Least connections: Assigns requests to servers based on current connections.
      • Least response time: Selects the server with the quickest response time.
      • IP hash: Sorts requests based on users’ IP addresses for tailored service.### URL Hashing and Load Balancing Algorithms
    • URL hashing allocates requests to specific server clusters based on the client's service request URL.
    • Load balancing algorithms can be categorized as static or dynamic, where static ones don't account for the changing state of servers.
    • Dynamic algorithms adapt to the current state of servers, improving forwarding decisions but adding communication overhead among load balancers.

    Stateful vs Stateless Load Balancing

    • Stateful load balancing maintains session information, which can increase complexity and reduce scalability.
    • Stateless load balancing is faster and lightweight, typically using consistent hashing for request forwarding.
    • Stateful load balancers share state across multiple instances, while stateless ones may require local states only for operational purposes.

    Types of Load Balancers

    • Layer 4 load balancers operate based on transport protocols like TCP/UDP, maintaining connections to forward sessions consistently.
    • Layer 7 load balancers make application-aware routing decisions based on HTTP headers, cookies, and user data, enabling features like rate limiting.

    Databases in Distributed Systems

    • Simple file storage is limited in scalability and concurrent access, leading to the need for databases.
    • Databases, structured collections of data, simplify the storage, retrieval, modification, and deletion of information.

    Database Types

    • Relational Databases (SQL): Use structured schemas and SQL for queries; maintain relationships through primary and foreign keys, enforcing ACID properties.
    • Non-Relational Databases (NoSQL): No fixed structure; suitable for unstructured and semi-structured data, offering flexibility and handling large volumes of data.

    Importance of Databases

    • Manage large datasets effectively; ensure data consistency and integrity with constraints; support easy updates and provide robust security measures.
    • Replication capabilities allow high availability and disaster recovery through data duplication across servers.

    Advantages of Relational Databases

    • Enable flexibility in data modification without downtime; reduce redundancy through normalization to enhance data integrity.
    • Handle concurrency, providing error management for simultaneous data access.

    Characteristics of NoSQL Databases

    • Simple designs allow for less code and easier maintenance; horizontal scaling allows for running databases across large clusters, enhancing availability.
    • Support for unstructured data makes NoSQL databases adaptable to diverse data needs, which can include dynamic schemas.

    Types of NoSQL Databases

    • Key-Value Stores: Utilize unique key-value pairs for efficient data access; examples include Redis and Amazon DynamoDB.
    • Document Databases: Store semi-structured documents (e.g., JSON, XML) maintaining a hierarchical structure; MongoDB is a well-known example.
    • Graph Databases: Represent data as nodes and edges to illustrate relationships dynamically, aiding in complex data queries; examples are Neo4J and InfiniteGraph.
    • Columnar Databases: Focus on column-wise data storage for fast access and efficiency; popular ones include Cassandra and HBase.

    Data Replication and Partitioning

    • Replication maintains multiple data copies to ensure availability under varying conditions while improving scalability and performance.
    • Synchronous replication requires acknowledgment from all nodes before confirming updates, while asynchronous replication allows primary nodes to operate independently of secondary nodes.

    Partitioning Techniques

    • Vertical Partitioning: Separates tables into different instances to manage large datasets efficiently, optimizing for specific access patterns.
    • Horizontal Partitioning (Sharding): Distributes rows of a table into smaller segments, reducing latency and ensuring balanced load across nodes.

    Conclusion

    • Load balancers and databases are crucial for efficiently managing internet traffic and data storage, enabling scalability and consistent performance across applications.
    • The distinction between different types of load balancing and database management approaches aids in selecting the appropriate solutions for specific application requirements.### Strategies for Horizontal Partitioning
    • Key-range based partitioning assigns continuous ranges of keys to partitions.
    • Hash-based partitioning applies a hash function to an attribute, producing varying hash values that determine partition allocation.
    • Consistent hashing, list-based, and round-robin are additional horizontal sharding techniques.

    Importance of Databases

    • Efficiently manage large data volumes while ensuring consistency through constraints.
    • Facilitate easy data updates and provide robust security measures.
    • Enhance data integrity and high availability through replication.
    • Support scalability and enable efficient data retrieval and backup.

    Key-Value Store Overview

    • Key-value stores function as distributed hash tables with unique keys bound to various values, including blobs or images.
    • Designed to bypass traditional database complexities, focusing on speed and simplicity.

    Functional Requirements of Key-Value Stores

    • Configurable service: Allows trade-offs between consistency, availability, cost, and performance.
    • Always-write capability: Applications must write data consistently.
    • Hardware heterogeneity: Nodes should perform equal tasks.

    Non-Functional Requirements of Key-Value Stores

    • Scalable: Designed to run on numerous servers globally, managing a substantial user base.
    • High availability: Continuous service provision, with configurable availability.
    • Fault tolerance: System remains operational despite component failures.

    Key Differences Between Key-Value Stores and Traditional Databases

    • Key-value stores prioritize simplicity and quick value retrieval, often sacrificing strict consistency for availability and scalability.
    • Suited for unstructured data and scenarios demanding rapid access.

    Scalability in Key-Value Stores

    • Utilize consistent hashing for balanced data distribution across storage nodes.
    • Implement virtual nodes to ensure a uniform load by applying multiple hash functions to each key.

    Data Replication Strategies

    • Employ primary-secondary or peer-to-peer replication methods for data consistency during failures.
    • Utilize vector clocks to maintain causality and manage data versioning across replicas.

    Handling Failures in Key-Value Stores

    • Use Merkle trees to quickly synchronize and detect inconsistencies among replicas.
    • Each node maintains a unique Merkle tree corresponding to its key ranges for efficient checks.

    Promoting Membership for Failure Detection

    • Careful handling of node additions and removals ensures stability.
    • Utilize a gossip protocol for synchronizing membership histories among nodes.

    Content Delivery Network (CDN) Basics

    • CDNs consist of geographically distributed proxy servers to reduce latency and bandwidth.
    • Proxy servers positioned at network edges provide quick content delivery.

    Advantages of CDNs

    • Enhanced content delivery speed and reduced server load.
    • Increased scalability and availability through load distribution.
    • Global reach allowing geo-targeted content delivery.

    Functional Requirements of CDNs

    • Capability to retrieve and deliver content based on user requests.
    • Ability to update and delete cached entries in accordance with content type.

    Non-Functional Requirements of CDNs

    • Minimized latency, ensuring high performance.
    • Continuous availability, protecting against outages and attacks.
    • Reliable handling of massive traffic loads without single points of failure.

    Key Components of a CDN

    • Clients: End users accessing content via various devices.
    • Routing system: Directs clients to the nearest server based on load and content placement.
    • Proxy servers: Store and serve content quickly to users.
    • Scrubber servers: Filter traffic to protect against attacks like DDoS.
    • Distribution system: Distributes content across all proxy servers efficiently.
    • Origin servers: Store original data and provide it when content is unavailable in the CDN.

    CDN Workflow

    • Origin servers publish content to the distribution system.
    • Routing system directs clients to the appropriate proxy server based on cached content.
    • Client requests are scrubbed for security, then routed to the edge proxy server to deliver desired content.

    Push vs. Pull CDN Models

    • Push CDN automatically sends content to proxy servers from origin servers, best for static content.
    • Pull CDN retrieves unavailable data from origin servers per user request and is more suitable for dynamic content.
    • Mixing both models leverages the strengths of each approach for varying content types.

    Dynamic Content Caching Optimization

    • Dynamic content creation may leverage script execution on proxy servers to optimize delivery and processing efficiency.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    System Design Review

    More Quizzes Like This

    Understanding Domain Name System (DNS)
    10 questions
    Internet Protocols and DNS Overview
    7 questions
    Domain Name System (DNS) Introduction
    45 questions
    Use Quizgecko on...
    Browser
    Browser