System Design Review

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is synchronous replication?

The primary node waits for acknowledgments from secondary nodes. (correct)
The primary node discards data updates.
The primary node reports success to the client immediately.
The primary node does not wait for acknowledgments from secondary nodes.

What is an advantage of asynchronous replication?

All secondary nodes are completely up to date.
It guarantees low latency.
The primary node can continue its work even if all secondary nodes are down. (correct)
It ensures data integrity.

What is partitioning?

Partitioning is the process of dividing a large dataset into smaller chunks stored at different nodes to balance the load.

Synchronous replication requires __________ from secondary nodes.

acknowledgments Signup and view all the answers

Which of the following is a type of horizontal partitioning?

Key-range based partitioning (A) Signup and view all the answers

Vertical partitioning involves splitting a table into multiple tables based on rows.

False (B) Signup and view all the answers

What are key-value stores?

Key-value stores are a type of distributed data storage model that binds unique keys to specific values. Signup and view all the answers

What is one of the main functional requirements of key-value stores?

The ability to always write data into the key-value storage. (B) Signup and view all the answers

A Content Delivery Network (CDN) consists of a single server.

False (B) Signup and view all the answers

What is the primary purpose of a CDN?

The primary purpose of a CDN is to improve content delivery speed and reduce latency for end users. Signup and view all the answers

What is the Domain Name System (DNS)?

The DNS is the Internet's naming service that maps human-friendly domain names to machine-readable IP addresses. Signup and view all the answers

Which of the following are types of DNS hierarchy servers? (Select all that apply)

DNS Resolver (A), Authoritative Name Servers (B), Top-level Domain (TLD) Name Servers (C), Root-level Name Servers (E) Signup and view all the answers

DNS uses a hierarchical structure for its name servers.

True (A) Signup and view all the answers

What is caching in the context of DNS?

Caching refers to the temporary storage of frequently requested resource records to reduce response time and decrease network traffic. Signup and view all the answers

What happens to the DNS if the time-to-live (TTL) value is set too large?

Users will receive outdated information longer. (A) Signup and view all the answers

What is the primary function of a load balancer?

To fairly divide client requests among available servers to avoid overloading any single server. Signup and view all the answers

Load balancers are considered a single point of failure (SPOF).

False (B) Signup and view all the answers

DNS names are processed from the to .

right, left Signup and view all the answers

What protocol does DNS typically use for query messages?

UDP (C) Signup and view all the answers

Match the following DNS terminologies with their descriptions:

DNS Resolver = Initiates queries and forwards them to other DNS servers Root-level Name Servers = Receive requests from local servers Authoritative Name Servers = Provide IP addresses of web servers Caching = Temporary storage of resource records Signup and view all the answers

Why does the DNS use load balancing?

To distribute client requests to multiple servers to ensure availability and enhance performance. Signup and view all the answers

Why is a dedicated monitoring solution necessary in a distributed system?

To provide centralized visibility, detect correlations that individual logs would miss, and enable proactive alerts for faster troubleshooting. Signup and view all the answers

Which of the following is not included in server monitoring?

Monitoring user activity (A) Signup and view all the answers

Monitoring network links and paths' latency is important for identifying bottlenecks.

True (A) Signup and view all the answers

The high-level components of our monitoring service include storage and a data ______ service.

collector Signup and view all the answers

What is the purpose of a time-series database in a monitoring service?

To store metrics data such as CPU usage or the number of exceptions. Signup and view all the answers

Which cloud service provider does not provide a health status page as mentioned in the content?

IBM (C) Signup and view all the answers

What type of monitoring involves ensuring efficient communication and connectivity for external clients?

Network monitoring Signup and view all the answers

What should be excluded when testing the service's standard functionality?

Active probing Signup and view all the answers

Which component of a CDN is responsible for directing clients to the nearest CDN facility?

Routing system (C) Signup and view all the answers

Scrubber servers are used to distribute content to edge proxy servers.

False (B) Signup and view all the answers

What do origin servers provide to clients?

Data that is unavailable at the CDN Signup and view all the answers

The component responsible for distributing content across CDN proxy servers is called the ______.

distribution system Signup and view all the answers

Match the following CDN methods with their descriptions:

Push CDN = Content is sent automatically from the origin server to proxy servers. Pull CDN = CDN retrieves unavailable data from origin servers when requested. Anycast = Routing methodology sharing a single IP address across multiple edge servers. Client multiplexing = Client receives a list of candidate servers to choose from. Signup and view all the answers

What is one of the main benefits of the pull CDN model?

Improved storage consumption (A) Signup and view all the answers

Dynamic content caching optimization can enhance performance by executing scripts at proxy servers.

True (A) Signup and view all the answers

What is the purpose of a management system in a CDN?

To observe resource usage and statistics Signup and view all the answers

A unique ID generation system within a distributed system provides ______ for identifying events and objects.

Uniqueness Signup and view all the answers

Which of the following methods takes both network distance and request load into consideration to reduce latency?

DNS redirection (C) Signup and view all the answers

What does Geographic Sharding involve?

Allocating ranges based on data center location (D) Signup and view all the answers

What is the maximum number of IDs that can be generated in a day?

86400000 Signup and view all the answers

The ID-generating server is a single point of failure (SPOF).

True (A) Signup and view all the answers

What approach is taken in a proactive method of handling IT infrastructure failures?

Preventing downtimes before they occur. Signup and view all the answers

What is one of the issues with using physical clocks for time measurement?

They can drift away by 17 seconds per day. (C) Signup and view all the answers

Metrics can be defined as ___ that provide insight into the system.

measurements Signup and view all the answers

What are two conventional approaches to handling failures in IT infrastructure?

Reactive and proactive approaches. Signup and view all the answers

What does logging help in monitoring?

Stores CPU usage and application-related information (C) Signup and view all the answers

What is one issue that can occur with probers?

Incomplete coverage or lack of user imitation. Signup and view all the answers

Pushing metrics is always preferred over pulling metrics.

False (B) Signup and view all the answers

How can user privacy be protected in client-side monitoring?

By allowing users to control what data is collected and sent. Signup and view all the answers

What is local load balancing?

Local load balancers reside within a datacenter and divide incoming requests among the pool of available servers. Signup and view all the answers

Which algorithm forwards each request to a server in a repeating sequential manner?

Round-robin scheduling (C) Signup and view all the answers

In a weighted round-robin algorithm, what is used to determine server allocation?

Weights assigned to nodes (C) Signup and view all the answers

Dynamic algorithms maintain state by communicating with the server.

True (A) Signup and view all the answers

What is a key difference between stateful and stateless load balancing?

Stateful load balancing maintains session information, while stateless load balancing does not. Signup and view all the answers

What layer do Layer 7 load balancers operate on?

Application layer (A) Signup and view all the answers

What does ACID stand for in relational databases?

Atomicity, Consistency, Isolation, Durability. Signup and view all the answers

What do NoSQL databases primarily support?

Large volumes of semi-structured and unstructured data (D) Signup and view all the answers

A database is an organized collection of ______.

data Signup and view all the answers

What is data replication?

Data replication is keeping multiple copies of the data at various nodes to achieve availability, scalability, and performance. Signup and view all the answers

What is a common use case for key-value databases?

Storing data in key-value pairs for fast access (C) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Domain Name System (DNS)

DNS is the Internet’s naming service, translating human-friendly domain names to machine-readable IP addresses.
Users request IP addresses via browsers, prompting queries to DNS infrastructure.
The DNS operates as a distributed system with multiple name servers rather than a single server.
Resource records (RRs) are used to store domain name to IP address mappings in the DNS database.
Types of resource records include type, name, and value; these vary by record type.

DNS Hierarchy

DNS resolvers initiate query sequences and reside within user networks, sometimes employing caching.
Root-level name servers maintain servers based on top-level domains (TLDs) like .com, .edu.
Top-level domain name servers hold IP addresses for authoritative name servers of specific organizations.
Authoritative name servers provide IP addresses of web or application servers.
DNS names are processed right to left, contrasting with UNIX file processing.

Query Resolution Methods

Two types of DNS query resolutions exist:
- Iterative: Local server queries root, TLD, and authoritative servers sequentially.
- Recursive: Local server handles requests for the user, querying higher-level servers.
Iterative queries are generally preferred due to lower load on DNS infrastructure.

Caching in DNS

Caching temporarily stores frequently requested RRs, improving response time and reducing network traffic.
Caching can occur at various levels, such as browsers, operating systems, and local name servers.
TTL (Time to Live) values should be small to foster high availability and quick updates to resource records.

DNS Reliability and Scalability

DNS reliability is bolstered through caching, server replication, and utilizing UDP for fast performance.
There are 13 logical root name servers globally managed by multiple organizations, ensuring reliability and low latency.
The DNS is highly scalable, with numerous replicated instances managing user queries effectively.

Load Balancers

Load balancers distribute client requests among servers to prevent overload and ensure high performance.
They can enhance scalability, availability, and resource utilization.
Key functions include health checking, TLS termination, predictive analytics, and mitigating DoS attacks.
Load balancers can fail; hence they are often deployed in pairs or clusters for high availability.

Global Server Load Balancing (GSLB)

GSLB distributes traffic across geographical regions and reroutes during datacenter failures.
Decisions are made based on geographic location and datacenter health.
GSLB can be implemented on-premises or through Load Balancing as a Service (LBaaS).

Load Balancing in DNS

DNS can respond with multiple IP addresses, using techniques like round-robin for load balancing.
Round-robin visits IPs in a circular order but can lead to issues like uneven load distribution if not managed.
DNS uses short TTL for caching to improve load balancing effectiveness.

Local Load Balancing

Local load balancers operate within datacenters as reverse proxies to manage incoming requests efficiently.
Common algorithms include:
- Round-robin: Sequentially forwards requests to servers.
- Weighted round-robin: Prioritizes servers based on assigned weight.
- Least connections: Assigns requests to servers based on current connections.
- Least response time: Selects the server with the quickest response time.
- IP hash: Sorts requests based on users’ IP addresses for tailored service.### URL Hashing and Load Balancing Algorithms
URL hashing allocates requests to specific server clusters based on the client's service request URL.
Load balancing algorithms can be categorized as static or dynamic, where static ones don't account for the changing state of servers.
Dynamic algorithms adapt to the current state of servers, improving forwarding decisions but adding communication overhead among load balancers.

Stateful vs Stateless Load Balancing

Stateful load balancing maintains session information, which can increase complexity and reduce scalability.
Stateless load balancing is faster and lightweight, typically using consistent hashing for request forwarding.
Stateful load balancers share state across multiple instances, while stateless ones may require local states only for operational purposes.

Types of Load Balancers

Layer 4 load balancers operate based on transport protocols like TCP/UDP, maintaining connections to forward sessions consistently.
Layer 7 load balancers make application-aware routing decisions based on HTTP headers, cookies, and user data, enabling features like rate limiting.

Databases in Distributed Systems

Simple file storage is limited in scalability and concurrent access, leading to the need for databases.
Databases, structured collections of data, simplify the storage, retrieval, modification, and deletion of information.

Database Types

Relational Databases (SQL): Use structured schemas and SQL for queries; maintain relationships through primary and foreign keys, enforcing ACID properties.
Non-Relational Databases (NoSQL): No fixed structure; suitable for unstructured and semi-structured data, offering flexibility and handling large volumes of data.

Importance of Databases

Manage large datasets effectively; ensure data consistency and integrity with constraints; support easy updates and provide robust security measures.
Replication capabilities allow high availability and disaster recovery through data duplication across servers.

Advantages of Relational Databases

Enable flexibility in data modification without downtime; reduce redundancy through normalization to enhance data integrity.
Handle concurrency, providing error management for simultaneous data access.

Characteristics of NoSQL Databases

Simple designs allow for less code and easier maintenance; horizontal scaling allows for running databases across large clusters, enhancing availability.
Support for unstructured data makes NoSQL databases adaptable to diverse data needs, which can include dynamic schemas.

Types of NoSQL Databases

Key-Value Stores: Utilize unique key-value pairs for efficient data access; examples include Redis and Amazon DynamoDB.
Document Databases: Store semi-structured documents (e.g., JSON, XML) maintaining a hierarchical structure; MongoDB is a well-known example.
Graph Databases: Represent data as nodes and edges to illustrate relationships dynamically, aiding in complex data queries; examples are Neo4J and InfiniteGraph.
Columnar Databases: Focus on column-wise data storage for fast access and efficiency; popular ones include Cassandra and HBase.

Data Replication and Partitioning

Replication maintains multiple data copies to ensure availability under varying conditions while improving scalability and performance.
Synchronous replication requires acknowledgment from all nodes before confirming updates, while asynchronous replication allows primary nodes to operate independently of secondary nodes.

Partitioning Techniques

Vertical Partitioning: Separates tables into different instances to manage large datasets efficiently, optimizing for specific access patterns.
Horizontal Partitioning (Sharding): Distributes rows of a table into smaller segments, reducing latency and ensuring balanced load across nodes.

Conclusion

Load balancers and databases are crucial for efficiently managing internet traffic and data storage, enabling scalability and consistent performance across applications.
The distinction between different types of load balancing and database management approaches aids in selecting the appropriate solutions for specific application requirements.### Strategies for Horizontal Partitioning
Key-range based partitioning assigns continuous ranges of keys to partitions.
Hash-based partitioning applies a hash function to an attribute, producing varying hash values that determine partition allocation.
Consistent hashing, list-based, and round-robin are additional horizontal sharding techniques.

Importance of Databases

Efficiently manage large data volumes while ensuring consistency through constraints.
Facilitate easy data updates and provide robust security measures.
Enhance data integrity and high availability through replication.
Support scalability and enable efficient data retrieval and backup.

Key-Value Store Overview

Key-value stores function as distributed hash tables with unique keys bound to various values, including blobs or images.
Designed to bypass traditional database complexities, focusing on speed and simplicity.

Functional Requirements of Key-Value Stores

Configurable service: Allows trade-offs between consistency, availability, cost, and performance.
Always-write capability: Applications must write data consistently.
Hardware heterogeneity: Nodes should perform equal tasks.

Non-Functional Requirements of Key-Value Stores

Scalable: Designed to run on numerous servers globally, managing a substantial user base.
High availability: Continuous service provision, with configurable availability.
Fault tolerance: System remains operational despite component failures.

Key Differences Between Key-Value Stores and Traditional Databases

Key-value stores prioritize simplicity and quick value retrieval, often sacrificing strict consistency for availability and scalability.
Suited for unstructured data and scenarios demanding rapid access.

Scalability in Key-Value Stores

Utilize consistent hashing for balanced data distribution across storage nodes.
Implement virtual nodes to ensure a uniform load by applying multiple hash functions to each key.

Data Replication Strategies

Employ primary-secondary or peer-to-peer replication methods for data consistency during failures.
Utilize vector clocks to maintain causality and manage data versioning across replicas.

Handling Failures in Key-Value Stores

Use Merkle trees to quickly synchronize and detect inconsistencies among replicas.
Each node maintains a unique Merkle tree corresponding to its key ranges for efficient checks.

Promoting Membership for Failure Detection

Careful handling of node additions and removals ensures stability.
Utilize a gossip protocol for synchronizing membership histories among nodes.

Content Delivery Network (CDN) Basics

CDNs consist of geographically distributed proxy servers to reduce latency and bandwidth.
Proxy servers positioned at network edges provide quick content delivery.

Advantages of CDNs

Enhanced content delivery speed and reduced server load.
Increased scalability and availability through load distribution.
Global reach allowing geo-targeted content delivery.

Functional Requirements of CDNs

Capability to retrieve and deliver content based on user requests.
Ability to update and delete cached entries in accordance with content type.

Non-Functional Requirements of CDNs

Minimized latency, ensuring high performance.
Continuous availability, protecting against outages and attacks.
Reliable handling of massive traffic loads without single points of failure.

Key Components of a CDN

Clients: End users accessing content via various devices.
Routing system: Directs clients to the nearest server based on load and content placement.
Proxy servers: Store and serve content quickly to users.
Scrubber servers: Filter traffic to protect against attacks like DDoS.
Distribution system: Distributes content across all proxy servers efficiently.
Origin servers: Store original data and provide it when content is unavailable in the CDN.

CDN Workflow

Origin servers publish content to the distribution system.
Routing system directs clients to the appropriate proxy server based on cached content.
Client requests are scrubbed for security, then routed to the edge proxy server to deliver desired content.

Push vs. Pull CDN Models

Push CDN automatically sends content to proxy servers from origin servers, best for static content.
Pull CDN retrieves unavailable data from origin servers per user request and is more suitable for dynamic content.
Mixing both models leverages the strengths of each approach for varying content types.

Dynamic Content Caching Optimization

Dynamic content creation may leverage script execution on proxy servers to optimize delivery and processing efficiency.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

System Design Review

Choose a study mode

Podcast

Questions and Answers

What is synchronous replication?

What is an advantage of asynchronous replication?

What is partitioning?

Synchronous replication requires __________ from secondary nodes.

Which of the following is a type of horizontal partitioning?

Vertical partitioning involves splitting a table into multiple tables based on rows.

What are key-value stores?

What is one of the main functional requirements of key-value stores?

A Content Delivery Network (CDN) consists of a single server.

What is the primary purpose of a CDN?

What is the Domain Name System (DNS)?

Which of the following are types of DNS hierarchy servers? (Select all that apply)

DNS uses a hierarchical structure for its name servers.

What is caching in the context of DNS?

What happens to the DNS if the time-to-live (TTL) value is set too large?

What is the primary function of a load balancer?

Load balancers are considered a single point of failure (SPOF).

DNS names are processed from the ______ to ______.

What protocol does DNS typically use for query messages?

Match the following DNS terminologies with their descriptions:

Why does the DNS use load balancing?

Why is a dedicated monitoring solution necessary in a distributed system?

Which of the following is not included in server monitoring?

Monitoring network links and paths' latency is important for identifying bottlenecks.

The high-level components of our monitoring service include storage and a data ______ service.

What is the purpose of a time-series database in a monitoring service?

Which cloud service provider does not provide a health status page as mentioned in the content?

What type of monitoring involves ensuring efficient communication and connectivity for external clients?

What should be excluded when testing the service's standard functionality?

Which component of a CDN is responsible for directing clients to the nearest CDN facility?

Scrubber servers are used to distribute content to edge proxy servers.

What do origin servers provide to clients?

The component responsible for distributing content across CDN proxy servers is called the ______.

Match the following CDN methods with their descriptions:

What is one of the main benefits of the pull CDN model?

Dynamic content caching optimization can enhance performance by executing scripts at proxy servers.

What is the purpose of a management system in a CDN?

A unique ID generation system within a distributed system provides ______ for identifying events and objects.

Which of the following methods takes both network distance and request load into consideration to reduce latency?

What does Geographic Sharding involve?

What is the maximum number of IDs that can be generated in a day?

The ID-generating server is a single point of failure (SPOF).

What approach is taken in a proactive method of handling IT infrastructure failures?

What is one of the issues with using physical clocks for time measurement?

Metrics can be defined as ___ that provide insight into the system.

What are two conventional approaches to handling failures in IT infrastructure?

What does logging help in monitoring?

What is one issue that can occur with probers?

Pushing metrics is always preferred over pulling metrics.

How can user privacy be protected in client-side monitoring?

What is local load balancing?

Which algorithm forwards each request to a server in a repeating sequential manner?

In a weighted round-robin algorithm, what is used to determine server allocation?

Dynamic algorithms maintain state by communicating with the server.

What is a key difference between stateful and stateless load balancing?

What layer do Layer 7 load balancers operate on?

What does ACID stand for in relational databases?

What do NoSQL databases primarily support?

A database is an organized collection of ______.

What is data replication?

What is a common use case for key-value databases?

Study Notes

Domain Name System (DNS)

DNS Hierarchy

Query Resolution Methods

Caching in DNS

DNS Reliability and Scalability

Load Balancers

Global Server Load Balancing (GSLB)

Load Balancing in DNS

Local Load Balancing

Stateful vs Stateless Load Balancing

Types of Load Balancers

Databases in Distributed Systems

Database Types

Importance of Databases

DNS names are processed from the to .