Podcast
Questions and Answers
What is synchronous replication?
What is synchronous replication?
What is an advantage of asynchronous replication?
What is an advantage of asynchronous replication?
What is partitioning?
What is partitioning?
Partitioning is the process of dividing a large dataset into smaller chunks stored at different nodes to balance the load.
Synchronous replication requires __________ from secondary nodes.
Synchronous replication requires __________ from secondary nodes.
Signup and view all the answers
Which of the following is a type of horizontal partitioning?
Which of the following is a type of horizontal partitioning?
Signup and view all the answers
Vertical partitioning involves splitting a table into multiple tables based on rows.
Vertical partitioning involves splitting a table into multiple tables based on rows.
Signup and view all the answers
What are key-value stores?
What are key-value stores?
Signup and view all the answers
What is one of the main functional requirements of key-value stores?
What is one of the main functional requirements of key-value stores?
Signup and view all the answers
A Content Delivery Network (CDN) consists of a single server.
A Content Delivery Network (CDN) consists of a single server.
Signup and view all the answers
What is the primary purpose of a CDN?
What is the primary purpose of a CDN?
Signup and view all the answers
What is the Domain Name System (DNS)?
What is the Domain Name System (DNS)?
Signup and view all the answers
Which of the following are types of DNS hierarchy servers? (Select all that apply)
Which of the following are types of DNS hierarchy servers? (Select all that apply)
Signup and view all the answers
DNS uses a hierarchical structure for its name servers.
DNS uses a hierarchical structure for its name servers.
Signup and view all the answers
What is caching in the context of DNS?
What is caching in the context of DNS?
Signup and view all the answers
What happens to the DNS if the time-to-live (TTL) value is set too large?
What happens to the DNS if the time-to-live (TTL) value is set too large?
Signup and view all the answers
What is the primary function of a load balancer?
What is the primary function of a load balancer?
Signup and view all the answers
Load balancers are considered a single point of failure (SPOF).
Load balancers are considered a single point of failure (SPOF).
Signup and view all the answers
DNS names are processed from the ______ to ______.
DNS names are processed from the ______ to ______.
Signup and view all the answers
What protocol does DNS typically use for query messages?
What protocol does DNS typically use for query messages?
Signup and view all the answers
Match the following DNS terminologies with their descriptions:
Match the following DNS terminologies with their descriptions:
Signup and view all the answers
Why does the DNS use load balancing?
Why does the DNS use load balancing?
Signup and view all the answers
Why is a dedicated monitoring solution necessary in a distributed system?
Why is a dedicated monitoring solution necessary in a distributed system?
Signup and view all the answers
Which of the following is not included in server monitoring?
Which of the following is not included in server monitoring?
Signup and view all the answers
Monitoring network links and paths' latency is important for identifying bottlenecks.
Monitoring network links and paths' latency is important for identifying bottlenecks.
Signup and view all the answers
The high-level components of our monitoring service include storage and a data ______ service.
The high-level components of our monitoring service include storage and a data ______ service.
Signup and view all the answers
What is the purpose of a time-series database in a monitoring service?
What is the purpose of a time-series database in a monitoring service?
Signup and view all the answers
Which cloud service provider does not provide a health status page as mentioned in the content?
Which cloud service provider does not provide a health status page as mentioned in the content?
Signup and view all the answers
What type of monitoring involves ensuring efficient communication and connectivity for external clients?
What type of monitoring involves ensuring efficient communication and connectivity for external clients?
Signup and view all the answers
What should be excluded when testing the service's standard functionality?
What should be excluded when testing the service's standard functionality?
Signup and view all the answers
Which component of a CDN is responsible for directing clients to the nearest CDN facility?
Which component of a CDN is responsible for directing clients to the nearest CDN facility?
Signup and view all the answers
Scrubber servers are used to distribute content to edge proxy servers.
Scrubber servers are used to distribute content to edge proxy servers.
Signup and view all the answers
What do origin servers provide to clients?
What do origin servers provide to clients?
Signup and view all the answers
The component responsible for distributing content across CDN proxy servers is called the ______.
The component responsible for distributing content across CDN proxy servers is called the ______.
Signup and view all the answers
Match the following CDN methods with their descriptions:
Match the following CDN methods with their descriptions:
Signup and view all the answers
What is one of the main benefits of the pull CDN model?
What is one of the main benefits of the pull CDN model?
Signup and view all the answers
Dynamic content caching optimization can enhance performance by executing scripts at proxy servers.
Dynamic content caching optimization can enhance performance by executing scripts at proxy servers.
Signup and view all the answers
What is the purpose of a management system in a CDN?
What is the purpose of a management system in a CDN?
Signup and view all the answers
A unique ID generation system within a distributed system provides ______ for identifying events and objects.
A unique ID generation system within a distributed system provides ______ for identifying events and objects.
Signup and view all the answers
Which of the following methods takes both network distance and request load into consideration to reduce latency?
Which of the following methods takes both network distance and request load into consideration to reduce latency?
Signup and view all the answers
What does Geographic Sharding involve?
What does Geographic Sharding involve?
Signup and view all the answers
What is the maximum number of IDs that can be generated in a day?
What is the maximum number of IDs that can be generated in a day?
Signup and view all the answers
The ID-generating server is a single point of failure (SPOF).
The ID-generating server is a single point of failure (SPOF).
Signup and view all the answers
What approach is taken in a proactive method of handling IT infrastructure failures?
What approach is taken in a proactive method of handling IT infrastructure failures?
Signup and view all the answers
What is one of the issues with using physical clocks for time measurement?
What is one of the issues with using physical clocks for time measurement?
Signup and view all the answers
Metrics can be defined as ___ that provide insight into the system.
Metrics can be defined as ___ that provide insight into the system.
Signup and view all the answers
What are two conventional approaches to handling failures in IT infrastructure?
What are two conventional approaches to handling failures in IT infrastructure?
Signup and view all the answers
What does logging help in monitoring?
What does logging help in monitoring?
Signup and view all the answers
What is one issue that can occur with probers?
What is one issue that can occur with probers?
Signup and view all the answers
Pushing metrics is always preferred over pulling metrics.
Pushing metrics is always preferred over pulling metrics.
Signup and view all the answers
How can user privacy be protected in client-side monitoring?
How can user privacy be protected in client-side monitoring?
Signup and view all the answers
What is local load balancing?
What is local load balancing?
Signup and view all the answers
Which algorithm forwards each request to a server in a repeating sequential manner?
Which algorithm forwards each request to a server in a repeating sequential manner?
Signup and view all the answers
In a weighted round-robin algorithm, what is used to determine server allocation?
In a weighted round-robin algorithm, what is used to determine server allocation?
Signup and view all the answers
Dynamic algorithms maintain state by communicating with the server.
Dynamic algorithms maintain state by communicating with the server.
Signup and view all the answers
What is a key difference between stateful and stateless load balancing?
What is a key difference between stateful and stateless load balancing?
Signup and view all the answers
What layer do Layer 7 load balancers operate on?
What layer do Layer 7 load balancers operate on?
Signup and view all the answers
What does ACID stand for in relational databases?
What does ACID stand for in relational databases?
Signup and view all the answers
What do NoSQL databases primarily support?
What do NoSQL databases primarily support?
Signup and view all the answers
A database is an organized collection of ______.
A database is an organized collection of ______.
Signup and view all the answers
What is data replication?
What is data replication?
Signup and view all the answers
What is a common use case for key-value databases?
What is a common use case for key-value databases?
Signup and view all the answers
Study Notes
Domain Name System (DNS)
- DNS is the Internet’s naming service, translating human-friendly domain names to machine-readable IP addresses.
- Users request IP addresses via browsers, prompting queries to DNS infrastructure.
- The DNS operates as a distributed system with multiple name servers rather than a single server.
- Resource records (RRs) are used to store domain name to IP address mappings in the DNS database.
- Types of resource records include type, name, and value; these vary by record type.
DNS Hierarchy
- DNS resolvers initiate query sequences and reside within user networks, sometimes employing caching.
- Root-level name servers maintain servers based on top-level domains (TLDs) like .com, .edu.
- Top-level domain name servers hold IP addresses for authoritative name servers of specific organizations.
- Authoritative name servers provide IP addresses of web or application servers.
- DNS names are processed right to left, contrasting with UNIX file processing.
Query Resolution Methods
- Two types of DNS query resolutions exist:
- Iterative: Local server queries root, TLD, and authoritative servers sequentially.
- Recursive: Local server handles requests for the user, querying higher-level servers.
- Iterative queries are generally preferred due to lower load on DNS infrastructure.
Caching in DNS
- Caching temporarily stores frequently requested RRs, improving response time and reducing network traffic.
- Caching can occur at various levels, such as browsers, operating systems, and local name servers.
- TTL (Time to Live) values should be small to foster high availability and quick updates to resource records.
DNS Reliability and Scalability
- DNS reliability is bolstered through caching, server replication, and utilizing UDP for fast performance.
- There are 13 logical root name servers globally managed by multiple organizations, ensuring reliability and low latency.
- The DNS is highly scalable, with numerous replicated instances managing user queries effectively.
Load Balancers
- Load balancers distribute client requests among servers to prevent overload and ensure high performance.
- They can enhance scalability, availability, and resource utilization.
- Key functions include health checking, TLS termination, predictive analytics, and mitigating DoS attacks.
- Load balancers can fail; hence they are often deployed in pairs or clusters for high availability.
Global Server Load Balancing (GSLB)
- GSLB distributes traffic across geographical regions and reroutes during datacenter failures.
- Decisions are made based on geographic location and datacenter health.
- GSLB can be implemented on-premises or through Load Balancing as a Service (LBaaS).
Load Balancing in DNS
- DNS can respond with multiple IP addresses, using techniques like round-robin for load balancing.
- Round-robin visits IPs in a circular order but can lead to issues like uneven load distribution if not managed.
- DNS uses short TTL for caching to improve load balancing effectiveness.
Local Load Balancing
- Local load balancers operate within datacenters as reverse proxies to manage incoming requests efficiently.
- Common algorithms include:
- Round-robin: Sequentially forwards requests to servers.
- Weighted round-robin: Prioritizes servers based on assigned weight.
- Least connections: Assigns requests to servers based on current connections.
- Least response time: Selects the server with the quickest response time.
- IP hash: Sorts requests based on users’ IP addresses for tailored service.### URL Hashing and Load Balancing Algorithms
- URL hashing allocates requests to specific server clusters based on the client's service request URL.
- Load balancing algorithms can be categorized as static or dynamic, where static ones don't account for the changing state of servers.
- Dynamic algorithms adapt to the current state of servers, improving forwarding decisions but adding communication overhead among load balancers.
Stateful vs Stateless Load Balancing
- Stateful load balancing maintains session information, which can increase complexity and reduce scalability.
- Stateless load balancing is faster and lightweight, typically using consistent hashing for request forwarding.
- Stateful load balancers share state across multiple instances, while stateless ones may require local states only for operational purposes.
Types of Load Balancers
- Layer 4 load balancers operate based on transport protocols like TCP/UDP, maintaining connections to forward sessions consistently.
- Layer 7 load balancers make application-aware routing decisions based on HTTP headers, cookies, and user data, enabling features like rate limiting.
Databases in Distributed Systems
- Simple file storage is limited in scalability and concurrent access, leading to the need for databases.
- Databases, structured collections of data, simplify the storage, retrieval, modification, and deletion of information.
Database Types
- Relational Databases (SQL): Use structured schemas and SQL for queries; maintain relationships through primary and foreign keys, enforcing ACID properties.
- Non-Relational Databases (NoSQL): No fixed structure; suitable for unstructured and semi-structured data, offering flexibility and handling large volumes of data.
Importance of Databases
- Manage large datasets effectively; ensure data consistency and integrity with constraints; support easy updates and provide robust security measures.
- Replication capabilities allow high availability and disaster recovery through data duplication across servers.
Advantages of Relational Databases
- Enable flexibility in data modification without downtime; reduce redundancy through normalization to enhance data integrity.
- Handle concurrency, providing error management for simultaneous data access.
Characteristics of NoSQL Databases
- Simple designs allow for less code and easier maintenance; horizontal scaling allows for running databases across large clusters, enhancing availability.
- Support for unstructured data makes NoSQL databases adaptable to diverse data needs, which can include dynamic schemas.
Types of NoSQL Databases
- Key-Value Stores: Utilize unique key-value pairs for efficient data access; examples include Redis and Amazon DynamoDB.
- Document Databases: Store semi-structured documents (e.g., JSON, XML) maintaining a hierarchical structure; MongoDB is a well-known example.
- Graph Databases: Represent data as nodes and edges to illustrate relationships dynamically, aiding in complex data queries; examples are Neo4J and InfiniteGraph.
- Columnar Databases: Focus on column-wise data storage for fast access and efficiency; popular ones include Cassandra and HBase.
Data Replication and Partitioning
- Replication maintains multiple data copies to ensure availability under varying conditions while improving scalability and performance.
- Synchronous replication requires acknowledgment from all nodes before confirming updates, while asynchronous replication allows primary nodes to operate independently of secondary nodes.
Partitioning Techniques
- Vertical Partitioning: Separates tables into different instances to manage large datasets efficiently, optimizing for specific access patterns.
- Horizontal Partitioning (Sharding): Distributes rows of a table into smaller segments, reducing latency and ensuring balanced load across nodes.
Conclusion
- Load balancers and databases are crucial for efficiently managing internet traffic and data storage, enabling scalability and consistent performance across applications.
- The distinction between different types of load balancing and database management approaches aids in selecting the appropriate solutions for specific application requirements.### Strategies for Horizontal Partitioning
- Key-range based partitioning assigns continuous ranges of keys to partitions.
- Hash-based partitioning applies a hash function to an attribute, producing varying hash values that determine partition allocation.
- Consistent hashing, list-based, and round-robin are additional horizontal sharding techniques.
Importance of Databases
- Efficiently manage large data volumes while ensuring consistency through constraints.
- Facilitate easy data updates and provide robust security measures.
- Enhance data integrity and high availability through replication.
- Support scalability and enable efficient data retrieval and backup.
Key-Value Store Overview
- Key-value stores function as distributed hash tables with unique keys bound to various values, including blobs or images.
- Designed to bypass traditional database complexities, focusing on speed and simplicity.
Functional Requirements of Key-Value Stores
- Configurable service: Allows trade-offs between consistency, availability, cost, and performance.
- Always-write capability: Applications must write data consistently.
- Hardware heterogeneity: Nodes should perform equal tasks.
Non-Functional Requirements of Key-Value Stores
- Scalable: Designed to run on numerous servers globally, managing a substantial user base.
- High availability: Continuous service provision, with configurable availability.
- Fault tolerance: System remains operational despite component failures.
Key Differences Between Key-Value Stores and Traditional Databases
- Key-value stores prioritize simplicity and quick value retrieval, often sacrificing strict consistency for availability and scalability.
- Suited for unstructured data and scenarios demanding rapid access.
Scalability in Key-Value Stores
- Utilize consistent hashing for balanced data distribution across storage nodes.
- Implement virtual nodes to ensure a uniform load by applying multiple hash functions to each key.
Data Replication Strategies
- Employ primary-secondary or peer-to-peer replication methods for data consistency during failures.
- Utilize vector clocks to maintain causality and manage data versioning across replicas.
Handling Failures in Key-Value Stores
- Use Merkle trees to quickly synchronize and detect inconsistencies among replicas.
- Each node maintains a unique Merkle tree corresponding to its key ranges for efficient checks.
Promoting Membership for Failure Detection
- Careful handling of node additions and removals ensures stability.
- Utilize a gossip protocol for synchronizing membership histories among nodes.
Content Delivery Network (CDN) Basics
- CDNs consist of geographically distributed proxy servers to reduce latency and bandwidth.
- Proxy servers positioned at network edges provide quick content delivery.
Advantages of CDNs
- Enhanced content delivery speed and reduced server load.
- Increased scalability and availability through load distribution.
- Global reach allowing geo-targeted content delivery.
Functional Requirements of CDNs
- Capability to retrieve and deliver content based on user requests.
- Ability to update and delete cached entries in accordance with content type.
Non-Functional Requirements of CDNs
- Minimized latency, ensuring high performance.
- Continuous availability, protecting against outages and attacks.
- Reliable handling of massive traffic loads without single points of failure.
Key Components of a CDN
- Clients: End users accessing content via various devices.
- Routing system: Directs clients to the nearest server based on load and content placement.
- Proxy servers: Store and serve content quickly to users.
- Scrubber servers: Filter traffic to protect against attacks like DDoS.
- Distribution system: Distributes content across all proxy servers efficiently.
- Origin servers: Store original data and provide it when content is unavailable in the CDN.
CDN Workflow
- Origin servers publish content to the distribution system.
- Routing system directs clients to the appropriate proxy server based on cached content.
- Client requests are scrubbed for security, then routed to the edge proxy server to deliver desired content.
Push vs. Pull CDN Models
- Push CDN automatically sends content to proxy servers from origin servers, best for static content.
- Pull CDN retrieves unavailable data from origin servers per user request and is more suitable for dynamic content.
- Mixing both models leverages the strengths of each approach for varying content types.
Dynamic Content Caching Optimization
- Dynamic content creation may leverage script execution on proxy servers to optimize delivery and processing efficiency.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
System Design Review