42 Questions
What is the first step in designing a system that supports millions of users?
Setting up a single server to run everything initially
Which component is typically not hosted by the system’s servers when users access websites through domain names?
Domain Name System (DNS)
What protocol are HTTP requests sent through to the web server?
Hypertext Transfer Protocol (HTTP)
What is the purpose of the single server setup illustrated in Figure 1?
To illustrate running everything on a single server initially
What is the primary benefit of database replication for system performance?
Improved parallel processing of read operations
In the event of a database failure, how can the system handle the situation?
Promoting a slave database to be the new master
What is the purpose of a cache in a system architecture?
To improve system performance by storing data for faster access
What should be considered when deciding to use a cache in a system?
Frequency of data reads and modifications
How can cached data be prevented from becoming stale?
By implementing an expiration policy for cached data
What is the purpose of keeping the data store and the cache in sync?
To avoid inconsistencies and data loss
Why is it advisable to use multiple cache servers across different data centers?
To maintain system availability and avoid a single point of failure
What is the purpose of overprovisioning cache servers?
To ensure sufficient memory is available and handle increased traffic
What is the purpose of a content delivery network (CDN)?
To deliver static content using a network of geographically distributed servers
Which caching policies are commonly used in cache eviction?
LRU, LFU, and FIFO
What does a stateful server do compared to a stateless server?
Remembers client data from one request to the next
Why is setting an appropriate cache expiry time important in a CDN?
To ensure fresh content delivery
What is the purpose of moving state data out of the web tier in a stateless web tier architecture?
To scale the web tier horizontally
How does a CDN workflow function when a user visits a website?
A CDN server closest to the user delivers static content; if not available, it requests the file from the origin and caches it.
What is the benefit of fetching static assets from a CDN instead of web servers?
Better performance and lightened database load
What is the primary function of a load balancer in a web server environment?
Improving performance by evenly distributing incoming traffic
What type of databases are suitable for low latency, unstructured data, and large data sets?
Graph stores
Which method is better for handling high traffic in large scale applications?
Horizontal scaling
What is the primary benefit of separating web/mobile traffic and database servers?
Allowing independent scaling of servers
In a master/slave relationship, what is the role of slave databases in database replication?
They serve as read-only databases to improve performance and availability
Which feature makes non-relational databases suitable for serialization/deserialization requirements?
Handling unstructured data efficiently
What is the main advantage of using vertical scaling for low traffic scenarios?
Simple solution without the need for additional servers
What communication protocol does the mobile application use to interact with the web server?
HTTP
Which type of databases are suitable for join operations using SQL?
Relational databases
What is the main purpose of using database replication in a system?
Providing redundancy and failover capabilities
What is the primary function of a relational database?
Storing data in tables, rows, and supporting join operations using SQL
What is the key benefit of using non-relational databases for large data sets?
Efficient handling of unstructured data
What is the primary purpose of GeoDNS in a multi-data center setup?
To distribute traffic to the closest data center and split traffic between data centers
What is the purpose of message queues in a system with independent scaling?
To serve as a buffer for distributing asynchronous requests between producers and consumers
What is the primary benefit of database replication for system performance?
To keep data consistent and improve fault tolerance
What is the purpose of horizontal database scaling?
To store and handle large amounts of data using powerful database servers
What protocol are HTTP requests sent through to the web server?
TCP/IP
What is the purpose of a cache in a system architecture?
To temporarily store frequently accessed data for faster retrieval
Why is it advisable to use multiple cache servers across different data centers?
To avoid overloading a single cache server and improve fault tolerance
In the event of a database failure, how can the system handle the situation?
By directing all traffic to a healthy data center
What should be considered when deciding to use a cache in a system?
Scalability, fault tolerance, and eviction policies for cache management
What is the purpose of overprovisioning cache servers?
To avoid overloading a single cache server and improve fault tolerance
What is the first step in designing a system that supports millions of users?
Solve challenges like traffic redirection using GeoDNS and data synchronization across multiple data centers.
Study Notes
-
Database replication improves system performance by allowing read operations to be processed in parallel across multiple servers (slave nodes), while writes and updates are processed in master nodes.
-
Replication improves system reliability by preserving data across multiple locations, preventing data loss in the event of a natural disaster or server failure.
-
In case of a database failure, the system can handle the situation by redirecting read operations to available slave databases or promoting a healthy slave database to be the new master.
-
A cache is a temporary storage area that stores the result of expensive responses or frequently accessed data in memory for faster access, reducing the load on the database and improving system performance.
-
Cache servers provide APIs for common programming languages, making it simple to interact with them.
-
Decide when to use a cache based on data access patterns: use it when data is read frequently but modified infrequently, and save important data in persistent data stores.
-
Implement an expiration policy for cached data to prevent it from being stored in volatile memory indefinitely and becoming stale.
-
Keep the data store and the cache in sync to avoid inconsistencies.
-
Use multiple cache servers across different data centers to avoid a single point of failure and maintain system availability.
-
Overprovision cache servers by certain percentages to handle increased traffic and ensure sufficient memory is available.
-
After removing state data from web servers, auto-scaling of web tier is achieved by adding or removing servers based on traffic load.
-
Geo-routed traffic to closest data center using GeoDNS in normal operation, split traffic between data centers.
-
In the event of a data center outage, all traffic is directed to a healthy data center.
-
Multi-data center setup requires solving challenges like traffic redirection (using GeoDNS) and data synchronization (replicating data across multiple data centers).
-
Decoupling different components of the system through message queues for independent scaling.
-
Message queues serve as a buffer distributing asynchronous requests with producers and consumers interacting asynchronously.
-
Logging, monitoring, metrics, and automation are essential for large-scale websites for error identification, business insights, and system health.
-
Database scaling approaches include vertical scaling (adding more power to existing machine) and horizontal scaling (adding more machines).
-
Vertical scaling can store and handle large amounts of data using powerful database servers.
-
Database servers can be extremely powerful, Amazon RDS offers a server with 24 TB of RAM.
-
stackoverflow.com in 2013 had over 10 million monthly users, demonstrating the scale that can be achieved with powerful database servers.
Learn about the advantages of database replication, such as improved performance and increased reliability. Explore how the master-slave model distributes read and write operations, enhancing parallel query processing and preserving data in case of server destruction.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free