Considerations for scaling to millions of users

ChivalrousSmokyQuartz avatar
ChivalrousSmokyQuartz
·
·
Download

Start Quiz

Study Flashcards

42 Questions

What is the first step in designing a system that supports millions of users?

Setting up a single server to run everything initially

Which component is typically not hosted by the system’s servers when users access websites through domain names?

Domain Name System (DNS)

What protocol are HTTP requests sent through to the web server?

Hypertext Transfer Protocol (HTTP)

What is the purpose of the single server setup illustrated in Figure 1?

To illustrate running everything on a single server initially

What is the primary benefit of database replication for system performance?

Improved parallel processing of read operations

In the event of a database failure, how can the system handle the situation?

Promoting a slave database to be the new master

What is the purpose of a cache in a system architecture?

To improve system performance by storing data for faster access

What should be considered when deciding to use a cache in a system?

Frequency of data reads and modifications

How can cached data be prevented from becoming stale?

By implementing an expiration policy for cached data

What is the purpose of keeping the data store and the cache in sync?

To avoid inconsistencies and data loss

Why is it advisable to use multiple cache servers across different data centers?

To maintain system availability and avoid a single point of failure

What is the purpose of overprovisioning cache servers?

To ensure sufficient memory is available and handle increased traffic

What is the purpose of a content delivery network (CDN)?

To deliver static content using a network of geographically distributed servers

Which caching policies are commonly used in cache eviction?

LRU, LFU, and FIFO

What does a stateful server do compared to a stateless server?

Remembers client data from one request to the next

Why is setting an appropriate cache expiry time important in a CDN?

To ensure fresh content delivery

What is the purpose of moving state data out of the web tier in a stateless web tier architecture?

To scale the web tier horizontally

How does a CDN workflow function when a user visits a website?

A CDN server closest to the user delivers static content; if not available, it requests the file from the origin and caches it.

What is the benefit of fetching static assets from a CDN instead of web servers?

Better performance and lightened database load

What is the primary function of a load balancer in a web server environment?

Improving performance by evenly distributing incoming traffic

What type of databases are suitable for low latency, unstructured data, and large data sets?

Graph stores

Which method is better for handling high traffic in large scale applications?

Horizontal scaling

What is the primary benefit of separating web/mobile traffic and database servers?

Allowing independent scaling of servers

In a master/slave relationship, what is the role of slave databases in database replication?

They serve as read-only databases to improve performance and availability

Which feature makes non-relational databases suitable for serialization/deserialization requirements?

Handling unstructured data efficiently

What is the main advantage of using vertical scaling for low traffic scenarios?

Simple solution without the need for additional servers

What communication protocol does the mobile application use to interact with the web server?

HTTP

Which type of databases are suitable for join operations using SQL?

Relational databases

What is the main purpose of using database replication in a system?

Providing redundancy and failover capabilities

What is the primary function of a relational database?

Storing data in tables, rows, and supporting join operations using SQL

What is the key benefit of using non-relational databases for large data sets?

Efficient handling of unstructured data

What is the primary purpose of GeoDNS in a multi-data center setup?

To distribute traffic to the closest data center and split traffic between data centers

What is the purpose of message queues in a system with independent scaling?

To serve as a buffer for distributing asynchronous requests between producers and consumers

What is the primary benefit of database replication for system performance?

To keep data consistent and improve fault tolerance

What is the purpose of horizontal database scaling?

To store and handle large amounts of data using powerful database servers

What protocol are HTTP requests sent through to the web server?

TCP/IP

What is the purpose of a cache in a system architecture?

To temporarily store frequently accessed data for faster retrieval

Why is it advisable to use multiple cache servers across different data centers?

To avoid overloading a single cache server and improve fault tolerance

In the event of a database failure, how can the system handle the situation?

By directing all traffic to a healthy data center

What should be considered when deciding to use a cache in a system?

Scalability, fault tolerance, and eviction policies for cache management

What is the purpose of overprovisioning cache servers?

To avoid overloading a single cache server and improve fault tolerance

What is the first step in designing a system that supports millions of users?

Solve challenges like traffic redirection using GeoDNS and data synchronization across multiple data centers.

Study Notes

  • Database replication improves system performance by allowing read operations to be processed in parallel across multiple servers (slave nodes), while writes and updates are processed in master nodes.

  • Replication improves system reliability by preserving data across multiple locations, preventing data loss in the event of a natural disaster or server failure.

  • In case of a database failure, the system can handle the situation by redirecting read operations to available slave databases or promoting a healthy slave database to be the new master.

  • A cache is a temporary storage area that stores the result of expensive responses or frequently accessed data in memory for faster access, reducing the load on the database and improving system performance.

  • Cache servers provide APIs for common programming languages, making it simple to interact with them.

  • Decide when to use a cache based on data access patterns: use it when data is read frequently but modified infrequently, and save important data in persistent data stores.

  • Implement an expiration policy for cached data to prevent it from being stored in volatile memory indefinitely and becoming stale.

  • Keep the data store and the cache in sync to avoid inconsistencies.

  • Use multiple cache servers across different data centers to avoid a single point of failure and maintain system availability.

  • Overprovision cache servers by certain percentages to handle increased traffic and ensure sufficient memory is available.

  • After removing state data from web servers, auto-scaling of web tier is achieved by adding or removing servers based on traffic load.

  • Geo-routed traffic to closest data center using GeoDNS in normal operation, split traffic between data centers.

  • In the event of a data center outage, all traffic is directed to a healthy data center.

  • Multi-data center setup requires solving challenges like traffic redirection (using GeoDNS) and data synchronization (replicating data across multiple data centers).

  • Decoupling different components of the system through message queues for independent scaling.

  • Message queues serve as a buffer distributing asynchronous requests with producers and consumers interacting asynchronously.

  • Logging, monitoring, metrics, and automation are essential for large-scale websites for error identification, business insights, and system health.

  • Database scaling approaches include vertical scaling (adding more power to existing machine) and horizontal scaling (adding more machines).

  • Vertical scaling can store and handle large amounts of data using powerful database servers.

  • Database servers can be extremely powerful, Amazon RDS offers a server with 24 TB of RAM.

  • stackoverflow.com in 2013 had over 10 million monthly users, demonstrating the scale that can be achieved with powerful database servers.

Learn about the advantages of database replication, such as improved performance and increased reliability. Explore how the master-slave model distributes read and write operations, enhancing parallel query processing and preserving data in case of server destruction.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser