Relational Database Management System PDF
Document Details
Uploaded by DecisiveGreatWallOfChina1467
Tags
Summary
This document describes a relational database management system (RDBMS) architecture designed to handle high traffic by distributing read and write operations across multiple instances. It explains the components like web instances, read replicas (R), write master (M), cache, and connections. The document also covers master-slave and master-master replication, sharding, denormalization, and SQL tuning techniques.
Full Transcript
ℹ 12 Relational Database Management System (GitHub System Design Primer) ** Database ** ** Diagram ** *[ Source: Scaling up to your first 10 million users ]()* This diagram illustrates a database architecture designed to handle high...
ℹ 12 Relational Database Management System (GitHub System Design Primer) ** Database ** ** Diagram ** *[ Source: Scaling up to your first 10 million users ]()* This diagram illustrates a database architecture designed to handle high ==** **== traffic by distributing read and write operations across multiple instances. 1. Web Instances : ==** **== There are multiple web instances at the top of the diagram, each of == ** **== which can process incoming user requests. == == These instances interact with the database for read and write operations and utilize a cache to improve performance. 2. Read Replicas (R) : ==** **== == Two read replica databases are labeled "R." ** **== == == == Web instances send read requests (indicated by blue arrows ) to == ==** **== == == these replicas to offload read traffic from the main database. == By distributing reads across multiple replicas , the system can handle == == more requests without overloading a single database. == 3. Write Master (M) : ==** **== The write master database is labeled "M." ==** **== == == This database is designated for write operations ( data ==** **== == modifications ). == Web instances direct write requests to the master database , ** == == ** == == ensuring that data consistency is maintained across the system. == == 4. Cache : ** == == ** A cache sits to the right of the web instances , linked to each ** == == ** == == instance. Web instances use the cache (indicated by green arrows ) to retrieve == == == frequently accessed data , reducing the load on both the read replicas == == and the write master. == Caching helps improve response times and minimizes the need for == == == repetitive database queries. == 5. Connections : ** == == ** Web instances are connected to both read replicas and the write master database, allowing them to direct requests based on the operation type (read or write). Blue arrows indicate read operations from web instances to the read ** ** replicas. Green arrows represent cache interactions from web instances, which ** ** helps to handle frequently requested data. Write operations are sent to the write master, which then handles data ** ** persistence for the entire system. In this setup: ** == Read scalability is achieved by using multiple read replicas , allowing the == ** == == == system to handle a higher volume of read requests without impacting the == == performance of the master database. == ** == Write consistency is maintained by directing all write operations to a == ** == == == == == single write master. == ** == Caching further optimizes performance by reducing the number of direct == ** == == == == == database accesses for frequently requested data , thus enhancing speed == == == == == == and efficiency. == This architecture is suitable for applications that require high read throughput == == == == and efficient data retrieval , as it reduces the load on the central database == == == == == == == == == == == and improves overall performance. == == == **~ Relational database management system ( RDBMS ) == == ~** A relational database like SQL is a collection of data items organized in tables. == == == == == == == == ** == ACID is a set of properties of relational database transactions. == ** == == [ == == ]() ** == Atomicity - Each transaction is all or nothing == ** == == == == == == ** == Consistency - Any transaction will bring the database from one valid state to == ** == == == == another ** == Isolation - Executing transactions concurrently has the same results as if the == ** == == == transactions were executed serially == == == ** == Durability - Once a transaction has been committed , it will remain so == ** == == == == == == ⠀ There are many techniques to scale a relational database: master-slave replication , * == == * * * ** == == ** ** == master-master replication , federation , sharding , denormalization , and SQL == ** ** == == ** ** == == ** ** == == ** ** == tuning. == ** **~ == Master-slave replication == ~** The master serves reads and writes , replicating writes to one or more slaves , == == * * == == == == * * == == * which serve only reads. * == == — Slaves can also replicate to additional slaves in a tree-like fashion. If the master == == * * == == * * *** *** == == == goes offline , the system can continue to operate in read-only mode until a slave is == == == == == * * == == * promoted to a master or a new master is provisioned. * == == * * == == * * ** Diagram ** *[ Source: Scalability, availability, stability, patterns ]()* This diagram illustrates a master-slave replication model , often used to ==** ** == == increase database read scalability and availability. == == == * * == == 1. Master Database : ** == == ** The master database is the main point for all write and update ** ** *** ** * ==* * ** **== * * ==** ** operations. == == When a client needs to perform a write operation , it connects to the == ** == == ** * * == == * * == master , ensuring that all data modifications go through a single == == == == == == source of truth. == The master is also capable of handling read requests if necessary. == == * * == == * * 2. Slave Databases : ** == == ** There are two slave databases in this architecture, which are read- == ** **== == only copies of the master database. == == == The master database replicates its data to the slaves , keeping them == == *** *** == == == == * ** in sync with the latest changes. ** == == * == Slaves are only used for read operations ; they do not handle any == * * ** == == ** * * ==~~ direct write operations from clients. ~~== * * == == 3. Replication Process : ** == == ** == Data from the master is continuously replicated to each slave == == == *** *** * * == database to ensure consistency. == * * == == * This replication enables each slave to provide up-to-date data to == == * == == * * == == == clients for read requests. == == == 4. Client Requests : ** == == ** ** == Write Requests : A client performing write operations connects == ** == == * ** directly to the master database. ** * == == ** == Read Requests : Clients requesting read operations can connect to == ** == == * ** any of the slave databases , which helps distribute the load and ** * == == == == == reduce the burden on the master. == == Multiple clients can read from the slaves simultaneously, which == * * == == *** *** == improves the system’s scalability and performance for high-read ** **== == ** ** == == workloads. == ** Key Benefits of This Architecture: ** ** Improved Scalability: By offloading read requests to the slave databases, ** the system can handle a larger number of simultaneous client requests. ** Data Consistency: The master database remains the single point for all ** write operations, ensuring that all data modifications are centralized. ** High Availability: If the master experiences issues, the read functionality ** remains available through the slaves, allowing the system to continue handling some requests. This master-slave replication model is commonly used in applications where ** ** read operations are more frequent than writes, helping maintain data consistency while enhancing system performance and scalability. ** == Disadvantage(s): master-slave replication == ** == Additional logic is needed to promote a slave to a master. == * * == == * * == == == See Disadvantage(s): replication for points related to both master-slave and [ ]()== ** ** master-master. ** ==~ Master-master replication ~== ** == Both masters serve reads and writes and coordinate with each other on writes. If == * * == == * * == == * * * * *** *** == either master goes down , the system can continue to operate with both reads and == == == ==* * *** *** * * *** writes. *** == ** Diagram ** *[ Source: Scalability, availability, stability, patterns ]()* This diagram shows a master-master replication model , which is designed ==** ** == for high availability and enables both databases to handle read and write operations simultaneously. 1. Master Databases: ** ** There are two master databases in this setup. == ** **== Each master can perform read and write operations, allowing clients ** ** to connect to either database for data access and updates. This configuration increases redundancy and ensures that if one master fails, the other can continue handling operations without service interruption. 2. Replication : ** == == ** == Data changes on one master are continuously replicated to the == * * == == * ** *** ** ** * * == other master. == This replication keeps both databases in sync, ensuring that each * * * ** *** master has an up-to-date copy of the data. The bidirectional replication flow enables both masters to update == == == each other’s records , maintaining consistency across both databases. == 3. Client Requests: ** ** ** Read/Write Requests: Clients can direct read and write requests to ** either master database. This design allows load distribution, as clients can use both masters to access or modify data, reducing bottlenecks and balancing the workload. ** Key Benefits of This Architecture: ** ** High Availability: By allowing both masters to handle reads and writes, the ** system remains operational even if one master becomes unavailable. ** Load Balancing: Distributing client requests between two masters enhances ** scalability and improves response times, as the workload is shared. ** Data Consistency: Continuous replication ensures that data changes made ** on one master are quickly mirrored to the other, keeping both copies in sync. ** == Considerations : == ** ** == Conflict Resolution : In a master-master setup, simultaneous updates to == ** the same record on both masters can lead to data conflicts , requiring == *** ***== ==* ** mechanisms to resolve inconsistencies. ** ** ***== ** == Latency : Replication between masters introduces some latency , so a == ** == == == delay might occur before updates appear on the other master. == == == The master-master replication model is useful for applications needing high ** ** availability and load distribution, though careful attention to conflict resolution and synchronization is necessary for consistent data management. What mechanisms are required to resolve data conflicts (in Master- **~ Master Database Relication)? ~ ** In a master-master setup, simultaneous updates to the same record can lead to ** ** ** data conflicts. The following mechanisms are commonly used to resolve ** ** ** ** inconsistencies: ** 1. Last-Write-Wins (LWW) ** ** ** Description: The system resolves conflicts by retaining the latest ** * version of the data based on a timestamp. * ** ** ** Pros: Simple and fast. ** ** Cons: Risk of data loss if an older write overwrites a more critical ** update. 2. Conflict-Free Replicated Data Types (CRDTs) ** ** ** Description: Structures like sets or counters automatically merge ** ** ** ** ** conflicting updates based on mathematical rules. ** ** ** Pros: Ensures eventual consistency with no manual intervention. ** ** ** ** Cons: Limited to specific use cases. ** 3. Application-Level Conflict Resolution ** ** ** Description: The application uses custom logic to resolve conflicts, ** ** ** such as merging values or prompting users to decide. ** Pros: Highly flexible and domain-specific. ** ** Cons: Adds complexity to the application logic. ** 4. Priority-Based Resolution ** ** ** Description: Conflicts are resolved based on predefined priorities, ** ** ** such as giving preference to one node or user over another. ** ** ** ** ** Pros: Easy to implement. ** ** Cons: May introduce bias or unfairness. ** 5. Manual Conflict Resolution ** ** ** Description: Conflicts are flagged for human intervention to review ** ** ** and resolve. ** Pros: Accurate resolution for critical data. ** ** ** ** Cons: Time-consuming and not scalable. ** 6. Hybrid Approaches ** ** ** Description: Combines automatic mechanisms (e.g., LWW) for most ** ** ** cases with manual intervention for complex conflicts.** ** ** ** ** Pros: Balances scalability and accuracy. ** ** Cons: Still requires careful design. ** ** Summary ** Conflict resolution mechanisms like Last-Write-Wins, CRDTs, or application ** ** ** ** ** logic ensure that data consistency is maintained in a master-master replication ** ** ** ** setup. Each method has its strengths and trade-offs, and the choice depends ** ** ** ** ** on the system's requirements and business priorities. ** ** ** ** ** == Disadvantage(s): master-master replication == ** You'll need a load balancer or you'll need to make changes to your application == == *** *** * * == logic to determine where to write. == * * == == * Most master-master systems are either loosely consistent ( violating ACID ) or * == == == == * have increased write latency due to synchronization. * == == == == * == Conflict resolution comes more into play as more write nodes are added and == * * * == == * * * as latency increases. * == == * * == See Disadvantage(s): replication for points related to both master-slave and [ ]()== ** ** == == == master-master. == ⠀ ** == Disadvantage(s): replication == ** ℹ There is a potential for loss of data if the master fails before any newly == == *** *** == == == == == == * * == written data can be replicated to other nodes. == * == == * == Writes are replayed to the read replicas. If there are a lot of writes , the read == *** *** == == *** *** == == * * == replicas can get bogged down with replaying writes and can't do as many == == == == reads. == * The more read slaves , the more you have to replicate , which leads to greater * == == * * == == ==* *== == replication lag. == On some systems , writing to the master can spawn multiple threads to write == == * * == == * * == == * * == in parallel , whereas read replicas only support writing sequentially with a == * == == * == == * * == == * * == single thread. == == Replication adds more hardware and additional complexity. == == == == == * * == == ⠀ Source(s) and further reading: replication [ Scalability, availability, stability, patterns ]() [ Multi-master replication ]() **~When Replicating Databases, Are They Replicated On Different Servers? ~ ** ** Yes, when replicating databases, they are typically replicated on different servers. This setup enhances reliability, availability, and performance in ** distributed systems. Here's an overview of why and how it's done: ** Why Replicate Databases on Different Servers? ** 1. High Availability ** ** Replicating data across different servers ensures that even if one server fails, the data remains accessible from other servers. 2. Fault Tolerance ** ** Redundancy across multiple servers prevents a single point of failure, increasing the system's resilience. 3. Load Balancing ** ** Requests can be distributed across replicas, reducing the load on any single server and improving overall performance. 4. Geographic Proximity ** ** Replicas can be placed on servers in different regions to reduce latency for users accessing the database from various locations. 5. Backup and Recovery ** ** Replicas act as real-time backups, making it easier to recover from data corruption or accidental deletions. ** How Are Databases Replicated on Different Servers? ** 1. Replication Types: ** ** ** Master-Slave Replication ** Changes are made to the master database, and updates are ** ** propagated to slave databases. ** ** Used for read-heavy systems where slaves handle read operations. ** Master-Master Replication ** Multiple servers can both read and write, with changes replicated bidirectionally. Used for write-heavy or collaborative applications but requires conflict resolution. ** Clustered Databases ** Databases are distributed across multiple servers as a single logical unit, often used in distributed systems like Cassandra or ** ** ** CockroachDB. ** 2. Synchronous vs. Asynchronous Replication: ** ** ** Synchronous Replication: Changes are immediately written to all ** replicas, ensuring consistency but increasing latency. ** Asynchronous Replication: Changes are propagated after a delay, ** improving performance but risking temporary inconsistency. 3. Tools and Technologies: ** ** ** MySQL: Supports master-slave and master-master replication. ** ** PostgreSQL: Uses tools like Streaming Replication or Logical ** ** ** ** Replication. ** ** MongoDB: Built-in replication for distributed systems. ** ** Cloud Databases: AWS RDS, Google Cloud Spanner, etc., provide ** replication as a managed service. ** Additional Considerations: ** ** Network Latency: Replicating across distant servers can increase ** propagation delays. ** Consistency Models: Systems need to decide between strong consistency ** (all replicas have identical data) or eventual consistency (replicas sync over time). ** Costs: Replicating databases across multiple servers increases hardware ** and operational costs. ** In Summary: ** Database replication across different servers is a standard practice for ensuring ** reliability, fault tolerance, and performance. The exact setup depends on the ** system's requirements, such as consistency, latency, and scalability needs. **~ Federation ~** ** Diagram ** ℹ *[ Source: Scaling up to your first 10 million users ]()* This diagram illustrates a federated database architecture where separate ** == == ** * * == databases (DBs) handle distinct types of data , allowing for independent == * * == == * * == scaling and management of each component. == == == Overview of Components: ** ** Each database cluster (e.g., Forums DB , Users DB , Products DB ) has: ** == == ** == == == == == == ** == M ( Master ): The primary database that handles all write operations. == ** == == == == == == ** == S ( Slave ): A replica of the master that handles read operations. == ** == == == == == == ** == R ( Read Replica ): An additional replica used to further distribute read == ** == == == == operations, enhancing read scalability. == Database Clusters : == 1. Forums DB : ** == == ** This database cluster manages data related to forums. == == == == The master (M) processes all writes, while the slave (S) and read ** ** ** ** ** replica (R) handle read requests. ** This structure reduces the load on the master by distributing read == == == operations to the slave and read replica. == == == == == 2. Users DB : ** == == ** This cluster manages data related to users. Similar to the Forums DB, it has a master for writes and a slave and ** ** ** ** ** read replica for reads. ** By separating user-related data into its own database, this cluster can scale independently of the other databases. 3. Products DB : ** == == ** This cluster is dedicated to managing product-related data. It also has a master for write operations and both a slave and a read ** ** ** ** ** replica for handling read requests. ** The separate structure for products allows scaling specifically for product data, without affecting other database clusters. ** == Federation Concept : == ** ** == Federation in this context means that each database cluster operates == ** * * == == ==* independently , with each cluster dedicated to a specific type of data *== == == == == (Forums, Users, Products). This approach allows for scalability and optimization of each cluster ** == == ** ** == == ** == based on its unique data needs and query patterns. == == == Federated databases also improve fault tolerance since each database * * * == == * ** == == ** == operates independently , reducing the impact of potential failures to only == == == == one type of data. == ** == Benefits of Federated Architecture: == ** ** == Independent Scaling : Each database can be scaled based on its specific == ** == == == read and write load without impacting others. == == == == == ** == Enhanced Performance : Separating data by category (Forums, Users, == ** == == == == Products) optimizes performance for each type of request. == == == == ** == Increased Fault Tolerance : Since each database is isolated , failures in == ** == == == == == one (e.g., Users DB) don’t affect others (e.g., Forums DB, Products DB). == == == This federated database model is effective for large-scale applications that ** ** == == == == == need to manage different types of data separately , ensuring efficient handling == == == == of specific workloads while maintaining overall system stability. == == == == == *** Federation (or functional partitioning) splits up databases by function. For *** * * *** *** * ** == == *** * example, instead of a single, monolithic database , you could have three databases : * * * ==~~ ~~== * * == == ** == forums , users , and products , resulting in less read and write traffic to each == ** ** == == ** ** == == ** == == == == == == == database and therefore less replication lag. == == == — Smaller databases result in more data that can fit in memory , which in turn == == == == == == == results in more cache hits due to improved cache locality. == == == == == == == — With no single central master serializing writes you can write in parallel , ==* * == ==~~ ~~== ~~ ~~ == == == increasing throughput. == == == == Disadvantage(s): federation == == Federation is not effective if your schema requires huge functions or tables. *** *** == == == * == == * == == == == == You'll need to update your application logic to determine which database to == == == == == == == == read and write. == == Joining data from two databases is more complex with a == server link == == == == == == == [ == == ](). == == Federation adds more hardware and additional complexity. == == == == == == == == == ⠀ Source(s) and further reading: federation [ Scaling up to your first 10 million users ]() ⠀ ** ~ What is a “Server Link”? ~ ** A "server link" refers to a connection between two database servers that allows ** ** them to communicate and query each other, even though they are hosted on separate machines. This is commonly used in federated database systems to ** ** perform operations like joining data across different databases. ** ** Here’s an explanation with the requested formatting applied: ** What is a Server Link? ** 1. Definition:** ** A server link is a mechanism that enables one database server to access ** ** and query data from another server. Often used in federated architectures where data is distributed across ** ** multiple databases. Allows queries that span across multiple servers. ** ** 2. How It Works: ** ** A server link creates a logical connection between two database ** ** servers. Queries from one server can include data from the linked server using special syntax (e.g., fully-qualified table names). ** ** For example, in SQL Server, you might use Linked Servers to enable ** ** ** ** cross-server querying. 3. Examples of Server Links: ** ** ** Microsoft SQL Server: Uses Linked Servers to join tables across ** ** ** different databases or servers. ```sql SELECT * FROM ServerA.DatabaseA.SchemaA.TableA AS A JOIN ServerB.DatabaseB.SchemaB.TableB AS B ON A.ID = B.ID; ``` ** PostgreSQL: Uses Foreign Data Wrappers (FDW) to query other ** ** ** PostgreSQL databases or even non-PostgreSQL databases. ** MySQL: Uses FEDERATED storage engine to query tables on remote ** ** ** servers. ** Challenges with Server Links ** ** Complexity in Queries: Joining data from two databases requires careful ** ** handling of schemas and potential latency issues. ** ** ** ** Performance Overhead: Querying data across servers introduces network ** ** latency and may require additional optimizations. ** ** Consistency Issues: Maintaining data consistency across linked databases ** can be challenging in distributed systems. ** Additional Setup: Setting up server links often requires extra ** ** configuration, such as permissions, credentials, and network security. ** In the context of federation, server links are useful for enabling communication ** ** between distributed databases but add complexity and hardware ** ** ** requirements, which is why they are listed as a disadvantage in your text. ** ** ** **~ Sharding ~** ** Diagram ** ℹ *[ Source: Scalability, availability, stability, patterns ]()* This diagram illustrates a sharding architecture where a database is ==** **== == == * divided into smaller, more manageable segments , called " shards ," to improve * == == == == == == == scalability and performance. == == == == Components : == 1. Application : ** == == ** The application is the topmost component, handling requests from users. It interacts with a load balancer to determine the appropriate shard ** == == ** == == for each user request. 2. Load Balancer : ** == == ** The load balancer sits between the application and the shards. == == == == == == It directs user requests to the appropriate database shard based on == == == == == specific criteria. == In this example, the load balancer uses user information (such as the ** == == ** first letter of the user's name) to determine which shard to route requests to. 3. Database Shards : ** == == ** The data is divided across multiple database shards , each managing == == == == a subset of users based on the alphabetical range of their names. == == * * * * ** Shard examples: ** ** User [A-C]: This shard stores data for users whose names start ** with A, B, or C. ** User [D-F]: This shard manages data for users whose names start ** with D, E, or F. ** User [G-I]: This shard holds data for users whose names start with ** G, H, or I. ** User [X-Y]: This shard stores data for users whose names fall ** within X and Y. Each shard can be independently scaled, optimized, and maintained, allowing for efficient handling of large datasets. ~** Process: **~ When a user (e.g., "Adam") makes a request , the application forwards it ** == == ** == == == == to the load balancer. == == The load balancer uses the user's information (in this case, the first letter == == == == of the name) to route the request to the appropriate shard. == == == == For "Adam," the request is directed to the User [A-C] shard, which stores his ** ** data. ** == Benefits of Sharding : == ** ** == Improved Scalability : Each shard can be stored and managed == ** == == == == == == == separately , allowing for horizontal scaling as data grows. == == == == == == =