Podcast
Questions and Answers
What is the minimum size for records to be compressed in WiredTiger?
What is the minimum size for records to be compressed in WiredTiger?
- 128 bytes (correct)
- 64 bytes
- 256 bytes
- 512 bytes
What is the maximum size limit for WiredTiger journal files?
What is the maximum size limit for WiredTiger journal files?
- 100 MB (correct)
- 150 MB
- 200 MB
- 50 MB
What happens to older journal files in WiredTiger?
What happens to older journal files in WiredTiger?
- They are compressed to save space.
- They are automatically removed after successful recovery. (correct)
- They are permanently retained for historical analysis.
- They are archived indefinitely.
Why does WiredTiger pre-allocate journal files?
Why does WiredTiger pre-allocate journal files?
How should disk space for journal files in WiredTiger be estimated?
How should disk space for journal files in WiredTiger be estimated?
What command is used to initiate a replica set for a shard?
What command is used to initiate a replica set for a shard?
Which of the following parameters must be included when starting a shard with mongod?
Which of the following parameters must be included when starting a shard with mongod?
What method allows you to enable sharding for a specific database?
What method allows you to enable sharding for a specific database?
When configuring a shard collection, which of the following sharding types equally distributes data across shards?
When configuring a shard collection, which of the following sharding types equally distributes data across shards?
Which command is used to add a shard to the cluster?
Which command is used to add a shard to the cluster?
What is the purpose of a journal in MongoDB's storage engine?
What is the purpose of a journal in MongoDB's storage engine?
In the context of MongoDB, what does the command sh.shardCollection() do?
In the context of MongoDB, what does the command sh.shardCollection() do?
Which command should be omitted for initializing a replica set member while setting up a shard?
Which command should be omitted for initializing a replica set member while setting up a shard?
What does a shard key do in MongoDB?
What does a shard key do in MongoDB?
What is the maximum size of each data chunk in MongoDB sharding?
What is the maximum size of each data chunk in MongoDB sharding?
Which sharding strategy divides data into ranges based on shard key values?
Which sharding strategy divides data into ranges based on shard key values?
What role does the balancer serve in MongoDB sharding?
What role does the balancer serve in MongoDB sharding?
How many mongod processes can a configuration server replica set have at maximum?
How many mongod processes can a configuration server replica set have at maximum?
Which of the following describes hashed sharding?
Which of the following describes hashed sharding?
What is the purpose of zone sharding in MongoDB?
What is the purpose of zone sharding in MongoDB?
What command is used to initiate the configuration server in a replica set?
What command is used to initiate the configuration server in a replica set?
What is the primary advantage of using sharding in MongoDB?
What is the primary advantage of using sharding in MongoDB?
What role does the 'mongos' serve in a sharded cluster?
What role does the 'mongos' serve in a sharded cluster?
Which of the following describes a shard in MongoDB?
Which of the following describes a shard in MongoDB?
What is the function of config servers in a MongoDB sharded cluster?
What is the function of config servers in a MongoDB sharded cluster?
Which scaling method allows adding more servers to handle increased workload in MongoDB?
Which scaling method allows adding more servers to handle increased workload in MongoDB?
How does sharding contribute to high availability in MongoDB?
How does sharding contribute to high availability in MongoDB?
What does zone sharding facilitate in a distributed database context?
What does zone sharding facilitate in a distributed database context?
How much additional throughput does each shard add in MongoDB?
How much additional throughput does each shard add in MongoDB?
What is the default storage engine in MongoDB?
What is the default storage engine in MongoDB?
Which method does WiredTiger use to handle transaction concurrency?
Which method does WiredTiger use to handle transaction concurrency?
How many read and write transactions can WiredTiger handle per node at maximum?
How many read and write transactions can WiredTiger handle per node at maximum?
What is the purpose of snapshots in WiredTiger?
What is the purpose of snapshots in WiredTiger?
What is the frequency of checkpoints in WiredTiger?
What is the frequency of checkpoints in WiredTiger?
Which compression method is employed for journal data in WiredTiger?
Which compression method is employed for journal data in WiredTiger?
What happens to performance when WiredTiger encounters heavy write workloads?
What happens to performance when WiredTiger encounters heavy write workloads?
What is a characteristic of document-level concurrency in WiredTiger?
What is a characteristic of document-level concurrency in WiredTiger?
Flashcards are hidden until you start studying
Study Notes
MongoDB Sharding
- Allows horizontal scaling for large workloads and data sets
- Employs a shared-nothing architecture, where nodes don't share resources
- Data is divided into shards, each functioning as a replica set for redundancy and availability
- Shards handle specific workloads, and partitions can be added or removed based on demand
Two Methods of Addressing System Growth
- Vertical Scaling: Expanding the capacity of a single server by upgrading CPU, RAM, and storage space
- Horizontal Scaling: Distributing the system's dataset and load across multiple servers, adding more servers as needed
Sharded Cluster Components
- Shard: Component of the sharded data, deployed as a replica set
- Mongos: Serves as a query router, connecting client applications to the sharded cluster
- Config Servers: Store cluster metadata and configuration parameters, must be a replica set (CSRS)
MongoDB Sharding Benefits
- Increased Read/Write Throughput: Achieved by distributing the data across multiple shards, allowing parallel processing. Example: Each shard can process 1,000 operations per second.
- Increased Storage Capacity: Adding a shard increases the total storage capacity. Example: If one shard is 4TB, adding another increases capacity by an additional 4TB.
- High Availability: Replica sets are essential for sharding, improving data availability by utilizing multiple servers.
- Data Locality: Zone sharding facilitates the design of distributed databases for geographically dispersed applications, supporting data residency in specific regions.
MongoDB Sharding Data Distribution
- Shard Key: Used to distribute a collection's documents across shards. Data is partitioned into non-overlapping intervals based on shard key values.
- Data Chunk Size: Maximum size of each data chunk is 128MB.
- Balancer: A background function that automatically migrates data chunks between shards to ensure equal data distribution.
Sharding Strategies
- Ranging Sharding: Divides data into ranges based on shard key values. Shard keys with similar values are likely to be stored in the same chunk, enabling focused operations.
- Hashed Sharding: Creates a hash of the shard key field's value. Data chunks are assigned ranges based on hashed shard key values, ensuring even data spread across shards.
- Zone Sharding: Separates data into distinct zones based on application requirements. Each zone can be associated with one or more shards and can store data from multiple zones.
How to Create a Sharded Collection in MongoDB
-
Step 1: Set up Config Servers:
- Start each configuration server replica set member with the
--configsvr
option. - Link to one of the replica set members using
mongosh
. - Use
rs.initiate()
on one of the replica set members to initialize the replica set.
- Start each configuration server replica set member with the
-
Step 2: Set up Shards:
- Start each shard replica set member with the
--shardsvr
option and a unique replica set name. - Link to one of the replica set members using
mongosh
. - Use
rs.initiate()
on one of the replica set members, excluding the--configsvr
option.
- Start each shard replica set member with the
-
Step 3: Start Mongos:
- Configure Mongos and point it to the configuration server replica set using
mongos --configdb
.
- Configure Mongos and point it to the configuration server replica set using
-
Step 4: Configure and Enable Sharding:
- Connect to Mongos using
mongosh
. - Add shards to the cluster using
sh.addShard()
. - Enable sharding for the database using
sh.enableSharding()
. - Shard the collection with the
sh.shardCollection()
method. Choose between hashed sharding for even data distribution or range-based sharding for optimized distribution based on shard key values.
- Connect to Mongos using
MongoDB Storage Systems
- Storage Engine: The primary component responsible for managing data.
- GridFS for Self-Managed Deployments: A storage system for handling files exceeding the 16MB document size limit.
- Journal: A write-ahead log system with checkpoints for data recovery after a shutdown.
WiredTiger Storage Engine
- Default Engine: Automatically selected unless otherwise specified. Supported in all MongoDB deployments.
- Operation & Limitations:
- Uses optimistic concurrency control.
- Transaction concurrency is dynamically optimized, with a limit of 128 read and 128 write transactions per node.
- WiredTiger cache isn't partitioned, potentially degrading performance under heavy write workloads.
- Document Level Concurrency: Uses MultiVersion Concurrency Control (MVCC), allowing multiple clients to write to different documents simultaneously.
- Snapshots & Checkpoints: Snapshots provide consistent views of data at the start of operations. Checkpoints ensure data durability by writing consistent snapshots to disk every 60 seconds.
- Journal & Compression:
- Journal persists all data modifications between checkpoints and uses compression by default.
- Compression minimizes storage space, but requires additional CPU usage.
- WiredTiger uses block compression for collections and prefix compression for indexes.
- Journal File Size Limit: Maximum journal file size is 100MB. Older files are automatically removed after recovery.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.