Podcast
Questions and Answers
What is a key characteristic of persistent block storage?
What is a key characteristic of persistent block storage?
Which type of storage is not meant for sharing across multiple VM instances?
Which type of storage is not meant for sharing across multiple VM instances?
What is a disadvantage of using block storage for application data?
What is a disadvantage of using block storage for application data?
Which of the following is a feature of NoSQL databases?
Which of the following is a feature of NoSQL databases?
Signup and view all the answers
What is a primary use of block storage in cloud computing?
What is a primary use of block storage in cloud computing?
Signup and view all the answers
Which storage solution is specifically mentioned as an example of temporary block storage?
Which storage solution is specifically mentioned as an example of temporary block storage?
Signup and view all the answers
What technique is crucial for ensuring data integrity with persistent block storage?
What technique is crucial for ensuring data integrity with persistent block storage?
Signup and view all the answers
What do cloud applications mainly utilize for state management?
What do cloud applications mainly utilize for state management?
Signup and view all the answers
What is the initial event in the example of event sourcing provided?
What is the initial event in the example of event sourcing provided?
Signup and view all the answers
What does CQRS stand for in the discussed architecture?
What does CQRS stand for in the discussed architecture?
Signup and view all the answers
What is a common challenge when managing application state with microservices?
What is a common challenge when managing application state with microservices?
Signup and view all the answers
What is a primary feature of Dropbox's client application?
What is a primary feature of Dropbox's client application?
Signup and view all the answers
What should be done if a query is frequently used, according to the suggested implementation?
What should be done if a query is frequently used, according to the suggested implementation?
Signup and view all the answers
How does Dropbox handle changes notified to clients?
How does Dropbox handle changes notified to clients?
Signup and view all the answers
Which type of database is mentioned for its scalability and elasticity?
Which type of database is mentioned for its scalability and elasticity?
Signup and view all the answers
What type of storage did Dropbox primarily use until 2016?
What type of storage did Dropbox primarily use until 2016?
Signup and view all the answers
What is the role of the Order Manager (OM)
in the event sourcing example?
What is the role of the Order Manager (OM)
in the event sourcing example?
Signup and view all the answers
In Dropbox, what is the role of metadata servers?
In Dropbox, what is the role of metadata servers?
Signup and view all the answers
What disadvantage is mentioned regarding materialized views created by a microservice?
What disadvantage is mentioned regarding materialized views created by a microservice?
Signup and view all the answers
What is the maximum size for chunks in which files are divided in Dropbox?
What is the maximum size for chunks in which files are divided in Dropbox?
Signup and view all the answers
Which type of databases are suitable for specialized data such as documents or graphs?
Which type of databases are suitable for specialized data such as documents or graphs?
Signup and view all the answers
What type of API does Dropbox utilize for communication between clients and servers?
What type of API does Dropbox utilize for communication between clients and servers?
Signup and view all the answers
What unique identifier is used for each chunk in Dropbox's file storage system?
What unique identifier is used for each chunk in Dropbox's file storage system?
Signup and view all the answers
Which of the following describes the sharing capabilities of Dropbox?
Which of the following describes the sharing capabilities of Dropbox?
Signup and view all the answers
What is the primary purpose of database sharding?
What is the primary purpose of database sharding?
Signup and view all the answers
Which of the following statements about SQL queries in sharded databases is true?
Which of the following statements about SQL queries in sharded databases is true?
Signup and view all the answers
What is a key limitation of using automatic sharding?
What is a key limitation of using automatic sharding?
Signup and view all the answers
In primary/secondary replication, what must be done with write requests?
In primary/secondary replication, what must be done with write requests?
Signup and view all the answers
What happens when adding or removing a node in a hash-partitioned sharding plan?
What happens when adding or removing a node in a hash-partitioned sharding plan?
Signup and view all the answers
What is necessary for effective load balancing in sharding?
What is necessary for effective load balancing in sharding?
Signup and view all the answers
Which of the following is a common challenge when scaling relational databases through sharding?
Which of the following is a common challenge when scaling relational databases through sharding?
Signup and view all the answers
What is the typical method for splitting tables in a sharded database?
What is the typical method for splitting tables in a sharded database?
Signup and view all the answers
What is the primary reason for using two different storage systems in the discussed architecture?
What is the primary reason for using two different storage systems in the discussed architecture?
Signup and view all the answers
What is a key feature of relational databases?
What is a key feature of relational databases?
Signup and view all the answers
What does the term 'polyglot architecture' refer to in the context of databases?
What does the term 'polyglot architecture' refer to in the context of databases?
Signup and view all the answers
What is a key requirement for metadata to ensure integrity in file operations?
What is a key requirement for metadata to ensure integrity in file operations?
Signup and view all the answers
Which statement best describes the transactions required for metadata?
Which statement best describes the transactions required for metadata?
Signup and view all the answers
What is a common limitation of relational databases when used in large cloud applications?
What is a common limitation of relational databases when used in large cloud applications?
Signup and view all the answers
In the context of sharding, how is data partitioned for metadata?
In the context of sharding, how is data partitioned for metadata?
Signup and view all the answers
What does ACID stand for in the context of database transactions?
What does ACID stand for in the context of database transactions?
Signup and view all the answers
What type of database is suggested to manage metadata due to its consistency requirements?
What type of database is suggested to manage metadata due to its consistency requirements?
Signup and view all the answers
What is one advantage of relational databases mentioned in the content?
What is one advantage of relational databases mentioned in the content?
Signup and view all the answers
How do Dropbox clients receive server IPs for load balancing?
How do Dropbox clients receive server IPs for load balancing?
Signup and view all the answers
Which of the following is a characteristic that may limit relational database usage in cloud applications?
Which of the following is a characteristic that may limit relational database usage in cloud applications?
Signup and view all the answers
Which database option is a managed service provided by Amazon?
Which database option is a managed service provided by Amazon?
Signup and view all the answers
What role does sharding play in the context of metadata?
What role does sharding play in the context of metadata?
Signup and view all the answers
What is a possible drawback of using a layered monolith architecture?
What is a possible drawback of using a layered monolith architecture?
Signup and view all the answers
What is a characteristic feature of the load balancing strategy mentioned?
What is a characteristic feature of the load balancing strategy mentioned?
Signup and view all the answers
Study Notes
Cloud Computing - Lesson 4: Storage and State Management
-
Announcements:
- Feedback for the first quiz will be available after the course.
- The second quiz is available on Moodle at 12:45 today.
- Quiz deadline: Wednesday, Oct. 16, 10:45.
- Review deadline: Wednesday, Oct. 23, 10:45.
Objectives
- Present storage solutions for laaS environments.
- Discuss relational database (SQL) scalability and introduce NoSQL databases.
- Describe the Dropbox service.
- Explain state management techniques in microservices applications.
Block Storage
- Virtual disks: Available for mounting within a VM instance, acting like local hard drives or SSDs. Guest operating systems must mount storage using a file system.
- Temporary/Local storage: Attached to the host or accessed through local SAN (Storage Area Networks). Content is lost when the VM is shut down. Example: Amazon EC2 Instance store.
- Persistent block storage: Maintains data across VM shutdowns/restarts. Requires a clean shutdown for file system integrity. Example: Amazon EC2 Elastic Block Store.
Storage for Applications
- Block storage: Used for OS, libraries, and application binaries/containers.
- File systems: Not suitable for application data storage within a single VM. Not resilient to instance failure, difficult to share data across servers.
- Databases: Essential for application data management across multiple servers.
Databases
- Flavors: Databases come in many varieties, including relational (SQL) and NoSQL.
- Layered monolithic: Use a single (or few) all-purpose database accessed by all layers.
- Service-Oriented Architecture (SOA): Each microservice chooses the most suitable database (polyglot architecture).
Relational Databases
- Platform availability: Found on many cloud platforms (e.g., Heroku PostgreSQL, Amazon Aurora, RDS).
- Developer understanding: Well understood by developers having a standard query language (SQL).
- ACID semantics: Support transactions with Atomicity, Consistency, Isolation, and Durability.
- Structured data queries: Efficient at complex queries on structured data.
- Join queries: Support the creation of join queries between related tables.
- Optimized Query Engines: Employ highly optimized query engines for efficiency.
Problems with Relational Databases
- Scalability and elasticity: Cannot always adapt to large cloud application demands.
- Example: A single-node MySQL performance graph shows diminishing returns with increasing thread counts, highlighting the limitations in scalability within basic relational databases.
Scaling relational databases
- Vertical scaling: Increasing the resources of a single server. Often expensive, does not reliably scale.
- Example: Vertically scaling a large enterprise database, using an IBM mainframe running an Oracle database, as a less efficient approach to scaling.
Scalability Cube
- X-axis: Horizontal duplication (scaling by cloning).
- Y-axis: Functional decomposition (scaling by splitting different things).
- Z-axis: Data partitioning (scaling by splitting similar things).
- Now: Microservices & horizontal scalability are preferred, as shown in the scalability cube.
Horizontally scaling a relational database
- Database sharding: Split tables to multiple database instances.
- Proxy: A proxy receives queries and distributes them to shards, aggregating the results.
- Examples: PL/Proxy for PostgreSQL, MySQL Cluster.
Sharding Limits
- Primary key splits: Tables are commonly split based on their primary key, using a hash function for balanced distribution.
- Complex queries: Often involve all shards, hindering scalability.
- Automatic sharding: A guarantee of poor scalability.
- Specific sharding: Designing a good application-dependent sharding strategy is complex and workload-dependent.
- Load balancing: Ensuring consistent load balancing is vital in sharded systems.
Primary/Secondary Replication
- Write requests are processed by a single primary instance; reads can be processed by multiple secondary instances. This design is only appropriate for read-dominated use cases.
NoSQL Databases
- "Not Only SQL": A family of databases with different flavors.
- Focus: Horizontal scaling & elasticity, simpler query than SQL, usually without schema.
- Usage: In large cloud web applications.
- Examples: Key/Value, Document-oriented, Graph-oriented, Column stores.
Key/Value Stores
- Hash Map: A global hash map enabling fast key lookups (put(key,value), value = get(key)).
- Horizontal scaling: Designed for this, splitting key ranges amongst separate servers.
- Consistent hashing: Used for distributing keys efficiently among servers.
- Examples: Apache Cassandra.
NoSQL: Adding Servers
- The range of keys assigned to each server is re-distributed to accommodate new servers in a NoSQL key/value store. This permits efficient continuous scaling.
Apache Cassandra
- A NoSQL key/value store that prioritizes throughput and linear scalability, commonly deployed with thousands to hundreds of thousands of servers (e.g. at Apple).
- Stores data as BLOBs and supports huge deployments.
Object Stores
- Immutable data: Data created only once; no updates allowed.
- Versioning: May employ a version number for identifying different states of the object.
- URI based access: Clients access data by a universally unique resource identifier (URI).
- Managed service: Typically offered as a managed service by cloud providers (e.g., S3, Azure Blob).
- Caching and deployment Enabled by Content Delivery Networks (CDNs).
Example of Amazon S3
- Buckets: Objects are stored inside buckets.
- Versioning: Allows multiple past versions of the object.
- Encrypted: Uses automatic encryption and access control mechanisms.
Creating a Bucket
- Name and Region: Define the bucket's name and location.
- Properties and Permissions: Configure bucket-level settings and access privileges.
Uploading Content
- File Upload: The process of placing files into relevant S3 buckets.
Setting Public Access
- Users: Set which users have permissions to read or write to the specified objects.
Using URI from Client
- Client access to S3 bucket via a universally unique resource identifier (URI).
Usage of object storage in project
- Azure Blob Storage: Cloud storage service, similar to Amazon S3.
- Public vs. Local Images: Returning public URIs instead of local images promotes distributed caching.
- REST calls: Use REST calls (PUT) for uploading images to be manipulated by an Azure function.
Document-Oriented NoSQL
- Structured documents: Store structured documents typically as JSON or YAML instead of BLOBs.
- Indexing: Allow indexing on internal fields/values.
- Search capabilities: Support search capabilities.
- Collections and hierarchies: Support for storing data in collections and hierarchies.
Most popular NoSQL storage system & Demonstration
- MongoDB: Most widely used NoSQL database, with capabilities demonstrating use within a tutorial in the course.
- Apache CouchDB: A NoSQL database system suitable for microservice scenarios.
Graph-oriented NoSQL Databases
- Store relations: Graph-oriented databases store relations between keys, as a navigable graph.
- Examples: neo4j for use in social networks.
Column Stores
- Full column: Stores a complete column of values for an attribute.
- Aggregation and queries: Efficient in handling aggregation and search queries.
- Joining: Joining operations are complex and implemented on the client-side. Cloud-based solutions like Apache Hbase or Google Bigtable.
Conclusion
- Cloud applications depend on database choices, whether relational, NoSQL, or specialized databases for data like graphs, documents.
- Implementing microservices complicates state management compared to a single database.
- Event sourcing offers an approach for handling database consistency and interactions in complex microservice deployments.
CQRS (Command Query Responsibility Separation)
- Frequent queries: Microservices may need to frequently query data when accessing across services.
- Event sourcing implementation: Efficiency issues in implementation using events.
- Separate logic: Separate the logic for commands (updates) and queries to improve efficiency.
- Queries as views: Implement queries by creating materialized views or by subscribing.
- Long-term queries: Plan ahead for frequent queries, as implementing SQL on demand might not be efficient.
Dropbox Case Study
- Cloud storage: Provides a private file store, accessible via Web interface, users' devices, and a developer API for third-party applications.
- Sharing: Same folder shares across multiple users' spaces.
- Abstraction: Abstracted filesystem (files, directories, links, spaces).
- Client side implementations: Local folder changes get synchronized to the cloud.
- Chunked files: Large files split into chunks, identified using SHA-256, differences in chunk contents sent in compressed binary diffs.
- Metadata Servers: The metadata (e.g. who owns the files) is stored on servers within the Dropbox private cloud. The metadata server is a relational database, similar to inodes, handling data integrity consistency.
- Data Storage: Amazon EC2 and S3 object store is used (until 2016) for keeping, managing, sharing, and retrieving large files. These servers store encrypted files.
- API: REST over HTTPS, but not RESTful.
Dropbox Architecture
- Storage servers: Store encrypted files based on file identification.
- Processing Servers: Encryption and application services.
- Metadata servers: Stores information about the files (metadata service, database).
- Notification servers: Stores notifications about files.
- Multiple devices: The client devices of end-users accessing Dropbox.
Dropbox Interaction
- Client-side: Clients register on Dropbox, create, and change files and folders within their accounts. Also handle communications with Dropbox servers using HTTPS.
- Server-side: Servers organize the Dropbox services, and manage and handle connections with clients.
- Storage services: Manage the data used by the files.
Reason for using two storage systems (Dropbox)
- Data and metadata consistency and speed: Metadata is critical and must maintain tight consistency; maintaining the appropriate relationships among files and folders speeds up file lookups.
- Data storage efficiency: The data (the file itself) can be cheaply stored and scaled.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on cloud storage solutions and NoSQL databases with this quiz. Questions cover persistent block storage, data integrity, application state management, and more. Perfect for those studying cloud computing and database management.