Podcast
Questions and Answers
Which type of storage is available for mounting in a VM instance and behaves like a local hard drive?
Which type of storage is available for mounting in a VM instance and behaves like a local hard drive?
What is a characteristic of temporary and local block storage?
What is a characteristic of temporary and local block storage?
What distinguishes persistent block storage from temporary storage?
What distinguishes persistent block storage from temporary storage?
Which storage solution would be appropriate for storing application data across multiple servers?
Which storage solution would be appropriate for storing application data across multiple servers?
Signup and view all the answers
In which scenario is using a database more advantageous than block storage?
In which scenario is using a database more advantageous than block storage?
Signup and view all the answers
Why might a file system be inappropriate for storing application data in a VM?
Why might a file system be inappropriate for storing application data in a VM?
Signup and view all the answers
What is a limitation of local block storage mentioned in the content?
What is a limitation of local block storage mentioned in the content?
Signup and view all the answers
What must occur for file system integrity in persistent block storage?
What must occur for file system integrity in persistent block storage?
Signup and view all the answers
What is a characteristic feature of NoSQL databases?
What is a characteristic feature of NoSQL databases?
Signup and view all the answers
What is the primary advantage of key/value stores in a NoSQL context?
What is the primary advantage of key/value stores in a NoSQL context?
Signup and view all the answers
Which statement is true regarding relational databases?
Which statement is true regarding relational databases?
Signup and view all the answers
What does the term 'elasticity' refer to in the context of NoSQL databases?
What does the term 'elasticity' refer to in the context of NoSQL databases?
Signup and view all the answers
What is a potential drawback of using relational databases compared to NoSQL?
What is a potential drawback of using relational databases compared to NoSQL?
Signup and view all the answers
How do key/value stores manage the distribution of data?
How do key/value stores manage the distribution of data?
Signup and view all the answers
Which of the following is a common reason for opting for NoSQL databases over traditional relational databases?
Which of the following is a common reason for opting for NoSQL databases over traditional relational databases?
Signup and view all the answers
What is the design goal of horizontal scaling in NoSQL databases?
What is the design goal of horizontal scaling in NoSQL databases?
Signup and view all the answers
What is a key characteristic of object stores in the context of NoSQL databases?
What is a key characteristic of object stores in the context of NoSQL databases?
Signup and view all the answers
Which of the following is a known deployment example of Apache Cassandra?
Which of the following is a known deployment example of Apache Cassandra?
Signup and view all the answers
What type of data does Apache Cassandra primarily focus on managing?
What type of data does Apache Cassandra primarily focus on managing?
Signup and view all the answers
What advantage do object stores offer when resources are accessed by clients?
What advantage do object stores offer when resources are accessed by clients?
Signup and view all the answers
Which Amazon S3 feature allows users to manage multiple versions of an object?
Which Amazon S3 feature allows users to manage multiple versions of an object?
Signup and view all the answers
What is a benefit of using CDNs in conjunction with object stores?
What is a benefit of using CDNs in conjunction with object stores?
Signup and view all the answers
What is the primary purpose of creating a bucket in Amazon S3?
What is the primary purpose of creating a bucket in Amazon S3?
Signup and view all the answers
Which mechanism is used for securing access to objects in Amazon S3?
Which mechanism is used for securing access to objects in Amazon S3?
Signup and view all the answers
What must be done atomically to prevent inconsistency in a database when managing customer orders?
What must be done atomically to prevent inconsistency in a database when managing customer orders?
Signup and view all the answers
Why is it important for each individual database to be fault-tolerant?
Why is it important for each individual database to be fault-tolerant?
Signup and view all the answers
What issue can occur if a service fails after updating the database?
What issue can occur if a service fails after updating the database?
Signup and view all the answers
What happens to the database if there is a crash after an event is published but before it is processed?
What happens to the database if there is a crash after an event is published but before it is processed?
Signup and view all the answers
What is a crucial step in managing customer credit lines before allowing an order?
What is a crucial step in managing customer credit lines before allowing an order?
Signup and view all the answers
What does the term 'event-sourcing' refer to in the context of the provided content?
What does the term 'event-sourcing' refer to in the context of the provided content?
Signup and view all the answers
What is a consequence of not publishing an event after updating a database?
What is a consequence of not publishing an event after updating a database?
Signup and view all the answers
How can services coordinate credit line checks when processing orders?
How can services coordinate credit line checks when processing orders?
Signup and view all the answers
What functionality does Dropbox provide to synchronize files?
What functionality does Dropbox provide to synchronize files?
Signup and view all the answers
How are files managed within Dropbox in terms of chunking?
How are files managed within Dropbox in terms of chunking?
Signup and view all the answers
What is the role of metadata servers in Dropbox's architecture?
What is the role of metadata servers in Dropbox's architecture?
Signup and view all the answers
What was Dropbox's storage solution before it moved to on-premises storage?
What was Dropbox's storage solution before it moved to on-premises storage?
Signup and view all the answers
How does Dropbox handle notifications of changes to files?
How does Dropbox handle notifications of changes to files?
Signup and view all the answers
Which protocol does Dropbox use for its API between clients and servers?
Which protocol does Dropbox use for its API between clients and servers?
Signup and view all the answers
In terms of file system abstraction, what does Dropbox present to the users?
In terms of file system abstraction, what does Dropbox present to the users?
Signup and view all the answers
What happens when a change is made to a file on Dropbox?
What happens when a change is made to a file on Dropbox?
Signup and view all the answers
What is a key problem associated with tailing the database log?
What is a key problem associated with tailing the database log?
Signup and view all the answers
In the context of using a database as a message queue, what is a major challenge?
In the context of using a database as a message queue, what is a major challenge?
Signup and view all the answers
What role does a log tailer play in tailing the database log approach?
What role does a log tailer play in tailing the database log approach?
Signup and view all the answers
Why is the implementation of a separate EVENT table in each microservice potentially problematic?
Why is the implementation of a separate EVENT table in each microservice potentially problematic?
Signup and view all the answers
What does the database transaction log primarily record?
What does the database transaction log primarily record?
Signup and view all the answers
In which scenario is the synchronization of event publishing most critical?
In which scenario is the synchronization of event publishing most critical?
Signup and view all the answers
What is one significant disadvantage of table-level changes in database log tailing?
What is one significant disadvantage of table-level changes in database log tailing?
Signup and view all the answers
What is the primary advantage of maintaining an EVENTS database table on each microservice?
What is the primary advantage of maintaining an EVENTS database table on each microservice?
Signup and view all the answers
Study Notes
Cloud Computing - Storage and State Management
-
Announcements:
- Feedback for the first quiz will be available after the course.
- The second quiz will be available on Moodle at 12:45 today.
- Quiz deadline: Wednesday, October 16, 10:45.
- Review deadline: Wednesday, October 23, 10:45.
Objectives
- Present storage solutions for laas environments.
- Discuss the scalability of relational databases (SQL) and introduce NoSQL databases.
- Describe the Dropbox service as an example.
- Explain techniques for state management in microservices applications.
Block Storage
- Virtual disks are available for mounting in a VM instance.
- These disks are visible as block devices, similar to local hard drives or SSDs.
- The guest operating system needs to mount the storage using a file system.
- Temporary and local block storage is attached to the host or via a local SAN (Storage Area Network).
- Content stored in temporary/local storage is lost when the VM is shut down. Example: Amazon EC2 Instance Store.
- Persistent block storage persists across VM shutdowns/restarts. Example: Amazon EC2 Elastic Block Store.
Storage for Applications
- Block storage is used for operating systems, libraries, and application binaries/containers.
- File systems aren't suitable for storing application data locally for a single VM instance.
- Shared storage is difficult to achieve and may be lost when the VM instance fails.
- Databases are better for handling application data over multiple servers.
- Containerized applications can have their own databases. A set of containers can use a platform provider's managed service for storage and databases.
Database Selection
- Many types of databases are available (relational and NoSQL).
- Relational databases (SQL) have features like ACID properties (Atomicity, Consistency, Isolation, Durability), and complex queries using join functions. They're well understood by developers. Examples include Heroku Postgresql, Amazon Aurora and RDS.
- Well-known relational databases are often not well-suited for large cloud applications due to issues in scaling and elasticity.
MySQL Single-Node Scalability
- The performance of a single-node MySQL database on cloud servers compared to cloud databases shows that performance is affected by thread count.
- 4GB MySQL cloud servers perform worse than comparable cloud databases as the thread count increases.
Scaling Relational Databases
- Enterprise-scale relational databases are often vertically scaled on large machines.
- This can be expensive.
- Cloud computing allows scaling horizontally using many commodity servers.
Scalability Cube
- The scalability cube illustrates scaling options within microservices environments.
Horizontally Scaling Relational Databases
- To handle more workload, split database content over multiple machines (database sharding).
- Split tables across several database instances.
- A proxy (e.g., PL/Proxy for PostgreSQL or MySQL Cluster) directs and combines query results across these instances.
MySQL Cluster Sharding
- This technique splits data based on the primary key, utilizing a hash function for distribution.
Limits of Sharding
- Sharding is typically based on primary keys, using a hash function for distribution over different instances.
- SQL queries frequently require comparisons/merging data from multiple tables, affecting scalability.
- Automatic sharding doesn't improve scalability.
Primary/Secondary Replication
- Writes to a relational database must go to a primary replica site.
- Reads can come from multiple read replicas for improved performance.
Sharding and Horizontal Elasticity
- Careful sharding can result in scalability to a few dozen database instances.
- Modifications/removal of a shard node, however, require redistribution of all data, negatively impacting throughput.
- Elastic sharding for relational databases is not typically practical.
Relational Databases Takeaway
- Relational databases are suitable for many applications with lower data volumes.
- MySQL/PostgreSQL servers can handle significant transaction throughput on large EC2 instances.
- Better scalability is needed when dealing with larger data volumes.
- Lack of join queries or relational schemas can be mitigated in many applications.
NoSQL Databases
- "Not only SQL" databases, are a family of options with multiple flavors.
- Common characteristics of Nosql databases include horizontal scaling, simpler querying and flexible schemas. No explicit relational tables between sets of data.
- An example is a key-value store designed for horizontal scaling by splitting key ranges into disjoint subsets. Different servers handle respective subsets. Examples include Apache Cassandra.
Key/Value Stores
- A common NoSQL interface, like a global hash map, with
put(key, value)
andvalue ← get(key)
operations. - Design for scaling: Splits subsets of keys across multiple servers using consistent hashing principles.
NoSQL Horizontal Scalability
- Individual servers process requests independently.
- Multiple servers can share the same portion of the key and data range.
- New servers can be quickly added by splitting existing key-ranges.
NoSQL: Adding a Server
- Adding a server (in the case of key-value stores) dynamically partitions the available keys into new storage partitions for additional servers.
Apache Cassandra
- NoSQL/key-value store with a focus on throughput and linear scalability.
Object Stores
- Object storage is a key-value store for immutable data.
- Immutable data is data that remains constant and shouldn't be changed after being created.
- Object identifiers can use version numbers.
- Clients can get resources directly from their URIs.
- Typically a managed service used for caching and CDN distribution. Examples include Amazon S3, and Google Cloud Storage.
Amazon S3 Example
- Objects are stored in buckets.
- Multiple past versions of objects can be maintained.
- Objects have automated encryption and access controls provided by Amazon.
Using URI from client
- Display of object URLs, which can be directly accessed by clients.
Usage of Object Storage in Project
- Azure Blob storage is a similar service to S3.
- Public URIs for objects are generally preferred to local data for improved accessibility.
- REST calls can be used for image uploads. Azure functions can be used for image resizing.
- Removal of data is possible at the end of a project.
Document-Oriented NoSQL
- Document-oriented NoSQL stores structured data in formats such as JSON or YAML.
- This type of store provides indexing capabilities by field and/or value.
- It supports collections and hierarchies. MongoDB is a popular choice.
Other NoSQL Databases
- Graph databases store relationships between keys (as in social networking graphs such as friends). An example is neo4j.
- Column stores, like Apache HBase, Google BigTable, store attribute values as full columns for efficient aggregation and search.
Popularities (DB-Engines Ranking)
- This display shows popularity of different database engines (such as MySQL, Oracle, PostgreSQL, MongoDB) over time.
Case Study: Dropbox
- Dropbox is a cloud storage service that provides personal file storage synced to local devices. It is accessible via a web interface and has an API.
Cloud Storage - Dropbox Client Side
- The client application monitors changes to local files.
- Any changes are compressed and sent to the cloud.
- Dropbox uses delayed HTTP calls when notifying the cloud of changes. This means that the cloud response time may be up to 60 seconds or more, yet updates are frequently written to the cloud upon registering a change in the local file system.
- Chunking of files is used for efficiency and resilience.
Dropbox Implementation
- Dropbox maintains metadata on its own private cloud, distinct from file data.
- The service uses an S3-like service from Amazon to store file content.
- Dropbox uses a REST API, but this is not entirely standard RESTful.
Dropbox Architecture
- This illustration displays the overall architecture design for the Dropbox service incorporating metadata and data storage separation. Different services exist on servers in charge of specific tasks like metadata storage, processing, and notification.
Example Dropbox Interaction
- Interactions between clients and servers through HTTPS.
- Separate services handle interactions with the primary and secondary instances of each database.
Interaction With Storage Servers
- Dropbox clients receive server IPs for connection using a load balancer.
Why Use Two Different Stores?
- Metadata and data storage have distinct requirements.
- Metadata must be highly consistent.
- File/data distribution should not be visible to the client.
- Data storage should be efficient, cheap, and scalable.
Some Numbers from 2013
- Dropbox grew to a significant number of users and data volume rapidly.
- It used thousands of physical servers and Amazon services for storage coordination.
Dropbox Move Away from Amazon
- Dropbox transitioned from Amazon's EC2 and S3 instances to using its own storage solutions.
Storage and Microservices
- This slide emphasizes a polyglot persistent architecture.
Polyglot Architecture
- Each service maintains its own state.
- Database selection depends on each individual service's needs.
State Management with Microservices
- Microservices only maintain a subset of the entire enterprise database (e.g., Customer billing information).
- Individual services are independent and can scale independently.
- Microservices use events to coordinate operations across multiple databases.
Service Decomposition
- Example diagrams show how different microservice components decompose data into classes.
Domain Model Pattern
- This pattern decomposes data into classes which match specific microservice components.
Problems
- Data in one database often references data in another database.
- Databases are accessible only through service APIs, making it harder in many instances to change database schema.
Aggregates
- A set of domain objects grouped together and treated as a single unit for better data consistency throughout the microservice system.
- Aggregates have unique keys for referencing by other services (e.g., their own URI).
Foreign Keys for Inter-Aggregate References
- Foreign key relationships for referencing data between different aggregates are employed to improve data integrity.
Single-Aggregate Transactions
- Database transactions are limited to single aggregates. Consistency within transactions across aggregates must be implemented through messaging.
Events and Reliability
- Service-based publishing and consuming of events is a way to handle reliability and potential service/database failures.
Preventing Inconsistency
- Both the database update and event publishing can be done atomically, ensuring consistency even if failure occurs during one of the operations.
Solution 1 (Tailing the database log)
- Tailing of the database log is a potential but less flexible solution for tracking database changes and events.
Solution 2 (Database as a message queue)
- In this method, the database itself is used as a message queue to distribute update/event notifications, which can be more problematic as it can be challenging to ensure consistency.
Event Sourcing
- A method of managing an aggregate as a sequence of events; changes to an aggregate are documented by recording an event for the aggregate.
Advantages of Event Sourcing
- Event publishing is atomic, ensuring data consistency.
- Other services can receive/process events to keep track of data changes.
- Provides a method for debugging events.
- Data updates are more structured.
Example of Use of Event Sourcing
- Example diagrams illustrate the flow of events between services.
- Order Service (OM) creates events, and Customer Service (CM) updates Customer data upon receiving the events from OM.
CQRS(Command Query Responsibility Separation)
- Separate command and query handling for efficiency, ensuring that query operations are handled efficiently without imposing unnecessary overhead.
- Queries are materialized as database views, simplifying queries when they are commonly used.
- Microservices that are commonly queried can cache results, reducing repeated database calls.
Conclusion
- Application state management in the cloud relies on databases (various types).
- Managing state with microservices is more complex than with a single database and transactions.
- An event sourcing approach ensures the consistent and proper handling of events within and between microservices systems.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores various types of storage solutions available for virtual machine instances, including characteristics of temporary and persistent block storage. Test your understanding of when to use databases versus block storage, and identify limitations associated with local block storage in a cloud environment.