Cloud Storage & NoSQL Databases Quiz
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a key characteristic of persistent block storage?

  • Content is lost when the VM is shut down.
  • It is not visible as block devices.
  • It cannot ensure file system integrity during a clean shutdown.
  • It is retained across VM shut-down/restart cycles. (correct)
  • Which type of storage is not meant for sharing across multiple VM instances?

  • Local block storage (correct)
  • Shared block storage
  • Temporary block storage
  • Persistent block storage
  • What is a disadvantage of using block storage for application data?

  • It allows easy partitioning of data over multiple servers.
  • It easily survives the failure of the instance.
  • It is typically local to one VM instance. (correct)
  • It is suitable for storing unstructured data.
  • Which of the following is a feature of NoSQL databases?

    <p>They can handle unstructured data.</p> Signup and view all the answers

    What is a primary use of block storage in cloud computing?

    <p>Mounting virtual disks for Virtual Machine instances.</p> Signup and view all the answers

    Which storage solution is specifically mentioned as an example of temporary block storage?

    <p>Amazon EC2 Instance Store</p> Signup and view all the answers

    What technique is crucial for ensuring data integrity with persistent block storage?

    <p>Implementing a clean shutdown.</p> Signup and view all the answers

    What do cloud applications mainly utilize for state management?

    <p>NoSQL databases and state management techniques.</p> Signup and view all the answers

    What is the initial event in the example of event sourcing provided?

    <p>Order created but not validated</p> Signup and view all the answers

    What does CQRS stand for in the discussed architecture?

    <p>Command/Query Responsibility Separation</p> Signup and view all the answers

    What is a common challenge when managing application state with microservices?

    <p>Increased complexity compared to a single database</p> Signup and view all the answers

    What is a primary feature of Dropbox's client application?

    <p>It synchronizes changes made locally with the cloud storage.</p> Signup and view all the answers

    What should be done if a query is frequently used, according to the suggested implementation?

    <p>Create a microservice to materialize the view</p> Signup and view all the answers

    How does Dropbox handle changes notified to clients?

    <p>By delaying responses through HTTP calls for up to 60 seconds.</p> Signup and view all the answers

    Which type of database is mentioned for its scalability and elasticity?

    <p>NoSQL databases</p> Signup and view all the answers

    What type of storage did Dropbox primarily use until 2016?

    <p>Amazon EC2 and S3 object store.</p> Signup and view all the answers

    What is the role of the Order Manager (OM) in the event sourcing example?

    <p>To validate orders and check customer credit</p> Signup and view all the answers

    In Dropbox, what is the role of metadata servers?

    <p>They handle information regarding files like ownership and chunk lists.</p> Signup and view all the answers

    What disadvantage is mentioned regarding materialized views created by a microservice?

    <p>They are less flexible compared to on-demand SQL queries</p> Signup and view all the answers

    What is the maximum size for chunks in which files are divided in Dropbox?

    <p>4 MB</p> Signup and view all the answers

    Which type of databases are suitable for specialized data such as documents or graphs?

    <p>NoSQL databases</p> Signup and view all the answers

    What type of API does Dropbox utilize for communication between clients and servers?

    <p>REST over HTTPS.</p> Signup and view all the answers

    What unique identifier is used for each chunk in Dropbox's file storage system?

    <p>SHA-256 hash.</p> Signup and view all the answers

    Which of the following describes the sharing capabilities of Dropbox?

    <p>Folders can be present in two users' spaces simultaneously.</p> Signup and view all the answers

    What is the primary purpose of database sharding?

    <p>To enhance load management by splitting database content across machines</p> Signup and view all the answers

    Which of the following statements about SQL queries in sharded databases is true?

    <p>Join queries typically require merging data from several tables across instances</p> Signup and view all the answers

    What is a key limitation of using automatic sharding?

    <p>It ensures no scalability for the database</p> Signup and view all the answers

    In primary/secondary replication, what must be done with write requests?

    <p>They must go through a single replica site</p> Signup and view all the answers

    What happens when adding or removing a node in a hash-partitioned sharding plan?

    <p>Data redistribution across all nodes occurs</p> Signup and view all the answers

    What is necessary for effective load balancing in sharding?

    <p>A complex and application-specific sharding strategy</p> Signup and view all the answers

    Which of the following is a common challenge when scaling relational databases through sharding?

    <p>Maintaining strict ACID properties during cross-shard queries</p> Signup and view all the answers

    What is the typical method for splitting tables in a sharded database?

    <p>Using a hash function based on primary keys</p> Signup and view all the answers

    What is the primary reason for using two different storage systems in the discussed architecture?

    <p>To cater to the different consistency needs of metadata and data</p> Signup and view all the answers

    What is a key feature of relational databases?

    <p>ACID transaction support</p> Signup and view all the answers

    What does the term 'polyglot architecture' refer to in the context of databases?

    <p>Each microservice selecting the best database suited for its needs</p> Signup and view all the answers

    What is a key requirement for metadata to ensure integrity in file operations?

    <p>All file chunks must be uploaded before the file is displayed</p> Signup and view all the answers

    Which statement best describes the transactions required for metadata?

    <p>Transactions help ensure no intermediate states are visible</p> Signup and view all the answers

    What is a common limitation of relational databases when used in large cloud applications?

    <p>Scalability and elasticity</p> Signup and view all the answers

    In the context of sharding, how is data partitioned for metadata?

    <p>Metadata is partitioned per user identifier</p> Signup and view all the answers

    What does ACID stand for in the context of database transactions?

    <p>Atomicity, Consistency, Isolation, Durability</p> Signup and view all the answers

    What type of database is suggested to manage metadata due to its consistency requirements?

    <p>Traditional relational database</p> Signup and view all the answers

    What is one advantage of relational databases mentioned in the content?

    <p>Optimized query engines for complex queries</p> Signup and view all the answers

    How do Dropbox clients receive server IPs for load balancing?

    <p>Clients obtain a random extract of server IPs periodically</p> Signup and view all the answers

    Which of the following is a characteristic that may limit relational database usage in cloud applications?

    <p>Challenges with scalability</p> Signup and view all the answers

    Which database option is a managed service provided by Amazon?

    <p>Aurora</p> Signup and view all the answers

    What role does sharding play in the context of metadata?

    <p>It allows for partitioning metadata per user on a single server</p> Signup and view all the answers

    What is a possible drawback of using a layered monolith architecture?

    <p>Limits flexibility in database choice for microservices</p> Signup and view all the answers

    What is a characteristic feature of the load balancing strategy mentioned?

    <p>It distributes connections in a round-robin manner</p> Signup and view all the answers

    Study Notes

    Cloud Computing - Lesson 4: Storage and State Management

    • Announcements:
      • Feedback for the first quiz will be available after the course.
      • The second quiz is available on Moodle at 12:45 today.
      • Quiz deadline: Wednesday, Oct. 16, 10:45.
      • Review deadline: Wednesday, Oct. 23, 10:45.

    Objectives

    • Present storage solutions for laaS environments.
    • Discuss relational database (SQL) scalability and introduce NoSQL databases.
    • Describe the Dropbox service.
    • Explain state management techniques in microservices applications.

    Block Storage

    • Virtual disks: Available for mounting within a VM instance, acting like local hard drives or SSDs. Guest operating systems must mount storage using a file system.
    • Temporary/Local storage: Attached to the host or accessed through local SAN (Storage Area Networks). Content is lost when the VM is shut down. Example: Amazon EC2 Instance store.
    • Persistent block storage: Maintains data across VM shutdowns/restarts. Requires a clean shutdown for file system integrity. Example: Amazon EC2 Elastic Block Store.

    Storage for Applications

    • Block storage: Used for OS, libraries, and application binaries/containers.
    • File systems: Not suitable for application data storage within a single VM. Not resilient to instance failure, difficult to share data across servers.
    • Databases: Essential for application data management across multiple servers.

    Databases

    • Flavors: Databases come in many varieties, including relational (SQL) and NoSQL.
    • Layered monolithic: Use a single (or few) all-purpose database accessed by all layers.
    • Service-Oriented Architecture (SOA): Each microservice chooses the most suitable database (polyglot architecture).

    Relational Databases

    • Platform availability: Found on many cloud platforms (e.g., Heroku PostgreSQL, Amazon Aurora, RDS).
    • Developer understanding: Well understood by developers having a standard query language (SQL).
    • ACID semantics: Support transactions with Atomicity, Consistency, Isolation, and Durability.
    • Structured data queries: Efficient at complex queries on structured data.
    • Join queries: Support the creation of join queries between related tables.
    • Optimized Query Engines: Employ highly optimized query engines for efficiency.

    Problems with Relational Databases

    • Scalability and elasticity: Cannot always adapt to large cloud application demands.
    • Example: A single-node MySQL performance graph shows diminishing returns with increasing thread counts, highlighting the limitations in scalability within basic relational databases.

    Scaling relational databases

    • Vertical scaling: Increasing the resources of a single server. Often expensive, does not reliably scale.
    • Example: Vertically scaling a large enterprise database, using an IBM mainframe running an Oracle database, as a less efficient approach to scaling.

    Scalability Cube

    • X-axis: Horizontal duplication (scaling by cloning).
    • Y-axis: Functional decomposition (scaling by splitting different things).
    • Z-axis: Data partitioning (scaling by splitting similar things).
    • Now: Microservices & horizontal scalability are preferred, as shown in the scalability cube.

    Horizontally scaling a relational database

    • Database sharding: Split tables to multiple database instances.
    • Proxy: A proxy receives queries and distributes them to shards, aggregating the results.
    • Examples: PL/Proxy for PostgreSQL, MySQL Cluster.

    Sharding Limits

    • Primary key splits: Tables are commonly split based on their primary key, using a hash function for balanced distribution.
    • Complex queries: Often involve all shards, hindering scalability.
    • Automatic sharding: A guarantee of poor scalability.
    • Specific sharding: Designing a good application-dependent sharding strategy is complex and workload-dependent.
    • Load balancing: Ensuring consistent load balancing is vital in sharded systems.

    Primary/Secondary Replication

    • Write requests are processed by a single primary instance; reads can be processed by multiple secondary instances. This design is only appropriate for read-dominated use cases.

    NoSQL Databases

    • "Not Only SQL": A family of databases with different flavors.
    • Focus: Horizontal scaling & elasticity, simpler query than SQL, usually without schema.
    • Usage: In large cloud web applications.
    • Examples: Key/Value, Document-oriented, Graph-oriented, Column stores.

    Key/Value Stores

    • Hash Map: A global hash map enabling fast key lookups (put(key,value), value = get(key)).
    • Horizontal scaling: Designed for this, splitting key ranges amongst separate servers.
    • Consistent hashing: Used for distributing keys efficiently among servers.
    • Examples: Apache Cassandra.

    NoSQL: Adding Servers

    • The range of keys assigned to each server is re-distributed to accommodate new servers in a NoSQL key/value store. This permits efficient continuous scaling.

    Apache Cassandra

    • A NoSQL key/value store that prioritizes throughput and linear scalability, commonly deployed with thousands to hundreds of thousands of servers (e.g. at Apple).
    • Stores data as BLOBs and supports huge deployments.

    Object Stores

    • Immutable data: Data created only once; no updates allowed.
    • Versioning: May employ a version number for identifying different states of the object.
    • URI based access: Clients access data by a universally unique resource identifier (URI).
    • Managed service: Typically offered as a managed service by cloud providers (e.g., S3, Azure Blob).
    • Caching and deployment Enabled by Content Delivery Networks (CDNs).

    Example of Amazon S3

    • Buckets: Objects are stored inside buckets.
    • Versioning: Allows multiple past versions of the object.
    • Encrypted: Uses automatic encryption and access control mechanisms.

    Creating a Bucket

    • Name and Region: Define the bucket's name and location.
    • Properties and Permissions: Configure bucket-level settings and access privileges.

    Uploading Content

    • File Upload: The process of placing files into relevant S3 buckets.

    Setting Public Access

    • Users: Set which users have permissions to read or write to the specified objects.

    Using URI from Client

    • Client access to S3 bucket via a universally unique resource identifier (URI).

    Usage of object storage in project

    • Azure Blob Storage: Cloud storage service, similar to Amazon S3.
    • Public vs. Local Images: Returning public URIs instead of local images promotes distributed caching.
    • REST calls: Use REST calls (PUT) for uploading images to be manipulated by an Azure function.

    Document-Oriented NoSQL

    • Structured documents: Store structured documents typically as JSON or YAML instead of BLOBs.
    • Indexing: Allow indexing on internal fields/values.
    • Search capabilities: Support search capabilities.
    • Collections and hierarchies: Support for storing data in collections and hierarchies.
    • MongoDB: Most widely used NoSQL database, with capabilities demonstrating use within a tutorial in the course.
    • Apache CouchDB: A NoSQL database system suitable for microservice scenarios.

    Graph-oriented NoSQL Databases

    • Store relations: Graph-oriented databases store relations between keys, as a navigable graph.
    • Examples: neo4j for use in social networks.

    Column Stores

    • Full column: Stores a complete column of values for an attribute.
    • Aggregation and queries: Efficient in handling aggregation and search queries.
    • Joining: Joining operations are complex and implemented on the client-side. Cloud-based solutions like Apache Hbase or Google Bigtable.

    Conclusion

    • Cloud applications depend on database choices, whether relational, NoSQL, or specialized databases for data like graphs, documents.
    • Implementing microservices complicates state management compared to a single database.
    • Event sourcing offers an approach for handling database consistency and interactions in complex microservice deployments.

    CQRS (Command Query Responsibility Separation)

    • Frequent queries: Microservices may need to frequently query data when accessing across services.
    • Event sourcing implementation: Efficiency issues in implementation using events.
    • Separate logic: Separate the logic for commands (updates) and queries to improve efficiency.
    • Queries as views: Implement queries by creating materialized views or by subscribing.
    • Long-term queries: Plan ahead for frequent queries, as implementing SQL on demand might not be efficient.

    Dropbox Case Study

    • Cloud storage: Provides a private file store, accessible via Web interface, users' devices, and a developer API for third-party applications.
    • Sharing: Same folder shares across multiple users' spaces.
    • Abstraction: Abstracted filesystem (files, directories, links, spaces).
    • Client side implementations: Local folder changes get synchronized to the cloud.
    • Chunked files: Large files split into chunks, identified using SHA-256, differences in chunk contents sent in compressed binary diffs.
    • Metadata Servers: The metadata (e.g. who owns the files) is stored on servers within the Dropbox private cloud. The metadata server is a relational database, similar to inodes, handling data integrity consistency.
    • Data Storage: Amazon EC2 and S3 object store is used (until 2016) for keeping, managing, sharing, and retrieving large files. These servers store encrypted files.
    • API: REST over HTTPS, but not RESTful.

    Dropbox Architecture

    • Storage servers: Store encrypted files based on file identification.
    • Processing Servers: Encryption and application services.
    • Metadata servers: Stores information about the files (metadata service, database).
    • Notification servers: Stores notifications about files.
    • Multiple devices: The client devices of end-users accessing Dropbox.

    Dropbox Interaction

    • Client-side: Clients register on Dropbox, create, and change files and folders within their accounts. Also handle communications with Dropbox servers using HTTPS.
    • Server-side: Servers organize the Dropbox services, and manage and handle connections with clients.
    • Storage services: Manage the data used by the files.

    Reason for using two storage systems (Dropbox)

    • Data and metadata consistency and speed: Metadata is critical and must maintain tight consistency; maintaining the appropriate relationships among files and folders speeds up file lookups.
    • Data storage efficiency: The data (the file itself) can be cheaply stored and scaled.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your knowledge on cloud storage solutions and NoSQL databases with this quiz. Questions cover persistent block storage, data integrity, application state management, and more. Perfect for those studying cloud computing and database management.

    More Like This

    Use Quizgecko on...
    Browser
    Browser