Cloud Storage & NoSQL Databases Quiz
48 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a key characteristic of persistent block storage?

  • Content is lost when the VM is shut down.
  • It is not visible as block devices.
  • It cannot ensure file system integrity during a clean shutdown.
  • It is retained across VM shut-down/restart cycles. (correct)
  • Which type of storage is not meant for sharing across multiple VM instances?

  • Local block storage (correct)
  • Shared block storage
  • Temporary block storage
  • Persistent block storage
  • What is a disadvantage of using block storage for application data?

  • It allows easy partitioning of data over multiple servers.
  • It easily survives the failure of the instance.
  • It is typically local to one VM instance. (correct)
  • It is suitable for storing unstructured data.
  • Which of the following is a feature of NoSQL databases?

    <p>They can handle unstructured data. (C)</p> Signup and view all the answers

    What is a primary use of block storage in cloud computing?

    <p>Mounting virtual disks for Virtual Machine instances. (C)</p> Signup and view all the answers

    Which storage solution is specifically mentioned as an example of temporary block storage?

    <p>Amazon EC2 Instance Store (B)</p> Signup and view all the answers

    What technique is crucial for ensuring data integrity with persistent block storage?

    <p>Implementing a clean shutdown. (A)</p> Signup and view all the answers

    What do cloud applications mainly utilize for state management?

    <p>NoSQL databases and state management techniques. (C)</p> Signup and view all the answers

    What is the initial event in the example of event sourcing provided?

    <p>Order created but not validated (D)</p> Signup and view all the answers

    What does CQRS stand for in the discussed architecture?

    <p>Command/Query Responsibility Separation (A)</p> Signup and view all the answers

    What is a common challenge when managing application state with microservices?

    <p>Increased complexity compared to a single database (D)</p> Signup and view all the answers

    What is a primary feature of Dropbox's client application?

    <p>It synchronizes changes made locally with the cloud storage. (D)</p> Signup and view all the answers

    What should be done if a query is frequently used, according to the suggested implementation?

    <p>Create a microservice to materialize the view (C)</p> Signup and view all the answers

    How does Dropbox handle changes notified to clients?

    <p>By delaying responses through HTTP calls for up to 60 seconds. (B)</p> Signup and view all the answers

    Which type of database is mentioned for its scalability and elasticity?

    <p>NoSQL databases (C)</p> Signup and view all the answers

    What type of storage did Dropbox primarily use until 2016?

    <p>Amazon EC2 and S3 object store. (D)</p> Signup and view all the answers

    What is the role of the Order Manager (OM) in the event sourcing example?

    <p>To validate orders and check customer credit (B)</p> Signup and view all the answers

    In Dropbox, what is the role of metadata servers?

    <p>They handle information regarding files like ownership and chunk lists. (D)</p> Signup and view all the answers

    What disadvantage is mentioned regarding materialized views created by a microservice?

    <p>They are less flexible compared to on-demand SQL queries (B)</p> Signup and view all the answers

    What is the maximum size for chunks in which files are divided in Dropbox?

    <p>4 MB (B)</p> Signup and view all the answers

    Which type of databases are suitable for specialized data such as documents or graphs?

    <p>NoSQL databases (A)</p> Signup and view all the answers

    What type of API does Dropbox utilize for communication between clients and servers?

    <p>REST over HTTPS. (A)</p> Signup and view all the answers

    What unique identifier is used for each chunk in Dropbox's file storage system?

    <p>SHA-256 hash. (C)</p> Signup and view all the answers

    Which of the following describes the sharing capabilities of Dropbox?

    <p>Folders can be present in two users' spaces simultaneously. (A)</p> Signup and view all the answers

    What is the primary purpose of database sharding?

    <p>To enhance load management by splitting database content across machines (C)</p> Signup and view all the answers

    Which of the following statements about SQL queries in sharded databases is true?

    <p>Join queries typically require merging data from several tables across instances (B)</p> Signup and view all the answers

    What is a key limitation of using automatic sharding?

    <p>It ensures no scalability for the database (D)</p> Signup and view all the answers

    In primary/secondary replication, what must be done with write requests?

    <p>They must go through a single replica site (D)</p> Signup and view all the answers

    What happens when adding or removing a node in a hash-partitioned sharding plan?

    <p>Data redistribution across all nodes occurs (D)</p> Signup and view all the answers

    What is necessary for effective load balancing in sharding?

    <p>A complex and application-specific sharding strategy (A)</p> Signup and view all the answers

    Which of the following is a common challenge when scaling relational databases through sharding?

    <p>Maintaining strict ACID properties during cross-shard queries (C)</p> Signup and view all the answers

    What is the typical method for splitting tables in a sharded database?

    <p>Using a hash function based on primary keys (A)</p> Signup and view all the answers

    What is the primary reason for using two different storage systems in the discussed architecture?

    <p>To cater to the different consistency needs of metadata and data (C)</p> Signup and view all the answers

    What is a key feature of relational databases?

    <p>ACID transaction support (A)</p> Signup and view all the answers

    What does the term 'polyglot architecture' refer to in the context of databases?

    <p>Each microservice selecting the best database suited for its needs (D)</p> Signup and view all the answers

    What is a key requirement for metadata to ensure integrity in file operations?

    <p>All file chunks must be uploaded before the file is displayed (B)</p> Signup and view all the answers

    Which statement best describes the transactions required for metadata?

    <p>Transactions help ensure no intermediate states are visible (B)</p> Signup and view all the answers

    What is a common limitation of relational databases when used in large cloud applications?

    <p>Scalability and elasticity (D)</p> Signup and view all the answers

    In the context of sharding, how is data partitioned for metadata?

    <p>Metadata is partitioned per user identifier (A)</p> Signup and view all the answers

    What does ACID stand for in the context of database transactions?

    <p>Atomicity, Consistency, Isolation, Durability (D)</p> Signup and view all the answers

    What type of database is suggested to manage metadata due to its consistency requirements?

    <p>Traditional relational database (D)</p> Signup and view all the answers

    What is one advantage of relational databases mentioned in the content?

    <p>Optimized query engines for complex queries (D)</p> Signup and view all the answers

    How do Dropbox clients receive server IPs for load balancing?

    <p>Clients obtain a random extract of server IPs periodically (C)</p> Signup and view all the answers

    Which of the following is a characteristic that may limit relational database usage in cloud applications?

    <p>Challenges with scalability (C)</p> Signup and view all the answers

    Which database option is a managed service provided by Amazon?

    <p>Aurora (B)</p> Signup and view all the answers

    What role does sharding play in the context of metadata?

    <p>It allows for partitioning metadata per user on a single server (B)</p> Signup and view all the answers

    What is a possible drawback of using a layered monolith architecture?

    <p>Limits flexibility in database choice for microservices (A)</p> Signup and view all the answers

    What is a characteristic feature of the load balancing strategy mentioned?

    <p>It distributes connections in a round-robin manner (C)</p> Signup and view all the answers

    Flashcards

    Horizontal Database Scaling

    A technique to distribute data from a single database across multiple machines for improved performance and scalability.

    Database Sharding

    A method of horizontally scaling a database by splitting tables into multiple databases.

    Proxy in Sharding

    A software that splits incoming queries based on the data distribution and aggregates the results from different shards.

    Hash-based Sharding

    Typical method for splitting data in sharding, where data is distributed across shards based on a hash function applied to the primary key.

    Signup and view all the flashcards

    Complexity of Queries in Sharding

    The challenging aspect of sharding where most queries involve multiple shards, hindering the optimization of query execution.

    Signup and view all the flashcards

    Primary/Secondary Replication

    A replication technique where write operations are restricted to a single primary replica, suitable for read-intensive workloads.

    Signup and view all the flashcards

    Fixed Sharding Plan

    The limitation of sharding where resharding requires data movement and can disrupt the system.

    Signup and view all the flashcards

    Adding/Removing Nodes in Sharding

    The problem faced in sharding when adding or removing a node, requiring data redistribution across all shards.

    Signup and view all the flashcards

    Event Sourcing

    A pattern where the state of an application is managed by applying a sequence of events to a current state.

    Signup and view all the flashcards

    Command Microservice

    A microservice responsible for handling actions that modify the state of the application.

    Signup and view all the flashcards

    Query Microservice

    A microservice responsible for retrieving data from the system, often by building materialized views.

    Signup and view all the flashcards

    Command/Query Responsibility Segregation (CQRS)

    The separation of logic for commands (updating) from logic for queries (reading) in microservices.

    Signup and view all the flashcards

    Aggregate (Microservices)

    A representation of the state of an entity within a microservice, typically containing data and behavior.

    Signup and view all the flashcards

    Event Store

    A specialized database designed for storing and querying events, often used in event-driven architectures.

    Signup and view all the flashcards

    Event Replay

    The process of applying historical events to reconstruct the current state of an entity.

    Signup and view all the flashcards

    State Management with Microservices

    A method for managing state in microservices by storing a sequence of events that reflect changes to the application, allowing for replayability and auditing.

    Signup and view all the flashcards

    Load Balancing

    A method of distributing network traffic across multiple servers to balance workload and prevent server overload, commonly used in cloud environments.

    Signup and view all the flashcards

    Sharding

    A method of dividing data into smaller units (chunks/pieces) and distributing these chunks across multiple servers for efficient storage and retrieval.

    Signup and view all the flashcards

    Metadata Store

    A type of storage used for storing metadata (information about files, like their names, sizes, and locations) to ensure consistent and reliable data access.

    Signup and view all the flashcards

    Data Store

    A type of storage used for storing the actual data content of files, optimized for high data throughput and scalability.

    Signup and view all the flashcards

    Strong Consistency

    Ensuring that all data changes are applied in a consistent and synchronized way, crucial for maintaining data integrity in distributed systems.

    Signup and view all the flashcards

    Relational Database

    A database specifically designed for storing structured data in tables, commonly used for managing metadata in cloud environments.

    Signup and view all the flashcards

    Server IPs

    A technology used for managing data access in cloud storage systems, by maintaining a list of available servers and routing requests to them.

    Signup and view all the flashcards

    Two Different Stores

    The use of two different storage systems, a metadata store and a data store, to optimize for different storage needs in cloud environments.

    Signup and view all the flashcards

    NoSQL Database

    A database that uses a different approach than SQL, often for unstructured or semi-structured data. Examples include MongoDB and Cassandra.

    Signup and view all the flashcards

    Service-Oriented Architecture (SOA)

    A software architecture where different components or services communicate with each other through well-defined interfaces, using protocols like HTTP or messaging. Services are independent and can be developed and deployed separately.

    Signup and view all the flashcards

    Polyglot Architecture

    A database architecture where each microservice in an SOA can choose the most appropriate database for its specific needs, leading to a mix of different database technologies within a single application.

    Signup and view all the flashcards

    Atomicity

    A characteristic of a database that ensures data remains consistent even in the face of failures or errors. It ensures operations are completed entirely or not at all.

    Signup and view all the flashcards

    Consistency

    A characteristic of a database that ensures data remains accurate and valid. It guarantees that any changes to the database follow predefined rules and constraints.

    Signup and view all the flashcards

    Isolation

    A characteristic of a database that ensures transactions do not interfere with each other. Each transaction is isolated and independent.

    Signup and view all the flashcards

    Durability

    A characteristic of a database that ensures that data is preserved even if the system crashes or experiences failures. It guarantees that completed transactions are permanently recorded.

    Signup and view all the flashcards

    Dropbox

    A cloud storage service that provides a personal file store, synchronizes with local copies on users' devices, and offers a web interface and developer API for third-party integration.

    Signup and view all the flashcards

    Dropbox Sharing Capabilities

    A feature that allows multiple users to access and modify the same folder, making collaboration seamless.

    Signup and view all the flashcards

    Dropbox Client Application

    The Dropbox client application constantly monitors local folders for changes, sending any updates to the cloud. It uses a delayed HTTP call mechanism to ensure efficient communication.

    Signup and view all the flashcards

    Dropbox File Chunking

    Dropbox breaks large files into chunks of up to 4 MB, treating each chunk as an independent object with a unique SHA-256 hash. This allows for efficient storage and transfer of data.

    Signup and view all the flashcards

    Dropbox Infrastructure

    Dropbox uses two separate infrastructures for metadata and data storage. Metadata servers manage information about files, such as owners, chunks, and access rights. Data servers store the actual file content.

    Signup and view all the flashcards

    Dropbox Data Storage Evolution

    Before 2016, Dropbox used Amazon EC2 and S3 object store for data storage. Later, they migrated to their own storage infrastructure. This earlier design using external cloud providers demonstrates the flexibility of cloud computing.

    Signup and view all the flashcards

    Dropbox API (Application Programming Interface)

    The interface between Dropbox clients and servers uses REST principles over HTTPS but isn't strictly RESTful. This provides a flexible communication protocol.

    Signup and view all the flashcards

    Dropbox Architecture

    The Dropbox architecture involves a client application, metadata servers, and data servers, all interconnected through the Dropbox API.

    Signup and view all the flashcards

    Block Storage

    Virtual disks attached to a VM, acting like a local hard drive or SSD. The guest operating system mounts the storage using a file system.

    Signup and view all the flashcards

    Temporary Block Storage

    Temporary block storage that is attached to the host or available through a local SAN (Storage Area Network). Data is lost when the VM is shut down.

    Signup and view all the flashcards

    Persistent Block Storage

    Persistent block storage that persists even when the VM is shut down or restarted. However, file system integrity requires a clean shutdown.

    Signup and view all the flashcards

    Scalability of Relational Databases (SQL)

    Relational databases (SQL) are known for their structured data organization and ability to handle large amounts of data. These databases are designed for scalability but can be more complex to manage.

    Signup and view all the flashcards

    Dropbox - A Cloud Storage Service

    Dropbox is a cloud storage service that utilizes a combination of techniques, including data replication and distributed file systems, to provide users with a reliable and shared storage solution. It exemplifies the benefits of cloud storage for data synchronization and collaboration.

    Signup and view all the flashcards

    State Management in Microservices

    State management addresses how different components of a distributed application retain and share information, ensuring that the application behaves consistently. Techniques include session management, data caching, and message queues.

    Signup and view all the flashcards

    Block Storage for Applications

    Block storage is often used for storing OS, libraries and binaries of the application. Due to its local nature, it is not suitable for sharing application data.

    Signup and view all the flashcards

    Signup and view all the flashcards

    Study Notes

    Cloud Computing - Lesson 5: Storage and State Management

    • Announcements:
      • Feedback for the first quiz will be available after the course.
      • The second quiz is available on Moodle at 12:45 today.
      • Quiz deadline: Wednesday, Oct. 16, 10:45.
      • Review deadline: Wednesday, Oct. 23, 10:45.

    Objectives

    • Present storage solutions for laaS environments.
    • Discuss relational database (SQL) scalability and introduce NoSQL databases.
    • Describe the Dropbox service.
    • Explain state management techniques in microservices applications.

    Block Storage

    • Virtual disks: Available for mounting within a VM instance, acting like local hard drives or SSDs. Guest operating systems must mount storage using a file system.
    • Temporary/Local storage: Attached to the host or accessed through local SAN (Storage Area Networks). Content is lost when the VM is shut down. Example: Amazon EC2 Instance store.
    • Persistent block storage: Maintains data across VM shutdowns/restarts. Requires a clean shutdown for file system integrity. Example: Amazon EC2 Elastic Block Store.

    Storage for Applications

    • Block storage: Used for OS, libraries, and application binaries/containers.
    • File systems: Not suitable for application data storage within a single VM. Not resilient to instance failure, difficult to share data across servers.
    • Databases: Essential for application data management across multiple servers.

    Databases

    • Flavors: Databases come in many varieties, including relational (SQL) and NoSQL.
    • Layered monolithic: Use a single (or few) all-purpose database accessed by all layers.
    • Service-Oriented Architecture (SOA): Each microservice chooses the most suitable database (polyglot architecture).

    Relational Databases

    • Platform availability: Found on many cloud platforms (e.g., Heroku PostgreSQL, Amazon Aurora, RDS).
    • Developer understanding: Well understood by developers having a standard query language (SQL).
    • ACID semantics: Support transactions with Atomicity, Consistency, Isolation, and Durability.
    • Structured data queries: Efficient at complex queries on structured data.
    • Join queries: Support the creation of join queries between related tables.
    • Optimized Query Engines: Employ highly optimized query engines for efficiency.

    Problems with Relational Databases

    • Scalability and elasticity: Cannot always adapt to large cloud application demands.
    • Example: A single-node MySQL performance graph shows diminishing returns with increasing thread counts, highlighting the limitations in scalability within basic relational databases.

    Scaling relational databases

    • Vertical scaling: Increasing the resources of a single server. Often expensive, does not reliably scale.
    • Example: Vertically scaling a large enterprise database, using an IBM mainframe running an Oracle database, as a less efficient approach to scaling.

    Scalability Cube

    • X-axis: Horizontal duplication (scaling by cloning).
    • Y-axis: Functional decomposition (scaling by splitting different things).
    • Z-axis: Data partitioning (scaling by splitting similar things).
    • Now: Microservices & horizontal scalability are preferred, as shown in the scalability cube.

    Horizontally scaling a relational database

    • Database sharding: Split tables to multiple database instances.
    • Proxy: A proxy receives queries and distributes them to shards, aggregating the results.
    • Examples: PL/Proxy for PostgreSQL, MySQL Cluster.

    Sharding Limits

    • Primary key splits: Tables are commonly split based on their primary key, using a hash function for balanced distribution.
    • Complex queries: Often involve all shards, hindering scalability.
    • Automatic sharding: A guarantee of poor scalability.
    • Specific sharding: Designing a good application-dependent sharding strategy is complex and workload-dependent.
    • Load balancing: Ensuring consistent load balancing is vital in sharded systems.

    Primary/Secondary Replication

    • Write requests are processed by a single primary instance; reads can be processed by multiple secondary instances. This design is only appropriate for read-dominated use cases.

    NoSQL Databases

    • "Not Only SQL": A family of databases with different flavors.
    • Focus: Horizontal scaling & elasticity, simpler query than SQL, usually without schema.
    • Usage: In large cloud web applications.
    • Examples: Key/Value, Document-oriented, Graph-oriented, Column stores.

    Key/Value Stores

    • Hash Map: A global hash map enabling fast key lookups (put(key,value), value = get(key)).
    • Horizontal scaling: Designed for this, splitting key ranges amongst separate servers.
    • Consistent hashing: Used for distributing keys efficiently among servers.
    • Examples: Apache Cassandra.

    NoSQL: Adding Servers

    • The range of keys assigned to each server is re-distributed to accommodate new servers in a NoSQL key/value store. This permits efficient continuous scaling.

    Apache Cassandra

    • A NoSQL key/value store that prioritizes throughput and linear scalability, commonly deployed with thousands to hundreds of thousands of servers (e.g. at Apple).
    • Stores data as BLOBs and supports huge deployments.

    Object Stores

    • Immutable data: Data created only once; no updates allowed.
    • Versioning: May employ a version number for identifying different states of the object.
    • URI based access: Clients access data by a universally unique resource identifier (URI).
    • Managed service: Typically offered as a managed service by cloud providers (e.g., S3, Azure Blob).
    • Caching and deployment Enabled by Content Delivery Networks (CDNs).

    Example of Amazon S3

    • Buckets: Objects are stored inside buckets.
    • Versioning: Allows multiple past versions of the object.
    • Encrypted: Uses automatic encryption and access control mechanisms.

    Creating a Bucket

    • Name and Region: Define the bucket's name and location.
    • Properties and Permissions: Configure bucket-level settings and access privileges.

    Uploading Content

    • File Upload: The process of placing files into relevant S3 buckets.

    Setting Public Access

    • Users: Set which users have permissions to read or write to the specified objects.

    Using URI from Client

    • Client access to S3 bucket via a universally unique resource identifier (URI).

    Usage of object storage in project

    • Azure Blob Storage: Cloud storage service, similar to Amazon S3.
    • Public vs. Local Images: Returning public URIs instead of local images promotes distributed caching.
    • REST calls: Use REST calls (PUT) for uploading images to be manipulated by an Azure function.

    Document-Oriented NoSQL

    • Structured documents: Store structured documents typically as JSON or YAML instead of BLOBs.
    • Indexing: Allow indexing on internal fields/values.
    • Search capabilities: Support search capabilities.
    • Collections and hierarchies: Support for storing data in collections and hierarchies.
    • MongoDB: Most widely used NoSQL database, with capabilities demonstrating use within a tutorial in the course.
    • Apache CouchDB: A NoSQL database system suitable for microservice scenarios.

    Graph-oriented NoSQL Databases

    • Store relations: Graph-oriented databases store relations between keys, as a navigable graph.
    • Examples: neo4j for use in social networks.

    Column Stores

    • Full column: Stores a complete column of values for an attribute.
    • Aggregation and queries: Efficient in handling aggregation and search queries.
    • Joining: Joining operations are complex and implemented on the client-side. Cloud-based solutions like Apache Hbase or Google Bigtable.

    Conclusion

    • Cloud applications depend on database choices, whether relational, NoSQL, or specialized databases for data like graphs, documents.
    • Implementing microservices complicates state management compared to a single database.
    • Event sourcing offers an approach for handling database consistency and interactions in complex microservice deployments.

    CQRS (Command Query Responsibility Separation)

    • Frequent queries: Microservices may need to frequently query data when accessing across services.
    • Event sourcing implementation: Efficiency issues in implementation using events.
    • Separate logic: Separate the logic for commands (updates) and queries to improve efficiency.
    • Queries as views: Implement queries by creating materialized views or by subscribing.
    • Long-term queries: Plan ahead for frequent queries, as implementing SQL on demand might not be efficient.

    Dropbox Case Study

    • Cloud storage: Provides a private file store, accessible via Web interface, users' devices, and a developer API for third-party applications.
    • Sharing: Same folder shares across multiple users' spaces.
    • Abstraction: Abstracted filesystem (files, directories, links, spaces).
    • Client side implementations: Local folder changes get synchronized to the cloud.
    • Chunked files: Large files split into chunks, identified using SHA-256, differences in chunk contents sent in compressed binary diffs.
    • Metadata Servers: The metadata (e.g. who owns the files) is stored on servers within the Dropbox private cloud. The metadata server is a relational database, similar to inodes, handling data integrity consistency.
    • Data Storage: Amazon EC2 and S3 object store is used (until 2016) for keeping, managing, sharing, and retrieving large files. These servers store encrypted files.
    • API: REST over HTTPS, but not RESTful.

    Dropbox Architecture

    • Storage servers: Store encrypted files based on file identification.
    • Processing Servers: Encryption and application services.
    • Metadata servers: Stores information about the files (metadata service, database).
    • Notification servers: Stores notifications about files.
    • Multiple devices: The client devices of end-users accessing Dropbox.

    Dropbox Interaction

    • Client-side: Clients register on Dropbox, create, and change files and folders within their accounts. Also handle communications with Dropbox servers using HTTPS.
    • Server-side: Servers organize the Dropbox services, and manage and handle connections with clients.
    • Storage services: Manage the data used by the files.

    Reason for using two storage systems (Dropbox)

    • Data and metadata consistency and speed: Metadata is critical and must maintain tight consistency; maintaining the appropriate relationships among files and folders speeds up file lookups.
    • Data storage efficiency: The data (the file itself) can be cheaply stored and scaled.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your knowledge on cloud storage solutions and NoSQL databases with this quiz. Questions cover persistent block storage, data integrity, application state management, and more. Perfect for those studying cloud computing and database management.

    More Like This

    Cloud Storage Flashcards
    3 questions

    Cloud Storage Flashcards

    IllustriousHoneysuckle avatar
    IllustriousHoneysuckle
    Cloud Storage Services and EVS Overview
    45 questions
    Use Quizgecko on...
    Browser
    Browser