Cloud Storage & NoSQL Databases Quiz

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a key characteristic of persistent block storage?

  • Content is lost when the VM is shut down.
  • It is not visible as block devices.
  • It cannot ensure file system integrity during a clean shutdown.
  • It is retained across VM shut-down/restart cycles. (correct)

Which type of storage is not meant for sharing across multiple VM instances?

  • Local block storage (correct)
  • Shared block storage
  • Temporary block storage
  • Persistent block storage

What is a disadvantage of using block storage for application data?

  • It allows easy partitioning of data over multiple servers.
  • It easily survives the failure of the instance.
  • It is typically local to one VM instance. (correct)
  • It is suitable for storing unstructured data.

Which of the following is a feature of NoSQL databases?

<p>They can handle unstructured data. (C)</p> Signup and view all the answers

What is a primary use of block storage in cloud computing?

<p>Mounting virtual disks for Virtual Machine instances. (C)</p> Signup and view all the answers

Which storage solution is specifically mentioned as an example of temporary block storage?

<p>Amazon EC2 Instance Store (B)</p> Signup and view all the answers

What technique is crucial for ensuring data integrity with persistent block storage?

<p>Implementing a clean shutdown. (A)</p> Signup and view all the answers

What do cloud applications mainly utilize for state management?

<p>NoSQL databases and state management techniques. (C)</p> Signup and view all the answers

What is the initial event in the example of event sourcing provided?

<p>Order created but not validated (D)</p> Signup and view all the answers

What does CQRS stand for in the discussed architecture?

<p>Command/Query Responsibility Separation (A)</p> Signup and view all the answers

What is a common challenge when managing application state with microservices?

<p>Increased complexity compared to a single database (D)</p> Signup and view all the answers

What is a primary feature of Dropbox's client application?

<p>It synchronizes changes made locally with the cloud storage. (D)</p> Signup and view all the answers

What should be done if a query is frequently used, according to the suggested implementation?

<p>Create a microservice to materialize the view (C)</p> Signup and view all the answers

How does Dropbox handle changes notified to clients?

<p>By delaying responses through HTTP calls for up to 60 seconds. (B)</p> Signup and view all the answers

Which type of database is mentioned for its scalability and elasticity?

<p>NoSQL databases (C)</p> Signup and view all the answers

What type of storage did Dropbox primarily use until 2016?

<p>Amazon EC2 and S3 object store. (D)</p> Signup and view all the answers

What is the role of the Order Manager (OM) in the event sourcing example?

<p>To validate orders and check customer credit (B)</p> Signup and view all the answers

In Dropbox, what is the role of metadata servers?

<p>They handle information regarding files like ownership and chunk lists. (D)</p> Signup and view all the answers

What disadvantage is mentioned regarding materialized views created by a microservice?

<p>They are less flexible compared to on-demand SQL queries (B)</p> Signup and view all the answers

What is the maximum size for chunks in which files are divided in Dropbox?

<p>4 MB (B)</p> Signup and view all the answers

Which type of databases are suitable for specialized data such as documents or graphs?

<p>NoSQL databases (A)</p> Signup and view all the answers

What type of API does Dropbox utilize for communication between clients and servers?

<p>REST over HTTPS. (A)</p> Signup and view all the answers

What unique identifier is used for each chunk in Dropbox's file storage system?

<p>SHA-256 hash. (C)</p> Signup and view all the answers

Which of the following describes the sharing capabilities of Dropbox?

<p>Folders can be present in two users' spaces simultaneously. (A)</p> Signup and view all the answers

What is the primary purpose of database sharding?

<p>To enhance load management by splitting database content across machines (C)</p> Signup and view all the answers

Which of the following statements about SQL queries in sharded databases is true?

<p>Join queries typically require merging data from several tables across instances (B)</p> Signup and view all the answers

What is a key limitation of using automatic sharding?

<p>It ensures no scalability for the database (D)</p> Signup and view all the answers

In primary/secondary replication, what must be done with write requests?

<p>They must go through a single replica site (D)</p> Signup and view all the answers

What happens when adding or removing a node in a hash-partitioned sharding plan?

<p>Data redistribution across all nodes occurs (D)</p> Signup and view all the answers

What is necessary for effective load balancing in sharding?

<p>A complex and application-specific sharding strategy (A)</p> Signup and view all the answers

Which of the following is a common challenge when scaling relational databases through sharding?

<p>Maintaining strict ACID properties during cross-shard queries (C)</p> Signup and view all the answers

What is the typical method for splitting tables in a sharded database?

<p>Using a hash function based on primary keys (A)</p> Signup and view all the answers

What is the primary reason for using two different storage systems in the discussed architecture?

<p>To cater to the different consistency needs of metadata and data (C)</p> Signup and view all the answers

What is a key feature of relational databases?

<p>ACID transaction support (A)</p> Signup and view all the answers

What does the term 'polyglot architecture' refer to in the context of databases?

<p>Each microservice selecting the best database suited for its needs (D)</p> Signup and view all the answers

What is a key requirement for metadata to ensure integrity in file operations?

<p>All file chunks must be uploaded before the file is displayed (B)</p> Signup and view all the answers

Which statement best describes the transactions required for metadata?

<p>Transactions help ensure no intermediate states are visible (B)</p> Signup and view all the answers

What is a common limitation of relational databases when used in large cloud applications?

<p>Scalability and elasticity (D)</p> Signup and view all the answers

In the context of sharding, how is data partitioned for metadata?

<p>Metadata is partitioned per user identifier (A)</p> Signup and view all the answers

What does ACID stand for in the context of database transactions?

<p>Atomicity, Consistency, Isolation, Durability (D)</p> Signup and view all the answers

What type of database is suggested to manage metadata due to its consistency requirements?

<p>Traditional relational database (D)</p> Signup and view all the answers

What is one advantage of relational databases mentioned in the content?

<p>Optimized query engines for complex queries (D)</p> Signup and view all the answers

How do Dropbox clients receive server IPs for load balancing?

<p>Clients obtain a random extract of server IPs periodically (C)</p> Signup and view all the answers

Which of the following is a characteristic that may limit relational database usage in cloud applications?

<p>Challenges with scalability (C)</p> Signup and view all the answers

Which database option is a managed service provided by Amazon?

<p>Aurora (B)</p> Signup and view all the answers

What role does sharding play in the context of metadata?

<p>It allows for partitioning metadata per user on a single server (B)</p> Signup and view all the answers

What is a possible drawback of using a layered monolith architecture?

<p>Limits flexibility in database choice for microservices (A)</p> Signup and view all the answers

What is a characteristic feature of the load balancing strategy mentioned?

<p>It distributes connections in a round-robin manner (C)</p> Signup and view all the answers

Flashcards

Horizontal Database Scaling

A technique to distribute data from a single database across multiple machines for improved performance and scalability.

Database Sharding

A method of horizontally scaling a database by splitting tables into multiple databases.

Proxy in Sharding

A software that splits incoming queries based on the data distribution and aggregates the results from different shards.

Hash-based Sharding

Typical method for splitting data in sharding, where data is distributed across shards based on a hash function applied to the primary key.

Signup and view all the flashcards

Complexity of Queries in Sharding

The challenging aspect of sharding where most queries involve multiple shards, hindering the optimization of query execution.

Signup and view all the flashcards

Primary/Secondary Replication

A replication technique where write operations are restricted to a single primary replica, suitable for read-intensive workloads.

Signup and view all the flashcards

Fixed Sharding Plan

The limitation of sharding where resharding requires data movement and can disrupt the system.

Signup and view all the flashcards

Adding/Removing Nodes in Sharding

The problem faced in sharding when adding or removing a node, requiring data redistribution across all shards.

Signup and view all the flashcards

Event Sourcing

A pattern where the state of an application is managed by applying a sequence of events to a current state.

Signup and view all the flashcards

Command Microservice

A microservice responsible for handling actions that modify the state of the application.

Signup and view all the flashcards

Query Microservice

A microservice responsible for retrieving data from the system, often by building materialized views.

Signup and view all the flashcards

Command/Query Responsibility Segregation (CQRS)

The separation of logic for commands (updating) from logic for queries (reading) in microservices.

Signup and view all the flashcards

Aggregate (Microservices)

A representation of the state of an entity within a microservice, typically containing data and behavior.

Signup and view all the flashcards

Event Store

A specialized database designed for storing and querying events, often used in event-driven architectures.

Signup and view all the flashcards

Event Replay

The process of applying historical events to reconstruct the current state of an entity.

Signup and view all the flashcards

State Management with Microservices

A method for managing state in microservices by storing a sequence of events that reflect changes to the application, allowing for replayability and auditing.

Signup and view all the flashcards

Load Balancing

A method of distributing network traffic across multiple servers to balance workload and prevent server overload, commonly used in cloud environments.

Signup and view all the flashcards

Sharding

A method of dividing data into smaller units (chunks/pieces) and distributing these chunks across multiple servers for efficient storage and retrieval.

Signup and view all the flashcards

Metadata Store

A type of storage used for storing metadata (information about files, like their names, sizes, and locations) to ensure consistent and reliable data access.

Signup and view all the flashcards

Data Store

A type of storage used for storing the actual data content of files, optimized for high data throughput and scalability.

Signup and view all the flashcards

Strong Consistency

Ensuring that all data changes are applied in a consistent and synchronized way, crucial for maintaining data integrity in distributed systems.

Signup and view all the flashcards

Relational Database

A database specifically designed for storing structured data in tables, commonly used for managing metadata in cloud environments.

Signup and view all the flashcards

Server IPs

A technology used for managing data access in cloud storage systems, by maintaining a list of available servers and routing requests to them.

Signup and view all the flashcards

Two Different Stores

The use of two different storage systems, a metadata store and a data store, to optimize for different storage needs in cloud environments.

Signup and view all the flashcards

NoSQL Database

A database that uses a different approach than SQL, often for unstructured or semi-structured data. Examples include MongoDB and Cassandra.

Signup and view all the flashcards

Service-Oriented Architecture (SOA)

A software architecture where different components or services communicate with each other through well-defined interfaces, using protocols like HTTP or messaging. Services are independent and can be developed and deployed separately.

Signup and view all the flashcards

Polyglot Architecture

A database architecture where each microservice in an SOA can choose the most appropriate database for its specific needs, leading to a mix of different database technologies within a single application.

Signup and view all the flashcards

Atomicity

A characteristic of a database that ensures data remains consistent even in the face of failures or errors. It ensures operations are completed entirely or not at all.

Signup and view all the flashcards

Consistency

A characteristic of a database that ensures data remains accurate and valid. It guarantees that any changes to the database follow predefined rules and constraints.

Signup and view all the flashcards

Isolation

A characteristic of a database that ensures transactions do not interfere with each other. Each transaction is isolated and independent.

Signup and view all the flashcards

Durability

A characteristic of a database that ensures that data is preserved even if the system crashes or experiences failures. It guarantees that completed transactions are permanently recorded.

Signup and view all the flashcards

Dropbox

A cloud storage service that provides a personal file store, synchronizes with local copies on users' devices, and offers a web interface and developer API for third-party integration.

Signup and view all the flashcards

Dropbox Sharing Capabilities

A feature that allows multiple users to access and modify the same folder, making collaboration seamless.

Signup and view all the flashcards

Dropbox Client Application

The Dropbox client application constantly monitors local folders for changes, sending any updates to the cloud. It uses a delayed HTTP call mechanism to ensure efficient communication.

Signup and view all the flashcards

Dropbox File Chunking

Dropbox breaks large files into chunks of up to 4 MB, treating each chunk as an independent object with a unique SHA-256 hash. This allows for efficient storage and transfer of data.

Signup and view all the flashcards

Dropbox Infrastructure

Dropbox uses two separate infrastructures for metadata and data storage. Metadata servers manage information about files, such as owners, chunks, and access rights. Data servers store the actual file content.

Signup and view all the flashcards

Dropbox Data Storage Evolution

Before 2016, Dropbox used Amazon EC2 and S3 object store for data storage. Later, they migrated to their own storage infrastructure. This earlier design using external cloud providers demonstrates the flexibility of cloud computing.

Signup and view all the flashcards

Dropbox API (Application Programming Interface)

The interface between Dropbox clients and servers uses REST principles over HTTPS but isn't strictly RESTful. This provides a flexible communication protocol.

Signup and view all the flashcards

Dropbox Architecture

The Dropbox architecture involves a client application, metadata servers, and data servers, all interconnected through the Dropbox API.

Signup and view all the flashcards

Block Storage

Virtual disks attached to a VM, acting like a local hard drive or SSD. The guest operating system mounts the storage using a file system.

Signup and view all the flashcards

Temporary Block Storage

Temporary block storage that is attached to the host or available through a local SAN (Storage Area Network). Data is lost when the VM is shut down.

Signup and view all the flashcards

Persistent Block Storage

Persistent block storage that persists even when the VM is shut down or restarted. However, file system integrity requires a clean shutdown.

Signup and view all the flashcards

Scalability of Relational Databases (SQL)

Relational databases (SQL) are known for their structured data organization and ability to handle large amounts of data. These databases are designed for scalability but can be more complex to manage.

Signup and view all the flashcards

Dropbox - A Cloud Storage Service

Dropbox is a cloud storage service that utilizes a combination of techniques, including data replication and distributed file systems, to provide users with a reliable and shared storage solution. It exemplifies the benefits of cloud storage for data synchronization and collaboration.

Signup and view all the flashcards

State Management in Microservices

State management addresses how different components of a distributed application retain and share information, ensuring that the application behaves consistently. Techniques include session management, data caching, and message queues.

Signup and view all the flashcards

Block Storage for Applications

Block storage is often used for storing OS, libraries and binaries of the application. Due to its local nature, it is not suitable for sharing application data.

Signup and view all the flashcards

Signup and view all the flashcards

Study Notes

Cloud Computing - Lesson 5: Storage and State Management

  • Announcements:
    • Feedback for the first quiz will be available after the course.
    • The second quiz is available on Moodle at 12:45 today.
    • Quiz deadline: Wednesday, Oct. 16, 10:45.
    • Review deadline: Wednesday, Oct. 23, 10:45.

Objectives

  • Present storage solutions for laaS environments.
  • Discuss relational database (SQL) scalability and introduce NoSQL databases.
  • Describe the Dropbox service.
  • Explain state management techniques in microservices applications.

Block Storage

  • Virtual disks: Available for mounting within a VM instance, acting like local hard drives or SSDs. Guest operating systems must mount storage using a file system.
  • Temporary/Local storage: Attached to the host or accessed through local SAN (Storage Area Networks). Content is lost when the VM is shut down. Example: Amazon EC2 Instance store.
  • Persistent block storage: Maintains data across VM shutdowns/restarts. Requires a clean shutdown for file system integrity. Example: Amazon EC2 Elastic Block Store.

Storage for Applications

  • Block storage: Used for OS, libraries, and application binaries/containers.
  • File systems: Not suitable for application data storage within a single VM. Not resilient to instance failure, difficult to share data across servers.
  • Databases: Essential for application data management across multiple servers.

Databases

  • Flavors: Databases come in many varieties, including relational (SQL) and NoSQL.
  • Layered monolithic: Use a single (or few) all-purpose database accessed by all layers.
  • Service-Oriented Architecture (SOA): Each microservice chooses the most suitable database (polyglot architecture).

Relational Databases

  • Platform availability: Found on many cloud platforms (e.g., Heroku PostgreSQL, Amazon Aurora, RDS).
  • Developer understanding: Well understood by developers having a standard query language (SQL).
  • ACID semantics: Support transactions with Atomicity, Consistency, Isolation, and Durability.
  • Structured data queries: Efficient at complex queries on structured data.
  • Join queries: Support the creation of join queries between related tables.
  • Optimized Query Engines: Employ highly optimized query engines for efficiency.

Problems with Relational Databases

  • Scalability and elasticity: Cannot always adapt to large cloud application demands.
  • Example: A single-node MySQL performance graph shows diminishing returns with increasing thread counts, highlighting the limitations in scalability within basic relational databases.

Scaling relational databases

  • Vertical scaling: Increasing the resources of a single server. Often expensive, does not reliably scale.
  • Example: Vertically scaling a large enterprise database, using an IBM mainframe running an Oracle database, as a less efficient approach to scaling.

Scalability Cube

  • X-axis: Horizontal duplication (scaling by cloning).
  • Y-axis: Functional decomposition (scaling by splitting different things).
  • Z-axis: Data partitioning (scaling by splitting similar things).
  • Now: Microservices & horizontal scalability are preferred, as shown in the scalability cube.

Horizontally scaling a relational database

  • Database sharding: Split tables to multiple database instances.
  • Proxy: A proxy receives queries and distributes them to shards, aggregating the results.
  • Examples: PL/Proxy for PostgreSQL, MySQL Cluster.

Sharding Limits

  • Primary key splits: Tables are commonly split based on their primary key, using a hash function for balanced distribution.
  • Complex queries: Often involve all shards, hindering scalability.
  • Automatic sharding: A guarantee of poor scalability.
  • Specific sharding: Designing a good application-dependent sharding strategy is complex and workload-dependent.
  • Load balancing: Ensuring consistent load balancing is vital in sharded systems.

Primary/Secondary Replication

  • Write requests are processed by a single primary instance; reads can be processed by multiple secondary instances. This design is only appropriate for read-dominated use cases.

NoSQL Databases

  • "Not Only SQL": A family of databases with different flavors.
  • Focus: Horizontal scaling & elasticity, simpler query than SQL, usually without schema.
  • Usage: In large cloud web applications.
  • Examples: Key/Value, Document-oriented, Graph-oriented, Column stores.

Key/Value Stores

  • Hash Map: A global hash map enabling fast key lookups (put(key,value), value = get(key)).
  • Horizontal scaling: Designed for this, splitting key ranges amongst separate servers.
  • Consistent hashing: Used for distributing keys efficiently among servers.
  • Examples: Apache Cassandra.

NoSQL: Adding Servers

  • The range of keys assigned to each server is re-distributed to accommodate new servers in a NoSQL key/value store. This permits efficient continuous scaling.

Apache Cassandra

  • A NoSQL key/value store that prioritizes throughput and linear scalability, commonly deployed with thousands to hundreds of thousands of servers (e.g. at Apple).
  • Stores data as BLOBs and supports huge deployments.

Object Stores

  • Immutable data: Data created only once; no updates allowed.
  • Versioning: May employ a version number for identifying different states of the object.
  • URI based access: Clients access data by a universally unique resource identifier (URI).
  • Managed service: Typically offered as a managed service by cloud providers (e.g., S3, Azure Blob).
  • Caching and deployment Enabled by Content Delivery Networks (CDNs).

Example of Amazon S3

  • Buckets: Objects are stored inside buckets.
  • Versioning: Allows multiple past versions of the object.
  • Encrypted: Uses automatic encryption and access control mechanisms.

Creating a Bucket

  • Name and Region: Define the bucket's name and location.
  • Properties and Permissions: Configure bucket-level settings and access privileges.

Uploading Content

  • File Upload: The process of placing files into relevant S3 buckets.

Setting Public Access

  • Users: Set which users have permissions to read or write to the specified objects.

Using URI from Client

  • Client access to S3 bucket via a universally unique resource identifier (URI).

Usage of object storage in project

  • Azure Blob Storage: Cloud storage service, similar to Amazon S3.
  • Public vs. Local Images: Returning public URIs instead of local images promotes distributed caching.
  • REST calls: Use REST calls (PUT) for uploading images to be manipulated by an Azure function.

Document-Oriented NoSQL

  • Structured documents: Store structured documents typically as JSON or YAML instead of BLOBs.
  • Indexing: Allow indexing on internal fields/values.
  • Search capabilities: Support search capabilities.
  • Collections and hierarchies: Support for storing data in collections and hierarchies.
  • MongoDB: Most widely used NoSQL database, with capabilities demonstrating use within a tutorial in the course.
  • Apache CouchDB: A NoSQL database system suitable for microservice scenarios.

Graph-oriented NoSQL Databases

  • Store relations: Graph-oriented databases store relations between keys, as a navigable graph.
  • Examples: neo4j for use in social networks.

Column Stores

  • Full column: Stores a complete column of values for an attribute.
  • Aggregation and queries: Efficient in handling aggregation and search queries.
  • Joining: Joining operations are complex and implemented on the client-side. Cloud-based solutions like Apache Hbase or Google Bigtable.

Conclusion

  • Cloud applications depend on database choices, whether relational, NoSQL, or specialized databases for data like graphs, documents.
  • Implementing microservices complicates state management compared to a single database.
  • Event sourcing offers an approach for handling database consistency and interactions in complex microservice deployments.

CQRS (Command Query Responsibility Separation)

  • Frequent queries: Microservices may need to frequently query data when accessing across services.
  • Event sourcing implementation: Efficiency issues in implementation using events.
  • Separate logic: Separate the logic for commands (updates) and queries to improve efficiency.
  • Queries as views: Implement queries by creating materialized views or by subscribing.
  • Long-term queries: Plan ahead for frequent queries, as implementing SQL on demand might not be efficient.

Dropbox Case Study

  • Cloud storage: Provides a private file store, accessible via Web interface, users' devices, and a developer API for third-party applications.
  • Sharing: Same folder shares across multiple users' spaces.
  • Abstraction: Abstracted filesystem (files, directories, links, spaces).
  • Client side implementations: Local folder changes get synchronized to the cloud.
  • Chunked files: Large files split into chunks, identified using SHA-256, differences in chunk contents sent in compressed binary diffs.
  • Metadata Servers: The metadata (e.g. who owns the files) is stored on servers within the Dropbox private cloud. The metadata server is a relational database, similar to inodes, handling data integrity consistency.
  • Data Storage: Amazon EC2 and S3 object store is used (until 2016) for keeping, managing, sharing, and retrieving large files. These servers store encrypted files.
  • API: REST over HTTPS, but not RESTful.

Dropbox Architecture

  • Storage servers: Store encrypted files based on file identification.
  • Processing Servers: Encryption and application services.
  • Metadata servers: Stores information about the files (metadata service, database).
  • Notification servers: Stores notifications about files.
  • Multiple devices: The client devices of end-users accessing Dropbox.

Dropbox Interaction

  • Client-side: Clients register on Dropbox, create, and change files and folders within their accounts. Also handle communications with Dropbox servers using HTTPS.
  • Server-side: Servers organize the Dropbox services, and manage and handle connections with clients.
  • Storage services: Manage the data used by the files.

Reason for using two storage systems (Dropbox)

  • Data and metadata consistency and speed: Metadata is critical and must maintain tight consistency; maintaining the appropriate relationships among files and folders speeds up file lookups.
  • Data storage efficiency: The data (the file itself) can be cheaply stored and scaled.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Cloud Storage Services and EVS Overview
45 questions
Benefits of Cloud Storage
5 questions

Benefits of Cloud Storage

SolicitousField6291 avatar
SolicitousField6291
Use Quizgecko on...
Browser
Browser