NoSQL Databases PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of NoSQL databases, including their different types, key features, advantages, use cases, and examples. It details key-value, document, column-family, and graph databases, covering concepts such as data models, schema flexibility, querying, and scalability.
Full Transcript
NoSQL 2 Assign NoSQL Overview Database Type Data Model Key Features Advantages Use Cases Examples Simple Extremely fast, structure...
NoSQL 2 Assign NoSQL Overview Database Type Data Model Key Features Advantages Use Cases Examples Simple Extremely fast, structure, each scalable for Caching, value is high- session identified by a throughput Redis, Amazon Key-Value Key-value pairs management, unique key; applications, DynamoDB shopping cart highly efficient supports in- data for simple memory lookups caching Flexible for semi- Stores data as structured data, Content JSON, BSON, documents with allows schema- management, MongoDB, Document or XML flexible schema; less design, user profiles, e- Couchbase documents supports nested supports commerce structures hierarchical relationships Optimized for Column- write-heavy oriented Rows and applications, Time-series storage; Apache columns scalable across data, real-time Column-Family supports Cassandra, grouped into distributed analytics, IoT dynamic HBase families systems, data addition/removal supports large of columns datasets Ideal for Represents complex relationships as relationships, Social networks, first-class Nodes, edges, efficient recommendation Neo4j, Amazon Graph citizens; and properties traversals, systems, fraud Neptune optimized for supports detection relationship graph-specific queries algorithms Hybrid Multi-model Combines Flexible for Complex, N/A (combining multiple NoSQL evolving evolving various types) paradigms, applications, applications NoSQL 2 1 Database Type Data Model Key Features Advantages Use Cases Examples supports multi- enables data model data integration handling, unified across different query interface formats Types of NoSQL Databases: Key-Value Column Family Graph Document Key-Value: Stores data as a collection of key-value pairs each data item (value) associated with unique key; to retrieve data when needed Highly efficient for simple read and write ops ; very fast access to data Its simplicity and efficiency suitable for wide range of uses ( especially for high-speed data retrieval and low latency access ) No schema data is entirely opaque to the database Usage: Caching Session Management NoSQL 2 2 Shopping cart data Advantages: Extremely fast for simple lookups Scalable and efficient for high- throughput apps Characteristics of Key-Value ( FPSS ) Simple Data Model: ( S) straightforward data model where each data item is stored with a unique key. No schema or structure on data = allowing flexibility in storing different data types High Performance: ( P) optimised for fast read and write ops achieve high performance using in-memory caching, asynchronous replication, and other optimisation technique Scalability ( S ) designed to scale horizontally; allowing handling large volume of data and high traffic loads; can distribute data across multiple nodes in cluster to support growing workloads Flexibility Data types: ( F ) can store various types of data; string, integer, binary data, etc Use cases: Commonly used in apps that require fast and efficient data access; caching, session management, user profiles, real-time, analytics, distributed systems Example: Redis high performance, in-memory data store that supports various data types features such as persistence, replication, and pub/sub messaging Amazon DynamoDb NoSQL 2 3 Fully managed NoSQL, provided by AWS; seamless scalability, low-latency, and flexible data models Key: "user123" Value: {"name":"Jane","age":28,"preferences":["sport","movies"]} Document: Stores data in a semi-structured format known as documents Each document is a self-contained unit of data that can contain key-value pairs, arrays, or nested docs Do not require a fixed schema; allowing for greater flexibility in data representation Offers flexibility, scalability, and performance; popular choice for modern application that require dynamic and evolving data models Schema-less or dynamic schema; different documents in the same collection to have different structures Advantages: Highly flexible for semi-structured data Support hierarchical relationship Easy to scale horizontally Example: MongoDB: widely-used docs database that stores in BSON format and provides powerful querying capabilities, indexing, and horizontal scalability. Couchbase: Distributed docs database that combines the flexibility of JSON docs with scalability and performance of a key-value store NoSQL 2 4 Characteristics of Documents: Flexible Schema: allow each document to have its own structure, meaning that different documents in the same collection can have different fields and data types. Making storing and querying data with evolving or unpredictable schemas easier JSON/BSON Format: usually stored in JSON( Javascript Object Notation ) or BSON ( Binary JSON ); human readable Hierarchical Data: support nested data structures; allows complex relationships to be represented within a single document Querying provide a query language that allow for flexible querying of data based on the content of documents; search specific field, filter docs, and perfom complex aggregation and manipulation of data Scalability designed to scale horizontally; handles large volume of data and traffic across multiple nodes in a cluster Column Family: Data is stored in rows and columns, but unlike relational databases, the columns in a row can vary. A column family groups related columns together. NoSQL 2 5 Designed to handle large volumes of data and support architectures for scalability and fautl tolerance Column-family databases offer a unique approach to data storage and querying, making them well-suited for applications with specific requirements around scalability, performance, and flexibility. Use Cases: Time series data Real time application IoT sensor data Advantages: Optimised for write-heavy application Supports large-scale distributed systems Example: Apache Cassandra: known for its scalability, high availability and fault tolerance; handle large dataset and support real-time with low-latency data access Apache HBase: built on top of the Hadoop Distributed File System (HDFS); designed for random, real-time read/write access to large datasets Characteristics Column Family: Column-oriented storage: data is stored in column instead of rows. Each row can have a different set of columns, and each column can have a different data type. Scalability Designed to scale horizontally across multiple nodes in a cluster handling large datasets and traffic Partitioning: being partitioned based on a partition key or shard key Allows database to distribute data evenly across nodes and enables parallel processing of queries. Schema Flexibility: offer flexible schemas, allowing for dynamic addition and removal of column without schema NoSQL 2 6 Querying: query language that allows efficient sorting based on column values Graph: designed to represent and store data in the form of graphs; nodes, edges, and properties Graph; nodes = representing entities and edges = connections Optimised for querying and traversing these relationships; suitable for interconnected data Advantages: Designed for complex queries involving relationships Efficient for traversing data Use Cases: Social Networks Recommendation system Fraud Detection NoSQL 2 7 Characteristics of graph: Graph Data Model: Used to represent network of nodes and edges. Both nodes/edges can have properties Relationships as First-Class Citizens: relationships treated first-class citizens; meaning they are an integral part of the data model. Allow efficient traversal of relationships and querying of connected data. Index-Free Adjacency: allow efficiently navigate the graph structure; node store direct pointers to their adjacent nodes, eliminating the need for costly index lookup and enabling fast traversal of relationships. Querying: provide query languages that allow for expressive and efficient querying of graph data. Schema Flexibility: Offer flexible schema, allowing dynamic addition and modification of node and edge type without requiring a predefined schema. This makes it easier to adapt to changing data requirements and evolving data models Hybrid NoSQL: NoSQL 2 8 Choosing the right NoSQL database for your project involves considering various factors, such as the nature of your data, your application requirements, scalability needs, consistency requirements, and operational considerations. Pick the right tool for the right job – there is no perfect answer Understand Your Data Define Your Application Requirements Evaluate NoSQL Database Types Consider Consistency Models Assess Scalability and Performance Examine Operational Considerations Evaluate Ecosystem and Integration Perform Proof of Concept (POC NoSQL 2 9