NoSQL Database, SQL vs. NoSQL (PDF)
Document Details
Uploaded by DecisiveGreatWallOfChina1467
Tags
Summary
This document provides a comparison of NoSQL and SQL databases. It discusses different types of NoSQL databases, including key-value, document, wide column, and graph stores, highlighting their characteristics and use cases. It also compares them to traditional SQL relational databases.
Full Transcript
13 NoSQL Database, SQL vs. NoSQL (GitHub System Design Primer) NoSQL NoSQL is a collection of data items represented in a key-value store, document ** ** ** store, wide column store, or a graph database. Data is denormalized, and joins are...
13 NoSQL Database, SQL vs. NoSQL (GitHub System Design Primer) NoSQL NoSQL is a collection of data items represented in a key-value store, document ** ** ** store, wide column store, or a graph database. Data is denormalized, and joins are ** ** ** ** ** generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency. [ ]() ** BASE is often used to describe the properties of NoSQL databases. In comparison ** with the CAP Theorem, BASE chooses availability over consistency. [ ]() ** Basically available - the system guarantees availability. ** ** Soft state - the state of the system may change over time, even without input. ** ** Eventual consistency - the system will become consistent over a period of time, ** given that the system doesn't receive input during that period. ⠀ In addition to choosing between SQL or NoSQL, it is helpful to understand which type [ ]() of NoSQL database best fits your use case(s). We'll review key-value stores, ** ** ** document stores, wide column stores, and graph databases in the next section. ** ** ** ** ** Key-value store Abstraction: hash table A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing [ ]() efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value. Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed. A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database. Source(s) and further reading: key-value store [ Key-value database ]() [ Disadvantages of key-value stores ]() [ Redis architecture ]() [ Memcached architecture ]() Document store Abstraction: key-value store with documents stored as values A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, * many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types. * Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other. Some document stores like MongoDB and CouchDB also provide a SQL-like language [ ]() [ ]() to perform complex queries. DynamoDB supports both key-values and documents. [ ]() Document stores provide high flexibility and are often used for working with occasionally changing data. Source(s) and further reading: document store [ Document-oriented database ]() [ MongoDB architecture ]() [ CouchDB architecture ]() [ Elasticsearch architecture ]() Wide column store Diagram *[ Source: SQL & NoSQL, a brief history ]()* This diagram illustrates the structure of a wide column store database model, ** ** commonly used in NoSQL databases (like Cassandra) for handling large volumes of structured data. In a wide column store, data is organized into rows, columns, and families of columns, enabling efficient access and storage of related data. Components: 1. Row Key: ** ** Each row in the database is identified by a unique row key. In this example, ** ** the row key is labeled as 1. ** ** The row key acts as a unique identifier for the data in this row, allowing fast lookup and retrieval. 2. Super Column Family: ** ** The super column family is a high-level grouping of related data. ** ** In this example, the super column family is labeled as companies. ** ** This super column family contains multiple column families that are ** ** associated with each other by context. 3. Column Family: ** ** A column family is a group of columns that are related and stored together. ** ** In this example, there are two column families within the "companies" super column family: ** Address: Contains columns that store location data (city, state, street). ** ** Website: Contains columns that store website-related information ** (subdomain, domain, protocol). 4. Columns: ** ** Each column within a column family holds a specific piece of data. ** ** In the "Address" column family: ** City: San Francisco ** ** State: California ** ** Street: Kearny St. ** In the "Website" column family: ** Subdomain: www ** ** Domain: grio.com ** ** Protocol: http ** Hierarchical Structure: The data is structured in a nested format: ** Super Column Family (companies) ** ** Column Families (address, website) ** ** Columns (city, state, street, subdomain, domain, protocol). ** This structure allows for efficient grouping of related data, enabling fast access patterns based on the hierarchy of keys and families. ** Key Benefits of a Wide Column Store: ** ** Flexible Schema: New columns can be added dynamically without altering the ** existing structure, making it ideal for applications with evolving data needs. ** Efficient Data Retrieval: Data is grouped by related context, allowing for fast ** access to specific data sets without scanning unrelated information. ** Optimized for Write and Read Performance: Data retrieval and storage are ** optimized for large-scale applications with high read and write demands. This wide column store model is particularly effective for use cases involving large ** ** volumes of structured data, where related information can be grouped for fast access, and the schema needs to adapt over time. Abstraction: nested map ColumnFamily ` A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution. Google introduced Bigtable as the first wide column store, which influenced the open- [ ]() source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. [ ]() [ ]() Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges. Wide column stores offer high availability and high scalability. They are often used for very large data sets. Source(s) and further reading: wide column store [ SQL & NoSQL, a brief history ]() [ Bigtable architecture ]() [ HBase architecture ]() [ Cassandra architecture ]() ⠀ Graph database Diagram *[ Source: Graph database ]()* This diagram represents a graph database structure, where data is organized as ** ** ** nodes and relationships (edges) between nodes. Each node and relationship can ** ** ** have properties, making this structure ideal for representing connected data. Nodes: ** Node 1: ** ** Id: 1 ** ** Name: Alice ** ** Age: 18 ** ** Node 2: ** ** Id: 2 ** ** Name: Bob ** ** Age: 22 ** ** Node 3: ** ** Id: 3 ** ** Type: Group ** ** Name: Chess ** Each node represents an entity (person or group) and has a unique identifier ( Id ) ` ` along with other attributes ( Name , Age , or Type ). ` ` ` ` ` ` Relationships (Edges): 1. Relationship from Alice to Bob: ** ** ** Id: 100 ** ** Label: knows ** ** Since: 2001/10/03 ** ** Direction: Directed from Alice to Bob ** 2. Relationship from Bob to Alice: ** ** ** Id: 101 ** ** Label: knows ** ** Since: 2001/10/04 ** ** Direction: Directed from Bob to Alice ** 3. Relationship from Alice to Chess Group: ** ** ** Id: 102 ** ** Label: is_member ** ** Since: 2005/07/01 ** ** Direction: Directed from Alice to the Chess Group ** 4. Relationship from Chess Group to Alice: ** ** ** Id: 103 ** ** Label: Members ** ** Direction: Directed from the Chess Group to Alice ** 5. Relationship from Chess Group to Bob: ** ** ** Id: 104 ** ** Label: Members ** ** Direction: Directed from the Chess Group to Bob ** 6. Relationship from Bob to Chess Group: ** ** ** Id: 105 ** ** Label: is_member** ** Since: 2011/02/14 ** ** Direction: Directed from Bob to the Chess Group ** Structure Explanation: ** Bidirectional Friendship: The "knows" relationships (Id 100 and Id 101) indicate ** ** ** a mutual connection between Alice and Bob, signifying a friendship where each person knows the other. ** Membership in Group: The Chess Group (Node 3) has "Members" relationships ** ** ** ** ** with both Alice and Bob, showing that they are members of this group. Additionally, Alice and Bob each have "is_member" relationships with the ** ** Chess Group, specifying when they joined (2005/07/01 for Alice and 2011/02/14 for Bob). Summary: In this graph database model: ** ** ** Nodes represent entities, such as people (Alice, Bob) or groups (Chess Group). ** ** Relationships (edges) define the connections between nodes, such as ** friendships and group memberships, and may include additional properties like dates. This structure efficiently handles complex relationships and interconnected data, ideal for applications needing to query relationships and paths between entities. This graph database model provides flexibility and clarity in representing ** ** relationships, allowing queries about friendships, group memberships, or other connections between entities. Abstraction: graph In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships. Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs. [ ]() Source(s) and further reading: graph [ Graph database ]() [ Neo4j ]() [ FlockDB ]() ⠀ Source(s) and further reading: NoSQL [ Explanation of base terminology ]() [ NoSQL databases a survey and decision guidance ]() [ Scalability ]() [ Introduction to NoSQL ]() [ NoSQL patterns ]() SQL or NoSQL Diagram *[ Source: Transitioning from RDBMS to NoSQL ]()* This diagram contrasts two different types of data models: Relational Data Model ** ** (associated with SQL databases) and Document Data Model (associated with NoSQL ** ** databases). Relational Data Model (SQL): Depicted as a table structure with rows and columns. ** ** The columns are labeled C1, C2, C3, and C4, each representing a specific ** ** ** ** attribute or field. ** Characteristics: ** ** Highly-structured organization, where data is arranged in predefined ** tables. ** Rigidly-defined data formats and record structure: Each row (or record) ** follows the same schema, with a fixed number of columns and data types. ** Example: Traditional databases like MySQL, PostgreSQL, and Oracle use ** relational models to store data in a structured, consistent format. Document Data Model (NoSQL): Depicted as JSON-like documents, emphasizing flexibility in data ** ** representation. ** Characteristics: ** ** Collection of complex documents: Data is stored as documents that can ** vary in structure, allowing each record to have its own unique fields. ** Arbitrary, nested data formats and varying "record" format: Unlike SQL, ** the structure of each document doesn’t have to match the others. Documents can have nested data and different field names. ** Example: NoSQL databases like MongoDB and Couchbase use document ** models to store data in JSON or BSON format, providing flexibility for unstructured or semi-structured data. ** Comparison Summary: ** ** Relational Data Model: Suitable for applications needing highly-structured, ** consistent data storage with predefined schemas, commonly used in transactional applications. ** Document Data Model: Ideal for applications requiring flexibility in data ** structure, such as when handling complex, varied data formats that may change over time. This comparison highlights how SQL databases excel in structured, consistent data ** ** environments, while NoSQL databases offer more flexibility for evolving and diverse ** ** data needs. Reasons for SQL: ** ** Structured data Strict schema Relational data Need for complex joins Transactions Clear patterns for scaling More established: developers, community, code, tools, etc Lookups by index are very fast ⠀ Reasons for NoSQL: ** ** Semi-structured data Dynamic or flexible schema Non-relational data No need for complex joins Store many TB (or PB) of data Very data intensive workload Very high throughput for IOPS ⠀ Sample data well-suited for NoSQL: Rapid ingest of clickstream and log data Leaderboard or scoring data Temporary data, such as a shopping cart Frequently accessed ('hot') tables Metadata/lookup tables ⠀ Source(s) and further reading: SQL or NoSQL [ Scaling up to your first 10 million users ]() [ SQL vs NoSQL differences ]()