BDD300-week1-NOSQL.pdf
Document Details
Uploaded by NicerWisdom7728
Seneca Polytechnic
Full Transcript
BDD300: Advanced Database Design Week 1 Chapter 1: Why NoSQL? 1 This material is built based on, Pramod Sadalage, Martin Fowler, “NoSQL Distilled: A brief guide to the emerging world of polyglot persistent”, Addison-Wesley, 2012...
BDD300: Advanced Database Design Week 1 Chapter 1: Why NoSQL? 1 This material is built based on, Pramod Sadalage, Martin Fowler, “NoSQL Distilled: A brief guide to the emerging world of polyglot persistent”, Addison-Wesley, 2012 2 Introduction Database - Organized collection of data Database Management System (DBMS): a software package with computer programs that controls the creation, maintenance and use of a database 3 Relational Databases Relational Databases guaranteed: Persistent Data stores large amounts of persistent data in the form of records and tables that are durable when changing devices and software Concurrency control Database concurrency is the ability of a database to allow multiple users to affect multiple transactions. Integration mechanisms Shared database integration. Multiple applications store their data in a single database Standard Model Different vendors’ SQL dialects are similar, transactions operate in mostly the same way 4 Impedance Mismatch The difference between the relational data model and the in memory data structures The relational data model organizes data into a structure of tables and rows Relations and tuples Tuple A set of name-value pairs Relation A set of tuples All operations in SQL consume and return “relations” 5 Impedance Mismatch (Cont.) Tuple cannot contain any structure Nested record or list In-memory data structures Richer structures than relations Translation to a relational representation needed To the different representations 6 Single data structure in memory is split into many rows from many tables in RDBMS 7 Application and integration with DB Integration mechanism between applications Storing their data in a common database Applications operate on a consistent set of persistent data Downsides Structure becomes more and more complex Hard to preserve database integrity Database cannot trust applications 8 Service-Oriented Architecture: Interoperability Requires better interaction protocols between applications and database Shift to web services as an integration mechanism Requires more flexibility for the structure of the data being exchanged XML (or JSON) provides a richer data structure than SQL 9 Attack of the Clusters: early 2000s Large sets of data appeared Links, social networks, activity in logs, mapping data We have two choices UP or OUT Scale Up Scale Out Bigger machines, more Lots of small machines in processors, disk storage, a cluster and Commodity hardware memory 10 RDBMS for clusters? Relational databases are not designed to be run on clusters Clustered RDBMS e.g. Oracle RAC or MS SQL Server Work on the concept of a shared disk subsystem Use a cluster-aware file system Cluster still has the disk subsystem as a single point of failure Sharding the database RDBMS run as separate servers for different sets of data Running on a cluster Raised prices 11 Inspiring approaches Mismatch between relational database and clusters Challenges for Amazon and Google Running large clusters Voluminous datasets BigTable from Google Fay Chang, et al. "Bigtable: A Distributed Storage System for Structured Data,"OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006 Dynamo from Amazon Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twentyfirst ACM SIGOPS symposium on Operating systems principles, pp. 205-220 12 NoSQL databases Stands for “Not Only SQL” Basic Idea Operates without a schema Allows users to add fields without having to define any changes in structure first Useful when dealing with nonuniform data and custom fields Reasons for considering NoSQL Handles data access with size and performance that demand a cluster Improves the productivity of application development by using a more convenient data interaction style 13 Types of NoSQL databases Key-value store the simplest form of NoSQL databases. This schema-less data model is organized into a dictionary of key-value pairs, where each item has a key and a value. Redis and Memcached are examples of an open-source key-value databases. DynamoDB is an example we will learn during this course. Document store As suggested by the name, document databases store data as documents. 14 An example of a document-oriented database is MongoDB Wide-column store These databases store information in columns, enabling users to access only the specific columns they need without allocating additional memory on irrelevant data. An example is Apache Cassandra Graph store Data elements are stored as nodes, edges and properties. Any object, place, or person can be a node. An edge defines the relationship between the nodes. An example is Neo4j 15 In-memory store data resides in the main memory rather than on disk, making data access faster than with conventional, disk-based databases. An example is IBM solidDB Advantages of NoSQL Cost-effectiveness Flexibility Replication Speed References: https://www.ibm.com/topics/nosql-databases 16