Summary

This document provides an overview of web databases focusing on SQL and NoSQL databases. It explains various aspects of databases including architecture, data types, queries, and relational database systems.

Full Transcript

Web Databases (SQL and NoSQL) CPEN320 1 Web Application Architecture Web Browser Web Server Storage System HTTP LAN Internet...

Web Databases (SQL and NoSQL) CPEN320 1 Web Application Architecture Web Browser Web Server Storage System HTTP LAN Internet 2 Why/when do we need a Database? Why/when do we need a Database? Need to handle large volumes of data: can’t keep everything in memory store and search for data Need persistent data: Users log in and out Power outages Network failures Need accurate data manipulation e.g., Bank account transactions Web App Storage Properties Always available - Fetch correct app data, store updates ○ Even if many requests come in concurrently - Scalable From all over the world ○ Even if pieces fail - Reliable / fault tolerant Provide a good organization of storing application data ○ Quickly generate the model (data) of a view ○ Handle app evolving over time Good software engineering: Easy to use and reason about Outline Relational Databases (SQL-based) ACID semantics Non-traditional Databases (NoSQL) MongoDB Primer 6 What’s a Database ? Have you used a database? Which? Why? 7 What’s a Database ? In its simplest form, it’s a collection of data – Allows applications to modify/access data through standard interfaces – Separates data storage from logical organization Many types of databases – Hierarchical – Object oriented – Relational – Document-based – Graph-based 8 Relational Database Stores the data in the form of tables (Relations) to map one kind of data to another Why tables ? – Separate data storage from logical view of data – Easy to express relationships between data – Aggregate data from multiple tables on demand (table joins) – Allow declarative queries to be executed 9 Relational Database System A table is made of up of rows (also called tuples or records) A row is made of a fixed (per table) set of typed columns String: VARCHAR(20) Integer: INTEGER Floating-point: FLOAT, DOUBLE Date/time: DATE, TIME, DATETIME Others Example of a Table Much like a spreadsheet, except the columns are of fixed type and rows are identified by a unique key (known as primary key) 11 Database Schema Schema: The structure of the database The table names (e.g. User, Photo, Comments) The names and types of table columns Various optional additional information (constraints, etc.) Column Name Type id INT given_name VARCHAR (20) middle_name VARCHAR (20) family_name VARCHAR (20) date_of_birth Date grade_point_average Floating Point start_date Date Exercise: What is the Schema of the User table below ID first_na last_na location game me me 1 Ian Malcolm Austin, TX 2/11/2024 2 Ellen Ripley Nostromo 11/11/2024 3 Peregrin Took Gondor 14/11/2024 4 Rey Kenobi D'Qar 23/11/2024 5 April Ludgate Awnee, IN 8/12/2024 6 John Ousterho Stanford, CA 21/12/2024 ut Schema of User table ID first_na last_na location game me me 1 Ian Malcolm Austin, TX 2/11/2024 Column types ID - INT 2 Ellen Ripley Nostromo 11/11/2024 first_name - VARCHAR(20) last_name - VARCHAR(20) 3 Peregrin Took Gondor 14/11/2024 location - VARCHAR(100) game - DATE 4 Rey Kenobi D'Qar 23/11/2024 5 April Ludgate Awnee, IN 8/12/2024 6 John Ousterho Stanford, CA 21/12/2024 ut Multiple Unconnected Tables 15 Connected Tables The problem with having multiple unconnected tables is that it’s difficult to tell if the same record is present in both tables – Solution 1 (Ugly): Duplicate the relevant data in each table. Complicates data management, updates and need to anticipate queries in advance – Solution 2 (Preferred): Keep a pointer (foreign key) to the other table so that you can access the data by following the pointer. No need to anticipate queries in advance, easy to modify 16 Connected Tables Each table has what is known as primary key to uniquely identify records in it. Tables keep foreign keys to link to records in other tables. A foreign key is the primary key of the table being linked to. 17 Structured Query Language (SQL) Standard language for accessing relational data ○ Sweet theory behind it: relational algebra Queries: the strength of relational databases ○ Lots of ways to extract information ○ You specify what you want ○ The database system figures out how to get it efficiently ○ Refer to data by contents, not just name SQL Example Commands CREATE TABLE Users ( DELETE FROM Users WHERE id INT AUTO_INCREMENT, last_name='Malcolm'; first_name VARCHAR(20), last_name VARCHAR(20), UPDATE Users location VARCHAR(100)); SET location = 'New York, NY WHERE id = 2; INSERT INTO Users ( first_name, SELECT * FROM Users; last_name, SELECT * from Users WHERE id = 2; location) VALUES ('Ian', 'Malcolm', 'Austin, TX'); Keys and Indexes Consider a model fetch: SELECT * FROM Users WHERE id = 2 Database could implement this by: 1. Scan the Users table and return all rows with id=2 2. Have built an index that maps id numbers to table rows. Lookup result from index. Uses keys to tell database that building an index would be a good idea Primary key: Organize data around accesses PRIMARY KEY(id) on CREATE table CREATE TABLE Persons ( ID INT PRIMARY KEY AUTO_INCREMENT, first_name VARCHAR(50) NOT NULL, Secondary key: Other indexes (UNIQUE) last_name VARCHAR(50) NOT NULL, location VARCHAR(100), UNIQUE (first_name, last_name) ); Table Joins Can be used to combine information from multiple tables together (e.g., through SQL) – Produces a single table containing the information in both tables, without duplication – Joins can involve more than one table For example, we can produce a single join table having the award name and the student details 21 Example of a Join in SQL SELECT * from Employees, Departments where employee.deptID=department.deptID 22 The problem with Joins Joins are expensive as they need to analyze multiple tables (potentially stored elsewhere) Join performance is poor for large tables, though databases are very good at optimizing Requires all the tables to be available during join - otherwise join will fail 23 Outline Relational Databases (SQL-based) ACID semantics Non-traditional Databases (NoSQL) MongoDB Primer 24 SQL Databases have ACID Semantics What does ACID stand for? 25 SQL Databases have ACID Semantics 26 Atomicity of Transactions Transaction is a sequence of operations executed all at once or not at all (Atomicity) If failure occurs, roll-back to the beginning Example: Transfer $1000 from Acct. A to B – Step 1: Locate Account A and check balance – Step 2: Subtract 1000 dollars from Acct A – Step 3: Credit 1000 dollars to Acct B – ERROR -> ROLL BACK steps 3, 2 (and 1 if needed) 27 Consistency Can check one or more constraints on the resulting data, and abort if not satisfied Example: Check status of back account of A, Before transaction and After Does A have enough funds before transaction? Is the funds reduced after the transaction? 28 Isolation Transactions are executed independently from one another, preventing concurrent transactions from interfering. 29 Isolation Transactions are executed independently from one another, preventing concurrent transactions from interfering. 30 Durability Transactions are permanent when committed 31 ACID: Pros and Cons Pros – Simplifies reasoning about actions of the system – Guarantees correctness in presence of failures Cons – Guarantees come with huge performance cost – Cannot guarantee availability when network fails This is due to something called the CAP theorem 32 Class Activity Consider the following transactions T1 and T2 which execute on a bank account database. Which of the four ACID rules, if any, (Atomicity, Consistency, Isolation, Durability) are violated ? Assume initial balance is $100. T1 attempts to deposit $900 to the account. At the same time, T2 checks if the account balance >= 500 and returns true. However, T1 aborts and the account balance becomes $100 again. 33 Solution Isolation The scenario suggests a violation of the isolation property. T2 observes the intermediate state of the database (the account balance as $1000 due to the uncommitted deposit by T1) and acts upon this information. When T1 aborts, the decision made by T2 (that the account balance was >= $500) is based on a transient state that no longer exists. This behavior indicates that the transactions are not properly isolated from each other, as T2 should not be able to see the uncommitted changes made by T1. 34 Outline Relational Databases (SQL-based) ACID semantics Non-traditional Databases (NoSQL) MongoDB Primer 35 What are NoSQL Databases? 36 What are NoSQL Databases? designed to handle unstructured or semi- structured data. Offer flexible schemas, scalability, and are optimized for large-scale data processing Often used in real-time web applications include types like document, key-value, and graph databases. 37 NoSQL Databases Do not natively support Table joins – Are much more scalable and failure tolerant – Must do joins explicitly using program code Do not typically support ACID semantics – So data may be inconsistent or out of sync (provide what is known as eventual consistency) – When failures occur, data may be lost or incorrect 38 CAP Theorem [Brewer’99] You can achieve only two of the following three properties in any database system 39 CAP theorem During a network partition (failure), a system must choose either consistency or availability for it to work through the partition – Traditional SQL-based databases choose consistency and may hence not be available – NoSQL databases choose availability and hence may not be consistent – In web applications, availability often trumps consistency 40 Example of Network Partitioning 41 BASE principle Basically Available Soft state Eventually consistent 42 Eventual Consistency NoSQL databases provide a guarantee that they will eventually be consistent (e.g., when the network partition heals) – Eventually can be a very long time …. – Consistent does not mean correct…. 43 SQL Vs NoSQL - 1 44 SQL Vs. NoSQL - 2 45 SQL Vs. NoSQL - 3 46 SQL Vs. NoSQL - 4 47 Class Activity For each of the following scenarios, will you use a traditional database or no-SQL database. Justify your answer using CAP theorem: – Online photo gallery to browse photos and upload photos occasionally from multiple locations – Large ecommerce store in which the inventory needs to reflect any purchases made instantly in all locations – Shopping cart of customers in an online store in which users can login from different locations 48 Class Activity – Online photo gallery to browse photos and upload photos occasionally from multiple locations – No-SQL – Availability and Partition Tolerance are important – Large ecommerce store in which the inventory needs to reflect any purchases made instantly in all locations – SQL – Consistency and Partition Tolerance are essential – Shopping cart of customers in an online store in which users can login from different locations – No-SQL? – Availability and Partition Tolerance? Or Consistency? 49 Outline Relational Databases (SQL-based) ACID semantics Non-traditional Databases (NoSQL) MongoDB Primer 50 MongoDB Document-oriented NoSQL database – Documents are the equivalent of tables – Stored in JSON format (technically BSON, or binary JSON) – Must be smaller than 16 MB in size No apriori schema needed, or rather schema can be modified dynamically – Can store dissimilar objects in same document – Documents can be embedded/nested in other documents 51 MongoDB: Data types 52 MongoDB: Example Dataset 53 Databases and Collections A MongoDB database consists of multiple databases. Specify db to use by “use test” A database can have multiple collections. Specify collection as db.collectionName.op A collection can have one or more documents – Each record is called a document 54 Insert into a Database db.collectName.insert( document in JSON ) 55 Insert // Insert a single document db.myCollection.insertOne({ name: "John Doe", age: 30, city: "New York" }); // Insert multiple documents db.myCollection.insertMany([ { name: "Jane Doe", age: 25, city: "Los Angeles" }, { name: "Richard Roe", age: 35, city: "Chicago" } ]); 56 Finding objects db.collectName.find() – shows all documents db.collectName.find(JSON object) – shows documents satisfying the given JSON object – Finds all docs with the fields and values equal to the JSON object passed as an argument – Can also specify conditional operations such as $lt, $gt, or logical combinations (using AND, OR) 57 Examples of queries db.restaurants.find( {“cuisine”: “Italian”}) – Finds all restaurants with the cuisine==Italian db.restaurants.find( { “grades.score”: { $gt:30} }) 58 Object_id Every document is given a unique ‘_id’ value – automatically assigned by the MongoDB Object IDs must be unique in a document, and should be of type ObjectID Can be used to remove or update specific objects 59 Update db.collectName.update(objects to be matched, object fields to be updated) Update operator (full list of operators can be found at: https://docs.mongodb.org/manual/reference/operator/update/ 60 ) Remove Can remove documents from a collection using the remove method db.collectName.remove( matching condition ) example: db.restaurants.remove({ “cuisine”: “Italian” }) 61 Operations on each record Example: Print the grades of all restaurants that have more than one grade associated with them. db.restaurants.find().forEach( function(Object) { if (Object.grades.length > 1) printjson(Object.grades); } ) 62 Table Joins in MongoDB Joins are not natively supported in MongoDB and hence need to be written manually – Iterate over each document of the first collection – Lookup the corresponding document in the second collection either by key or by name – Write JavaScript code to merge the information in the relevant fields from the two documents – Return the merged information as the query result 63 Example: Join Operation Assume that you had another collection in the database called “users” which had a list of users who had reviewed each restaurant. Assume this collection is indexed by restaurant name. We wish to write a query to list all the restaurants that have at least one review, and the list of users who reviewed that restaurant. 64 Example Join Operation db.restaurants.find().forEach( function(Object) { if (Object.grades.length > 1) { var user = db.Users.find(Object.name); if (user!=null) { printjson(Object.name); printjson(user); } } } ) 65 Class Activity You have two collections in a MongoDB database. marks contains the list of students in a course with their marks and student number, and students contains the student number along with details such as first name, last name etc. Compute the join of these two collections (in JS code) from the Mongdb shell to print each student along with their marks and details. You can assume the database is already loaded into the shell. 66 Solution to the activity db.marks.find().forEach( function(Object) { var st = db.students.find( {“student no”: Object.studentNo} ); if (st!=null) { printjson(st); printjson(Object.marks); } else { print(“No match found for “ + Object.studentno); } } ) 67

Use Quizgecko on...
Browser
Browser