Module 2 Notes (1) Data Science 02 PDF
Document Details
Uploaded by SpellbindingTopology
Indian Institute of Technology Bombay
Tags
Summary
These notes cover the basics of database management systems, data models, and SQL. Key topics include database management systems, data models, SQL, XML and JSON functionalities and benefits of database management systems
Full Transcript
Learning Objectives o Database Management Systems o Data Models o Introduction to SQL o SQL Queries o XML and JSON Data and Management Data represents the traces of real-world processes. Data is valuable but hard and costly to manage ○ Collection, representation, storage Data management r...
Learning Objectives o Database Management Systems o Data Models o Introduction to SQL o SQL Queries o XML and JSON Data and Management Data represents the traces of real-world processes. Data is valuable but hard and costly to manage ○ Collection, representation, storage Data management refers to the process of collecting, storing, organizing, and maintaining data to ensure its reliability, accessibility, and security. 4 Data Management: Functionalities Describe real-world entities in terms of stored data Create & persistently store large datasets Efficiently query & update ○ Must handle complex questions ○ Must handle sophisticated updates ○ Performance matters 5 Data Management: Functionalities Change structure (e.g., add attributes) Concurrency control: enable simultaneous queries, updates etc. Crash recovery Access control, security, integrity It is difficult and costly to implement all these features! 6 DBMS Relational database management systems (RDBMS) Hadoop Distributed File System (HDFS)-based systems Stream management systems: Apache Kafka 7 Database What is a database? A collection of files storing related data 9 Database Management System What is a Database Management System (DBMS)? o A software package designed to store and manage databases o A large, integrated collection of data o It models real-world enterprise. o Entities (e.g., students, courses) o Relationships (e.g., Jim is taking course DA106) 10 Benefits of Database Management System Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from crash 11 DBMS Concepts: Data Models A data model is a general, conceptual way of structuring data. Suppose we have book data: author, title, publisher, pub date, price, etc. How should we organize such data in files? 13 DBMS Concepts: Data Models Relational Data represented as relations Semi-structured (JSON) Data represented as trees Key-value pairs Used by NoSQL systems Graph Object-oriented 14 DBMS Concepts: Data Models Elements of Data Model Instance ○ The actual data Schema ○ Describe what data is being stored Query language ○ How data can be retrieved and manipulated The relational model of data is the most widely used. Main concept: relation, basically a table with rows and columns. Every relation has a schema, describes the columns 15 DBMS Concepts: Relational Data Models Instance ○ Organized as “table” or “relation” consists of “column” aka “attribute” aka “field” and “row” aka “tuple” aka “record” Schema ○ “table name” aka “relation name” ○ “column name” aka “attribute name” ○ Each attribute has a “type” aka “domain” aka “data type” 17 DBMS Concepts: Relational Data Models “degree” of a relation ○ Number of attributes Example types: ○ Strings: CHAR(20), VARCHAR(50), TEXT ○ Numbers: INT, SMALLINT, FLOAT ○ MONEY, DATETIME, … ○ Usually vendor specific ○ Statically and strictly enforced 18 DBMS Concepts: Relational Data Models Primary Key A Primary Key is the minimal set of attributes of a table that has the task to uniquely identify the rows It uniquely identifies a record Foreign Key A foreign key is a key used to link two tables together It is a column or a combination of columns whose values match a Primary Key in a different table. 19 DBMS Concepts: Relational Data Models 20 DBMS Concepts: Relational Data Models Example: “type” in a Schema cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y Company(cname: varchar(30), country: char(20), no_employees: int, for_profit: char(1)) 21 DBMS Concepts: Relational Data Models Some key points: Tables in RDBMS are not ordered ○ The rows can be exchanged Tables don’t contain any nested attributes ○ Compare with semi structured data models Tables maintain data independence ○ Applications are insulated from how data are stored ○ Protection against the changes in logical and physical structure of data 23 DBMS Concepts: Relational Data Models Key aspects that we won’t cover (out of scope): Concurrency control Atomicity Logging Layered architecture We will explore the basics of SQL (Structured Query Language) used for query in a RDBMS in the future lectures. 24 DBMS Concepts: Relational Data Models 25 What is SQL SQL stands for Structured Query Language SQL lets you access and manipulate databases SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987 27 What is SQL What Can SQL do? SQL can execute queries against a database SQL can retrieve data from a database SQL can insert records in a database SQL can update records in a database SQL can delete records from a database SQL can create new databases SQL can create new tables in a database SQL can create stored procedures in a database SQL can set permissions on tables, procedures… 28 What is SQL Although SQL is an ANSI/ISO standard, there are different versions of the SQL language. However, to be compliant with the ANSI standard, they all support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner. Most of the SQL database programs also have their own proprietary extensions in addition to the SQL standard! 29 What is SQL Example: To build a website that shows data from a database, you will need: An RDBMS database program (i.e. MS Access, SQL Server, MySQL) To use a server-side scripting language, like PHP or ASP To use SQL to get the data you want To use HTML / CSS to style the page 30 Quick SQL Examples SELECT * FROM Orders SELECT EmployeeID FROM Orders SELECT DISTINCT EmployeeID FROM Orders SELECT * FROM Customers ORDER BY Country; 32 SQL Details Structured Query Language(SQL) commands are categorised into four categories: 1. DDL – Data Definition Language 2. DQl – Data Query Language 3. DML – Data Manipulation Language 4. DCL – Data Control Language 33 SQL: DDL (Data Definition Language) Data Definition Language actually consists of the SQL commands that can be used to define the database schema. DDL is a set of SQL commands used to create, modify, and delete database structures but not data. These commands are normally not used by a general user, who should be accessing the database via an application. 34 SQL: DQL (Data Query Language) DQL statements are used for performing queries on the data within schema objects. It is a component of SQL statement that allows getting data from the database and imposing order upon it. When a SELECT is fired against a table or tables the result is compiled into a further temporary table, which is displayed or perhaps received by the program i.e. a front-end. 35 SQL: DML(Data Manipulation Language) The SQL commands that deals with the manipulation of data present in the database belong to DML or Data Manipulation INSERT: Adds new records to a table. UPDATE: Modifies existing records in a table. DELETE: Removes records from a table. 37 SQL: DCL (Data Control Language) DCL includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions, and other controls of the database system. 38 SQL: TCL There is one more category known as Transaction Control Language (TCL). These commands are used to manage transactions in the database. These are used to manage the changes made by DML-statements. It also allows statements to be grouped together into logical transactions. 39