Database Processing (2024-2025) - OIS PDF
Document Details
Uploaded by StimulativeSydneyOperaHouse1714
Università degli Studi di Padova
2025
Diego Campagnolo - Massimiliano Oleotto
Tags
Summary
This document discusses database processing, from basic concepts to more advanced topics. It includes learning objectives, questions, and explanations about big data and NoSQL databases. The content seems to be from a university course in 2024-2025. It includes detailed information on databases.
Full Transcript
1 Lesson 2 Database Processing from table to big data ing. Massimiliano Oleotto 2 Learning Objectives Q1. Why do business professional...
1 Lesson 2 Database Processing from table to big data ing. Massimiliano Oleotto 2 Learning Objectives Q1. Why do business professionals need to have a minimum knowledge about Database Processing? Q2. What is a DB? Q3. What is SQL? Q4. What is a DBMS? Q5. What is DB Processing? Q6. How is the building of a DB? Q7. Why is NoSQL DBMSs becoming important? Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 3 Exam Question – text Which of the following is not a characteristic used to describe Big Data according to the “4v” model? 1. Veracity 2. Velocity 3. Volume 4. Viability Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q1. Why do business professionals need to have a minimum 4 knowledge about Database Processing? Businesses of every size organize Data into collections called Database Business professionals work frequently with Database → “work” means use, purchase, design, … In 2024 basic knowledge of Database Processing is a key skill of any business professional Minimum knowledge = basic terms and concepts Because it is complicated (!) Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q1. Why do business professionals need to have a minimum 5 knowledge about Database Processing? The DB is the center of an IS The Information requested by IS users needs definite Data, and therefore it guides the IS DB development As an IS user, you may play a critical role in its development Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q1. Why do business professionals need to have a minimum 6 knowledge about Database Processing? Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 7 Learning Objectives – recap 1 Q1. Why do business professionals need to have a minimum knowledge about Database Processing? in every organization Data is collected into Database, alias DB have a minimum knowledge to work with it acquire a skill that will be very useful for every career Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 8 Thanks to IT today we are able to store large amounts of Data This ability is useless if we are unable to extract Information from Data To extract Information from Data, Data must be organized in a certain way A convenient way to do this is to use a DB A DB is an organized collection of Data with definite elements … of course, stored in a Computer (!) Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 9 There is more than one definition for information, but all definitions have common elements: it is not Data, it is … more it is located “inside” a person, not a Computer it varies from person to person A very useful definition in the context of Database Processing is that given by the DKIW Pyramid Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 10 The DIKW Pyramid is a hierarchical model that Knowledge Management has developed to categorize and simplify the key concepts involved in cognitive processes in order to create IS capable of managing large amounts of Data Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 11 The 1st element of a DB is a Table, where: a column (Field) defines a Data Type → content and format a cell (Value) defines a Data → a specific content in a specific format a row (Record) define a Data Set → Data related to each other Note that: Tables and Fields are identified by a name Table has a specific number of Fields but can have any number of Records Key is the Field (or Fields) that uniquely identify the Records Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 12 The Key of a Table is called Primary Key and it is unique for a Table → in this way a Table is a list of “things” with some of their specifications identified by the Primary Key Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 13 A Table can be implemented with: a Flat File → e.g. a CSV file a Spreadsheet → e.g. Microsoft Excel Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 14 Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 15 Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 16 The 2nd element of a DB is a Relationship, explained by a … table (!) Reasoning Example In a table there may be In a Table with Fields like Song (Primary Key), Album Redundant Data and Artist, Data for Artist is duplicated To eliminate Redundant Data An album contains different songs and an artist (without losing Data) you must produces different albums first identify Relationships Then split the Table into different Two Tables: one with Fields like Song (Primary Key) Tables based on the and Album, and one with Fields like Album (Primary Relationships Key) and Artist At last connect the two Tables The Album Field of the first Table is linked as with a Foreign Key Foreign Key to the Album Field of the second Table Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 17 Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 18 A Foreign Key is a Field that is a Primary Key of a different Table, not the Table in which it is → in this way a DB is a set of Tables linked together by Keys Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 19 The set of Tables and Relationships of a DB are its structure, called DB Schema → a DB Schema exists before having Data in the DB Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 20 The 3rd element of a DB is the Data Dictionary: it holds Data about the DB Schema and the Data – these Data is called Metadata because it is the Data used to describe Data it means that a DB contains, within itself, a description of its contents – this is the self-describing nature of DB Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q2. What is a DB? 21 A DB obtained with Tables, Relationships and Data Dictionary is called a Relational DB The name derives from the “initial” Table divided into Tables and Relationships, a process that is technically called “Relate” – technically its columns are Attributes and its rows are Tuples (!) Relational DB is currently the most popular DB The reason for its popularity is that you can extract Information from Data easily thanks to SQL Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 22 Learning Objectives – Recap 2 Q2. What is a DB? Database is an organized collection of Data with definite elements stored in a Computer in order to extract information from it the definite elements of a (Relational) DB are Tables, Relationships (with Keys) and Data Dictionary (a description of DB) Excel is not a DB because it contains only Tables nowadays a (Relational) DB is still the most convenient way to extract Information (meaning) from Data thanks to Structured Query Language, alias SQL Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 23 SQL is a language to query the DB by selecting Data … but not only (!) Because it allows other actions on the DB such as: manipulate the Data by inserting, updating and deleting it modify the DB Schema by creating and altering Tables and Relationships grant User Access to Data Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 24 What is User Access? … Yes, what is it? It is how a user can access the DB, so it is a series of rules that define the privileges of the user, that is: what Data a user can access – if he accesses a part, there is a DB Subschema for him what actions a user can perform on the Data – only selection or manipulation if a user can perform actions on the DB Schema – in this case the user is a DB Administrator Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 25 We can understand how powerful is SQL because: with SQL you create the DB Schema with SQL you insert the Data into the DB with SQL you get Information from Data even with SQL you are able to give access to users But... is it simple? Yes, it is Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 26 To understand the simplicity of SQL, let’s see how easy it is to query the DB by selecting Data What you want is an instruction that says: get a Table → DATA select the Columns and then the Rows → SORTED organize them, for example by ordering the Rows → ARRANGED show them → PRESENTED VISUALLY Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 27 The instruction you want is SELECT Statement, that has a definite syntax Here is the instruction you want: get a Table → FROM TABLE select the Columns and then the Rows → SELECT COLUMNS WHERE ROWS ARE organize them, for example by ordering the Rows → ORDER BY show them → AS ALIAS Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 28 So … what does it mean? Note that it is a correct SQL statement Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q3. What is SQL? 29 What is a Join? … Yes, what is it? We have seen simple cases that involves one Table However, there are less simple cases that involve more than one Table In these cases, the syntax of the SELECT Statement includes the JOIN Clause With the JOIN Clause, SQL uses Relationships to recreate the “initial” Table → we return to simple cases that involve one Table Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 30 Learning Objectives – Recap 3 Q3. What is SQL? SQL is a language with which you can operate with a DB in a simple way you can select Data to extract information but you can also manipulate Data and DB Schema, and define User Access to DB – because a DB is generally used by more than 1 user Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q4. What is a DBMS? 31 DBMS is an Application used to: create a DB – define DB structure (Tables, Relationships, Metadata) process a DB – read or change Data administer a DB – do a wide variety of activities which serve to make the DB intact DB is a collection of Data stored in a (or more than one) file, DBMS is an application that allows to manage them Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q4. What is a DBMS? 32 Activities to administer a DB: memorization – Data file organization on Mass Storage devices security – User Access concurrency or Data conflict – when a user calls a Data no other users can use it → this is possible through the “lock” mechanism fault tolerance – the actions that modify the Data or the DB are not instantaneous, even if very fast, and a malfunction must not interrupt them in the middle → it is possible through the “commit & rollback” mechanism recovery – DB copy → it is possible through the “backup & restore” mechanism Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q4. What is a DBMS? 33 These are the most used DBMS in the world Commercial DBMS Open Source DBMS Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q4. What is a DBMS? 34 And … Microsoft Access? Well, it’s a DBMS, of course … but it’s a Personal DBMS not an Enterprise DBMS Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 35 Learning Objectives – Recap 4 Q4. What is a DBMS? DBMS is an Application used to manage a DB “Manage” means create, process and “administer” (and administer means dealing with memorization, security, concurrency, fault tolerance and recovery) DB is often used also for DBMS, but DB and DBMS are different Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 36 The DB is valuable but Data is in a raw form, i.e. Rows in a Table … there is no Information here The DBMS is valuable but it is a mean to operate with the DB, i.e. querying DB … there is no Information here So, how do you retrieve Information from Data? … You need an Application that performs specific operations: like entering Data, elaborating it and presenting it as Information using a DBMS to process a DB Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 37 That Application is a DB Application And the process of performing those specific operations is DB Processing Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 38 Elements of a DB Application: Form part of the Software UI that allows to enter Data: it is a “form to fill in” Report part of the Software UI that shows Data in a structured way (tables, graphs, etc.) Query Program that allows to enter Queries Application Program Program that allows to process logic for a given need → a query hasn’t logic Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 39 FORM REPORT QUERY APPLICATION PROGRAM Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 40 A DB Application can also serve as an intermediary between DBMS and the Internet Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 41 A Database Application System is the sum of three components: a DB, a DBMS and a DB Application Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 42 A Database Application System is a Distributed Application that uses 3-Tier architecture Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 43 A 1st example of Database Application System is … … a Database System, i.e. itself (!) Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 44 A 2nd example of Database Application System is … … a ERP System, e.g. SAP S/4HANA Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q5. What is DB Processing? 45 A 3rd example of Database Application System is … … a Social Networking Service, e.g. Facebook (!) Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 46 Learning Objectives – Recap 5 Q5. What is DB Processing? Extract information from Data stored in a DB is DB processing DB processing needs a DBMS but also a DB Application, whose elements are Form, Report, Query and Application Program A Database Application System is the sum of DB, DBMS and DB Application – it’s a Distributed Application that uses 3-Tier architecture Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q6. How is the building of a DB? 47 There two main activities in the building of a DB: Data Modeling → create the logical representation of DB Data Database Design → convert a Data Model into a DB Schema Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q6. How is the building of a DB? 48 Data Modeling describes in a Data Model the Data that will be stored in the DB The tool used for Data Modeling is the E-R Data Model: E = Entity – a thing that a user wants to track, e.g. customer, along with its attributes, i.e. company identifier, e-mail address, phone number, … R = Relationship – a link between two entities, e.g. the relationship between a customer and its orders, that is: 1:N, or one-to-many relationship N:M, or many-to-many relationship Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q6. How is the building of a DB? 49 The E-R Model uses a diagram in which: entities are boxes relationships are lines Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q6. How is the building of a DB? 50 Database Design first transforms entities in Tables → this process is called normalization and solves the Data Integrity problem Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q6. How is the building of a DB? 51 Database Design then transforms relationships in … Relationships using Keys → this process is called relationship representation: entity identifier → Primary Key 1:N relationship → Foreign Key on “N” attribute N:M relationship → additional Table (return to 1:N relationship) Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 52 Learning Objectives – Recap 6 Q6. How is the building of a DB? The building of a DB consists of Data Modeling and Database Design The Data Model uses entities and relationships between entities to define what Data is need and how it is linked to each other The Database Design transform entities in Tables and relationships between entities in Relationships Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q7. Why is NoSQL DBMS becoming important? 53 Thanks to the Internet and smartphones, today we are witnessing the creation of an increasing amount of Data Much of this Data cannot be stored in a relational DB because it has huge volume, high velocity and great variety → this collection of Data is called Big Data Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q7. Why is NoSQL DBMS becoming important? 54 Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q7. Why is NoSQL DBMS becoming important? 55 A common way to describe Big Data is the “4v” description Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q7. Why is NoSQL DBMS becoming important? 56 The Relational DB technology isn’t able to meet Big Data processing needs that arise in big Corporation like Google, Amazon, Facebook, Netflix and so on So, each of them created a own nonrelational DBMS – these are called NoSQL DBMS and process Data: in large amount distributed in the Cloud very rapidly that has simple non transactional Data structure Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto Q7. Why is NoSQL DBMS becoming important? 57 These are examples of NoSQL DBMS Google Amazon Facebook Netflix Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 58 Learning Objectives – Recap 7 Q7. Why is NoSQL DBMS becoming important? Data is growing more and more Furthermore, Data is becoming more and more Big Data (huge volume, high velocity, high variety, no validation) Relational DBMS cannot support Big Data, so a new DBMS was born, the so-called NoSQL DBMS Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 59 Exam Question – text Which of the following is not a characteristic used to describe Big Data according to the “4v” model? 1. Veracity 2. Velocity 3. Volume 4. Viability Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 60 Exam Question – solution Which of the following is not a characteristic used to describe Big Data according to the “4v” model? 1. Veracity → Data must be trusted… correct 2. Velocity → Data is generated rapidly … correct 3. Volume → Data is in large amount … correct 4. Viability → it means that it’s workable, so it has nothing to deal with Big Data… wrong Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto 61 References Course book: ❑ Earl McKinney Jr. David M. Kroenke – Processes, Systems, and Information, An Introduction to MIS – Chapter 4 Other books: ❑ J. Glenn Brookshear Dennis Brylow – Computer Science, An overview – Chapter 9 Organization and Information Systems | a.y. 2024-2025 | Diego Campagnolo – Massimiliano Oleotto