Principles Of Information Systems Lecture 5 - Database Systems(1) PDF
Document Details
Uploaded by HappierBaltimore
Tags
Summary
This document is a lecture on principles of information systems, specifically focused on database systems and the concept of big data, including database paradigms, big data characteristics, and data warehousing. It introduces various concepts related to data and its management.
Full Transcript
Principles of Information Systems, Thirteenth Edition Chapter 5 Database System and Big Data © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a...
Principles of Information Systems, Thirteenth Edition Chapter 5 Database System and Big Data © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain 1 product or service or otherwise on a password-protected website for classroom use. Database Paradigms Three paradigms will be discussed in this lecture: Relational Databases Data Warehouses NoSQL Databases The lecture will also discuss the following concepts Big data Data lakes © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 2 or otherwise on a password-protected website for classroom use. Relational Databases © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Introduction Relational database: an organized collection of data Commonly used to store and manage daily transactions (e.g., sales transactions and bank transactions) Why not store data in spreadsheets? Relational databases ensure data integrity through defined relationships and constraints, maintaining a single source of truth “Single source of truth” means that every piece of information is stored only once This minimizes errors and inconsistencies when updating records © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 4 or otherwise on a password-protected website for classroom use. Introduction A database management system (DBMS) is a group of programs that: Manipulate the database Provide an interface between the database and its users and other application programs © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 5 or otherwise on a password-protected website for classroom use. Data Fundamentals Organizations could not successfully complete most business activities without data and the ability to process it Data consists of raw facts Data must be organized in a meaningful way to transform it into useful information © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 6 or otherwise on a password-protected website for classroom use. Hierarchy of Data A bit (binary digit) represents a circuit that is either on or off A byte is made up of eight bits Each byte represents a character Field: a name, number, or combination of characters that describes an aspect of a business object or activity Record: a collection of related data fields or row or tuple File: a collection of related records Database: a collection of integrated and related files © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 7 or otherwise on a password-protected website for classroom use. Hierarchy of Data Hierarchy of data: bits, characters, fields, records, files, and databases © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 8 or otherwise on a password-protected website for classroom use. Data Entities, Attributes, and Keys Entity: a person, place, or thing for which data is collected, stored, and maintained student Attribute: a characteristic of an entity name ID Data item: the specific value of an attribute Fahad 1136525091 Primary key: a field or set of fields that uniquely identifies the record © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 9 or otherwise on a password-protected website for classroom use. Database Activities CRUD Create: inserting new data Adding the information of a new employee (new record/new row) Read: retrieving data from the database Update: modifying existing data Updating the employees record, such as updating his new phone number Delete: removing data Generating reports See next slide © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 10 or otherwise on a password-protected website for classroom use. Generating Reports A DBMS can produce a wide variety of documents, reports, and other output that can help organizations achieve their goals © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 11 or otherwise on a password-protected website for classroom use. Data Modeling Considerations when building a database Content: What data should be collected? cost? Access: What data should be provided to which users and when? Logical structure: How should data be arranged so that it makes sense? Physical organization: Where should data be physically located? Archiving: How long to store? © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 12 or otherwise on a password-protected website for classroom use. Data Modeling Data model: a diagram of data entities and their relationships Entity-relationship (ER) diagrams: data models that use basic graphical symbols to show the organization of and relationships between data The ER model shows the structure of a database at a high level of abstraction this meansmore than onecustomer one to many © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 13 or otherwise on a password-protected website for classroom use. Data Modeling © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 14 or otherwise on a password-protected website for classroom use. Data Modeling Relational model: a simple but highly useful way to organize data into collections of two-dimensional tables called relations Each row in the table represents an entity Each column represents an attribute of that entity © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 15 or otherwise on a password-protected website for classroom use. Data Modeling © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 16 or otherwise on a password-protected website for classroom use. Linking Multiple Tables is Often Required to Answer a Single Query © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 17 or otherwise on a password-protected website for classroom use. Structured Query Language (SQL) SQL: a special-purpose programming language for accessing and manipulating data stored in a relational database © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 18 or otherwise on a password-protected website for classroom use. Structured Query Language (SQL) © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 19 or otherwise on a password-protected website for classroom use. Data Cleansing Also called data cleaning The process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, irrelevant records that reside in a database The cost of performing data cleansing can be quite high Different from data validation Data validation involves the identification of “bad data” and its rejection at the time of data entry - JavaScript is commonly used for validating web forms. Example: [link] © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 20 or otherwise on a password-protected website for classroom use. Data Cleansing © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 21 or otherwise on a password-protected website for classroom use. Database Administration Database administrators (DBAs) Works with users to define their data needs Applies database programming languages to craft a set of databases to meet those needs Assures that data is secure from unauthorized access The DBA is responsible for planning, designing, creating, operating, securing, monitoring, and maintaining databases © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 22 or otherwise on a password-protected website for classroom use. Popular Database Management Systems Open-source relational DBMS: MySQL Relational DBMS for Enterprise: Microsoft SQL Server IBM DB2 © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 23 or otherwise on a password-protected website for classroom use. Popular Database Management Systems Database as a Service (DaaS) A cloud-based service that provides database storage, management, and administration The database is hosted on the service provider's servers and is accessed by the client over the Internet Example of DaaS: Amazon Relational Database Service (Amazon RDS) Microsoft Azure SQL Database © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 24 or otherwise on a password-protected website for classroom use. In-Memory Databases A database management system that stores the entire database in random access memory (RAM) Provides access to data at rates much faster than storing data on some form of secondary storage An application: real-time ad bidding Advertisers quickly bid for ad space as a webpage loads This process is completed in less than a second Drawbacks Cost Volatility (data loss on power failure) © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 25 or otherwise on a password-protected website for classroom use. Data Warehouses © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Data Warehouses Data warehouse: a large database that collects historical business information, typically spanning multiple years (e.g., 3, 5, or 10 years), from many sources in the enterprise to support decision making ETL process Extract Transform Load ETL moves data from its original source (e.g., relational databases) to the data warehouse The ETL moves data on a regular schedule (e.g., daily) © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 27 or otherwise on a password-protected website for classroom use. Data Warehouses © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 28 or otherwise on a password-protected website for classroom use. Relational Databases vs. Data Warehouses Both relational databases and data warehouses organize data into tables One key difference is the data modelling approach: Relational databases are designed to minimize data redundancy - To maintain a single source of truth Data warehouses are designed to handle read-intensive operations - Achieved by minimizing joins (links) between tables Relational Database Model Data Warehouse Model © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 29 or otherwise on a password-protected website for classroom use. Business Intelligence (BI) Includes a wide range of applications, practices, and technologies for the gathering, integration, visualization, analysis, interpretation, and presentation of data to improve decision making Users of BI systems care most about the interface Usually, the interface is interactive BI tools can be used to analyze data from a data warehouse Examples of BI software: Tableau [link] Microsoft Power BI © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 30 or otherwise on a password-protected website for classroom use. NoSQL Databases © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. NoSQL Databases NoSQL is a type of database that does not use a relational table structure NoSQL databases have a flexible schema NoSQL databases are often used for applications that need to be scalable Instead of upgrading to a larger, more powerful server, you can distribute the data across multiple, smaller servers NoSQL databases lack strong data consistency For example, in social media, posts, likes, or comments might be distributed across several servers. If a user adds a comment, it might not immediately appear to users on other servers © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 32 or otherwise on a password-protected website for classroom use. Types of NoSQL Databases Document databases Document databases store data in semi-structured documents, typically using formats like JSON Each document can have a different structure, and there is no fixed schema Commonly used for web applications (e.g., social media and e-commerce) Key-value databases Graph Databases And others Example of JSON © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 33 or otherwise on a password-protected website for classroom use. Flexible Schema (No Fixed Schema) © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 34 or otherwise on a password-protected website for classroom use. Big Data © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Big Data Extremely large and complex data collections Traditional data management software, hardware, and analysis processes are incapable of dealing with them Three characteristics of big data Volume VVV Velocity - The data comes at us fast and needs to be processed quickly Variety - Structured data (like databases) - Unstructured data (like text, images, and videos) © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 36 or otherwise on a password-protected website for classroom use. Sources of Big Data © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 37 or otherwise on a password-protected website for classroom use. Big Data Uses Social Media Analytics: Volume: huge amounts of user-generated content (posts, images, videos) Velocity: real-time updates and interactions Variety: text, images, videos Healthcare: Volume: patient records, medical imaging data Velocity: real-time monitoring of patient vitals Variety: structured data (patient information), unstructured data (medical images and doctor's notes) Transportation: Volume: data from GPS devices, traffic cameras Velocity: real-time traffic updates Variety: location data, images, video feeds © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 38 or otherwise on a password-protected website for classroom use. Challenges of Big Data Where and how to store and process the data How to find the subset of the data that are relevant to the decision making How to derive value from the relevant data © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 39 or otherwise on a password-protected website for classroom use. Data Lakes © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Data Lakes Data lake: takes a “store everything” approach to big data, saving all the data in its raw and unaltered form Users decide just how they want to use the data Example: https://lake.data.gov.sa/ © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 41 or otherwise on a password-protected website for classroom use. Summary The database approach to data management has become broadly accepted Data modeling is a key aspect of organizing data and information A well-designed and well-managed database is an extremely valuable tool in supporting decision making Recently, organizations deal with a tremendous growth in the amount of data available and struggling how to manage and make use of it Seven Database Paradigms: https://www.youtube.com/watch?v=W2Z7fbCLSTw © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service 42 or otherwise on a password-protected website for classroom use.