Session 11-12 week 6.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Database Systems and Data Management 8/10/2024 1 Benefits Gained Through Use of High-Quality Data Improves decision making Improves innovation Using high-quality data in decision Improves work...
Database Systems and Data Management 8/10/2024 1 Benefits Gained Through Use of High-Quality Data Improves decision making Improves innovation Using high-quality data in decision Improves worker efficiency, product making and/or service quality, or the Removes guesswork and risk taking customer experience Increases customer satisfaction Raises productivity Leads to high customer satisfaction Employees can focus on the core Bad data causes unfavorable data mission errors No need to correct data errors Increases sales Ensures compliance Provides accurate consumer targeting and communications Enables successful up-sell and cross- sell suggestions 8/10/2024 2 The Data Hierarchy Entity Person, place, or thing (object) for which data is collected, stored, and maintained File Collection of entities Attribute Characteristic of an entity Domain Range of allowable values for a data attribute 8/10/2024 3 The Data Hierarchy Data item Specific value of a data attribute Record Collection of attributes about a specific entity Primary key Attribute or set of attributes that uniquely identifies the record Foreign key Attribute in one table that refers to the primary key in another table 8/10/2024 4 Database Fundamentals Database Well-designed, organized, and carefully managed collection of data Database is a collection of tables plus relationships among the rows in those tables, plus special data, called metadata Should help an organization achieve its goals Can contribute to organizational success Provides managers and decision makers with timely, accurate, and relevant information built on data Most organizations have multiple databases 8/10/2024 5 Quick questions A collection of data fields that are all related to one object, activity, or individual is called a file TRUE/FALSE An attribute is a general class of people, places, or things for which data is collected, stored, and maintained TRUE/FALSE Which component of the data hierarchy is ranked just below the database and represents a collection of similar entities? a. field b. Attribute c. Record d. file 8/10/2024 6 The Database Approach Database approach to data management Multiple information systems share a pool of related data Database management system (DBMS) Group of programs provided by the DBMS supplier Programs used to access and manage a database Provides an interface between the database and its users and other application programs 8/10/2024 7 Database Design Necessary to keep data well Data modelling organized Tool used to design a database Database design Occurs at organizational level Store all relevant data Called enterprise data modelling Provide quick access and easy Occurs at specific business application modification level Reflect organization’s business Enterprise data model processes Identifies data entities and data Considerations attributes of greatest interest to the organization Content and access Identifies their associated standard Logical structure and physical data definitions, data length and organization format, domain of valid values, and Response time, archiving, security any business rules for their use 8/10/2024 8 Entity-relationship (ER) diagram Data model used to analyze and communicate data needs Works at the individual project or application level Uses graphical symbols Identify data entities and their associated data attributes Identify the relationships among the entities of interest Many notation styles exist for drawing an ER diagram 8/10/2024 9 Fitness Center Membership System A fitness center offers various types of memberships to its customers. The center has multiple branches located in different cities. Each branch has facilities like a gym, swimming pool, spa, and fitness classes. Each facility has attributes such as opening hours, capacity, and available equipment. Members can choose from different types of memberships, such as monthly, quarterly, or annual. Each membership type has different pricing and benefits. Members must provide personal details such as name, age, address, and phone number when signing up. Members can book sessions for fitness classes offered by the center. Each class has a specific time, duration, instructor, and capacity. Instructors are employees of the fitness center and have attributes like name, expertise, and contact details. The center also offers personal training services. Members can book sessions with personal trainers, specifying the date, time, and type of training required. The fitness center tracks attendance and usage of facilities by members. Members can also purchase additional services like massages, nutritional consultations, and fitness merchandise. 8/10/2024 10 Data Model for Food delivery app You can do it for Taxi service Hotel booking app 8/10/2024 11 Database Activities Providing a user view Creating and modifying the Schema database Description defining the database’s Data definition language (DDL) logical and physical structure Collection of instructions and Identifies the tables and the commands attributes in each table Defines and describes data and Identifies the relationships between relationships in a specific database attributes and tables Data dictionary DBMS can reference a schema Detailed description of data stored in To access requested data in relation the database to another piece of data Adherence to data dictionary standards Makes it easy to share data among organizations 8/10/2024 12 Database Activities Storing and retrieving data Manipulating data and generating DBMS function reports Interface between application program Database provides reports and and database important information To obtain database data Company uses DBMS to manage Request it through the DBMS database Concurrency control Databases can use Query by Example Addresses situation where two or (QBE) more users or applications access the Visual approach to developing same record at the same time database queries or requests Data manipulation language (DML) A specific language provided with a DBMS Allows users to access and modify the data, make queries, and generate reports 8/10/2024 13 Database Activities Security management Ensures data is protected against access by: Unauthorized users, physical damage, operating system failure, and simultaneous updating Defines and enforces user access privileges Grants users minimum privileges to do their jobs Backup and recovery Backup management Make backup copies of the database Backup copies can be used to restore the database 8/10/2024 14 Data Cleansing Data cleansing Detects and then corrects or deletes incomplete, incorrect, inaccurate, or irrelevant records residing in a database Improves quality of the data used in decision making Different from data validation Methods Cross-checking data against a validated data set Using data enhancement to augment the data in a database by adding related information 8/10/2024 15 Database Systems and Data Management Session 9 8/10/2024 16 Relational Databases Relational database Selecting Fundamental characteristics Eliminating rows according to certain Data is organized into relations criteria Rows represent entities and columns Projecting represent attributes Rows uniquely identified by a primary Eliminating columns in a table key Joining Column table data Integer number, decimal number, Combining two or more tables date, text, etc. through common data attributes to Constrained to be certain type, length, create a new table or to have a value between two limits Primary and foreign keys enable table Data normalization relationships Eliminates data redundancy User queries perform operations on the database 8/10/2024 17 8/10/2024 18 SQL (Structured Query Language) Special-purpose programming language Used for accessing and manipulating relational database data SQL databases conform to ACID properties Atomicity (commit or abort), consistency, isolation, durability Guarantees the database transactions are processed reliably Ensures the integrity of data in the database 8/10/2024 19 Popular Relational Database Management Systems Relational DBMS for Individuals Relational DBMS for Open-Source RDBMS and Workgroups Workgroups and Enterprise MySQL Microsoft Access Oracle PostgreSQL IBM Lotus Approach IBM DB2 MariaDB Google Base Sybase Adaptive Server SQL Lite OpenOffice Base Teradata Airtable Microsoft SQL Server Knack Progress OpenEdge Analyze information needs while selecting a DBMS Check the size of database, number of concurrent users, performance demanded, ability to be integrated with other systems, vendor considerations, and cost 8/10/2024 20 Popular Relational Database Management Systems Database as a service (DAAS) Database stored on a service provider’s servers Database accessed by service subscriber over the Internet Database administration handled by the service provider Customer relies on service provider for backup and data protection DaaS advantage Eliminates the installation, maintenance, and monitoring of in-house databases 8/10/2024 21 Data Management Data management Integrated set of functions Defines the processes by which data is obtained, certified fit for use, stored, secured, and processed Ensures data accessibility, reliability, and timeliness meet the data users’ needs Need of DM: Meet external regulations To avoid accidental release Ensure high quality data for business decisions 8/10/2024 22 Data Management Data governance Database administrator (DBA) Defines roles, responsibilities, Skilled and trained IS professional and processes Holds discussions with business users Ensures data can be trusted and Defines their data needs used by the entire organization Applies database programming languages to Ensures people identified and in craft a set of databases to meet those needs place who are responsible for Tests and evaluates databases fixing and preventing issues with Monitors database performance and data implements changes to improve response Team should be cross functional time for user queries Decides who is accountable for Assures data is secure which aspect Responsible for data security Defines process how to store, archive, backup, and protect data Data steward Two key members are: DBA and Typically a non-IS employee Data steward Manages critical data entities or attributes 8/10/2024 Users ask DS ‘what data to use to answer a 23 business question’ Server Farm & Data Centers Server Farm is a facility that houses a large number of servers in the same room, where access to machines can be controlled and authorized personnel can manage and maintain servers Data Center: a climate-and-access controlled building or set of buildings that houses computer hardware that delivers data and information services Power usage effectiveness Ability to absorb the impact of disaster Four tiers of data center classification 8/10/2024 24 Edge data center 8/10/2024 25 Green computing Efficient and environmentally responsible design, manufacture, operation and disposal of IT related products Goals: Reduce use of hazardous material Lower power related costs Enable safe disposal or recycling of computers or related equipment Best practice would be to eliminate the use of toxic substances Check EPEAT tier (not available in India) 8/10/2024 26 Why do we need to learn data analytics? New data coming from all directions Nearly a zettabyte per year 1 trillion gigabytes or a 1 followed by 21 zeros Must analyze large amounts of data Measure past and current performance Predict the future Forecasts drive anticipatory actions Improve business strategies Strengthen business operations Enrich decision making Organization will become more competitive 8/10/2024 27 Big Data Enormous (terabytes or more) Complex (sensor data to social media data) Traditional processes incapable of dealing with them Key characteristics Volume Velocity: 2.5 quintillion bytes per day Value Variety Veracity (quality of data/ trustworthy) 8/10/2024 28 Haley's employer has asked her to review a database containing thousands of social media posts about their company's products and extract the data the executive team needs to make decisions about these products and their marketing. In terms of the characteristics of big data, Haley is focusing on ________. Guillarme, a data scientist, utilizes data from company documents, machine logs, Data.gov, and Facebook Graph in his work. What characteristic of big data does this best demonstrate? 8/10/2024 29 Sources of Big Data 8/10/2024 30 Big Data Uses Organizations use big data to improve: Day-to-day operations Planning Decision making Any examples? Challenges: How to choose Where and how to store Privacy and security Legal implications 8/10/2024 31 Technologies Used to Manage and Process Big Data Technologies used to manage and process big data Data warehouses Extract Transform Load process Data marts Data lakes NoSQL databases Hadoop In-Memory databases 8/10/2024 32 Data Warehouses, Data Marts, and Data Lakes Online transaction processing Characteristics Description (OLTP) systems Large Holds billions of records and petabytes of data Traditionally used to capture data Multiple sources Data comes from many sources both Do not support data analysis internal and external thus an extract, required today transform, load process is required to ensure quality data Data warehouses and data marts Historical Typically 5 years of data or more Allow organizations to access OLTP Cross organizational Data accessed, used, and analyzed by data access users across the organization to Support decision making more and analysis support multiple business processes effectively and decision making Supports various types Drill down analysis, development of of metrics, identification of trends analyses and reporting 8/10/2024 33 Data Warehouses, Data Marts, and Data Lakes Data warehouse Large database Holds business information from many sources in the enterprise Covers all aspects of the company’s processes, products, and customers Data Quality is an issue (Dirty data) Extract Transform Load (ETL) process Extracts (or discard) data from a variety of sources & convert to single format Edits and transforms data into a data warehouse format Loads data into the warehouse Elements of a data warehouse 8/10/2024 34 Data Marts, and Data Lakes Data mart NoSQL database Subset of a data warehouse Differs from a relational database Used by small and medium-sized Data modeled without two- businesses and departments within dimensional tabular relations large companies Uses horizontal scaling Supports decision making Does not require a predefined schema Does not conform to true ACID Data lake properties when processing Takes a “store everything” approach transactions to big data Structures used by NoSQL databases Saves all data in its raw and unaltered More flexible than relational database form tables Provide improved access speed and Users decide how they want to use redundancy the data 8/10/2024 35 Melanie's company takes a "store everything" approach to big data, saving all of it in a raw, unaltered form. Only when she needs to analyze some of the data is it extracted from this _____. 8/10/2024 36 Analytics and Business Intelligence Business intelligence (BI) Wide range of applications, practices, and technologies Extracts, transforms, integrates, visualizes, analyzes, interprets, and presents data Supports improved decision making Goal is to get most value out of information and present results in an easy to understand manner Analytics Extensive use of data and quantitative analysis Supports fact-based decision making within organizations Reveal patterns, forecast, optimize business process 8/10/2024 37 Benefits Achieved from BI and Analytics Detect fraud Improve forecasting Increase sales Optimize operations Reduce costs 8/10/2024 38 Components Required for Effective BI and Analytics Three key components Existence of a solid data management program Includes governance Creative data scientists Strong commitment to data-driven decision making 8/10/2024 39 The primary difference between business intelligence and analytics is that _______________. a. BI is used to analyze historical data to tell what happened or is happening right now in your business while analytics employs algorithms to determine relationships among data to develop predictions of what will happen in the future. b. analytics employs techniques like optimization, predictive analysis, and simulation while BI employs descriptive analysis and text and visual analysis. c. a data scientist is required to properly employ analytics while an end user working with a database administrator can employ BI. d. organizations used to employ BI but now are moving to greater use of analytics. 8/10/2024 40 Categories of BI/Analytic techniques Descriptive Predictive Text and Video Analysis Analytics Optimization Simulation Analysis Time series Scenario Visual analytics analysis Genetic algorithm analysis Text analysis Regression Monte Carlo analysis Data mining Linear programming simulation Video analysis 8/10/2024 41 Descriptive Analysis Descriptive analysis Visual analytics Preliminary data processing stage Presentation of data pictorially or Identifies data patterns graphically Answers questions Word cloud Who, what, where, when, and to Visual depiction of a set of words what extent Words grouped together Two types Based on frequency of their occurrence Visual analytics Regression analysis Conversion funnel Graphical representation Example: Summary of steps a consumer takes in making the decision to buy a product and become a customer 8/10/2024 42 8/10/2024 43 Descriptive analysis OR predictive analysis Regression analysis Determines the relationship between a dependent variable and one or more independent variables Produces a regression equation Coefficients represent a relationship Between each independent variable and the dependent variable Used to make predictions 8/10/2024 44 Predictive analytics Predictive analytics Data mining Techniques to analyze current data BI analytics tool Identifies future probabilities and Explores large amounts of data for trends hidden patterns Makes predictions about the Predicts future trends and future behaviors Time series analysis Used in decision making Uses statistical methods Three common data mining techniques Analyzes time series data Association analysis (Market basket) Extracts meaningful statistics and Neural computing characteristics Case-based reasoning Seasonality, trends, cycles 8/10/2024 45 Data mining used to boost ticket sales Data mining to improve store assortment Market basket analysis 8/10/2024 46 Optimization Allocate scarce resources To minimize costs or maximize profits Genetic algorithm Employs a natural selection-like process Finds approximate solutions to optimization and search problems Vehicle routing problem, scheduling Linear programming Finds the optimum value of a linear expression Calculated based on the value of a set of decision variables Variables subject to a set of constraints Eg. Production planning 8/10/2024 47 Simulation Emulates the dynamic responses of a real-world system to various inputs Scenario analysis Predicts future values based on certain potential events marketing campaign success, for risk management Monte Carlo simulation Provides a spectrum of thousands of possible outcomes Considers the many variables involved Considers the range of potential values for each variable 8/10/2024 48 8/10/2024 49 Text and Video Analysis Glean insights and data relevant to decision making Text analysis Process for extracting value from large quantities of unstructured text data Video analysis Process of obtaining information or insights from video footage 8/10/2024 50 8/10/2024 51 Descriptive WHAT happened? Diagnostic Why did it happen? Predictive What might happen? Prescriptive What needs to be done? 8/10/2024 52