Document Details

HighQualityCantor

Uploaded by HighQualityCantor

Hashemite University

Arthur P. Aguiar

Tags

multidimensional data data modeling data warehousing database management

Summary

This document discusses multidimensional data models, specifically OLAP and OLTP systems, and concepts like data granularity and data cubes used for data analysis and warehousing. It covers topics like how to treat data, store it efficiently in a data warehouse, and create data cubes, outlining different types of data models, such as star and snowflake, and explaining the advantages and disadvantages of each one.

Full Transcript

MULTIDIMENSIONAL DATA MODEL SLIDES ARE BASED ON ESSAY WRITTEN BY ARTHUR P. AGUIAR SLIDES ARE PREPARED AND ORGANIZED BY DR. MOTAZ ABDUL AZIZ AL-HAMI OVERVIEW BEFORE ANALYZING A DATA, WE MUST FIRST TREAT IT AND STORE I...

MULTIDIMENSIONAL DATA MODEL SLIDES ARE BASED ON ESSAY WRITTEN BY ARTHUR P. AGUIAR SLIDES ARE PREPARED AND ORGANIZED BY DR. MOTAZ ABDUL AZIZ AL-HAMI OVERVIEW BEFORE ANALYZING A DATA, WE MUST FIRST TREAT IT AND STORE IT SOMEWHERE, HOWEVER, DEPENDING HOW THAT DATA IS STORED IT CAN BECOME A LONG-TERM PROBLEM, SUCH AS HIGH MAINTENANCE AND DATA PROCESSING COSTS, CONTAINED THEREIN OR EVEN USELESS DATA THAT IS JUST WASTING OUR MONEY ON STORAGE AND PROCESSING. FOR THIS, WE MUST ORGANIZE OUR DATA BEFORE PASSING IT ON TO OUR DATA WAREHOUSE, ONE OF THE SOLUTIONS IS THE MULTIDIMENSIONAL DATA MODEL, HOWEVER, EVERY WAY, HOWEVER HERE ARE SOME CONCEPTS THAT YOU WILL NEED TO KNOW BEFOREHAND. ONLINE ANALYTICAL PROCESSING (OLAP) & ONLINE TRANSACTION PLATFORM (OLTP) OLAP: IS A TOOL USED FOR THE ANALYSIS AND TREATMENT OF DATA, WIDELY USED TO TREAT A MASSIVE AMOUNT OF INFORMATION IN THE DATA WAREHOUSE. OLAP MIGHT TAKE MINUTES OR HOURS FOR COMPLEX QUERY RESULTS. REAL TIME OLAP (RTOLAP): IS A TOOL USED FOR THE ANALYSIS AND TREATMENT OF DATA IN REAL TIME, WIDELY USED TO TREAT A MASSIVE AMOUNT OF INFORMATION IN THE VARIOUS DIMENSIONS OF A DATA WAREHOUSE. OLTP: REFERS TO SYSTEMS THAT RECORD ALL THE OPERATIONAL ACTIONS OF A STOCK, GUARANTEEING ITS SUCCESS. THIS TYPE OF DATA IS GENERATED MASSIVELY EVERY DAY. OLTP HAS SHORTER RESPONSE TIMES, TYPICALLY IN MILLISECONDS (REAL TIME). EXAMPLE: WE ARE ANALYZING TEMPORAL DATA AND WE WANT TO HAVE VIEWS BY YEAR, DAY, QUARTER OR SEMESTER, THIS INTERACTION WITH THE USER IS DONE BY OLAP. USES OF OLAP & OLTP THE PRIMARY PURPOSE OF ONLINE ANALYTICAL PROCESSING (OLAP) IS TO ANALYZE AGGREGATED DATA, WHILE THE PRIMARY PURPOSE OF ONLINE TRANSACTION PROCESSING (OLTP) IS TO PROCESS DATABASE TRANSACTIONS. YOU USE OLAP SYSTEMS TO GENERATE REPORTS, PERFORM COMPLEX DATA ANALYSIS, AND IDENTIFY TRENDS. IN CONTRAST, YOU USE OLTP SYSTEMS TO PROCESS ORDERS, UPDATE INVENTORY, AND MANAGE CUSTOMER ACCOUNTS. OLAP VS. REAL-TIME OLAP DESPITE THE INCREASING POPULARITY OF REAL-TIME ANALYTICS, ALSO KNOWN AS REAL-TIME ONLINE ANALYTICAL PROCESSING (REAL-TIME OLAP, OCCASIONALLY ABBREVIATED AS RTOLAP), FEW UNDERSTAND WHEN TO USE IT IN THEIR DATA STACK. SOME ANALYTICAL QUERIES CAN MAKE DUE WITH TRADITIONAL OLAP, WHILE OTHERS TRULY NEED TO RUN IN A REAL-TIME OLAP DATABASE. EXAMPLES OF ONLINE ANALYTICAL PROCESSING (OLAP), A TYPE OF DATA STORE USED SPECIFICALLY FOR COMPLEX DATA ANALYSIS. OLAP ENABLES YOU TO PROCESS HISTORICAL DATA AND TRAIN MACHINE LEARNING MODELS. ITS USE CASES TYPICALLY COVER LONG-RUNNING QUERIES THAT FINISH IN MINUTES OR HOURS TO USE IN NEXT-DAY REPORTS OR OCCASIONALLY REFRESHED DASHBOARDS. OLAP VS. REAL-TIME OLAP REAL-TIME OLAP DATA STORES, ON THE OTHER HAND, INTEND TO SERVE MULTI-DIMENSIONAL DATA IN REAL TIME AT LOWER LATENCIES MEASURED IN SECONDS OR MILLISECONDS. IT ALSO SUPPORTS SIGNIFICANTLY MORE END USERS COMPARED TO TRADITIONAL OLAP, REFLECTED BY HIGH RATES OF QUERIES PER SECOND (QPS) MEASURED IN THE THOUSANDS TO HUNDREDS OF THOUSANDS. ONLINE ANALYTICAL PROCESSING (OLAP) & ONLINE TRANSACTION PLATFORM (OLTP) EXAMPLE: A BANK TRANSACTION FOR EXAMPLE, IF IT FAILS THE WHOLE ACTION MUST BE REVERSED, IF IT IS SUCCESSFUL, IT MUST BE RECORDED AND IMMUTABLE. DATA GRANULARITY WE MUST PAY CLOSE ATTENTION TO DATA GRANULARITY AS IT DIRECTLY AFFECTS THE VOLUME IN DATA STORAGE AND THE SPEED OF SEARCH AND THE LEVEL OF DETAIL OF THE INFORMATION. IN A BRIEF EXPLANATION, WHEN WE HAVE A HIGH GRANULARITY OF DATA IT MEANS THAT WE HAVE LESS DETAILS OF THE DATA, WHEN WE HAVE A LOW GRANULARITY, WE HAVE MORE DETAILS OF THE DATA. EXAMPLE: IMAGINE THAT WE HAVE A SALES TABLE WHERE THE NAME OF SALESPEOPLE APPEARS REPEATEDLY. THIS TABLE HAS A LOW GRANULARITY BECAUSE IT HAS A LOT OF INFORMATION, SO MUCH INFORMATION THAT IT CAN MAKE IT DIFFICULT FOR US TO SELECT WHICH SALESPERSON HAD THE HIGHEST NUMBER OF SALES, NOTE THAT THE MORE DECENTRALIZED DATA WE HAVE IN A TABLE, THE GREATER THE DATA AND THEREFORE WE WILL HAVE LONGER ANALYSIS TIME. IF WE HAVE A TABLE OF SELLERS WITH THE TOTAL SALES OF EACH ONE, WE WILL HAVE DATA WITH LESS DETAILS, MORE GRANULARITY AND SPENDING LESS RESOURCES. DATA GRANULARITY WE CAN ALSO UNDERSTAND THIS CONDENSATION AND DETAILING AS DRILL DOWN AND DRILL UP. DRILL DOWN — THE GRANULARITY IS REDUCED AND THE LEVEL OF DETAILS IS INCREASED. DRILL UP — THE GRANULARITY IS INCREASED AND THE LEVEL OF DETAIL IS DECREASED. THEREFORE, IT IS CLEAR THAT WE MUST CONDENSE SOME DATA TO AVOID REDUNDANCY, SAVE PROCESSING AND SPACE. WE SHOULD THINK ABOUT RESOURCES X DATA = INFORMATION. IF OUR DATA IS POORLY MODELED WE WILL THROW AWAY RESOURCES THAT ARE PRECIOUS, SO OUR EQUATION LOOKS LIKE THIS RESOURCES X DATA = WASTE OF TIME AND MONEY. MULTIDIMENSIONAL DATA ANALYSIS IN THIS ANALYSIS WE USE STRUCTURED DATA IN THE FORM OF A CUBE (EACH SIDE OF THE CUBE IS A DIMENSION), THE MULTIDIMENSIONAL MODEL IS THE STANDARD IN THE ANALYSIS TOOLS, FOR EXAMPLE, WHEN WE USE ARITHMETIC QUERIES WITH OLAP. THIS MODEL HAS A HIGHER PERFORMANCE IN QUERIES, BESIDES PROVIDING A FACILITY IN THE CREATION OF COMPLEX QUERY’S. WHEN THE SCOPE OF THE PROJECT IS REDUCED, THIS MODEL ALLOWS A MORE AGILE IMPLEMENTATION. MULTIDIMENSIONAL DATA ANALYSIS STRUCTURE TO VISUALIZE THIS MODEL WE USE A CUBE, WHERE TABLES ARE ASSOCIATED, SUMMARIZED OR AGGREGATED TO RETURN SOME METRICS (SALES PER YEAR, FOR EXAMPLE). EACH TABLE IS SEEN AS A DIMENSION, TOGETHER THEY FORM A CUBE THAT CAN HAVE LOW OR HIGH GRANULARITY, ALWAYS DEPENDING ON THE REQUIREMENTS OF EACH PROJECT. DATA CUBE IS A DATA STRUCTURE FOR STORING AND ANALYZING LARGE AMOUNTS OF MULTIDIMENSIONAL DATA (PEDERSEN, 2009B). MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE: DATA CUBE REPRESENTS THE DATA IN TERMS OF DIMENSIONS AND FACTS. A DATA CUBE IS USED TO REPRESENTS THE AGGREGATED DATA. A DATA CUBE IS BASICALLY CATEGORIZED INTO TWO MAIN KINDS THAT ARE: - MULTIDIMENSIONAL DATA CUBE. - RELATIONAL DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS WHAT IS DATA CUBE? A DATA CUBE IS A MULTIDIMENSIONAL DATA MODEL THAT STORE THE OPTIMIZED, SUMMARIZED OR AGGREGATED DATA WHICH EASES THE OLAP TOOLS FOR THE QUICK AND EASY ANALYSIS. DATA CUBE STORES THE PRECOMPUTED DATA AND EASES ONLINE ANALYTICAL PROCESSING. WHEN IT COMES TO CUBE, WE, ALL THINK IT AS A THREE-DIMENSIONAL STRUCTURE BUT IN DATA WAREHOUSING, WE CAN IMPLEMENT AN N-DIMENSIONAL DATA CUBE. DATA STORED IN A DATA CUBE IS REPRESENTED IN TERMS OF DIMENSIONS AND FACTS. NOW, WHAT DOES THE DIMENSION EXACTLY REPRESENTS? THE DIMENSIONS OF DATA CUBE ARE THE ATTITUDE, ANGLE OR THE ENTITIES WITH RESPECT TO WHICH THE ENTERPRISE WANTS TO STORE THE DATA. NOW, HOW DOES IT HELP THE ANALYST TO ANALYZE AND EXTRACT THE DATA? MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE EXAMPLE: LET US TAKE AN EXAMPLE, CONSIDER WE HAVE DATA ABOUT ALL ELECTRONICS SALES. HERE WE CAN STORE THE SALES DATA IN MANY PERSPECTIVES OR DIMENSIONS LIKE SALES IN ALL TIME, SALES AT ALL LOCATION, SALES OF ALL ITEMS. THE FIGURE SHOWS THE DATA CUBE FOR ALL ELECTRONICS SALES. MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE EXAMPLE: EACH DIMENSION HAS A DIMENSION TABLE WHICH CONTAINS A FURTHER DESCRIPTION OF THAT DIMENSION. SUCH AS A BRANCH DIMENSION MAY HAVE BRANCH NAME, BRANCH CODE, BRANCH ADDRESS ETC. A MULTIDIMENSIONAL DATA MODEL LIKE DATA CUBE IS ALWAYS BASED ON A THEME WHICH IS TERMED AS FACT. LIKE IN THE ABOVE EXAMPLE OF A DATA SET OF ALL ELECTRONIC WE HAVE STORED DATA BASED ON THE SALES OF THE ELECTRONIC ITEM. SO, HERE THE FACT IS SALES. A FACT HAS A FACT TABLE ASSOCIATED WITH IT. MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE CLASSIFICATION DATA CUBE CAN BE CLASSIFIED INTO TWO MAIN CATEGORIES AS DISCUSSED BELOW: 1. MULTIDIMENSIONAL DATA CUBE (MOLAP) MULTIDIMENSIONAL ARRAYS ARE USED TO STORE DATA THAT ASSURES A MULTIDIMENSIONAL VIEW OF THE DATA. MULTIDIMENSIONAL DATA CUBE HELPS IN STORING A LARGE AMOUNT OF DATA. MULTIDIMENSIONAL DATA CUBE IMPLEMENTS INDEXING TO REPRESENT EACH DIMENSION OF A DATA CUBE WHICH IMPROVES THE ACCESSING, RETRIEVING AND STORING DATA FROM THE DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE CLASSIFICATION DATA CUBE CAN BE CLASSIFIED INTO TWO MAIN CATEGORIES AS DISCUSSED BELOW: 2. RELATIONAL DATA CUBE (ROLAP) YOU CAN CONSIDER THE RELATIONAL DATA CUBE AS THE ‘EXTENDED VERSION OF RELATIONAL DBMS’. RELATIONAL TABLES ARE USED TO STORE DATA AND EACH RELATIONAL TABLE REPRESENTS THE DIMENSION OF A DATA CUBE. TO CALCULATE THE AGGREGATED DATA RELATIONAL DATA CUBE IMPLEMENTS SQL BUT WHEN IT COMES TO PERFORMANCE THE RELATIONAL DATA CUBE’S PERFORMANCE IS SLOWER THAN THE MULTIDIMENSIONAL DATA CUBE. BUT THE RELATIONAL DATA CUBE IS SCALABLE FOR STEADILY INCREASING DATA. YOU CAN EVEN GET THE COMBINATION OF BOTH RELATIONAL DATA CUBE AS WELL AS MULTIDIMENSIONAL DATA CUBE WHICH IS TERMED AS A HYBRID DATA CUBE. THE HYBRID DATA CUBE (HOLAP) RETRIEVE FEATURES SUCH AS SCALABILITY FROM RELATIONAL DATA CUBE AND IT RETRIEVES FASTER COMPUTATION FROM MULTIDIMENSIONAL DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE NOW, LET US DISCUSS THE OPERATIONS THAT CAN BE CONDUCTED ON DATA CUBE IN ORDER TO VIEW DATA FROM DIFFERENT ANGLES. THERE ARE FOUR BASIC OPERATIONS THAT CAN BE IMPLEMENTED ON A DATA CUBE WHICH ARE DISCUSSED BELOW: 1. ROLL UP ROLL-UP OPERATION SUMMARIZES OR AGGREGATES THE DIMENSIONS EITHER BY PERFORMING DIMENSION REDUCTION OR YOU CAN PERFORM CONCEPT HIERARCHY. THE BELOW FIGURE SHOWS YOU THE EXAMPLE OF A ROLL-UP OPERATION PERFORMED ON THE LOCATION DIMENSION OF THE DATA CUBE WE HAVE SEEN ABOVE. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE 2. DRILL DOWN WHEN THE DRILL-DOWN OPERATION IS PERFORMED ON ANY DIMENSION THE DATA ON THE DIMENSION IS FRAGMENTED INTO GRANULAR FORM. IN THE FIGURE YOU CAN SEE THE DRILL-DOWN OPERATION ON THE TIME DIMENSION WHERE THE QUARTER Q1, Q2, IS FRAGMENTED INTO MONTHS. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE 3. SLICE AND DICE THE SLICE AND DICE OPERATION PICK UP ONE DIMENSION OF THE DATA CUBE AND THEN FORMS A SUBCUBE OUT OF IT. THE FIGURE REPRESENTS THE SLICE OPERATION ON A DATA CUBE WHERE THE DATA CUBE IS SLICED BASED ON TIME. THE DICE OPERATION SELECT MORE THAN ONE DIMENSION TO FORM A SUBCUBE. LIKE IN THE FIGURE YOU CAN SEE THAT THE SUBCUBE IS FORMED BY SELECTING THE DIMENSIONS SUCH AS LOCATION, ITEMS AND TIME. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE 4. PIVOT PIVOT IS NOT A CALCULATIVE OPERATION ACTUALLY IT ROTATES THE DATA CUBE IN ORDER TO VIEW DATA CUBE FROM DIFFERENT DIMENSIONS. THE FIGURE BELOW SHOWS THE PIVOT OPERATION PERFORMED ON THE DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS ADVANTAGES OF DATA CUBE 1. DATA CUBE EASE IN AGGREGATING AND SUMMARIZING THE DATA. 2. DATA CUBE PROVIDE BETTER VISUALIZATION OF DATA. 3. DATA CUBE STORES HUGE AMOUNT OF DATA IN A VERY SIMPLIFIED WAY. 4. DATA CUBE INCREASES THE OVERALL EFFICIENCY OF THE DATA WAREHOUSE. 5. THE AGGREGATED DATA IN DATA CUBE HELPS IN ANALYZING THE DATA FAST AND THEREBY REDUCING THE ACCESS TIME. TYPES OF MULTIDIMENSIONAL MODELS STAR LAYOUT IT IS THE SIMPLEST MODEL, WHERE THE FACT IS CENTRALIZED AND COMPOSED OF DIMENSIONS WITH A LARGE NUMBER OF DATA, GIVEN THAT, WITHOUT REDUNDANCY, THESE DIMENSIONS ARE DIRECTLY LINKED TO THE FACT THROUGH ITS FOREIGN KEYS. TYPES OF MULTIDIMENSIONAL MODELS SNOWFLAKE LAYOUT IT IS AN EXTENSION OF THE STAR SCHEMA, COMPOSED OF MORE DIMENSIONS REDUCING REDUNDANCY, IN THE END, WE HAVE A GREATER NUMBER OF DIMENSIONS FORMING MORE COMPLEX QUERIES AND REDUCING PERFORMANCE. MODELS SUCH AS THE SNOWFLAKE ARE ARRANGED SO THAT AT EACH END OF THE STAR, IT BECOMES THE CENTER OF ANOTHER STAR (MACHADO, 2013). TYPES OF MULTIDIMENSIONAL MODELS CONSTELLATION SCHEME OR FACT CONSTELLATION, IS A GROUPING OF DIMENSIONS WITH MULTIPLE FACT TABLES, ITS ONLY DISADVANTAGE IS DUE TO THE COMPLEXITY. STAR SCHEMAS VS. OLAP CUBES DIMENSIONAL MODELS IMPLEMENTED IN RELATIONAL DATABASE MANAGEMENT SYSTEMS ARE REFERRED TO AS STAR SCHEMAS. DIMENSIONAL MODELS IMPLEMENTED IN MULTIDIMENSIONAL DATABASE ENVIRONMENTS ARE REFERRED TO AS ONLINE ANALYTICAL PROCESSING CUBES OR (OLAP). WHEN DATA IS LOADED INTO AN OLAP CUBE, IT IS STORED AND INDEXED USING FORMATS AND TECHNIQUES THAT ARE DESIGNED FOR DIMENSIONAL DATA. PERFORMANCE AGGREGATIONS OR PRECALCULATED SUMMARY TABLES ARE OFTEN CREATED AND MANAGED BY OLAP CUBE ENGINES. TYPICALLY OLAP CUBES PROVIDE USERS WITH SUPERIOR QUERY PERFORMANCE BECAUSE OF THE PRECALCULATIONS, INDEXING STRATEGIES AND OTHER OPTIMIZATIONS. STAR SCHEMAS VS. OLAP CUBES A STAR SCHEMA HOSTED IN A RELATIONAL DATABASE IS A GOOD PHYSICAL FOUNDATION FOR BUILDING AN OLAP CUBE, AND IS GENERALLY REGARDED AS A MORE STABLE BASIS FOR BACKUP AND RECOVER OLAP CUBES HAVE TRADITIONALLY BEEN NOTED FOR EXTREME PERFORMANCE ADVANTAGES OVER RDBMS OLAP CUBE DATA STRUCTURES ARE MORE VARIABLE ACROSS DIFFERENT VENDORS THAN RELATIONAL DBMS OLAP CUBES TYPICALLY OFFER MORE SOPHISTICATED SECURITY OPTIONS THAN RDBMS, SUCH AS LIMITING ACCESS TO DETAILED DATA BUT PROVIDING MORE OPEN ACCESS TO SUMMARY DATA OLAP CUBES OFFER SIGNIFICANTLY RICHER ANALYSIS CAPABILITIES THAN RDBMS WHICH ARE HARNESSED BY THE CONSTRAINTS OF SQL OLAP CUBES GRACEFULLY SUPPORT SLOWLY CHANGING DIMENSION, BUT CUBES OFTEN NEED TO BE REPROCESSED PARTIALLY OR TOTALLY WHENEVER DATA IS REWRITTEN OLAP CUBES MAY IMPOSE DETAILED CONSTRAINTS ON THE STRUCTURE OF DIMENSION KEYS THAT IMPLEMENT DRILL-DOWN HIERARCHIES COMPARED TO RELATIONAL DATABASES

Use Quizgecko on...
Browser
Browser