Multidimensional Data Model PDF

Document Details

HighQualityCantor

Uploaded by HighQualityCantor

Hashemite University

Arthur P. Aguiar, Dr. Motaz Abdul Aziz Al-Hami

Tags

data warehousing data modeling multidimensional data business intelligence

Summary

This document details the concepts of multidimensional data models, covering topics like OLAP, OLTP, and data granularity. It explains different approaches to data warehousing, such as the Inmon and Kimball methodologies. The document also discusses the advantages and disadvantages of each approach.

Full Transcript

MULTIDIMENSIONAL DATA MODEL SLIDES ARE BASED ON ESSAY WRITTEN BY ARTHUR P. AGUIAR SLIDES ARE PREPARED AND ORGANIZED BY DR. MOTAZ ABDUL AZIZ AL-HAMI OVERVIEW BEFORE ANALYZING A DATA, WE MUST FIRST TREAT IT AND STORE I...

MULTIDIMENSIONAL DATA MODEL SLIDES ARE BASED ON ESSAY WRITTEN BY ARTHUR P. AGUIAR SLIDES ARE PREPARED AND ORGANIZED BY DR. MOTAZ ABDUL AZIZ AL-HAMI OVERVIEW BEFORE ANALYZING A DATA, WE MUST FIRST TREAT IT AND STORE IT SOMEWHERE, HOWEVER, DEPENDING HOW THAT DATA IS STORED IT CAN BECOME A LONG-TERM PROBLEM, SUCH AS HIGH MAINTENANCE AND DATA PROCESSING COSTS, CONTAINED THEREIN OR EVEN USELESS DATA THAT IS JUST WASTING OUR MONEY ON STORAGE AND PROCESSING. FOR THIS, WE MUST ORGANIZE OUR DATA BEFORE PASSING IT ON TO OUR DATA WAREHOUSE, ONE OF THE SOLUTIONS IS THE MULTIDIMENSIONAL DATA MODEL, HOWEVER, EVERY WAY, HOWEVER HERE ARE SOME CONCEPTS THAT YOU WILL NEED TO KNOW BEFOREHAND. ONLINE ANALYTICAL PROCESSING (OLAP) & ONLINE TRANSACTION PLATFORM (OLTP) OLAP: IS A TOOL USED FOR THE ANALYSIS AND TREATMENT OF DATA, WIDELY USED TO TREAT A MASSIVE AMOUNT OF INFORMATION IN THE DATA WAREHOUSE. OLAP MIGHT TAKE MINUTES OR HOURS FOR COMPLEX QUERY RESULTS. REAL TIME OLAP (RTOLAP): IS A TOOL USED FOR THE ANALYSIS AND TREATMENT OF DATA IN REAL TIME, WIDELY USED TO TREAT A MASSIVE AMOUNT OF INFORMATION IN THE VARIOUS DIMENSIONS OF A DATA WAREHOUSE. OLTP: REFERS TO SYSTEMS THAT RECORD ALL THE OPERATIONAL ACTIONS OF A STOCK, GUARANTEEING ITS SUCCESS. THIS TYPE OF DATA IS GENERATED MASSIVELY EVERY DAY. OLTP HAS SHORTER RESPONSE TIMES, TYPICALLY IN MILLISECONDS (REAL TIME). EXAMPLE: WE ARE ANALYZING TEMPORAL DATA AND WE WANT TO HAVE VIEWS BY YEAR, DAY, QUARTER OR SEMESTER, THIS INTERACTION WITH THE USER IS DONE BY OLAP. USES OF OLAP & OLTP THE PRIMARY PURPOSE OF ONLINE ANALYTICAL PROCESSING (OLAP) IS TO ANALYZE AGGREGATED DATA, WHILE THE PRIMARY PURPOSE OF ONLINE TRANSACTION PROCESSING (OLTP) IS TO PROCESS DATABASE TRANSACTIONS. YOU USE OLAP SYSTEMS TO GENERATE REPORTS, PERFORM COMPLEX DATA ANALYSIS, AND IDENTIFY TRENDS. IN CONTRAST, YOU USE OLTP SYSTEMS TO PROCESS ORDERS, UPDATE INVENTORY, AND MANAGE CUSTOMER ACCOUNTS. OLAP VS. REAL-TIME OLAP DESPITE THE INCREASING POPULARITY OF REAL-TIME ANALYTICS, ALSO KNOWN AS REAL-TIME ONLINE ANALYTICAL PROCESSING (REAL-TIME OLAP, OCCASIONALLY ABBREVIATED AS RTOLAP), FEW UNDERSTAND WHEN TO USE IT IN THEIR DATA STACK. SOME ANALYTICAL QUERIES CAN MAKE DUE WITH TRADITIONAL OLAP, WHILE OTHERS TRULY NEED TO RUN IN A REAL-TIME OLAP DATABASE. EXAMPLES OF ONLINE ANALYTICAL PROCESSING (OLAP), A TYPE OF DATA STORE USED SPECIFICALLY FOR COMPLEX DATA ANALYSIS. OLAP ENABLES YOU TO PROCESS HISTORICAL DATA AND TRAIN MACHINE LEARNING MODELS. ITS USE CASES TYPICALLY COVER LONG-RUNNING QUERIES THAT FINISH IN MINUTES OR HOURS TO USE IN NEXT-DAY REPORTS OR OCCASIONALLY REFRESHED DASHBOARDS. OLAP VS. REAL-TIME OLAP REAL-TIME OLAP DATA STORES, ON THE OTHER HAND, INTEND TO SERVE MULTI-DIMENSIONAL DATA IN REAL TIME AT LOWER LATENCIES MEASURED IN SECONDS OR MILLISECONDS. IT ALSO SUPPORTS SIGNIFICANTLY MORE END USERS COMPARED TO TRADITIONAL OLAP, REFLECTED BY HIGH RATES OF QUERIES PER SECOND (QPS) MEASURED IN THE THOUSANDS TO HUNDREDS OF THOUSANDS. ONLINE ANALYTICAL PROCESSING (OLAP) & ONLINE TRANSACTION PLATFORM (OLTP) EXAMPLE: A BANK TRANSACTION FOR EXAMPLE, IF IT FAILS THE WHOLE ACTION MUST BE REVERSED, IF IT IS SUCCESSFUL, IT MUST BE RECORDED AND IMMUTABLE. DATA GRANULARITY WE MUST PAY CLOSE ATTENTION TO DATA GRANULARITY AS IT DIRECTLY AFFECTS THE VOLUME IN DATA STORAGE AND THE SPEED OF SEARCH AND THE LEVEL OF DETAIL OF THE INFORMATION. IN A BRIEF EXPLANATION, WHEN WE HAVE A HIGH GRANULARITY OF DATA IT MEANS THAT WE HAVE LESS DETAILS OF THE DATA, WHEN WE HAVE A LOW GRANULARITY, WE HAVE MORE DETAILS OF THE DATA. EXAMPLE: IMAGINE THAT WE HAVE A SALES TABLE WHERE THE NAME OF SALESPEOPLE APPEARS REPEATEDLY. THIS TABLE HAS A LOW GRANULARITY BECAUSE IT HAS A LOT OF INFORMATION, SO MUCH INFORMATION THAT IT CAN MAKE IT DIFFICULT FOR US TO SELECT WHICH SALESPERSON HAD THE HIGHEST NUMBER OF SALES, NOTE THAT THE MORE DECENTRALIZED DATA WE HAVE IN A TABLE, THE GREATER THE DATA AND THEREFORE WE WILL HAVE LONGER ANALYSIS TIME. IF WE HAVE A TABLE OF SELLERS WITH THE TOTAL SALES OF EACH ONE, WE WILL HAVE DATA WITH LESS DETAILS, MORE GRANULARITY AND SPENDING LESS RESOURCES. DATA GRANULARITY WE CAN ALSO UNDERSTAND THIS CONDENSATION AND DETAILING AS DRILL DOWN AND DRILL UP. DRILL DOWN — THE GRANULARITY IS REDUCED AND THE LEVEL OF DETAILS IS INCREASED. DRILL UP — THE GRANULARITY IS INCREASED AND THE LEVEL OF DETAIL IS DECREASED. THEREFORE, IT IS CLEAR THAT WE MUST CONDENSE SOME DATA TO AVOID REDUNDANCY, SAVE PROCESSING AND SPACE. WE SHOULD THINK ABOUT RESOURCES X DATA = INFORMATION. IF OUR DATA IS POORLY MODELED WE WILL THROW AWAY RESOURCES THAT ARE PRECIOUS, SO OUR EQUATION LOOKS LIKE THIS RESOURCES X DATA = WASTE OF TIME AND MONEY. MULTIDIMENSIONAL DATA ANALYSIS IN THIS ANALYSIS WE USE STRUCTURED DATA IN THE FORM OF A CUBE (EACH SIDE OF THE CUBE IS A DIMENSION), THE MULTIDIMENSIONAL MODEL IS THE STANDARD IN THE ANALYSIS TOOLS, FOR EXAMPLE, WHEN WE USE ARITHMETIC QUERIES WITH OLAP. THIS MODEL HAS A HIGHER PERFORMANCE IN QUERIES, BESIDES PROVIDING A FACILITY IN THE CREATION OF COMPLEX QUERY’S. WHEN THE SCOPE OF THE PROJECT IS REDUCED, THIS MODEL ALLOWS A MORE AGILE IMPLEMENTATION. MULTIDIMENSIONAL DATA ANALYSIS STRUCTURE TO VISUALIZE THIS MODEL WE USE A CUBE, WHERE TABLES ARE ASSOCIATED, SUMMARIZED OR AGGREGATED TO RETURN SOME METRICS (SALES PER YEAR, FOR EXAMPLE). EACH TABLE IS SEEN AS A DIMENSION, TOGETHER THEY FORM A CUBE THAT CAN HAVE LOW OR HIGH GRANULARITY, ALWAYS DEPENDING ON THE REQUIREMENTS OF EACH PROJECT. DATA CUBE IS A DATA STRUCTURE FOR STORING AND ANALYZING LARGE AMOUNTS OF MULTIDIMENSIONAL DATA (PEDERSEN, 2009B). MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE: DATA CUBE REPRESENTS THE DATA IN TERMS OF DIMENSIONS AND FACTS. A DATA CUBE IS USED TO REPRESENTS THE AGGREGATED DATA. A DATA CUBE IS BASICALLY CATEGORIZED INTO TWO MAIN KINDS THAT ARE: - MULTIDIMENSIONAL DATA CUBE. - RELATIONAL DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS WHAT IS DATA CUBE? A DATA CUBE IS A MULTIDIMENSIONAL DATA MODEL THAT STORE THE OPTIMIZED, SUMMARIZED OR AGGREGATED DATA WHICH EASES THE OLAP TOOLS FOR THE QUICK AND EASY ANALYSIS. DATA CUBE STORES THE PRECOMPUTED DATA AND EASES ONLINE ANALYTICAL PROCESSING. WHEN IT COMES TO CUBE, WE, ALL THINK IT AS A THREE-DIMENSIONAL STRUCTURE BUT IN DATA WAREHOUSING, WE CAN IMPLEMENT AN N-DIMENSIONAL DATA CUBE. DATA STORED IN A DATA CUBE IS REPRESENTED IN TERMS OF DIMENSIONS AND FACTS. NOW, WHAT DOES THE DIMENSION EXACTLY REPRESENTS? THE DIMENSIONS OF DATA CUBE ARE THE ATTITUDE, ANGLE OR THE ENTITIES WITH RESPECT TO WHICH THE ENTERPRISE WANTS TO STORE THE DATA. NOW, HOW DOES IT HELP THE ANALYST TO ANALYZE AND EXTRACT THE DATA? MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE EXAMPLE: LET US TAKE AN EXAMPLE, CONSIDER WE HAVE DATA ABOUT ALL ELECTRONICS SALES. HERE WE CAN STORE THE SALES DATA IN MANY PERSPECTIVES OR DIMENSIONS LIKE SALES IN ALL TIME, SALES AT ALL LOCATION, SALES OF ALL ITEMS. THE FIGURE SHOWS THE DATA CUBE FOR ALL ELECTRONICS SALES. MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE EXAMPLE: EACH DIMENSION HAS A DIMENSION TABLE WHICH CONTAINS A FURTHER DESCRIPTION OF THAT DIMENSION. SUCH AS A BRANCH DIMENSION MAY HAVE BRANCH NAME, BRANCH CODE, BRANCH ADDRESS ETC. A MULTIDIMENSIONAL DATA MODEL LIKE DATA CUBE IS ALWAYS BASED ON A THEME WHICH IS TERMED AS FACT. LIKE IN THE ABOVE EXAMPLE OF A DATA SET OF ALL ELECTRONIC WE HAVE STORED DATA BASED ON THE SALES OF THE ELECTRONIC ITEM. SO, HERE THE FACT IS SALES. A FACT HAS A FACT TABLE ASSOCIATED WITH IT. MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE CLASSIFICATION DATA CUBE CAN BE CLASSIFIED INTO TWO MAIN CATEGORIES AS DISCUSSED BELOW: 1. MULTIDIMENSIONAL DATA CUBE (MOLAP) MULTIDIMENSIONAL ARRAYS ARE USED TO STORE DATA THAT ASSURES A MULTIDIMENSIONAL VIEW OF THE DATA. MULTIDIMENSIONAL DATA CUBE HELPS IN STORING A LARGE AMOUNT OF DATA. MULTIDIMENSIONAL DATA CUBE IMPLEMENTS INDEXING TO REPRESENT EACH DIMENSION OF A DATA CUBE WHICH IMPROVES THE ACCESSING, RETRIEVING AND STORING DATA FROM THE DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS DATA CUBE CLASSIFICATION DATA CUBE CAN BE CLASSIFIED INTO TWO MAIN CATEGORIES AS DISCUSSED BELOW: 2. RELATIONAL DATA CUBE (ROLAP) YOU CAN CONSIDER THE RELATIONAL DATA CUBE AS THE ‘EXTENDED VERSION OF RELATIONAL DBMS’. RELATIONAL TABLES ARE USED TO STORE DATA AND EACH RELATIONAL TABLE REPRESENTS THE DIMENSION OF A DATA CUBE. TO CALCULATE THE AGGREGATED DATA RELATIONAL DATA CUBE IMPLEMENTS SQL BUT WHEN IT COMES TO PERFORMANCE THE RELATIONAL DATA CUBE’S PERFORMANCE IS SLOWER THAN THE MULTIDIMENSIONAL DATA CUBE. BUT THE RELATIONAL DATA CUBE IS SCALABLE FOR STEADILY INCREASING DATA. YOU CAN EVEN GET THE COMBINATION OF BOTH RELATIONAL DATA CUBE AS WELL AS MULTIDIMENSIONAL DATA CUBE WHICH IS TERMED AS A HYBRID DATA CUBE. THE HYBRID DATA CUBE (HOLAP) RETRIEVE FEATURES SUCH AS SCALABILITY FROM RELATIONAL DATA CUBE AND IT RETRIEVES FASTER COMPUTATION FROM MULTIDIMENSIONAL DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE NOW, LET US DISCUSS THE OPERATIONS THAT CAN BE CONDUCTED ON DATA CUBE IN ORDER TO VIEW DATA FROM DIFFERENT ANGLES. THERE ARE FOUR BASIC OPERATIONS THAT CAN BE IMPLEMENTED ON A DATA CUBE WHICH ARE DISCUSSED BELOW: 1. ROLL UP ROLL-UP OPERATION SUMMARIZES OR AGGREGATES THE DIMENSIONS EITHER BY PERFORMING DIMENSION REDUCTION OR YOU CAN PERFORM CONCEPT HIERARCHY. THE BELOW FIGURE SHOWS YOU THE EXAMPLE OF A ROLL-UP OPERATION PERFORMED ON THE LOCATION DIMENSION OF THE DATA CUBE WE HAVE SEEN ABOVE. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE 2. DRILL DOWN WHEN THE DRILL-DOWN OPERATION IS PERFORMED ON ANY DIMENSION THE DATA ON THE DIMENSION IS FRAGMENTED INTO GRANULAR FORM. IN THE FIGURE YOU CAN SEE THE DRILL-DOWN OPERATION ON THE TIME DIMENSION WHERE THE QUARTER Q1, Q2, IS FRAGMENTED INTO MONTHS. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE 3. SLICE AND DICE THE SLICE AND DICE OPERATION PICK UP ONE DIMENSION OF THE DATA CUBE AND THEN FORMS A SUBCUBE OUT OF IT. THE FIGURE REPRESENTS THE SLICE OPERATION ON A DATA CUBE WHERE THE DATA CUBE IS SLICED BASED ON TIME. THE DICE OPERATION SELECT MORE THAN ONE DIMENSION TO FORM A SUBCUBE. LIKE IN THE FIGURE YOU CAN SEE THAT THE SUBCUBE IS FORMED BY SELECTING THE DIMENSIONS SUCH AS LOCATION, ITEMS AND TIME. MULTIDIMENSIONAL DATA ANALYSIS OPERATIONS ON DATA CUBE 4. PIVOT PIVOT IS NOT A CALCULATIVE OPERATION ACTUALLY IT ROTATES THE DATA CUBE IN ORDER TO VIEW DATA CUBE FROM DIFFERENT DIMENSIONS. THE FIGURE BELOW SHOWS THE PIVOT OPERATION PERFORMED ON THE DATA CUBE. MULTIDIMENSIONAL DATA ANALYSIS ADVANTAGES OF DATA CUBE 1. DATA CUBE EASE IN AGGREGATING AND SUMMARIZING THE DATA. 2. DATA CUBE PROVIDE BETTER VISUALIZATION OF DATA. 3. DATA CUBE STORES HUGE AMOUNT OF DATA IN A VERY SIMPLIFIED WAY. 4. DATA CUBE INCREASES THE OVERALL EFFICIENCY OF THE DATA WAREHOUSE. 5. THE AGGREGATED DATA IN DATA CUBE HELPS IN ANALYZING THE DATA FAST AND THEREBY REDUCING THE ACCESS TIME. TYPES OF MULTIDIMENSIONAL MODELS STAR LAYOUT IT IS THE SIMPLEST MODEL, WHERE THE FACT IS CENTRALIZED AND COMPOSED OF DIMENSIONS WITH A LARGE NUMBER OF DATA, GIVEN THAT, WITHOUT REDUNDANCY, THESE DIMENSIONS ARE DIRECTLY LINKED TO THE FACT THROUGH ITS FOREIGN KEYS. TYPES OF MULTIDIMENSIONAL MODELS SNOWFLAKE LAYOUT IT IS AN EXTENSION OF THE STAR SCHEMA, COMPOSED OF MORE DIMENSIONS REDUCING REDUNDANCY, IN THE END, WE HAVE A GREATER NUMBER OF DIMENSIONS FORMING MORE COMPLEX QUERIES AND REDUCING PERFORMANCE. MODELS SUCH AS THE SNOWFLAKE ARE ARRANGED SO THAT AT EACH END OF THE STAR, IT BECOMES THE CENTER OF ANOTHER STAR (MACHADO, 2013). TYPES OF MULTIDIMENSIONAL MODELS CONSTELLATION SCHEME OR FACT CONSTELLATION, IS A GROUPING OF DIMENSIONS WITH MULTIPLE FACT TABLES, ITS ONLY DISADVANTAGE IS DUE TO THE COMPLEXITY. STAR SCHEMAS VS. OLAP CUBES DIMENSIONAL MODELS IMPLEMENTED IN RELATIONAL DATABASE MANAGEMENT SYSTEMS ARE REFERRED TO AS STAR SCHEMAS. DIMENSIONAL MODELS IMPLEMENTED IN MULTIDIMENSIONAL DATABASE ENVIRONMENTS ARE REFERRED TO AS ONLINE ANALYTICAL PROCESSING CUBES OR (OLAP). WHEN DATA IS LOADED INTO AN OLAP CUBE, IT IS STORED AND INDEXED USING FORMATS AND TECHNIQUES THAT ARE DESIGNED FOR DIMENSIONAL DATA. PERFORMANCE AGGREGATIONS OR PRECALCULATED SUMMARY TABLES ARE OFTEN CREATED AND MANAGED BY OLAP CUBE ENGINES. TYPICALLY OLAP CUBES PROVIDE USERS WITH SUPERIOR QUERY PERFORMANCE BECAUSE OF THE PRECALCULATIONS, INDEXING STRATEGIES AND OTHER OPTIMIZATIONS. STAR SCHEMAS VS. OLAP CUBES A STAR SCHEMA HOSTED IN A RELATIONAL DATABASE IS A GOOD PHYSICAL FOUNDATION FOR BUILDING AN OLAP CUBE, AND IS GENERALLY REGARDED AS A MORE STABLE BASIS FOR BACKUP AND RECOVER OLAP CUBES HAVE TRADITIONALLY BEEN NOTED FOR EXTREME PERFORMANCE ADVANTAGES OVER RDBMS OLAP CUBE DATA STRUCTURES ARE MORE VARIABLE ACROSS DIFFERENT VENDORS THAN RELATIONAL DBMS OLAP CUBES TYPICALLY OFFER MORE SOPHISTICATED SECURITY OPTIONS THAN RDBMS, SUCH AS LIMITING ACCESS TO DETAILED DATA BUT PROVIDING MORE OPEN ACCESS TO SUMMARY DATA OLAP CUBES OFFER SIGNIFICANTLY RICHER ANALYSIS CAPABILITIES THAN RDBMS WHICH ARE HARNESSED BY THE CONSTRAINTS OF SQL OLAP CUBES GRACEFULLY SUPPORT SLOWLY CHANGING DIMENSION, BUT CUBES OFTEN NEED TO BE REPROCESSED PARTIALLY OR TOTALLY WHENEVER DATA IS REWRITTEN OLAP CUBES MAY IMPOSE DETAILED CONSTRAINTS ON THE STRUCTURE OF DIMENSION KEYS THAT IMPLEMENT DRILL-DOWN HIERARCHIES COMPARED TO RELATIONAL DATABASES FACT TABLES FOR MEASUREMENTS THE FACT TABLE IN A DIMENSIONAL MODEL STORES THE PERFORMANCE MEASUREMENTS RESULTING FROM AN ORGANIZATION'S BUSINESS PROCESS EVENTS. YOU SHOULD STRIVE TO STORE THE LOW-LEVEL MEASUREMENT DATA RESULTING FROM A BUSINESS PROCESS IN A SINGLE DIMENSIONAL MODEL. MEASUREMENT DATA IS THE LARGEST SET OF DATA. THE TERM FACT REPRESENTS A BUSINESS MEASURE EACH ROW IN A FACT TABLE CORRESPONDS TO A MEASUREMENT EVENT THE DATA ON EACH ROW IS AT A SPECIFIC LEVEL OF DETAIL, REFERRED TO AS THE GRAIN, SUCH AS ONE ROW PER PRODUCT SOLD ON A SALES TRANSACTION ADDITIVITY IS CRUCIAL BECAUSE BI APPLICATIONS RARELY RETRIEVE A SINGLE FACT TABLE ROW. RATHER THEY BRING BACK HUNDREDS, THOUSANDS, OR EVEN MILLIONS OF FACT ROWS AT A TIME, AND THE MOST USEFUL THING TO DO WITH SO MANY ROWS IS TO ADD THEM UP. FACTS ARE OFTEN DESCRIBED AS CONTINUOUSLY VALUED TO HELP SORT OUT WHAT IS A FACT VERSUS A DIMENSION ATTRIBUTE. FACT TABLES FOR MEASUREMENTS THE DESIGNER SHOULD MAKE EVERY EFFORT TO PUT TEXTUAL DATA IN DIMENSIONS WHERE THEY CAN BE CORRELATED MORE EFFECTIVELY WITH THE OTHER TEXTUAL DIMENSION ATTRIBUTES AND CONSUMES MUCH LESS SPACE. FACT TABLES TEND TO BE DEEP IN TERMS OF THE NUMBER OF ROWS, BUT NARROW IN TERMS OF THE NUMBER OF COLUMNS. ALL FACT TABLES HAVE TWO OR MORE FOREIGN KEYS THAT CONNECT TO THE DIMENSION TABLES' PRIMARY KEYS. WHEN ALL THE KEYS IN THE FACT TABLE CORRECTLY MATCH THEIR RESPECTIVE PRIMARY KEYS IN THE CORRESPONDING DIMENSION TABLES, THAT TABLES SATISFY REFERENTIAL INTEGRITY. THE FACT TABLE GENERALLY HAS ITS OWN PRIMARY KEY COMPOSED OF A SUBSET OF THE FOREIGN KEYS. THIS KEY IS CALLED THE COMPOSITE KEY EVERY TABLE THAT HAS A COMPOSITE KEY IS A FACT TABLE DIMENSION TABLES FOR DESCRIPTIVE CONTEXT DIMENSION TABLES ARE INTEGRAL COMPANIONS TO A FACT TABLE. THE DIMENSION TABLES CONTAIN THE TEXTUAL CONTEXT ASSOCIATED WITH A BUSINESS PROCESS MEASUREMENT EVENT. DIMENSION TABLES TEND TO HAVE FEWER ROWS THAN FACT TABLES, BUT CAN BE WIDE WITH MANY LARGE TEXT COLUMNS. EACH DIMENSION IS DEFINED BY A SINGLE PRIMARY KEY WHICH SERVES AS THE BASIS FOR REFERENTIAL INTEGRITY WITH ANY GIVEN FACT TABLE TO WHICH IT IS JOINED. DIMENSION TABLES ATTRIBUTES PLAY A VITAL ROLE IN THE DW/BI SYSTEM. BECAUSE THEY ARE THE SOURCE OF VIRTUALLY ALL CONSTRAINTS AND REPORT LABELS, DIMENSION ATTRIBUTES ARE CRITICAL TO MAKING THE DW/BI USABLE AND UNDERSTANDABLE. FACTS AND DIMENSIONS JOINED IN A STAR SCHEMA THE FIRST THING TO NOTICE ABOUT THE DIMENSIONAL SCHEMA IS ITS SIMPLICITY AND SYMMETRY. THE SIMPLICITY OF THE DIMENSIONAL MODEL HAS PERFORMANCE BENEFITS. DATABASE OPTIMIZERS PROCESS THESE SIMPLE SCHEMAS WITH FEWER JOINS MORE EFFICIENTLY. DIMENSION MODELS ARE GRACEFULLY EXTENSIBLE TO ACCOMMODATE CHANGE. THE PREDICTABLE FRAMEWORK OF A DIMENSIONAL MODEL RESISTS UNEXPECTED CHANGES IN USER BEHAVIOR. ANOTHER WAY TO THINK ABOUT THE RELATIONSHIP BETWEEN A DIMENSION AND A FACT IS TO THINK IN TERMS OF A REPORT, DIMENSION ATTRIBUTES SUPPLY THE REPORT FILTERS AND LABELING, WHEREAS THE FACT TABLES SUPPLY THE REPORT'S NUMERIC VALUES: FACTS AND DIMENSIONS JOINED IN A STAR SCHEMA FACTS AND DIMENSIONS JOINED IN A STAR SCHEMA YOU CAN THINK ABOUT THE SQL REQUIRED TO CREATE THE ILLUSTRATED REPORT FROM ABOVE AS: THE INMON METHODOLOGY BILL INMON, THE FATHER OF DATA WAREHOUSING, CAME UP WITH THE CONCEPT TO DEVELOP A DATA WAREHOUSE. BILL INMON’S DEFINITION OF A DATA WAREHOUSE IS THAT IT IS A “SUBJECT-ORIENTED, NONVOLATILE, INTEGRATED, TIME- VARIANT COLLECTION OF DATA IN SUPPORT OF MANAGEMENT’S DECISIONS.” THE MODEL THEN CREATES A THOROUGH, LOGICAL MODEL FOR EVERY PRIMARY ENTITY. FOR INSTANCE, A LOGICAL MODEL IS CONSTRUCTED FOR PRODUCTS WITH ALL THE ATTRIBUTES ASSOCIATED WITH THAT ENTITY. THE BILL INMON DESIGN APPROACH USES THE NORMALIZED FORM FOR BUILDING ENTITY STRUCTURE, AVOIDING DATA REDUNDANCY AS MUCH AS POSSIBLE. THIS RESULTS IN CLEARLY IDENTIFYING BUSINESS REQUIREMENTS AND PREVENTING ANY DATA UPDATE IRREGULARITIES. MOREOVER, THE ADVANTAGE OF THIS TOP-DOWN APPROACH IN DATABASE DESIGN IS THAT IT IS ROBUST TO BUSINESS CHANGES AND CONTAINS A DIMENSIONAL PERSPECTIVE OF DATA ACROSS DATA MART. THIS BILL INMON MODEL CREATES A SINGLE SOURCE OF TRUTH FOR THE WHOLE BUSINESS. DATA LOADING BECOMES LESS COMPLEX DUE TO THE NORMALIZED STRUCTURE OF THE MODEL. ALL THE DATA ENTERING THE DATA WAREHOUSE IS INTEGRATED. THE DATA WAREHOUSE ACTS AS A SINGLE DATA SOURCE FOR VARIOUS DATA MARTS TO ENSURE INTEGRITY AND CONSISTENCY ACROSS THE ENTERPRISE THE INMON METHODOLOGY THE KIMBALL METHODOLOGY THE KIMBALL DATA MODEL FOLLOWS A BOTTOM-UP APPROACH TO DATA WAREHOUSE ARCHITECTURE DESIGN IN WHICH DATA MARTS ARE FIRST FORMED BASED ON THE BUSINESS REQUIREMENTS. IN RALPH KIMBALL MODELING, THE PRIMARY DATA SOURCES ARE EVALUATED, AND AN EXTRACT, TRANSFORM AND LOAD (ETL) TOOL IS USED TO FETCH DATA FROM SEVERAL SOURCES AND LOAD IT INTO A STAGING AREA OF THE DATA WAREHOUSE. ONCE DATA IS UPLOADED IN THE DATA WAREHOUSE STAGING AREA, THE NEXT PHASE INCLUDES LOADING DATA INTO A DIMENSIONAL DATA WAREHOUSE MODEL THAT’S DENORMALIZED BY NATURE. THIS MODEL PARTITIONS DATA INTO THE FACT TABLE, WHICH IS NUMERIC TRANSACTIONAL DATA OR DIMENSION TABLE, WHICH IS THE REFERENCE INFORMATION THAT SUPPORTS FACTS. STAR SCHEMA IS THE FUNDAMENTAL ELEMENT OF THE DIMENSIONAL DATA WAREHOUSE MODEL. BUS MATRIX OR KIMBALL BUS ARCHITECTURE THAT VERTICALLY RECORDS THE FACTS AND HORIZONTALLY RECORDS THE CONFORMED DIMENSIONS. THE KIMBALL METHODOLOGY TO INTEGRATE DATA, KIMBALL APPROACH TO DATA WAREHOUSE LIFECYCLE SUGGESTS THE IDEA OF CONFORMED DATA DIMENSIONS. IT EXISTS AS A BASIC DIMENSION TABLE SHARED ACROSS DIFFERENT FACT TABLES (SUCH AS CUSTOMER AND PRODUCT) WITHIN A DATA WAREHOUSE OR AS THE SAME DIMENSION TABLES IN VARIOUS KIMBALL DATA MARTS. THIS GUARANTEES THAT A SINGLE DATA ITEM IS USED IN A SIMILAR MANNER ACROSS ALL THE FACTS. THE KIMBALL METHODOLOGY BASIC KIMBALL DATA WAREHOUSE ARCHITECTURE EXPLAINED (SOURCE: ZENTUT) ADVANTAGES OF THE KIMBALL METHODOLOGY KIMBALL DIMENSIONAL MODELING IS FAST TO CONSTRUCT AS NO NORMALIZATION IS INVOLVED AN ADVANTAGE OF STAR SCHEMA IS THAT MOST DATA OPERATORS CAN EASILY COMPREHEND IT BECAUSE OF ITS DENORMALIZED STRUCTURE, WHICH SIMPLIFIES QUERYING AND ANALYSIS. DATA WAREHOUSE SYSTEM FOOTPRINT IS TRIVIAL BECAUSE IT FOCUSES ON INDIVIDUAL BUSINESS AREAS AND PROCESSES RATHER THAN THE WHOLE ENTERPRISE. CONFORMED DIMENSIONAL STRUCTURE FOR DATA QUALITY FRAMEWORK. THE KIMBALL APPROACH TO DATA WAREHOUSE LIFECYCLE IS ALSO REFERRED TO AS THE BUSINESS DIMENSIONAL LIFESTYLE APPROACH BECAUSE IT ALLOWS BUSINESS INTELLIGENCE TOOLS TO DEEPER ACROSS SEVERAL STAR SCHEMAS AND GENERATES RELIABLE INSIGHTS. KIMBALL APPROACH TO DATA WAREHOUSE LIFECYCLE (SOURCE: KIMBALL GROUP) KIMBALL’S DW/BI ARCHITECTURE LET’S BUILD ON YOUR UNDERSTANDING OF DW/BI SYSTEMS AND DIMENSIONAL MODELING FUNDAMENTALS BY INVESTIGATING THE COMPONENTS OF A DW/BI ENVIRONMENT BASED ON THE KIMBALL ARCHITECTURE. THERE ARE FOUR SEPARATE AND DISTINCT COMPONENTS TO CONSIDER IN THE DW/BI ENVIRONMENT: - OPERATIONAL SOURCE SYSTEMS. - ETL SYSTEM. - DATA PRESENTATION AREA - BUSINESS INTELLIGENCE APPLICATIONS. KIMBALL'S DW/BI ARCHITECTURE AS ILLUSTRATED IN THE FIGURE BELOW THERE ARE FOUR SEPARATE AND DISTINCT COMPONENTS TO CONSIDER IN THE DW/BI ENVIRONMENT : OPERATIONAL SOURCE SYSTEMS, ETL SYSTEM, DATA PRESENTATION AREA, AND BUSINESS INTELLIGENCE APPLICATIONS. OPERATIONAL SOURCE SYSTEMS THINK OF THE SOURCE SYSTEMS AS OUTSIDE THE DATA WAREHOUSE BECAUSE PRESUMABLE YOU HAVE LITTLE OR NOT CONTROL OVER THE CONTENT AND FORMAT OF THE DATA IN THESE OPERATIONAL SYSTEMS. THE MAIN PRIORITIES OF THE SOURCE SYSTEM ARE PROCESSING PERFORMANCE AND AVAILABILITY. SOURCE SYSTEMS CONTAIN LITTLE HISTORICAL DATA, A GOOD DATA WAREHOUSE CAN RELIEVE THE SOURCE SYSTEMS OF MUCH OF THE RESPONSIBILITY OF BEING RESPONSIBLE FOR THE PAST. IN MAY CASES THE SOURCE SYSTEMS ARE SPECIAL PURPOSE APPLICATIONS. EXTRACT, TRANSFORMATION AND LOAD SYSTEMS THE EXTRACT, TRANSFORMATION AND LOAD SYSTEMS OF THE DW/BI ENVIRONMENT CONSIST OF A WORK AREA, INSTANTIATED DATA STRUCTURES AND A SET OF PROCESSES. THE ETL SYSTEM IS EVERYTHING BETWEEN OPERATIONAL SOURCE SYSTEMS AND THE DW/BI PRESENTATION AREA. EXTRACTION IS THE FIRST STEP IN THE PROCESS OF GETTING DATA INTO THE DATA WAREHOUSE ENVIRONMENT (EXTRACTING MEANS READING AND UNDERSTANDING THE SOURCE DATA AND COPYING THE DATA NEEDED INTO ETL SYSTEM FOR FURTHER MANIPULATION) THEN THERE ARE NUMEROUS POTENTIAL TRANSFORMATIONS SUCH AS CLEANING THE DATA, COMBINING DATA FROM MULTIPLE SOURCES, AND DE-DUPLICATING DATA IN ADDITION THESE ACTIVITIES CAN BE ARCHITECTED TO CREATE DIAGNOSTIC METADATA, EVENTUALLY LEADING TO BUSINESS PROCESS RE-ENGINEERING TO IMPROVE DATA QUALITY IN THE SOURCE SYSTEMS OVER TIME THE FINAL STEP OF THE ETL PROCESS IS THE PHYSICAL STRUCTURING AND LOADING OF DATA INTO THE PRESENTATION AREA'S TARGET DIMENSIONAL MODEL PRESENTATION AREA TO SUPPORT BUSINESS INTELLIGENCE THE DW/BI PRESENTATION AREA IS WHERE DATA IS ORGANIZED, STORED AND MADE AVAILABLE FOR DIRECT QUERYING BY USERS, REPORT WRITERS, AND OTHER ANALYTICAL BI APPLICATIONS. IDEAS TO CONSIDER FOR THE PRESENTATION AREA: INSIST THAT THE DATA BE PRESENTED, STORED, AND ACCESSED IN DIMENSIONAL SCHEMAS; FOR DIMENSIONAL MODELING IS THE MOST VIABLE TECHNIQUE FOR DELIVERING DATA TO THE DW/BI USERS. THE SECOND STAKE IS THAT IT MUST CONTAIN DETAILED, ATOMIC DATA. ATOMIC DATA IS REQUIRED TO RESIST REQUESTS FROM UNPREDICTABLE AD HOC USER QUERIES. THE MOST FINELY GRAINED DATA SHOULD BE AVAILABLE IN THE PRESENTATION AREA SO THAT USERS CAN ASK THE MOST PRECISE QUESTIONS POSSIBLE. THE PRESENTATION DATA ARE SHOULD BE BUILT AROUND BUSINESS PROCESS MEASUREMENT EVENTS. ALL THE DIMENSIONAL STRUCTURES MUST BE BUILT USING COMMON, CONFORMED DIMENSIONS (I.E. THEY MUST CONFORM TO THE COMMITMENT OF THE ENTERPRISE DATA WAREHOUSE BUS ARCHITECTURE). BUSINESS INTELLIGENCE APPLICATIONS THE TERM BUSINESS INTELLIGENCE LOOSELY REFERS TO THE RANGE OF CAPABILITIES PROVIDED TO BUSINESS USERS TO LEVERAGE THE PRESENTATION AREA FOR ANALYTICAL DECISION MAKING. A BI APPLICATION CAN BE AS SIMPLE AS AN AD HOC QUERY TOOL OR AS COMPLEX AS A SOPHISTICATED DATA MINING OR MODELING APPLICATION. ALTERNATIVE DW/BI ARCHITECTURES INDEPENDENT DATA MART ARCHITECTURE WITH THIS APPROACH, ANALYTIC DATA IS DEPLOYED ON A DEPARTMENTAL BASIS WITHOUT CONCERN TO SHARING AND INTEGRATING INFORMATION ACROSS THE ENTERPRISE. TYPICALLY, A SINGLE DEPARTMENT IDENTIFIES REQUIREMENTS FOR DATA FROM AN OPERATIONAL SOURCE SYSTEM. THE DEPARTMENT WORKS WITH IT STAFF OR OUTSIDE CONSULTANTS TO CONSTRUCT A DATABASE THAT SATISFIES THEIR DEPARTMENTAL NEEDS, REFLECTING THEIR BUSINESS RULES AND PREFERRED LABELING. WORKING IN ISOLATION, THIS DEPARTMENTAL DATA MART ADDRESSES THE DEPARTMENT’S ANALYTIC REQUIREMENTS. ALTERNATIVE DW/BI ARCHITECTURES HUB-AND-SPOKE CORPORATE INFORMATION FACTORY (CIF) INMON ARCHITECTURE WITH THE CIF, DATA IS EXTRACTED FROM THE OPERATIONAL SOURCE SYSTEMS AND PROCESSED THROUGH AN ETL SYSTEM SOMETIMES REFERRED TO AS DATA ACQUISITION. THE ATOMIC DATA THAT RESULTS FROM THIS PROCESSING LANDS IN A 3NF DATABASE; THIS NORMALIZED, ATOMIC REPOSITORY IS REFERRED TO AS THE ENTERPRISE DATA WAREHOUSE (EDW) WITHIN THE CIF ARCHITECTURE. ALTHOUGH THE KIMBALL ARCHITECTURE ENABLES OPTIONAL NORMALIZATION TO SUPPORT ETL PROCESSING, THE NORMALIZED EDW IS A MANDATORY CONSTRUCT IN THE CIF. LIKE THE KIMBALL APPROACH, THE CIF ADVOCATES ENTERPRISE DATA COORDINATION AND INTEGRATION. THE CIF SAYS THE NORMALIZED EDW FILLS THIS ROLE, WHEREAS THE KIMBALL ARCHITECTURE STRESSES THE IMPORTANCE OF AN ENTERPRISE BUS WITH CONFORMED DIMENSIONS ALTERNATIVE DW/BI ARCHITECTURES HYBRID HUB-AND-SPOKE AND KIMBALL ARCHITECTURE THIS ARCHITECTURE POPULATES A CIF-CENTRIC EDW THAT IS COMPLETELY OFF -LIMITS TO BUSINESS USERS FOR ANALYSIS AND REPORTING. IT’S MERELY THE SOURCE TO POPULATE A KIMBALL PRESENTATION AREA IN WHICH THE DATA IS DIMENSIONAL, ATOMIC (COMPLEMENTED BY AGGREGATES), PROCESS CENTRIC, AND CONFORMS TO THE ENTERPRISE DATA WAREHOUSE BUS ARCHITECTURE. IT BLENDS THE TWO ENTERPRISE-ORIENTED APPROACHES. IT LEVERAGE A PREEXISTING INVESTMENT IN AN INTEGRATED REPOSITORY. ADDRESSING THE PERFORMANCE AND USABILITY ISSUES ASSOCIATED WITH THE 3NF EDW BY OFFLOADING QUERIES TO THE DIMENSIONAL PRESENTATION AREA. DATA WAREHOUSE APPLICATION (MICROSOFT FABRIC) MICROSOFT FABRIC PROVIDES CUSTOMERS WITH A UNIFIED PRODUCT THAT ADDRESSES EVERY ASPECT OF THEIR DATA ESTATE BY OFFERING A COMPLETE, SAAS-IFIED DATA, ANALYTICS AND AI PLATFORM, WHICH IS LAKE CENTRIC AND OPEN. THE FOUNDATION OF MICROSOFT FABRIC ENABLES THE NOVICE USER THROUGH TO THE SEASONED PROFESSIONAL TO LEVERAGE DATABASE, ANALYTICS, MESSAGING, DATA INTEGRATION AND BUSINESS INTELLIGENCE WORKLOADS THROUGH A RICH, EASY TO USE, SHARED SAAS EXPERIENCE WITH MICROSOFT ONELAKE AS THE CENTERPIECE.

Use Quizgecko on...
Browser
Browser