🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

DATA WAREHOUSING What is Data Warehousing? Data warehousing is a process used to collect and manage data from multiple sources into a centralized repository to drive actionable business insights. With all your data in one place, it becomes simpler to perform analysis and reporting at d...

DATA WAREHOUSING What is Data Warehousing? Data warehousing is a process used to collect and manage data from multiple sources into a centralized repository to drive actionable business insights. With all your data in one place, it becomes simpler to perform analysis and reporting at different aggregate levels. It is the core of the BI system and helps you make better business decisions. In simple words, it is the electronic storage space for all your business data integrated from different marketing and other sources. Data Warehousing sometimes called “single source of truth”. History of Data Warehousing The concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the “business data warehouse”. 1960s – General Mills and Dartmouth College in a joint research project, develop the terms dimensions and facts. 1970s - ACNielsen and IRI provide dimensional data marts for retail sales 1970s – Bill Inmon begins to define and discuss the term Data Warehouse. 1983 – Teradata introduces the DBC/1012 database computer specifically designed for decision support. 1988 – Barry Devlin and Paul Murphy publish the article "An architecture for a business and information system" where they introduce the term "business data warehouse“. 1991 – Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse. 1992 – Bill Inmon publishes the book Building the Data Warehouse. How it works? Data Warehousing combines information collected from multiple sources into one comprehensive database Companies use this information to analyze their customers. Data warehousing also related to data mining which means looking for meaningful data patterns in the huge data volumes and devise newer strategies for higher sales and profits. Why it matters? Companies with a dedicated Data Warehousing team think way ahead of others in product development, marketing, pricing strategy, production time, historical analysis, and forecasting and customer satisfaction. Though a slightly pricey option, it pays in the long run. However, data warehouses can also be very expensive to design and implement, and sometimes their construction makes them unwieldy. Three types of Data Warehousing ENTERPRISE DATA OPERATIONAL DATA DATA MART WAREHOUSE STORE Three types of Data Warehousing Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse, which provides decision support service across the enterprise. It offers a unified approach to organizing and representing data. It also provides the ability to classify data according to the subject and give access according to those divisions. Three types of Data Warehousing Operational Data Store: Operational Data Store, also called ODS, is data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees. Three types of Data Warehousing Data Mart: A Data Mart is a subset of the data warehouse. It specially designed for specific segments like sales, finance, sales, or finance. In an independent data mart, data can collect directly from sources. Data Warehouses and data marts are mostly built on dimensional data modeling where fact tables relate to dimension tables. This is useful for users to access data since a database can be visualized as a cube of several dimensions. A data warehouse allows a user to splice the cube along each of its dimensions. Data Warehousing and Data Analytics People often get confused between data warehousing and data analytics. These two terms may seem similar but are not the same, which sums up the difference between them. Data warehousing is the process of consolidating all the organizational data into one common database. On the other hand, data analytics is all about analyzing the raw data and driving conclusions from the information gained. The concepts are interrelated but different. The process of data analytics begins once the process of data warehousing is completed. General Stages of Data Warehousing Offline Operational Database: In this stage, data is copied to a server from an operating system so that loading, processing, and reporting the data does not impact the performance of the operational system. Offline Data Warehouse: The data stored in the warehouse is regularly updated from the operational database to derive useful business insights. Real-time Data Warehouse: Whenever a transaction takes place in the operational database, the same is updated in the data warehouse. Integrated Data Warehouse: Every transaction taking place in the operational database is updated simultaneously in the data warehouse. Then, the warehouse generates transactions that are forwarded to the operational database. Data Warehouse Architecture Data Warehouse architecture is based on a Relational database management system server that functions as the central repository for informational data. In the data warehouse architecture, operational data and processing are separate from data warehouse processing. This central information repository is surrounded by several key components designed to make the entire environment functional, manageable, and accessible by both the operational systems that source data into the warehouse and by the end-user query and analysis tools. Three-tier architecture of a data warehouse: BOTTOM TIER MIDDLE TIER TOP-TIER Data Warehouse Architecture Bottom Tier: The bottom tier of the architecture represents the data warehouse database server, also known as the relational database system. Back-end tools and utilities are made use of to feed data into the bottom tier. These back-end tools and utilities perform the Extract, Clean, Load, and refresh functions. Data Warehouse Architecture Middle Tier: The middle tier of a data warehouse lies the OLAP Server which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational OLAP (MOLAP) model, which directly implements the multidimensional data and operations. Data Warehouse Architecture Top-Tier: This tier represents the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools. Data Warehouse Appliances Data Warehouse Appliances are a set of hardware and software tools used for storing data. Every data-driven business uses these appliances to build a centralized and comprehensive data warehouse, where all kinds of functional business data can be stored. Data Warehouse Appliances When these appliances are combined with data warehouses, help organizations to meet their modern-day data integration requirements. By combining all customer data in a data warehouse, you can get the following benefits: Cross-account indexing. Easy access to customers’ historical data. Enhanced interactive voice response technology Customized digital communications Data Warehouse appliances act as the building blocks for creating efficient business data warehouse systems. Data Warehousing and Business Intelligence A Data Warehouse may be described as a consolidation of data from multiple sources that is designed to support strategic and tactical decision making for organizations. The primary purpose of DW is to provide a coherent picture of the business at a point in time. Business Intelligence (BI), on the other hand, describes a set of tools and methods that transform raw data into meaningful patterns for actionable insights and improving business processes. This usually involves data preparation, data analytics, and data visualization. Business Intelligence is an umbrella term that is used interchangeably with Data Analytics or to describe a process which includes data preparation, analytics, and visualization. Data warehousing describes tools that take care of joining disparate data sources, cleaning the data and preparing it for analysis. DATA MARTS What is Data Marts? A data mart is a subset of a data warehouse focused on a particular line of business, department or subject area. Data marts can improve team efficiency, reduce costs and facilitate smarter tactical business decision-making in enterprises. Data marts make specific data available to a defined group of users, which allows those users to quickly access critical insights without wasting time searching through an entire data warehouse. For example, many companies may have a data mart that aligns with a specific department in the business, such as finance, sales or marketing. History of Data Marts The history of data marts doesn't have a single "father" like some other technological advancements. Data marts evolved as a response to the need for more specialized and focused data warehousing solutions. However, Bill Inmon and Ralph Kimball are two prominent figures often associated with the development and popularization of data warehousing and related concepts, including data marts. They have differing approaches to data warehousing, with Inmon advocating for the "top-down" approach and Kimball promoting the "bottom-up" dimensional modeling approach. Both of their contributions have influenced the development of data warehousing and, by extension, data marts. Data Marts vs Data Warehouse Data marts and data warehouses are crucial central data repositories, but they serve different needs within an organization. A data warehouse is a system that aggregates data from multiple sources into a single, central, consistent data store to support data mining, artificial intelligence (AI), and machine learning—which, ultimately, can enhance sophisticated analytics and business intelligence. Through this strategic collection process, data warehouse solutions consolidate data from the different sources to make it available in one unified form. A data mart (as noted above) is a focused version of a data warehouse that contains a smaller subset of data important to and needed by a single team or a select group of users within an organization. A data mart is built from an existing data warehouse (or other data sources) through a complex procedure that involves multiple technologies and tools to design and construct a physical database, populate it with data, and set up intricate access and management protocols. Why is a Data Marts important? Retrieve data more efficiently By using a data mart, companies can access specific information more efficiently. Compared to a data warehouse, a data mart contains relevant and detailed information that a department accesses frequently. Therefore, business managers don’t need to search the entire data warehouse to generate performance reports or graphics. Streamline decision-making Companies can create a subset of data from a data warehouse with a data mart. Employees within the department can then analyze the data and make decisions based on the same set of information. Why is a Data Marts important? Control information more effectively A data mart gives employees highly granular access privileges. This means the company can authorize a certain person to view or retrieve specific data. It helps companies to improve data governance and enforce information access policies. For example, you can use data marts to provide user access to employees for specific information in a data warehouse. Manage data flexibly A data mart is smaller and contains fewer tables than a data warehouse. This means data engineers can manage and change information in a data mart without causing major database changes. How does a Data Mart work? A data mart turns raw information When it is connected to a data into structured, meaningful warehouse, the data mart retrieves content for a specific business a selection of information that is department. To do this, data relevant to a business unit. Often, engineers set up a data mart to the information contains receive information either from a summarized data and excludes data warehouse or directly from unnecessary or detailed data. external data sources. How does a Data Mart work? ETL Extract, transform, and load (ETL) is a process for integrating and transferring information from various data sources into a single physical database. Data marts use ETL to retrieve information from external sources when it does not come from a data warehouse. The process involves the following steps. Extract: collect raw information from various sources Transform: structure the information into a common format Load: transfer the processed data to the database How does a Data Mart work? Analytics Business analysts use software tools to retrieve, analyze, and represent data from the data mart. For example, they use the information stored in data marts for business intelligence analytics, reporting dashboards, and cloud applications. Each data mart serves a small number of users. For example, the marketing manager and senior marketers have access to a data mart, so it takes less time to generate reports and graphs or to perform predictive analysis. What are the types of Data Marts? Dependent Data Mart A dependent data mart populates its storage with a subset of information from a centralized data warehouse. The data warehouse gathers all the information from data sources. Then, the data mart queries and retrieves subject-specific information from the data warehouse. Pros and cons Most data management and administration works are performed in the data warehouse. This means that business analysts do not need to be highly skilled in database management to use information from the data mart. Although dependent data marts make retrieving information much easier, they present a single point of failure. If the data warehouse fails, all the connected data marts will also fail. What are the types of Data Marts? Independent Data Mart An independent data mart does not rely on a central data warehouse or any other data mart. Each data mart collects information from its sources instead of from a data warehouse. Independent data marts are suitable for smaller companies, but only specific departments need to access and analyze information. Pros and cons Companies can set up independent data marts with relative ease. However, managing them might be difficult. This is because business analysts need to perform database administrative work at each data mart. It is straightforward to share data between different data marts using strategies like data sharing; departments can read another department’s data and even augment it with their own data. However, a strong data cataloging strategy must be put into place to ensure each department knows what they are looking at. What are the types of Data Marts? Hybrid Data Mart Hybrid data marts collect information from a data warehouse and from external sources. This allows companies the flexibility to test independent data sources before they direct the data to the data warehouse. For example, suppose you launch a new product and want to analyze its initial sales data. The data mart uses sales information that comes directly from the e-commerce software and retrieves sales records for other products from the data mart. After the product becomes a permanent fixture in your store, you channel the transaction details to the data warehouse. What are the steps in Implementing a Data Mart? Cloud data engineers set up a data mart by doing the following: 1 2 3 4 Launch their cloud-native data Populate the data mart with Set up the data mart so that Continue to monitor, optimize, platform. business data. They ensure that multiple users can access data and resolve issues when the data the data has the correct format from it. For example, they install a mart runs. and is relevant to the business reporting dashboard in the data users. mart. What are the structures of a data mart? What are the structures of a data mart? 1. Star The star schema is a blueprint that resembles a star shape and consists of fact tables that reference dimension tables in a relational database. The fact table is placed at the center of the star and relates a metric set that relates to a specific process. The star schema requires fewer joints when writing queries as there is no dependency between dimension tables. The ETL request process makes it vastly efficient for accessing and navigating large data sets. The said benefits make star schemas widely used in most information technology systems. What are the structures of a data mart? 2. Snowflake A snowflake schema extends the star schema blueprint with additional dimension tables that are normalized to protect data integrity and minimize data redundancy. The snowflake schema’s main benefit is that it requires less storage space for dimension tables. However, a snowflake structure is difficult to maintain due to multiple tables that need to be populated and synchronized. It also adversely impacts performance as a result of the need for additional dimension tables. What are the structures of a data mart? 3. Vault The vault schema enables users to design agile enterprise data warehouses. It is a fairly modern database modeling technique. The vault schema is a layered structure that focuses on agility and scalability. Group Members JAYSON P. AMARILLO FERDINAND QUE JOHN BENEDICT ESCALONA LUIGI RHENZ BELARAS

Use Quizgecko on...
Browser
Browser