Summary

This document provides an overview of data warehousing concepts, including the island system, its causes and consequences, and the characteristics of a data warehouse. It details topics including subject-oriented design, integrated data, non-volatile nature, and time-variant data, along with components of a data warehouse like metadata, OLAP tools, data sources, data marts, ETL processes, and data warehouse databases. It's focused on practical applications within business settings.

Full Transcript

**[DATA WAREHOUSING]** **ISLAND SYSTEM** - Also known as **information island** or **information silo** - Situation where information is **isolated and not shared.** - A **computer application system** where **functions are not related,** **information not shared and disconnected** fro...

**[DATA WAREHOUSING]** **ISLAND SYSTEM** - Also known as **information island** or **information silo** - Situation where information is **isolated and not shared.** - A **computer application system** where **functions are not related,** **information not shared and disconnected** from business processes and applications. A. **CAUSES** - a **cultural problem** when **people do not share information.** B. **CONSEQUENCES** - Leads to **employees focusing on their own work progress.** - **Agencies spreading efforts** across different agencies. - Institutions **not working together.** **DATA WAREHOUSING** - Primary purpose is to **consolidate data form multiple sources** into a **single, unified system** that **supports reporting**, **analysis**, and **decision-making.** **DATA WAREHOUSE** - Large, organized repository **designed to store historical and current data.** - **Optimized for [querying]** and **[analysis] rather than transactional processing.** **[4 KEY CHARACTERISTICS OF A DATA WAREHOUSE]** **Subject-Oriented** - **Organized around key subjects** (customers, sales, products) **rather than around the organization's ongoing operations**. **Integrated** - **Data from various sources** (database, spreadsheets) are **integrated** into the data warehouse, **ensuring consistency in data formats**, **naming conventions**, and **measurements**. **Non-Volatile** - **Once data is entered into the warehouse, it does not change.** - This characteristic **ensures that historical data is preserved for analysis** over time. **Time-Variant** - Data in data warehouse is **typically stored with timestamps** to **allow for historical analysis.** - Enables **businesses to analyze trends over time.** **[6 Components of Data Warehouse (MODDED) ]** 1. **Metadata** 2. **OLAP (Online Analytical Processing) Tools** 3. **Data Sources** 4. **Data Marts** 5. **ETL Process (Extract, Transform, Load)** 6. **Data Warehouse Database** 1. **Metadata** - Metadata **provides information about the data**, such as **definitions**, **source systems**, **data types**, and **transformation rules.** - It helps **users understand and navigate the data warehouse.** 2. **OLAP (Online Analytical Processing) Tools** - These **tools allow users to perform complex queries**, **analysis**, and **reporting on the data stored in the data warehouse.** - OLAP tools **support multidimensional analysis**, enabling **users to slice** and **dice data across various dimensions.** 3. **Data Sources** - These are the **different systems**, **databases**, and **files** from which **data is extracted.** - **Sources can include operational databases**, **CRM** (Customer Relationship Management) systems, **ERP** (Enterprise Resource Planning) systems, and **external data providers.** 4. **Data Marts** - These are **subsets of the data warehouse** that are **tailored to the specific needs of different business units or departments**, such as **marketing**, **finance**, or **sales.** 5. **ETL Process (Extract, Transform, Reload)** - This process **involves extracting data from various sources, transforming it into a consistent format**, and **loading it into the data warehouse**. - ETL tools are **critical for data integration** and **quality control.** 7. **Data Warehouse Database** - The **central repository** where the **integrated**, **cleaned**, and **organized data** **is stored**. - This **database is optimized for query performance** and is **typically designed using a relational database management system (RDBMS).**

Use Quizgecko on...
Browser
Browser