Business Intelligence - Module 1 Coverage PDF
Document Details
Uploaded by SharpestExtraterrestrial
Surigao del Norte State University
Tags
Summary
This document provides an overview of business intelligence (BI), focusing on the concept of Online Analytical Processing (OLAP). It explains different types of OLAP and the role of aggregations in optimizing query performance. The document also highlights the importance of business intelligence in supporting business decisions.
Full Transcript
# COVERAGE ## MODULE 1 ### BUSINESS INTELLIGENCE DEFINITION BI - is neither a product nor a system. It is an architecture and a collection of integrated operational as well as decision-support applications and databases that provide the business community easy access to business data. ### BI Defin...
# COVERAGE ## MODULE 1 ### BUSINESS INTELLIGENCE DEFINITION BI - is neither a product nor a system. It is an architecture and a collection of integrated operational as well as decision-support applications and databases that provide the business community easy access to business data. ### BI Defined - Business Intelligence (BI) encompasses the processes, tools, and technologies required to transform enterprise data into information, and information into knowledge that can be used to enhance decision-making and to create actionable plans that drive effective business activity. - BI can be used to acquire Tactical insight to optimize business processes by identifying trends, anomalies, and behaviors that require management action. Strategic insight to align multiple business processes with key business objectives through integrated performance management and analysis. ### BUSINESS INTELLIGENCE - Increasing potential to support business decisions - Making Decisions - Data Presentation Visualization Techniques - Data Mining - Information Discovery - Data Exploration - Statistical Analysis, Querying and Reporting - Data Warehouses/Data Marts - OLAP - Data Sources: Paper, Files, Information Providers, Database Systems, OLTP - OLAP (Online Analytical Processing) - End User - Business Analyst - Data Analyst - DBA ### OLAP (Online Analytical Processing) - Online Analytical Processing, or OLAP is an approach to swiftly answer multi dimensional analytical queries - OLAP is part of the broader category of business intelligence, which also encompasses and data mining - Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases. - The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. - Fact Table: The fact table holds the main data. It includes a large amount of aggregated data, such as price and units sold. There may be multiple tables in a star schema. - Dimension tables: which are usually smaller than fact tables, include the attributes that describe the facts. Often this is a separate table for each dimension. Dimension tables can be joined to the fact table(s) as needed. - Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it is still easy to access as well as be a compact type of database. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications Analytical databases use these databases because of their ability to deliver answers swiftly to complex business queries. ### AGGREGATIONS IN OLAP - It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities. - The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. - Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, ### TYPES OF OLAP - Multidimensional (MOLAP): MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing. - Relational (ROLAP): ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design. This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. - Hybrid (HOLAP): There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data. ### COMPARISON - MOLAP implementations are prone to database explosion, a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. - MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression techniques. - ROLAP is generally more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously. - Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use. - HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer good function support. ## MODULE 2 ### 2.1 Design of BI "front-end" applications #### Business Query and Reporting Business query and reporting tools are often referred to as "ad hoc query tools." This terminology is a little misleading, as in fact the queries are not always ad hoc (as in spontaneously crafted) but rather are often fixed reports. The difference is that a business User, usually a power User, may have built the report, rather than an information technology (IT) person. The business environment changes at a rapid pace, and unable to wait weeks or months for IT to develop a new report, business Users often demand the ability to create queries and reports themselves. Business query and reporting tools allow for this and are most often used for decision-making and management purposes. The business query and reporting tool is a key module to provide Users with self-service information access. In some cases, a report is truly ad hoc; it's a one-off business question that will never be posed again. Ad hoc queries may be exploratory in nature as Users try to find the root cause of a problem, test a theory, or consider changing a business model. Table 3-1 lists some sample fixed reports that may lead to an ad hoc query. As Users explore the data, what started as an ad hoc query or one-time question may later become a fixed report. It's important to recognize the iterative nature of business intelligence and to ensure you have flexible business intelligence tools. Getting to the data is just one capability of business query tools; the other aspect is presenting and formatting the data in a meaningful way, loosely referred to as reporting. The terms "query" and "reporting" are sometimes used interchangeably because a business query and reporting tool will have both capabilities-getting to the data and formatting it to create a report. | Fixed Report | Purpose | Related Ad Hoc Query | |---------------------------------|---------------------------------------------------------------------------|------------------------------------------------------------------------------------| | Imentory by Product | To determine if an order can be fulfilled today by the primary warehouse | If I'm short at main warehouse, can I supply the product from elsewhere? | | Top 10 Customers By Quarter and Product | To understand which customers generate the most revenue | Who fell off this quarter's lia? Are there certain products we can cross-sell? | | Raw Material Receipts. and Delivery Times | To determine how long it takes to acquire materials and which supplier can fulfill purchase orders fastest | Are there other suppliers who respond faster? | | Patients Per Hour | To understand busy periods and wait times | Do staffing levels correspond to busy times) | Business query and reporting tools vary widely in their formatting capabilities. The most basic of formatting capabilities allow for changing the font of column headings and making them bold and centered. Conditional formatting will, for example, display numeric values red when negative or below target and green when positive or above target. Simple report styles include displaying information in a cross-tab report, a chart, or a master-detail report with groupings and subtotals. Tools may provide a set of templates to create nicely formatted reports Users use to generate reports that use a consistent corporate look and feel. More complex formatting capabilities include the ability to present multiple charts on a page, perhaps coming from different data sources. Examples of business query tools include Business Objects Web Intelligence, Cognos 8 Query Studio, and SAS Web Report Studio. #### A Business View of the Data Business query tools allow business Users to access a data source via business terms without having to write any SQL. The data source could be a data warehouse or it might be direct access to an operational system. A key feature of a business query tool is that it has a business view or metadata layer that hides the complexity of the physical database structure from the business User by: * Using business terminology rather than physical field names. For example, a User may select a dimension such as Customer Name rather than a cryptic field such as CUST.L33_NAME (the physical table and field name in the Relational Database Management System [RDBMS]). * Automatically connecting related tables via joins. * Providing metrics that may calculate and aggregate facts such as revenue, number of customers, number of orders, number of incidents, and average selling price. Figure 3-1 shows an example of building a query with the Business Objects universe, one of the first products to introduce the concept of a business view. This business view is the most important piece of your BI frontend tools and one in which the business and IT must work together to model. For integrated Bl platforms, the business view is common to all the BI tool modules: business query, reporting, analysis, and dashboards. When the business view looks too much like the data warehouse or source system with confusing table and field names, business Users are overwhelmed and can too easily build incorrect queries. Poor business view design also forces Users to put too much logic and too many calculations inside individual reports and dashboards. For these reasons, in some organizations, the Users within a business unit, function, or department, are responsible for building the business view or metadata layer; in others, it is the central BI group or data warehouse team that will build and maintain the business view. #### What Is Structured Query Language (SQL) SQL, pronounced "sequel," is a computer language used to communicate with a relational database. SQL is a common language regardless if you use a database from Oracle, IBM, Microsoft, or Teradata. Querying a database with SQL can be fairly complicated. Business query tools will generate the SQL behind the scenes so business Users don't need to learn how to write SQL code. While there is a common set of SQL commands, such as SELECT and SUM, each database vendor may have its own SQL extensions, or dialect. RANK, for example is a popular SQL expression among business Users but it is an expression that not all relational databases support. Sometimes when trying to develop a complex business query, you may run into limitations inherent in the SQL language. For example, a query about sales for this quarter would generate simple SQL. Asking a query about which products were cross-sold to the same customers this year versus last year would require very complex SQL and may be better answered in an OLAP database. ## MODULE 3 ### Data and data models A data model (or data model) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner. The term data model can refer to two distinct but closely related concepts. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain: for example the customers, products, and orders found in a manufacturing organization. At other times it refers to the set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses. ### Types of Data Models #### Conceptual Data Models: High-level, static business structures and concepts The conceptual data model is a structured business view of the data required to support business processes, record business events, and track related performance measures. This model focuses on identifying the data used in the business but not its processing flow or physical characteristics. #### Logical Data Models: Entity types, data attributes and relationships between entities A logical model contains representations of entities and attributes, relationships, unique identifiers, subtypes and super types, and constraints between relationships. A logical model can also contain domain model objects, or reference one or more domain or glossary models. #### Physical Data Models: The internal schema database design A physical data model is a database-specific model that represents relational data objects (for example, tables, columns, primary and foreign keys) and their relationships. A physical data model can be used to generate DDL statements which can then be deployed to a database server. ### Data Processing Data processing is the conversion of data into usable and desired form. This conversion or "processing" is carried out using a predefined sequence of operations either manually or automatically. Most of the processing is done by using computers and other data processing devices, and thus done automatically. #### Data integration technique and the concept of ETL in the BI applications ETL and data integration this is the ongoing process that data warehousing undertakes to transform multiple data sources into useful, consistent information for business intelligence and analytical efforts. #### What is Data Integrations? Data integration is the process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL mapping, and transformation.... The data is extracted from the sources, then consolidated into a single, cohesive data set. #### What is data techniques? - Uniform access integration: A technique that retrieves and uniformly displays data, but leaves it in its original source. - Common storage integration: An approach that retrieves and uniformly displays the data, but also makes a copy of the data and stores it. #### Difference of ETL and data integration The main difference between data integration and ETL is that the data integration is the process of combining data in different sources to provide a unified view to the Users while ETL is the process of extracting, transforming and loading data in a data warehouse environment. The process of ETL plays a key role in data integration strategies. ETL allows businesses to gather data from multiple sources and consolidate it into a single, centralized location. ETL also makes it possible for different types of data to work together. #### ETL Tools Middleware By using a persistent metadata repository, ETL tools can transition from one-time projects to persistent middleware, performing data harmonization and data profiling consistently and in near-real time. ## MODULE 4 ### Self-service business intelligence Self-service business intelligence BI is the strategic process of using data insights to make decisions that help organizations reach their goals. Instead of using gut instinct, precedents, and traditional mind sets, self-service Bl helps to create a new culture around using data every day. Self-service business intelligence is defined here as end Users designing and deploying their own reports and analyses within an approved and supported architecture and tools portfolio. ### TRADITIONAL VS. SELF-SERVICE BI APPROACH Access to data was in the hands of IT, data scientists and analysts. The control, manipulation, extraction and reporting was in the hands of few people in a company that could work with data warehouses, write SQL queries, and analyze large volumes of data. Often, companies haven't even had the chance to employ someone to work with their information, and all the tools they could afford were in the form of spreadsheets and static presentations. In recent years, things have drastically changed and software solutions have developed so quickly that everyone in a company can generate their own actionable insights. Let's explore the differences between the traditional BI and the self-service approach: a) Who requires what? Traditionally, business Users would set up business requirements for generating a report or dashboard. Here is where their involvement pretty much stalls and the IT department takes over the rest of the process. The project/report must be approved, and, later on, the data is ready for extraction, transformation, and load (ETL) into the data warehouse. The IT or BI department generates the report or business dashboard while business Users don't have any control over this process. On the other hand, in a self-service setting, business Users don't need to be tech-savvy to process the data on their own. The IT team works closely with Users' requests to choose the best possible tool, but the analysis, reporting, and dashboard creation is completely under control of the User. Additionally, self-service features give various possibilities based on the level of knowledge a professional needs, so both an average User and the IT specialist can work in the same tool. b) Who loads the data and where? Extracting and loading the data into a warehouse is mostly automated, but in the traditional setting, the IT department is responsible to extract the data out of the warehouse by writing SQL queries to generate reports. On the other hand, in a self-service setting, business Users access data directly and have complete control over the analysis, creation of reports or dashboards. In a traditional setting, the whole analysis process was set up and controlled by the IT, but self-service enables the User to take control over the process with the help of a self-service Bl software. While they do need to cooperate in some of the steps of data preparation, business Users are enabled to help the IT department and unburden many of their tasks. c) Who creates the report or dashboard? Modern business intelligence software usually is comprised of an intuitive, drag-and-drop interface, based on user-friendly navigation that enables non-technical professionals to build their own queries and generate comprehensive reports. In a traditional Bl setting, this was done by IT professionals while the business User would approve or ask for changes in the report or dashboard. This process could take weeks, and often the IT department is already extremely busy with numerous other tasks. Generating reports became a tedious task that no one was happy with. In recent years, average business Users have gained the ability to generate their own analysis, ad hoc reports, explore the data and interact with dashboards with the help of numerous visualizations. The process has become so simplified that the IT department doesn't have to hold full control over the business intelligence of a company. Business Users can take full advantage of self-service reporting tools and tinker the data on their own. Traditional vs. modern BI have a different way of operating within a company, but the goal is the same: actionable insights. Businesses must consider the budget, workforce, and tools they're capable to finance and maintain to be able to choose the best possible solution. We will focus now on the benefits of self-service business intelligence (SSBI) and follow with the most prominent elements that these tools have to offer. ### ORGANIZATION BI Business intelligence, or BI, plays a key role in the strategic planning of organizations and is used for multiple purposes, including measuring performance progress toward business goals, performing quantitative analysis, reporting and data sharing, and identifying customer insights. ### TOP SELF-SERVICE BUSINESS INTELLIGENCE BENEFITS a) Answer critical questions in seconds: Gone are the days of asking for a critical KPI and having to wait hours, if not days, for the busy IT department to pull the report. SSBI puts the User in charge of the data. Data can be accessed on the fly for real-time analysis and immediate actionable insights, giving your team a competitive advantage. b) Combine multiple data sources: Comprehensive business intelligence often requires combining multiple data sources and performing cross-database queries. From online to offline data, a user-friendly SSBI setup will enable you to aggregate and analyze every type of data available in one central place c) Excel just isn't cutting it: When you can't access the data you need it is still common for Users to shortcut their data management processes to get instant answers. Excel is a common go-to tool for short cuts. A 2011 study by Forrester showed that 88 percent of Users rely "heavily" or "exclusively on spreadsheets. Old-style BI is clearly not meeting their needs. While 2011 was years ago, it is safe to assume that in 2020 the number is still high, especially when it comes to ad hoc analysis. Though Excel was never intended to be a Bl tool, it can be tempting to depend on spreadsheets for workflow from familiarity. Relying on Excel is a risky proposition because it requires a lot of manual data entry, leaving it prone to errors, especially when spreadsheets are passed around your organization. The days of exporting large datasets to spreadsheets to be converted into charts and pivot tables are over. With the new generation of Self-service BI tools, data analysis within spreadsheets belongs to the past. d) Democratizing data: The entire team no longer needs to rely on a data analyst poring over spreadsheets, looking for clues to improve services. Self-service BI tools provide information for everyone involved in the project in an easy-to-share format. Because all team members can easily access the same data, SSBI makes the decision-making process more inclusive, puts everyone involved in the decision-making process on the same page, allowing everyone an equal opportunity to impress the boss and customers. e) Mobile matters: Business Intelligence means having the right data at the right time to make the right calls. Nowadays, that means accessing your data wherever you are, which often involves the use of a mobile device. Taking BI mobile is critical for companies to respond to issues in real-time and make informed decisions immediately. Continual access to dashboards means that you and your colleagues will always have the information you need, regardless of your location. f) Free up the IT department: Old data, or different data structures, often bring IT department back into the picture to clean up the data to make it functional. A common issue with this, is that it impacts how timelines are delivered. Across industries, the vast minority of Users get their reports within one and three days of the initial request. With the fast pace of today's business environment, a day long wait can render the initial question moot. g) Data visualization is key: Data visualization plays a key role in buying decisions. Combining millions of data points and advanced math concepts to make a story that people will understand is not an easy task. When it comes to data visualization, simplicity and clear data presented will increase the level of audience engagement. The use of dashboards has become indispensable to filter and manage the vast amounts of data that modern companies generate. h) Telling your story: While data visualization is part of your story, the topic of storytelling deserves to be a stand-alone bullet point. Data will be remembered only if presented in the right way and the right context. A story will always be more effective and memorable than a spreadsheet or a complicated dashboard. Have you been confronted with dashboards chock full of analytics that is not relevant to your question? It is a challenge to make data-driven decisions if you do not understand the story behind the numbers. Modern dashboard software enables the report creator to better understand the underlying data, and craft more compelling stories. Accessible anywhere, through mobile devices and desktops, dashboards drive better team collaboration and greater insights than traditional solutions like PowerPoint or Excel ever could. i) Limit your risk: Many SSBI providers offer their tool as a software-as-a-service (SaaS). In comparison to traditional BI tools, this type of software provides a solution without the time, expense, and hassle of developing a BI infrastructure internally. There are no upfront investments required to start and no maintenance charges. Typically, buyers pay a monthly fee dependent on the required feature set, number of Users and data sources, as well as your individual data volume. Companies no longer need to build their own data centers, hire large IT staff or buy expensive one-time licenses to get started. All you technically need to do is to sign up and commit only for a few months, and easily change the provider. Most vendors also offer a free trial so you can make sure the product is right for your needs before you commit.