Business Intelligence Midterm PDF

Summary

This document is a presentation or lecture outline on Business Intelligence (BI). It covers the process of Business Intelligence, including data transformation to knowledge, advantages of using BI, and users of BI. It also includes a history of BI and its architecture.

Full Transcript

BUSINESS INTELLEGENCE Dr. Keziban Seckin Codal OUTLINE BI The Process of BI The Benefits of BI The Different Users of BI History of BI The Architecture of BI BI Governance Successful BI Implementation BI Product Providers BUSINESS INTELLE...

BUSINESS INTELLEGENCE Dr. Keziban Seckin Codal OUTLINE BI The Process of BI The Benefits of BI The Different Users of BI History of BI The Architecture of BI BI Governance Successful BI Implementation BI Product Providers BUSINESS INTELLENGE Business intelligence (BI) is a business management term refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. DATA, INFORMATION AND KNOWLEDGE Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring. Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information. Process of BI Data -> information -> knowledge -> actionable plans Data -> information: the process of determining what data is to be collected and managed and in what context Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining Knowledge -> actionable plans: The most important aspect in a BI process THE BENEFITS OF BI Time savings Single version of truth Improved strategies and plans Improved tactical decisions More efficient processes Cost savings Faster, more accurate reporting Improved decision making Improved customer service Increased revenue THE DIFFERENT USERS OF BI There are many different users who can benefit from business intelligence – Executives – Those who focus on the overall business – Business Decision Makers – Usually focused on single areas of the business (finance, HR, manufacturing, and so forth) – Information Workers – Typically managers or staff working in the back office – Line Workers – Employees who might use BI without knowing it – Analysts – Employees who will perform extensive data analysis HISTORY OF BI Example Target Solutions: Fraud Detection / Risk Business Intelligence Use Cases CRM Analytic Supply Chain Optimization RFID / Spatial Data Focus on what is Stream Analytics* Other High-Volume happening RIGHT NOW Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) *n lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role Focus on what will happen Advanced Analytics/Optimization Analytic applications that Rules apply statistical Predictive Analytics relationships in the form Real-Time Threshold Real-time and traditional Data Mining of RULES Data mining to determine “New Traditional” Analytics why something Focus on what did “2.5-Gen” Analytics (In-Memory OLAP, Search-Based) happened by unearthing happen relationships that the end-user may not have Turning data into known existed. information is limited by the relationships which the Traditional Analytics end-user already knows to 1 Generation Analytics (Query & Reporting) st Source: look for. 2 Generation Analytics (OLAP, Data Warehousing) nd Bill O’Connell IBM, Aug 2007 THE ARCHITECTURE OF BI BI’s Architecture and Components – Data Warehouse – Business Analytics – Performance and Strategy – User Interface (Dashboards and Other Information Broadcasting Tools) Data Warehouse A data warehouse is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization. Business Analytics End users can work with data and information in a data warehouse by using a variety of tools and techniques. These tools and techniques fit into two major categories: Reports and queries Data, text and web mining and other sophisticated mathematical and statistical tools Performance and Strategy Business performance management encompasses three key components: A set of integrated, close-loop management and analytic processes, supported by technology, that addresses financial as well as operational activities Tools for businesses to define strategic goals and then measure and manage performance against those goals A core set of processes, including financial and operational planning, consolidation and reporting, modeling, analysis and monitoring of key performance indicators, linked to organizational strategy. User Interface Dashboards provide a comprehensive visual view of corporate performance measures, trends, and exceptions. Dashboards present graphs that show actual performance compared to desired metrics. BI GOVERNANCE A typical set of issues for the BI governance team is to address Creating categories of projects (investment, business opportunity, strategic, mandatory, etc.) Defining criteria for project selection Determining and setting a framework for managing project risk Managing and leveraging project interdependencies Continually monitoring and adjusting the composition of the portfolio Transaction Processing Versus Analytic Processing Online transaction processing systems (OLTP): Systems that handle a company’s routine ongoing business. Online analytic processing (OLAP): A capability of information systems that supports interactive examination of large amounts of data from many perspectives. SUCESSFUL BI IMPLEMENTATION At the business and organizational level strategic and operational objectives must be defined while considering available organization skills to achieve those objectives. BI best practices must be considered by upper management with plans in place to prepare the organization for change. Whether the culture is amenable to change is important for successful adaptation process. This process is moved from up to down in the organization. SUCESSFUL BI IMPLEMENTATION One of the important step in that process is to adapt IS organization which include the skillsets of the potential classes of users. Another critical issue for BI implementation success is the integration of several BI project with the other IT systems in the organization and its business partners. SUCCESSFUL BI IMPLEMENATION Appropriate Planning and Alignment with the Business Strategy Establish a BI Competency Center (BICC) within the Company Real-time, On-Demand BI Is Attainable Developing or Acquiring BI Systems Justification and Cost/Benefit Analysis Security and Protection of Privacy Integration of Systems and Applications BI PRODUCT PROVIDERS Microsoft SAS IBM Oracle SyBase Business Objects References Business Intelligence: A Managerial Approach, 2011, Turban et al. Business Intelligence: Data Mining and Optimization for Decision Making, 2010, Vercellis C. Business Intelligence for Telecommunications, 2007, Pareek D. Informatıon Technology for Management, 2015, Turban et al. Business Intelligence: A Managerial Approach (2nd Edition) Chapter 2: Data Warehousing DW definition and Concept Using real-time data warehousing (RDW) in conjunction with decision support system (DSS) and BI tools is an important way to conduct business processes -23 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall What is a Data Warehouse? a data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization Data are usually structured to be available in form ready for analytical -24 processing Copyright activities © 2011 Pearson Education, Inc. Publishing as Prentice Hall Characteristics of DW Subject-oriented. Data are organized by detailed subject, such as sales, products, or customers. Integrated. Data are from different sources and must be in a consistent form. DWs must deal with naming conflicts and discrepancies among units of measure. -25 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Characteristics of DW Time variant (time series). A DW usually contains historical data (e.g. daily, weekly, monthly), except in real-time systems. They detect trends, deviations, long-term relationships for forecasting and comparisons, leading to decision making. Nonvolatile. Users cannot change data in DWs. Obsolete data are discarded, and changes are recorded as new data -26 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Additional DW Characteristics Web-based. DWs are typically web-based application. Relational/Multidimensional. DWs use either a relational structure or multidimensional structure. Client/server. DWs use client/server architecture to provide easy access to end users. Real-time. Newer DWs provide real-time or active, data access and analysis capabilities. Include metadata. DWs contain metadata -27 (data about data) about how data organized Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Parts of DWs. 1. Data Marts. A data mart is a subset of a DW, typically consisting of a single subject area (e.g. marketing, operations). Dependent data mart A subset that is created directly from a data warehouse Independent data mart A small data warehouse designed for a strategic business unit or a department its not from DW -28 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Parts of DWs. 2. Operational Data Stores (ODS). An ODS provides a fairly recent form of customer information file (CIF) and is used for short- term decision involving mission-critical applications. 3. Enterprise Data Warehouses (EDW). is a large scale DW that is used across the enterprise for decision support. EDW are used to provide data for many types of DSS,  customer relation management (CRM),  supply chain management (SCM),  business performance management (BPM),  business activity monitoring (BAM),  product lifecycle management (PLM),  revenue management, -29  knowledge Copyright © 2011 management Pearson Education, systems Inc. Publishing as Prentice Hall (KMS), etc. MetaData Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use. Syntactic metadata, structure metadata and semantic metadata. -30 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall DW Process Overview Components of the data warehousing process: o Data sources. Data are sourced from multiple independent operational "legacy“ systems and possibly from external data providers (such as the U.S. Census). Data may also come (OLTP) or ERP system,Web data. o Data extraction and transormation. Data are extracted and properly transformed using custom-written or commercial software called ETL. -31 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall DW Process Overview Components of the data warehousing process: o Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/or data marts. o Compreheusive database. Essentially, this is the EDW to support all decision analysis by providing relevant summarized and detailed information originating from many different sources. -32 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall DW Process Overview Components of the data warehousing process: o Metadata. Metadata are maintained so that they can be assessed by IT personnel and users. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools. o Middleware tools. Middleware tools enable access to the data warehouse. Power users such as analysts may write their own SQL -33 queries. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall A Data Warehouse Framework and Views Applications Custom-suit Access applications Replication Production Select Data reporting Legacy mart tools Extract Metadata API Middleware Relational Transform reports OLTP query tools Integrate Data Information mart OLAP/ visualization Maintain Enterprise data External hOLAP warehouse Preparation Operational Data Web systems/data mart browsers Target database [HDB, MDDB] Data mining -34 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2.3 DATA WAREHOUSING ARCHITECTURES architectures are commonly called client/ server or n-tier architectures (2-tier and 2-tier OR 1-tire) DW Arch. Devide into three parts: 1. The data warehouse itself 2. Data acquisition (back-end) software 3. Client (front-end) software, -35 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 3-tire DW Architectures The advantage of the three-tier architecture is its separation of the functions of the data warehouse, which eliminates resource constraints and makes it possible to easily create data marts -36 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2-tire Dw the DSS engine physically runs on the same hardware platform as the data warehouse Therefore, it is more economical than the three-tier structure. -37 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Web-based DW Web pages Application Server Web Data Web server browser warehouse Internet, intranet, Client and/or extranet Web-based DW -38 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Alternative Data Warehousing Architectures data warehouse architecture design viewpoints can be categorized into enterprise-wide data warehouse (EDW) design and data mart (DM) design. A. Independent Data Marts Architecture the simplest and the least costly architecture Source Systems The data marts are developed to operate independently of each other to serve for the needs of individual ETL organizational units RDBMS RDBMS MDB Human Sales Data Financial Resource mart Data mart Data mart Local Metadata Local Metadata Local Metadata Data Mart Architecture -39 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Alternative Data Warehousing Architectures B. Data mart bus architecture. This architecture is a viable alternative to the independent data marts where the individual marts are linked to each other via some kind of middleware. -40 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Alternative Data Warehousing Architectures (c) Hub-and-Spoke Architecture (Corpo rate Information Factory) the most famous data warehousing architecture today. focused on building a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts -41 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Alternative Data Warehousing Architectures (d) Centralized Data Warehouse Architecture no dependent data marts provides users with access to all data in the data warehouse it reduces the amount of data the technical team has to transfer or change, therefore simplifying data management and administration -42 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Alternative Data Warehousing Architectures (e) Federated Architecture It uses all possible means to integrate analytical resources from multiple sources to meet changing needs or business conditions -43 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Alternative Data Warehousing Architectures Ten factors that potentially affect the architecture selection decision: 1. Information interdependence 6. Strategic view of the data between organizational units warehouse prior to 2. Upper management’s implementation information needs 7. Compatibility with existing 3. Urgency of need for a data systems warehouse 8. Perceived ability of the in- 4. Nature of end-user tasks house 5. Constraints on resources IT staff -44 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 9. Technical issues 2.4 DATA INTEGRATION AND THE EXTRACTION, TRANSFORMATION, AND LOAD (ETL) PROCESSES A decision maker typically needs access to multiple sources of data that must be integrated. As data warehouses grow in size, the issues of integrating data grow as well -45 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Integration Data integration comprises three major processes: Data access (i.e., the ability to access and extract data from any data source), Data federation (i.e., the integration of business views across multiple data stores), and Change capture (i.e., based on the indentification, cpature, and delivery of the changes made to enterprise data sources). -46 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Integration Techniques Enterprise application integration (EAI) Service-oriented architecture (SOA) Enterprise information integration (Ell) Extraction, transformation, and load (ETL) -47 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Extraction, Transformation, and Load (ETL) ETL is the heart of DW. IT managers are often faced with challenges because the ETL process typically consumes 70 percent of the time ETL is composed of Extraction: reading data from one or more databases, Transformation: converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse or simply another database, and Load: putting the data into datawarehouse -48 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall ETL The purpose of the ETL process is to load the warehouse with integrated and cleansed data. Several issues affect whether an organization will purchase data transformation tools or build the transformation process itself: Data transformation tools are expensive. Data transformation tools may have a long learning curve. It is difficult to measure how the IT organization is doing until it has learned to use the data transformation tools. -49 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall ETL The following are some of the important criteria in selecting an ETL tool: Ability to read from and write to an unlimited number of data source architectures Automatic capturing and delivery of metadata A histoty of conforming to open standards An easy-to-use interface for the developer and the functional user -50 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall ETL -51 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2.5 DATA WAREHOUSE DEVELOPMENT A data warehouse provides several benefits that can be classified as direct and indirect. Direct benefits include the following: Encl users can perform extensive analysis in numerous ways. A consolidated view of corporate data (i.e., a single version of the truth) is possible. Better and more-timely information is possible Enhanced system performance can result Data access is simplified. -52 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Indirect benefits enhance business knowledge, present competitive advantage, improve customer service and satisfaction, facilitate decision making, and help in reforming business processes -53 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Warehouse Development Approaches THE INMON MODEL: THE EDW APPROACH  Bill Inmon, who is often called "the father of data warehousing.  Inmon's approach emphasizes top-clown development,  The EDW approach does not preclude the creation of data marts. -54 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Warehouse Development Approaches THE KIMBALL MODEL: THE DATA MART APPROACH  Is "plan big, build small" approach.  A data mart is a subject-oriented or department-oriented data warehouse.  It is a scaled-down version -55 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Representation of Data in Data Warehouse Dimensional Modeling – a retrieval-based system that supports high-volume query access Star schema – the most commonly used and the simplest style of dimensional modeling  Contain a fact table surrounded by and connected to several dimension tables  Fact table contains the descriptive attributes (numerical values) needed to perform decision analysis and query reporting  Dimension tables contain classification and aggregation information about the values in the fact table Snowflakes schema – an extension of star schema where the diagram resembles a snowflake -56 in ©shape Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall Star Vs Snowflake Schema -57 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Analysis of Data in Data Warehouse OLAP versus OLTP OLTP (online transaction processing) is a term used for transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions The main focus is on efficiency of routine tasks OLAP (online analytic processing) A system is designed to address the need of information extraction by providing effectively and efficiently ad hoc analysis of organizational data The main focus is on effectiveness -58 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall OLTP Vs OLAP -59 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall OLAP Operations The main operational structure in OLAP is based on a concept called cube. A cube in OLAP is a multidimensional data structure (actual or virtual) that allows fast analysis of data. -60 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall -61 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall OLAP Operations Slice – a subset of a multidimensional array Dice – a slice on more than two dimensions Drill Down/Up – navigating among levels of data ranging from the most summarized (up) to the most detailed (down) Roll Up – computing all of the data relationships for one or more -62 dimensions Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall -63 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2.6 DW Implementation Issues The following is a list of major tasks that could be performed: 1. Establishment of service-level agreements and data-refresh requirements 2. Identification of data sources and their governance policies 3. Data quality planning 4. Data model design 5. ETL tool selection 6. Relational database software and platform selection 7. Data transport 8. Data conversion 9. Reconciliation process 10. Purge and archive planning -64 11. End-user Copyright support © 2011 Pearson Education, Inc. Publishing as Prentice Hall DW Implementation Guidelines Project must fit with corporate strategy & business objectives There must be complete buy-in to the project by executives, managers, and users It is important to manage user expectations about the completed project The data warehouse must be built incrementally Build in adaptability, flexibility and scalability The project must be managed by both IT and business professionals Only load data that have been cleansed and are of a quality understood by the organization Do not overlook training requirements Be politically aware -65 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Successful DW Implementation Things to Avoid Starting with the wrong sponsorship chain Setting expectations that you cannot meet Engaging in politically naive behavior Loading the data warehouse with information just because it is available Believing that data warehousing database design is the same as transactional database design Choosing a data warehouse manager -66 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall who is technology oriented rather than Successful DW Implementation Things to Avoid - Cont. Focusing on traditional internal record oriented data and ignoring the value of external data and of text, images, etc. Delivering data with confusing definitions Believing promises of performance, capacity, and scalability Believing that your problems are over when the data warehouse is up and running Focusing on ad hoc data mining and -67 periodic reporting instead of alerts Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2.7 Real-Time Data Warehousing Traditionally, DWs work mainly on historical data to support strategic and tactical decision making. For many business, making fast and consistent decisions across the enterprise requires real- time data warehousing. Decision support has become operational. Real-time data warehouse (RDW), also known as active data warehouse (ADW), is the process of loading and providing data via the data warehouse as they become available. -68 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Traditional Vs Real-Time Strategic decision only Strategic, tactic and Operational decisions Results sometimes hard to Results measured with measure operations Daily, weekly, monthly data Only comprehensive detailed currency acceptable; data available within minutes summaries often appropriate is acceptable Moderate user concurrency High number (1000 or more) of users accessing and querying the system simultaneously Highly restrictive reporting used to confirm or check Flexible ad hoc reporting, as existing processes and well as machine-assisted patterns; often uses modeling (e.g., data mining) to predeveloped summary tables discover new hypotheses and or data marts relationships Power users, knowledge -69 workers, internal Copyright © 2011 Operational users Inc. Publishing as Pearson Education, Prentice Hall staffs, call centers, external users Critical Concerns about real-time BI Not all data should be updated continuously. -70 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall -71 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2.8DW Administration and Security Issues Establishing effective corporate security policies and procedures. An effective security policy should start at the top, with executive management, and should be communicated to all individuals within the organization. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption technology. Limiting physical access to the data center environment. Establishing an effective internal control review process with an emphasis on security and privacy. -72 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall The Future of DW Sourcing… Open source software SaaS (software as a service) Cloud computing DW appliances Infrastructure… Real-time DW Data management practices/technologies In-memory processing (“super-computing”) New DBMS Advanced analytics -73 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data and Text Mining Data Warehouse  A data warehouse, an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.  Data warehouses are central repositories of integrated data from one or more disparate sources. OLTP&OLAP Data Mining Concepts and Applications  Six factors behind the sudden rise in popularity of data mining 1. General recognition of the untapped value in large databases 2. Consolidation of database records tending toward a single customer view 3. Consolidation of databases, including the concept of an information warehouse 4. Reduction in the cost of data storage and processing, providing for the ability to collect and accumulate data 5. Intense competition for a customer’s attention in an increasingly saturated marketplace 6. The movement toward the de-massification of business practices Data Mining Concepts and Applications  Data mining A process that uses statistical, mathematical, artificial intelligence and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases Data Mining Concepts and Applications  How data mining works  Data mining tools find patterns in data and may even infer rules from them  Data mining task can be classified into three main categories: 1. Prediction 2. Association 3. Clustering A Taxonomy for Data Mining Tasks Data Mining Concepts and Applications  Classification Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior Common tools used for classification are:  Neural networks  Decision trees  If-then-else rules Data Mining Concepts and Applications  Clustering Partitioning a database into segments in which the members of a segment share similar qualities  Association A category of data mining algorithm that establishes relationships about items that occur together in a given record Data Mining Concepts and Applications  Sequence discovery is the identification of associations over time  Visualization can be used in conjunction with data mining to gain a clearer understanding of many underlying relationships  Regression is a well-known statistical technique that is used to map data to a prediction value  Forecasting estimates future values based on patterns within large sets of data Data Mining Concepts and Applications  Hypothesis-driven data mining Begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition  Discovery-driven data mining Finds patterns, associations, and relationships among the data in order to uncover facts that were previously unknown or not even contemplated by an organization Data Mining Applications – Marketing – Computer – Banking hardware and software – Retailing and – Airlines sales – – Health care Manufacturing and production – Broadcasting – Brokerage and – Homeland securities security trading – Insurance – Government and – Police defense Data Mining Techniques and  Tools Data mining tools and techniques can be classified based on the structure of the data and the algorithms used:  Case-based reasoning  Neural computing  Intelligent agents  Genetic algorithms  Other tools  Rule induction  Data visualization Data Mining Project Processes Data Mining Project Processes Text Mining  Text mining Application of data mining to non- structured or less structured text files. It entails the generation of meaningful numerical indices from the unstructured text and then processing these indices using various data mining algorithms Text Mining  Text mining helps organizations:  Find the “hidden” content of documents, including additional useful relationships  Relate documents across previous unnoticed divisions  Group documents by common themes Text Mining  Applications of text mining  Automatic detection of e-mail spam or phishing through analysis of the document content  Automatic processing of messages or e-mails to route a message to the most appropriate party to process that message  Analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses Text Mining  Applications of text mining (cont)  Analysis of related scientific publications in journals to create an automated summary view of a particular discipline  Creation of a “relationship view” of a document collection  Qualitative analysis of documents to detect deception Text Mining  How to mine text 1. Eliminate commonly used words (stop-words) 2. Replace words with their stems or roots (stemming algorithms) 3. Consider synonyms and phrases 4. Calculate the weights of the remaining terms Web Mining  Web mining The discovery and analysis of interesting and useful information from the Web, about the Web, and usually through Web-based tools Web Mining Web Mining  Web content mining The extraction of useful information from Web pages  Web structure mining The development of useful information from the links included in the Web documents  Web usage mining The extraction of useful information from the data being generated through webpage visits, transaction, etc. Web Mining  Uses for Web mining  Determine the lifetime value of clients  Design cross-marketing strategies across products  Evaluate promotional campaigns  Target electronic ads and coupons at user groups  Predict user behavior  Present dynamic information to users Web Mining References  Business Intelligence: A Managerial Approach, 2011, Turban et al.  Business Intelligence: Data Mining and Optimization for Decision Making, 2010, Vercellis C.  Business Intelligence for Telecommunications, 2007, Pareek D.  Informatıon Technology for Management, 2015, Turban et al.

Use Quizgecko on...
Browser
Browser