PREDICTIVE ANALYTICS-REVIEWER.docx

Document Details

RaptRetinalite2257

Uploaded by RaptRetinalite2257

Polytechnic University of the Philippines

Tags

predictive analytics data mining business intelligence statistics

Full Transcript

**PREDICTIVE ANALYTICS FINALS** **PREDICTIVE ANALYTICS** - - - - **Common use of Predictive Analytics in Business** 1. 2. 3. 4. 5. **DATA MINING DRAWS FROM:** 1. - Ex: Natural Languages Processing 2. - Ex: Employee Retention and Turnover 3. - 4. - 5...

**PREDICTIVE ANALYTICS FINALS** **PREDICTIVE ANALYTICS** - - - - **Common use of Predictive Analytics in Business** 1. 2. 3. 4. 5. **DATA MINING DRAWS FROM:** 1. - Ex: Natural Languages Processing 2. - Ex: Employee Retention and Turnover 3. - 4. - 5. 6. **TRADITIONAL TECHNIQUES** - **Common Approaches in Data Mining** - - - - - **BENEFITS OF PREDICTIVE ANALYTICS** - - - - - - - **DATA MINING** - +-----------------------------------+-----------------------------------+ | **PROS** | **CONS** | +===================================+===================================+ | Customer Relationship | Expensive In the initial stage | | | | | Management | | +-----------------------------------+-----------------------------------+ | Forecasting | Security of the Critical Data | +-----------------------------------+-----------------------------------+ | Competitive Advantage | Data mining violates | | | | | | user privacy | +-----------------------------------+-----------------------------------+ | Attract Customers | Lack of Precision or Incorrect | | | | | | Information | +-----------------------------------+-----------------------------------+ | Anomaly Detection | | +-----------------------------------+-----------------------------------+ **DATA MINING TECHNIQUES** 1. 2. 3. 4. 5. 6. 7. 8. **TYPES OF DATA MINING ALGORITHMS** 1. **Classification** - predicts categorical outcomes **Regression** - predicts numerical values **TRAINING DATASETS** - - **LARGER TRAINING DATASETS** - 2. **Association Analysis** - attempts to find the relationship between items in a dataset 1. 2. 3. 4. **Data Mining Tools-** are software usually downloaded for free or bought from third-party providers. 1. 2. 3. 4. 5. 6. **Cross-Industry Standard Process for Data Mining (CRISP-DM)** - - **7 ESSENTIAL STEPS OF THE DATA MINING PROCESS** - - - - - - - **SIX PHASES PROCESS/LIFE CYCLE OF DATA MINING PROCESS** I. - - **BUSINESS OBJECTIVES-** It should be determine to ensure the project addresses the right questions. II. - - 1. 2. 3. III. - - 1. 2. 3. 4. 5. IV. - - 1. 2. 3. 4. V. - - 1. 2. 3. VI. - - 1. 2. 3. 4. **STATISTICAL ANALYSIS** - - 1. 2. 3. 4. **TOOLS IN PREDICTIVE ANALYTICS** 1. 2. 3. - **PREDICTIVE ANALYTICS FRAMEWORK KEY STEPS:** 1. 2. 3. 4. 5. 6. **MODULE 2: INTRODUCTION TO DATA PREPROCESSING** "Data in the real world is Dirty" **DATA PREPROCESSING-** aims at assessing and improving the quality of data for secondary statistical analysis. 1. 2. 3. 4. **METHODS OF DATA PREPROCESSING** 1. - - - a. b. c. **DATA IMPUTATION-** It is a method for retaining the majority of the dataset's data and information by substituting missing data with a different value. 2. - - **3 MAJOR ISSUES TO CONSIDER DURING DATA INTEGRATION** 1. 2. 3. 3. - - - **DATA TRANSFORMATION TASKS** 1. A. B. 2. A. B. a. b. 3. 4. 5. 6. **2 TYPES OF DATA ENCODING** **BINARY ENCODING-** Transformation of categorical data into numerical data by taking the values 0 to 1 to indicate the absence or presence of each category. **CLASS-BASED ENCODING-** replace one categorical variable to one new numerical variable and replace each category of the categorical variable. 4. - - - A. - ***TYPES OF SAMPLING*** 1. 2. 3. 4. **B. FEATURE SUBSET SELECTION** - - **FEATURE SUBSET SELECTION TECHNIQUES** 1. 2. 3. 4. **C. FEATURE CREATION** - **FEATURE CREATION METHODOLOGIES** 1. 2. 3. 5. - - - +-----------------------------------+-----------------------------------+ | **DATA SMOOTHING TECHNIQUES** | **IMPORTANCE** | +===================================+===================================+ | Moving Averages | Identifying Trends | | | | | Exponential Smoothing | Removing Noise | | | | | Seasonal Smoothing | Handling Outliers | | | | | Holt-Winters Method | Improving Seasonal Forecasts | +-----------------------------------+-----------------------------------+ | | | +-----------------------------------+-----------------------------------+ | | | +-----------------------------------+-----------------------------------+ | | | +-----------------------------------+-----------------------------------+ **APPLICATION IN PREDICTIVE ANALYTICS** **Employee Retention**: It can identify employees who are likely to leave by analyzing indicators such as job satisfaction, engagement, tenure, and salary. This enables organizations to adopt customized retention strategies. **Recruitment** : Predictive models assist in finding individuals who are most likely to succeed in a specific role by analyzing their previous job performance, education, abilities, and other relevant factors. ***Data Preprocessing:*** - +-----------------------------------+-----------------------------------+ | **CHALLENGES** | **SOLUTIONS** | +===================================+===================================+ | Data Privacy Concerns | - | | | | | | | | | | | | - | +-----------------------------------+-----------------------------------+ | Bias in Data and Algorithms | - - | +-----------------------------------+-----------------------------------+ | Integratio Disparate Data Sources | - - - | +-----------------------------------+-----------------------------------+ | Skill Gaps in Data Analytics | - - - | +-----------------------------------+-----------------------------------+ **MODULE 3: SUPERVISED LEARNING** **SUPERVISED LEARNING** - - - +-----------------------+-----------------------+-----------------------+ | **SUPERVISED | **SIMILARITIES** | **UNSUPERVISED | | LEARNING** | | LEARNING** | | | Subsets of AI that | | | | use machine learning | | | | to solve problems | | | | using algorithm | | +=======================+=======================+=======================+ | Users labeled | | Doesn't use labeld | | training data | | data and often used | | | | to understand | | | | relationships with | | | | datasets | +-----------------------+-----------------------+-----------------------+ | Used to classify data | | | | or make predictions | | | +-----------------------+-----------------------+-----------------------+ **2 CATEGORIES OF SUPERVISED LEARNING** **I. CLASSIFICATION** - - - **ALGORTIHMS** a. - - - b. - - - c. - - d. - e. - - - f. - - **ADVANTAGES** **DISADVANTAGES** ----------------------- ------------------------- High Accuracy Complexity Handling Missing Data Lack of Interretability g. - - - - **TYPES OF ENSEMBLES** 1. - - - 2. - - - - - - - h. - - 1. 2. 3. i. - - - a. b. c. j. - - **HYPERLANE-** Identify the term used for the boundary that separates different classes of data points in an svm model **TYPES OF SVMs CLASSIFIERS** **Linear SVMS** - **Non-Linear SVMS** - **CLASSIFICATION MODEL EVALUATION** **MODEL EVALUATION-** a methodology that helps to find the best model that represents our data and how well that chosen model will work in the future. - **UNDERLIFTING** - - - **OVERLIFTING** - - - **II. REGRESSION** - - - **USES OF REGRESSION ANALYSIS** - - - - **INDICATOR VARIABLE-** a variable that assigns levels to the qialitative variable (also kown as dummy variable) **SIMPLE LINEAR REGRESSION MODEL** - **MULTIPLE LINEAR REGRESSION MODEL** - **REGRESSION MODEL EVALUATION** - **EVALUATION METRICS FOR REGRESSION MODELS** 1. 2. 3. 4. **III. INDICATOR VARIABLES** - - - **MULTICOLLINEARITY** - - - - **TYPES OF MULTICOLLINEARITY** 1. 2. **LOGISTIC REGRESSION** - - - - **MAXIMUM LIKELIHOOD ESTIMATION IN LOGISTICS REGRESSION** - - - - - **MODULE 4: UNSUPERVISED LEARNING** **SUPERVISED LEARNING-** Used labeled data to train models and make predictions. Ex: Classification and Regression **UNSUPERVISED LEARNING-** used unlabeled data to discover patterns or structures. Ex: Clustering and Dimensionality Reduction **ASSOCIATION RULE MINING** - - **MARKET BASKET ANALYSIS** - **ITEM SET** - - **SUPPORT COUNT** - **SUPPORT** - - **FREQUENT ITEM SET** - - **ASSOCIATIO RULE-** Implication expression of the form X -\> Y, where X and Y are itemsets **SOLVING ASSOCIATION RULE MINING PROBLEMS** **BRUTE-FORCE APPROACH (Simplest way to do it)** - **FORMULAS:** ***Support=Frequency (X,Y) / N*** ***Confidence=Frequency (X,Y) / Frequency (X)*** **TWO-STEP APPROACH** +-----------------------------------+-----------------------------------+ | **FREQUENT ITEMSET GENERATION** | **RULE** | | | | | | **GENERATION** | +===================================+===================================+ | Generate all itemsets whose | Generate high-confidence rules | | support \> minsup | from each frequent itemset, where | | | each rule is a binary | | | partitioning of a frequent | | | itemset. | +-----------------------------------+-----------------------------------+ | Computationally Expensive | | +-----------------------------------+-----------------------------------+ **APRIORI PRINCIPLE** - - **RULE GENERATION** - **LIFT RATIO** - - **SEQUENCIAL PATTERN MINING** - - **SEQUENCE** - - **SUBSEQUENCE** - - **THE SPADE ALGORITHM** **(SEQUENTIAL PATTERN DISCOVERY USING EQUIVALENT CLASS)** - - - - **APPLICATION OF THE SPADE ALGORITHM** +-----------------------+-----------------------+-----------------------+ | **DATA SET** | **SEQUENTIAL DATA** | **BENEFITS** | +=======================+=======================+=======================+ | -Employee ID | -Equivalence Classes | -Career Development | | | | | | -Job Title | -Frequent Pattern | -Talent Management | | | Mining | | | -Department | | -Resource Allocation | | | -Analysis | | | -Date of Promotion | | | | | -Decision-Making | | | -Salary | | | | | | | | -Training Programs | | | | Attended | | | +-----------------------+-----------------------+-----------------------+ **HIERARCHICAL CLUSTERING** - - **DENDOGRAM-** a tree like diagram that records the sequence or merges or splits **STRENGTHS OF HIERARCHICAL CLUSTERING** - - **DENDOGRAM TREE-** visual representation that uses differernt branches to show the relationship **2 MAIN TYPES OF HIERARCHICAL CLUSTERING** +-----------------------------------+-----------------------------------+ | **AGGLOMERATVE HIERARCHICAL | **DIVISIVE HIERARCHICAL | | CLUSTERING (BOTTOM-UP)** | CLUSTERING ( TOP-DOWN)** | +===================================+===================================+ | -Start with every data point as | -Start with all data points in | | its own cluster. | one big cluster | | | | | -Gradually combine the closest | -Gradually split the big cluster | | clusters together, step-by-step | into smaller ones, step-by-step | | until everything is in one big | until point is its own cluster. | | cluster | | | | | | -most common type of hierarchical | | | clustering that is used to group | | | objects in clusters | | +-----------------------------------+-----------------------------------+ **CLUSTER SIMILARITY: WARD'S METHOD** - - - - **HIERARCHICAL CLUSTERING: PROBLEMS AND LIMITATIONS** - - - a. b. c. **TEXT MINING** - - - - **NATURAL LANGUAGES PROCESSING (NLP)** - - **TEXT MINING TERMINOLOGIES** **UNSTRUCTURED OR SEMI-STRUCTURED DATA** - **CORPUS (CORPHA)** - **STEMMING** - **STOP WORD** - **TERM** - **TOKENIZING** - **TEXT-BY-DOCUMENT MARIX** - **3 STEPS IN TEXT MINING PROCESS** **STEP 1** Establish the Corpus Collect and organize the specific unstructured data ------------ --------------------------------------------- ----------------------------------------------------- **STEP 2** Create Term-Document Matrix (TDM) Introduce structure to the corpus **STEP 3** Extract Knowledge from term-Document Matrix Discover Novel patterns from the T-D matrix **SOCIAL MEDIA SENTIMENT ANALYSIS** - - **TEXTUAL INFORMATION** - - **2 MAIN TYPES OF TEXTUAL INFORMATION** - - **SENTIMENT ANALYSIS OR OPINION MINING** - - **IMPORTANCE OF OPINIONS** - - - - **WEB AND USER-GENERATED CONTENT IN THE RISE OF SENTIMENT ANALYSIS** - **KEY CONCEPTS OF SOCIAL MEDIA SENTIMENTS ANALYSIS** **OPINION-** The subject statement or sentiment expressed by a user about a particular subject or entity. **TARGET-** The specific subject or aspect of the entity that the opinion is directed towards. **OPINION HOLDER-** The individual or entity who expresses the opinion SOCIAL MEDIA SENTIMENT ANALYSIS INVOLVES EXTRACTING AND UNDERSTANDING EMOTIONS OR OPINIONS EXPRESSED IN SOCIAL MEDIA POSTS. **SENTIMENT CLASSIFICATION-** Determining whether the sentiment of a text is positive, negative, or neutral. **EMOTIONA DETECTIION-** Identifying specific emotions like joy, anger, or sadness. **APPLICATION OF SOCIAL MEDIA SENTIMENT IN HR FIELD** - - -

Use Quizgecko on...
Browser
Browser