Motivation For Artificial Intelligence In Management (PDF)

Part 1: Motivation Prof. Dr. Stefan Feuerriegel Institute of AI in Management LMU Munich https://www.ai.bwl.lmu.de Placeholder for organisational unit name / logo First...

Part 1: Motivation Prof. Dr. Stefan Feuerriegel Institute of AI in Management LMU Munich https://www.ai.bwl.lmu.de Placeholder for organisational unit name / logo First name Surname (edit via “Insert” > “Header & Footer”) | Sep 5, 2018 | 1 (edit in slide master via “View” > “Slide Master”) ABOUT OUR INSTITUTE Solving management problems with artificial intelligence (AI) PhD Assistant professor 1 2 3 1 Information 2 Innovation 3 Impact What defines We solve We develop new We evaluate the our research management algorithms from the added value of our problems of area of AI tools rigorously in relevance by using (statistics, computer management data science science, etc.) practice 2 ABOUT ME Example of our research with impact AI for health management 1 2 3 11 Effective police patrolling 2 Effective disease management 3 Early warnings for fake news in social media 3 OUR IMPACT Our impact during the current COVID-19 epidemic Data Modeling Impact ▪ Nationwide data on ▪ New artificial ▪ A tool for public decision-makers micro-level human intelligence real-time monitoring of mobility during the algorithm to link compliance with social distancing epidemic mobility and case ▪ Knowledge transfer through ▪ ~1.5 bn movements growth membership in the COVID-19 of individuals Working Group of the World Health Organization ▪ Dissemination to the public through media appearances (WEF blog, etc.) Persson, Parie, Feuerriegel @ PNAS 2021: Monitoring the COVID-19 epidemic with nationwide telecommunication data https://doi.org/10.1073/pnas.2100664118 4 ABOUT OUR INSTITUTE Combining AI technologies and real-world applications Management AI for decision- AI for Good AI & Web practice making Real-world AI Focus on Healthcare Digital traces demonstrations sequential settings applications Social media Field Causal ML (with medical data (e.g., fake experiments (e.g., sequential practitioners) news) Financial deconfounding, AI to support Clickstream data implications sequential ITE Sustainable Development Mobility data Organizational Off-policy learning (e.g., Goals (e.g., and behavioral sustainability, implications dynamic treatment inequality) (fairness, accountability, regimens) etc.) 5 ABOUT OUR INSTITUTE In joint industry collaborations, we strive for lasting impact in practice Companies Digital transformation ▪ Partnership for a Typical challenges that we address 3-year project Insight Data Insufficient data integration, missing policies for data use ▪ Full funding for a PhD position by Analytics No use of advanced analytics, often scattered and simple tools the company ▪ Partners: Software No analytics packages in place, software not user-friendly enough People Missing trust in data, no capabilities in using or designing reports Process Analytics not embedded into regular decision-making processes No overall strategy for using big data and advanced analytics for Strategy business decisions Decision 6 OUR RESEARCH We seek to publish in leading outlets from both artificial intelligence and domain applications Artificial intelligence Application domains ▪ Thought leadership in applied AI ensures ▪ Contributes to research that is relevant research that is rigorous ▪ Demonstrates impact in practice ▪ Provides state-of-the-art performance Example outlets Example outlets ▪ SIGKDD Conference on Knowledge ▪ PNAS Discovery and Data Mining (KDD) ▪ Management Science ▪ The Web Conference (WWW) ▪ Marketing Science ▪ EMNLP, ACL, CHI, … Research collaborations 7 VISION We see 3 challenges for bringing AI into business management Clarify boundaries of AI interventions Missing accountability of AI 01 decisions Implement governance structures Establish risk management frameworks Develop human-in-the-loop frameworks from a human-centered perspective Incomplete frameworks for 02 human-in-the-loop analytics Promote frameworks Promote frameworksforfor and exploration and exploration human learning human learning Advance prescriptive algorithms Appoint transformation workforce Encourage managers to explore and 03 Organizational inertia experiment Incentivize adoption Feuerriegel, Shrestha, von Krogh, Zhang: Bringing AI into business management 8 COURSE Course Overview ▪ 4 SWS / 6 ECTS ▪ Date: 4-day block course (adapted for online use) with optional Q&A ▪ Grading: ▪ Exam with programming (demanding!) focusing on implementing an analytics solution ▪ Constraints: BSc BWL ▪ Requirements: ▪ Programming skills and basic Maths (regression) ▪ Expectation: ▪ Practical application of machine learning is an integral element ▪ Course has little focus on mathematical proofs, rather intuition and overarching concepts ▪ Nevertheless, there is a strong method-focus 9 COURSE Books ▪ James, Witten, Hastie & Tibshirani. An Introduction to Statistical Learning: with Applications in R. Springer, 2013. PDF: http://www-bcf.usc.edu/~gareth/ISL/ ▪ Wickham & Grolemund. R for Data Science. O’Reilly, 2017. Online version: http://r4ds.had.co.nz/ 10 Homework / Reading 11 Course Outline ▪ Organizational details 1 Motivation ▪ Applications of business analytics ▪ Definition of machine learning 𝑦 = 𝑓𝜃 (𝑥) 2 Predictive ▪ Taxonomy of predictive modeling modeling ▪ Performance assessments Linear ▪ Linear model (ordinary least squares) Linear 𝑓: 3 modeling ▪ Regularization (lasso, ridge regression, elastic net) 𝑦 = 𝛼 + 𝛽𝑇 𝑥 ▪ Decision trees, random forest Non-linear 𝑓 4 Nonlinear modeling ▪ Boosting ▪ Neural networks ▪ Train/test split How 𝜃? 5 Model tuning ▪ Cross-validation Bringing to ▪ Management challenges 6 ▪ Pitfalls in practice practice 12 Mastering business analytics promises value creation Technologies of how information is collected, analyzed and visualization for achieving better decisions Understand Obtain business understanding in order to business define goals and derive KPIs Identify and collect internal and external Prepare data sources that potentially contain data interesting information Apply Apply model from predictive analytics and analytics evaluate performance for decision-making Related terms: predictive analytics, machine learning, artificial intelligence, … Placeholder for organisational unit name / logo | | 13 (edit in slide master via “View” > “Slide Master”) Illustrative Naïve predictions can fuel decision-making – but only sometimes Example: Forecasted sales volume → input to production quota Calibration phase Go-live ▪ Replicates observed patterns from past data Y = f(x1, …, xn) Sales volume with e.g. neural networks ▪ Leverages external predictors Time Placeholder for organisational unit name / logo | | 14 (edit in slide master via “View” > “Slide Master”) Predictive analytics anticipates outcomes for a given input Descriptive Business value / complexity Human input What happened? Diagnostic Human input Why did it happen? Data Decision Action Predictive Human input What will happen? Prescriptive Decision automation What should happen? ▪ Predictive analytics learns by “demonstration” and thus replicates past decisions (and errors) ▪ Prescriptive analytics promises to identify optimal decisions Placeholder for organisational unit name / logo | | 15 (edit in slide master via “View” > “Slide Master”) Prediction of churn is based on several input variables feeding statistical propensity models Different variables are identified… …to build stable predictive models Customer behavior 6 months prior to churning Continuous improvement of models through “learning” from actual customer behavior Bad debt (in days) 30 Actual churn Data-mart 25 Analytical Predicted churn model 3 5 0 0 0 -6 m. -5 m. -4 m. -3 m. -2 m. -1 m. churn Example 1: Logistic Regression Number of inbound calls to customer care Probability to churn 3 Logistic 2 regression model 1 1 Example 2: Decision tree 0 0 0 -6 m. -5 m. -4 m. -3 m. -2 m. -1 m. churn Calls >= 5 and bad debt >=30 and … Customer characteristics 6 months prior to churning Probability to churn Demographics Contact type … Calls “Slide Master”) COLLECT AND COMBINE DATA Banking example – A set of typical variables are considered in this context Category Current and dynamic (changes in …) Desired but usually hard to get Customer Specifics ▪ Age ▪ Changes in life stages, personal ▪ Gender and professional events (marriage, ▪ Marital status promotion) ▪ Household size ▪ Income and job type ▪ ZIP Code ▪ Segment (value/behavior) ▪ Tenure (customer since …) Product holding ▪ Product holdings (yes/no per product) ▪ … ▪ Average balances ▪ Number of accounts ▪ Total assets under management ▪ Total liabilities Product usage/transactions ▪ Number of transactions ▪ Transaction details, i.e., detailed ▪ Transaction volumes information on channel, purpose (e.g., ▪ Transaction channels recipient of transaction is other bank, ▪ Inflow vs. outflow luxury retailer, or utility) Contact history and other ▪ Date of last contact ▪ Online data: e.g., clicks on Internet ▪ Last offer (product or service) page, logins, origination pages, ▪ Last campaign and channel browser used, mobile used, … ▪ Inbound contacts (call center, etc,) Placeholder for organisational unit name / logo ▪ Internet logins/usage (web analytics) | | 18 (edit in slide master via “View” > “Slide Master”) EXAMPLE Predicting customers that respond positively to marketing campaigns Instead of treating simply all, treat only those that are “persuadable” Yes Persuadables Sure buyers Purchase if treated? A clear business KPI is needed in order to focus on “persuadebles” and No Lost causes Sleeping dogs not target those would’ve purchased anyway (“sure buyers”) No Yes Purchase if not treated? Solution are AI algorithms that directly predict to which subgroup a customer belongs, instead of targeting all Placeholder for organisational unit name / logo | | 19 (edit in slide master via “View” > “Slide Master”) Targeting the right customers at the right time to prevent churn for customers that show high likelihood of churning …churn early warning systems are implemented based on analytical propensity models… …which yield concrete customer interactions in the call center Propensity to churn (percent) AI use for customer lifetime management 100 Alert by early warning system Immediate The early warning system raises an alert because the action to prevent propensity to churn for that customer has gone above a 75 churn predefined threshold for the segment the customer belongs to 50 Customer put on watchlist Customer specific action The AI provides not only the churn probability but also the variables/reasons 25 The system can be enriched with traditional database information, e.g. CLV, segment, billing history, contracts, etc. -7 -6 -5 -4 -3 -2 -1 0 Months prior to churn Call with individualized script AI can make such predictions The customer is being called with the best available Management defines (and experiments) with thresholds for different actions script at the earliest possible time to avoid churn Best action might differ per customer segment, e.g. for the best customers The success rate of a call script can again be predicted the thresholds should be lower and the actions probably be more expensive (e.g. calling a customer instead of sending an email) Placeholder for organisational unit name / logo | | 20 (edit in slide master via “View” > “Slide Master”) Group work: Pitch your idea of business analytics! 21 EXAMPLE Managerial decision support for effective police management Crime risk Patrol unit Data Crime risk reduction estimation routing System periodically Officers on the ground Prevented crime identifies areas and follow instructions times of elevated risk Operator assesses and prioritizes alerts Theory Analytics Optimization 22 EXAMPLE Dynamic estimations of crime risk Historic crime Current practice: hotspot map Dynamic risk map Kadar, Maculan & Feuerriegel (2019): Public decision support for low population density areas. Decision Support Systems 23 EXAMPLE Data-driven risk model ℙ(crime𝑖𝑡 = 1) = 𝑓(spatial𝑖 , temporal𝑡 , crime𝑖,𝑡−1 ; 𝜃) Kadar, Maculan & Feuerriegel (2019): Public decision support for low population density areas. Decision Support Systems 24 EXAMPLE Effectiveness 25 PRIMER 26 MODELING BASICS There are 2 approaches to modeling – supervised and unsupervised learning, but only the former uses historical data to predict the future Y = f(X) X Y X Y Supervised learning Take … to predict Take most … to predict historical data what happened recent data … the future … in the past Time Today ▪ Unclear management implications of clusters Unsupervised ▪ No ground truth labels what a „true“ labels are learning ▪ Not designed to generalize to unseen instances Identify patterns in historical data … 27 Agenda ▪1 Models ▪2 Performance ▪3 Training & evaluation ▪4 Implementation ▪5 Managing AI 28 The general idea is to map input and output into a mathematical space Purchase volume Example: k-nearest neighbor approach Time since last purchase k=3 k=5 Churn ? Or no churn ? 29 Discriminatory approaches are needed that directly model the output variable ▪ The k-nearest neighbor approach has a number of limitations ▪ Uneven frequency of classes ▪ Robustness to noise ▪ Instead, discriminatory approaches are preferred 30 Model choice depends on the trade-off between prediction power and interpretability ▪ A plethora of classifiers exist but others Linear model have not been reliably superior 1 ▪ Unstructured data (e.g. images, text) require custom models Interpretability Random forest xglboost Recommendations for practice ▪ Small datasets (< 1000) are usually well served with linear models Neural network ▪ Large-scale datasets (1 mio observations) only benefit from neural networks Prediction power ▪ Ensembles: combine multiple classifiers but benefit is ~1% 1) www.kaggle.com 31 Linear model are straightforward to fit and interpret Linear models combine several predictors via an additive scheme ▪ x: predictor, feature regressor, independent variable ▪ y: outcome, prediction, label, dependent variable ▪ 𝛼, 𝛽: intercept and coefficients → needs to be estimated Example PRICE_WINE = 3 + 10 × ALCOHOL + … 32 Decision trees describe a flowchart-like structure for deriving predictions Example: predict whether you have can sell well icecream Follow tree downwards to arrive at a prediction At each node, choose appropriate branching Prediction 33 Combining decisions trees to yield a random forest Decision tree Random forest Flowchart-like structure to display decision Combines ensemble of trees by majority vote criterions Averages out errors and handles non-linearity End nodes of each branch indicate outcomes Pro: good out-of-the-box performance for Pro: easy to understand and apply prediction without parameter tuning Reduces complexity of Big Data x x... Cloudy Sunny rain no + rain y Common in explanatory tasks Highly suitable for prediction 34 Neural network are inspired by the human nervous system ▪ Artificial neural networks are stacked generalized linear models (= layers) ▪ Universal approximator theorem ensures that any f can be modeled Purchase volume Churn Time since last purchase No churn Layer Layer Layer Visual demonstration: https://playground.tensorflow.org Neuron z = A(w1 x1 + …. + wN xN) 35 Deep neural networks stack multiple hidden layers Challenge: finding an appropriate network architecture How many layers? Which type of layers? Which size of layers? Which optimizer? Which activation function? How to achieve speed? #epochs, batch size, … How to prevent overfitting? E.g. dropout, batch normalization, … Example: VGGNet classifier for predicting image content 36 Why deep learning has become so widespread Data influence on performance Example: Sales forecasting ~1 million samples Ng, A. (2016a). Machine Learning Yearning: Technical Strategy for AI Engineers, In the Era of Kraus, M., Feuerriegel, S., & Oztekin, A. (2019). Deep learning in business analytics and Deep Learning, Draft Version 0.5 operations research: Models, applications and managerial implications. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2019.09.018 37 Managing unstructured data requires tailored modeling approaches Images Text Spatial problems word0 word1 wordτ variable length ▪ Convolutional neural network (CNN) Natural language processing ▪ Gaussian processes (for sparse data ▪ Recurrent neural network (RNN) points) ▪ Long short-term memory (LSTM) ▪ Deep spatio-temporal residual ▪ Language models (e.g. BERT) network (for dense data) Kraus, Feuerriegel, Oztekin (2019): Deep learning in business analytics and operations research: Models, applications and managerial implications. EJOR. https://doi.org/10.1016/j.ejor.2019.09.018 38 Terminology ▪ Artificial intelligence ▪ In the old days: generating rules like „if X, then predict Y“ ▪ Nowdays, buzzwords considering all aspects of ML, but also other optimization techniques that are concerned with data-driven decision-making (Markov decison processes) ▪ Machine learning (ML) ▪ Supervised + unsupervised learning (next slides) => train, then deploy ▪ Reinforcement learning (sequential/continuous!) ▪ Predictive analytics ~= supervised machine learning ▪ Reinforcement learning, one of prescriptive analytics ▪ Statistical learning (specific stats-driven types of ML; like lasso, ridge, … mostly regression) 39 Agenda ▪1 Models ▪2 Performance ▪3 Training & evaluation ▪4 Implementation ▪5 Managing AI 40 In classification, performance is measured via the confusion matrix Customer Note: in regression, we simply compute the deviation | true Y – predicted Y | 41 Being aware of class imbalances is imperative for a rigorous performance evaluation 1 Imbalance-aware metrics Receiver operating characteristic & area under the curve F1 score Balanced accuracy → but more difficult to interpret 2 Compare classifier against majority vote as a naïve baseline 3 Where possible, translate confusion matrix into finanical KPI (= custom loss function) 42 Agenda ▪1 Models ▪2 Performance ▪3 Training & evaluation ▪4 Implementation ▪5 Managing AI 43 Training and evaluating models is based on the following 3-step standard process Split your existing data into a training and 1 test set 2 Use the training set to fit all model parameters Evaluate the performance on unseen data 3 via the test set Test error Out-of-sample performance → ensures that the model is tested according to its ability of generalization Rule-of-thumb: 80% for training; 20% for testing (or 90/10) 44 Train-test procedure avoids a phenomenon of overfitting, so that the model generalizes well to unseen data 45 46 47 48 Agenda ▪1 Models ▪2 Performance ▪3 Training & evaluation ▪4 Implementation ▪5 Managing AI 49 Still, most AI implementations are custom-made as few general-purpose tools exist ▪ There is no “one-fits-all” approach, as all implementations require problem-specific customizations Specifies ML ▪ Common are programming languages pipeline ▪ R (for prototyping) ▪ Python (scikit-learn, tensorflow, keras, PyTorch) ▪ Packages for others are being developed (e.g. Template-based wrapper for ML.NET) existing APIs ▪ Existing tools are limited in capabilities, e.g. ▪ No time series ▪ No merging with external data ▪ Low prediction performance 50 Software, platforms, and packages for business analytics Type of Prediction Task Available Software, Platforms, and Packages General Systems ▪ AzureML (Microsoft) ▪ WEKA ▪ Alteryx ▪ SAS/SPSS Preprocessing ▪ Dplyr ▪ Rmarkdown ▪ Pandas/ Numpy Machine Learning ▪ Sklearn ▪ Deep learning: Pytorch, Keras, Tensorflow ▪ R/Caret ▪ Julia Probabilistic ▪ Pyro ▪ Edward ▪ sklearn 51 Managing AI projects requires an iterative approach Cross Industry Standard Process for Data Mining (CRISP-DM) Requires interdisciplinary team with data translators Consumes usually 80% of the time Re-deployment beneficial if (A) substantially more data is available or (B) highly Consists of another dynamic environments iterative process 52 The step of modeling requires multiple iterations Modeling: example process Recommendations ▪ Align process for rapid deployment ▪ Build in-house experience ▪ Keep in mind that parallelism in in AI development is limited, thereby limiting scalability ▪ Leverage tools for reproducibility ▪ Follow a lean paradigm ▪ Priorities are on minimum viable products ▪ Pause process early as modeling is expensive ▪ High-performance is usually only achieved by experts EDA: explanatory data analysis Kuhn, Jonson: Feature engineering and selection: A practical approach for predictive models http://www.feat.engineering/ 53 Data is the most important thing, yet methods can tweak the last inches “Using the right data is more important than using the right modeling technique” – a practitioner's view Improving the model Improving the data 10% 1% vs Basic Advanced Basic Advanced Same data, better model Same model, better data Sufficient in 90% of the cases are linear regressions or tree-based models Spend your time with improving the data first, later try more sophisticated models 54 COLLECT AND COMBINE DATA The final customer view includes multiple sets of internal and external data Customer journey Digital marketing (display, (stages, needs, touch- paid search, affiliates, points) SEO) Purchase Behaviors (e.g., Online browsing (e.g., value, products purchased, Collect & combine visit, browse, conversion, longitudinal migrations) data from multiple feature usage) sources … Social (e.g., linkage of Mobile usage (e.g., value, Facebook ‘likes’, sources of shopping behaviors, traffic) feature usage) Ethnographies (e.g., … into an 3rd party payments (e.g., attitudes, perceptions, integrated 360 competitor shopping, what sources of shopping customer view they buy, how much they inspiration) spend, when) 55 Avoiding pitfalls with AI Before ▪ Ensure that the rules for good AI cases are all satisfied 1 implementation ▪ Define your expected / desired level of performance ▪ Implement a baseline (random guess, current performance level, etc.) 2 During ▪ Follow a rapid prototyping: a first model within the first day (similar to a implementation minimum viable product, otherwise “kill“ project) ▪ At a later stage, consider fusing additional predictors from public data 3 After deployment ▪ Consider re-training your model from time-to-time if you have a dynamic environment 56 Agenda ▪1 Models ▪2 Performance ▪3 Training & evaluation ▪4 Implementation ▪5 Managing AI 57 A holistic approach is required, yet facing various challenges Dimension 58 Personal experience on what makes successful AI implementations AI value chain Key principles Goal of the AI value chain is to identify and prioritize areas of improvement in analytics and big data along a structured framework Decision backwards Build the capability by starting with the business decisions you want to drive and working backwards Step-by-step Focus on specific topics and set each element in Matrix of AI value chain dimensions and place - a chain is only as strong as its weakest link application areas Application areas Test and learn Dimension Move from data to decision and from decision back to the data with which to measure the outcome 59

Motivation For Artificial Intelligence In Management (PDF)

Document Details

Tags

Related

Summary

Full Transcript