Machine Learning for Business Analytics 2024 PDF

Summary

This presentation from 2024 covers machine learning for business analytics. It discusses the concepts of Business Analytics, Data Science, and Machine Learning, along with related topics.

Full Transcript

Machine Learning for Business Analytics 2024 Docenten: Dr. Marc Hilbert, Dr. Andrii Kleshchonok Voertaal: Engels Machine Learning for Business Analytics 1 INTERNAL  Business...

Machine Learning for Business Analytics 2024 Docenten: Dr. Marc Hilbert, Dr. Andrii Kleshchonok Voertaal: Engels Machine Learning for Business Analytics 1 INTERNAL  Business Analytics in the data-driven world Business Analytics Data Science “makes extensive use of analytical “is an inter-disciplinary field that uses modeling and numerical analysis, scientific methods, processes, including explanatory and predictive algorithms and systems to modeling, and fact-based management extract knowledge and insights from to drive decision making” many structural and unstructured data” Machine Learning “is set of methods which ‘learn’ through data a specific task. It is closely linked to statistics and optimization” 2 BA in the data-driven businesses Artificial Intelligence (AI) is a broad definition of intelligence displayed by machines. AI has various subfields which centre around specific methods and tasks: e.g. Reasoning, Knowledge Representation, Planning and Learning. C3PO and R2D2 of Star Wars Machine Learning is set of methods which ‘learn’ through data a specific task. It is closely linked to statistics and optimization. The methods are rapidly developing and include e.g. linear regression, decision trees, neural networks, k-means clustering. Neural Networks Autonomous Driving Subset of Machine Learning based on connected neuron Neural Networks is one method of machine approximated by simple learning which is capable of dealing with complex machine learning tasks while in functions the need of a large amount of data. It can be used for e.g. computer vision, natural language processing. Identifying handwritten numbers and letters 3 The Data Science Pyramid of Needs https://medium.com/hackernoon/the-ai-hierarchy-of-needs-18f111fcc007 4 Why machine learning is relevant now? Machine Learning methods are continuously researched and developed. Large models for example Deep Neural Networks handle millions of parameters. Pre-trained networks for common tasks are available. IT Hard- and Software provide the relevant resources to process and deploy this data to a cheaper price. Cloud environments offer almost un-limited computation power. Machine Learning can be realized without coding in very few steps. Significant more data is collect through Digitalization and Automation. The amount feature and quality is still increasing. x x x x 1 2 3 4 1 2 ▪ Sufficient number of observations (samples) for each feature. 3 observations 4 5 7 8 5 Machine Learning Task: Find a linear relationship between x and y observations, to be able to predict future outputs on unseen inputs. Data: Data set of measurements including in- and output. Method: Linear regression. = During training the parameter of the model Target are searched so that it has the lowest error. Input x x Choosing the error measurement and x 80% x x Data Training Model x method which fits the data is relevant. For Data x x common task such as Computer Vision pre- Model parameter trained models are available. Training optimization = Output During testing the trained model with fixed 20% x Test Model ?= parameters is evaluated with representative Data Input unseen data to estimate it’s performance. Target x Testing = Input During deployment the model with fixed Predictions x Unseen Model parameters is run in the final software. Data x x x Deployment 6 *The process is simplified and in practice intermitted steps need to be done to guarantee a correct result 𝑦 𝑥𝑦 𝑦 𝑥 𝑥 𝑚 𝑚 𝑚 𝑥 𝑥 𝑥 Examples 7 Features of Business Analytics 1. Task of BA: providing decision support for specific goals defined in the context of business activities 2. Foundation of BA: relies on empirical information based on data 3. Realization of BA: must be realized as a system using the actual capabilities in information and communication technologies 4. Delivery of BA: deliver information at the right time to the right people in an appropriate form 8 Questions BA can answer Manufacturing How frequently are the same inspection errors being logged year over year? Can we eliminate the number of touches in the supply chain process to reduce failures? Marketing and sales What are our top three most profitable customer segments? Do certain regions have an affinity for purchasing our products? Professional Services Is the business on track to meet sales objectives this quarter? Can I demonstrate with data 5 ways I have been effective and 5 areas of improvement? 9 When to Use Machine Learning When the problem is related to data When to problem is to complex for coding When the problem is constantly changing When it is a perceptive problem When it is a unstudied phenomenon When the problem has a simple objective When Not to use ML When it needs to be completely explainable Cost of an error is high Getting the right data is hard The operational environment can’t handle ML solutions 10 Cost of Machine Learning Influencing driver s of costs are: Complexity (existing algorithms, libraires?) Data (data exists, annotated, quantity?) Accuracy (cost of wrong prediction, lowest accuracy level?) 11 Experience from Machine Learning Projects Know Unknowns: Progress: Quality is attainable First fast decrease in model error, later Quantity of data is needed progress slows down - Nonlinear progress Which features are relevant Measure the progress and communicate Model size constraints and risks Experiment setup Simplifying the Problem: Solve the simple points first Separate the problem into simpler problems 12 ML Template The number one reason why projects fail is a misunderstanding of objectives between business and analyst ML Template: ‣ Clarifies Task, Data, Method, Evaluation and Hypothesis ‣ Structures the idea of the project ‣ Documents the approach for all participants ‣ Provides a starting point for the data scientists 13 ML Template Task: Describes a business problem which needs to be solved. It is written form the perspective of the business. Data: Describes the data which is used to solve task. It includes a description of information included in the data, the data format and how the data can be accessed. Method: Describes the high-level machine learning/analytics approach which is used with the data to solve the task. It is written form the perspective of the data scientist. Evaluation: Clarifies how the solution to the task is evaluated and how success is defined. Hypothesis: Is a list of hypothesis by the data analyst which uses data and the method to be accepted or rejected. The list represents intermediate steps towards the solution of the task. 14 Typology of Tasks and their Goals Descriptive goals generate a summary description 1. Reporting 2. Segmentation Who are my customers? 3. Detect interesting behavior Predictive goals predict the behavior of instances of the business process 1. Regression 2. Classification What are their needs? Understanding goals support stakeholders in understanding business processes 1. Process identification 2. Process analysis How am I doing? 15 ML Template Task Hypothesis What are the top categories of products we sell next year? Data Sales data - number of items per category of the last years Data stored as file extract from the data warehouse Method Learning the trend of the data from the last years and extrapolating them to the next year Evaluation Splitting the data into years from which we learn and the last years we keep for testing https://amzscout.net/blog/best-items-to-sell-on-amazon/ 16 ML Template ‣ Each Hypothesis describes how to achieve a goal stated in the task. ‣ Each hypothesis focuses on a single improvement. Either we add new data or improve a method. We should avoid hypotheses a ec ng data and a method at the same me. Otherwise, it is hard to understand what drives an improvement. ‣ It is best to start with simple and later develop more complex hypotheses. Establish a baseline with the simplest data and method and itera vely move on to the more complex ones. ‣ Each hypothesis requires acceptance criteria. Template: Using [data] with [method] will ensure the [evalua on] to be [value]. Ash Urazbaev “Managing Data Science Products and Projects with Lean Data Science” https://leands.ai/ 17 ti ti ff ti ti ML Template ‣ A hypothesis helps to focus on the result. ‣ Hypothesis-based work is easy to schedule and evaluate. ‣ Proper hypothesis processing allows one to articulate outcomes of the experiments clearly, reduce the number of errors, and maintain accurate documentation. ‣ Make the process transparent for all the stakeholders. ‣ The team can split one product hypothesis into several and work in parallel. Ash Urazbaev “Managing Data Science Products and Projects with Lean Data Science” https://leands.ai/ 18 ML Template Task Hypothesis What are the top categories of products we sell next year? Using just the sales data with linear regression will ensure the correct trend of the prediction. Data Using quarterly data of each year with linear regression will ensure quarterly increased sales in addition to the Sales data - number of items per category of the last years general trend per year. Data stored as file extract from the data warehouse Method Learning the trend of the data from the last years and extrapolating them to the next year Evaluation Splitting the data into years from which we learn and the last years we keep for testing https://amzscout.net/blog/best-items-to-sell-on-amazon/ 19 ML Template Task Hypothesis What should be the next movie we should buy to sell on the ? platform? Data ? Method ? Evaluation ? 20 Machine Learning Lifecycles A Machine Learning lifecycle indicates the iterative steps taken to build, deliver and maintain any data-driven product. All ML projects are not built the same, so their life cycle varies as well. Different version exits: ‣ CRISP-DM (Cross-Industry Standard Process for Data Mining) ‣ KDD (Knowledge Discovery in Databases) ‣ SEMMA (Sample, Explore, Modify, Model, and Assess) ‣ OSEMN (Obtain, Scrub, Explore, Model, iNterpret) ‣ TDSP (Team Data Science Process) 21 Cross-industry standard process for data mining - CRISP-DM CRISP-DM breaks the process into six major phases. The sequence of the phases is not strict and moving back and forth between di erent phases. The arrows in the process diagram indicate the most important and frequent dependencies between phases. The process con nues a er a solu on has been deployed. The lessons learned during the process can trigger new, o en more focused business ques ons, and subsequent data mining processes will bene t from the experiences of previous ones. 22 ff ti ti ft ft fi ti AI Engineering - Model lifecycle Business Goal Data collection Feature Model Model Training Problem De inition & preparation Engineering Evaluation Model Model Model Model Serving Maintenance Monitoring Deployment AI Engineering 23 f AI Engineering - Model lifecycle Business Goal Data collection Feature Model Model Training Problem De inition & preparation Engineering Evaluation Model Model Model Model Serving Maintenance Monitoring Deployment “All-in-One” 24 f Cross-industry standard process for data mining - CRISP-DM Task ML lifecycles, such as CRISP- DM, KDD, TDSP or other processes, Data have much in common and can be adapted to: ‣ Team size Evaluation ‣ Analytics task/ project scope ‣ Organization Method ‣ Existing pipelines and tools Hypothesis 25 BI TOOLS 26 TOOLS - SPSS Advantages: Good user interface Easy to start Disadvantages: High costs Licenses system 27 TOOLS - PYTHON Advantages: Scalability Big community Disadvantages: Not strong in explanatory analysis Programming skills needed 28 TOOLS - R Advantages: Free Big community for empirical methods Disadvantages: Can be slow Steep learning curve 29 TOOLS 30 TOOLS - FOUNDATION MODELS - LLM Advantages: Create code and pictures Automated contend creation Disadvantages: Higher competitiveness Uniqueness 31 TOOLS The course will use the CoLab environment for examples and assignments. This is the place were you can work on the data from your browser without the need of installing anything on your computer (google account needed): https://colab.research.google.com/ 1. Get familiar with CoLab: https://www.youtube.com/watch?v=inN8seMm7UI&ab_channel=TensorFlow 2. Be able to open a notebook. 3. Be able to execute python code: https://colab.research.google.com/github/data-psl/lectures2020/ blob/master/notebooks/01_python_basics.ipynb 32 ASSIGNMENT 1 Using the article "Competing on Analytics” by Thomas H. Davenport, select a company of your choice and analyse how it leverages analytics to gain a competitive advantage. The goal is to demonstrate your understanding of how companies "compete on analytics" and apply this knowledge to a real-world example. (End 29. Oct) 33

Use Quizgecko on...
Browser
Browser