Manufacturing Data Analytics PDF

Manufacturing Data Analytics DATA-DRIVEN DECISION MAKING | 데이터 기반 의사결정 Manufacturing Data Analytics Decision making(theory) - A thought process of selecting a course of action among several alternatives Who is a manager? - Own an authority to make decision » Responsible for the consequences - Successful management → Making a proper/effective decision » Good manager → Own an ability to realize the successful management 2 DATA-DRIVEN DECISION MAKING | 데이터 기반 의사결정 Manufacturing Data Analytics Data-driven decision making (DDDM) - The process of using data to inform your decision-making process and validate a course of action before committing to it » Evidence vs. “Intuition” & “Experience” » Weakness: Creativity, Lack of “Intuition & Experience”, Data availability What are the feasible decisions for optimal results? Prescriptive Optimization (최적의 결과를 위해 실행 가능한 의사결정은 무엇인가?) Analytics What will happen in the future? Predictive Modeling (미래에 무슨 일이 발생할 것인가?) Predictive Will the current trends continue? Analytics Forecasting (현재 관측되는 추세가 계속될 것인가?) What are the causes of such problem? Statistical Analysis (이런 문제가 발생한 원인은 무엇인가?) What actions are needed NOW? Alerts (현재 필요한 액션은 무엇인가?) Descriptive Where exactly is the problem located? Analytics Query Drilldown (문제가 정확히 어디에 있는 것인가?) How much, how often, and where are problems occurring? Ad-hoc Reports (얼마나 많이, 자주, 어디에서 문제가 발생하는가?) What just happened? Status Standard Reports (무슨 일이 일어난 것인가?) Report 3 DATA ANALYTICS | 데이터 애널리틱스 Manufacturing Data Analytics Process of data analytics project [ Source: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview ] 4 DATA ANALYTICS | 데이터 애널리틱스 Manufacturing Data Analytics Process of data analytics practice Data acquisition & Feature Engineering UI/UX System Integration Support Model Evaluation & Deployment Feasibility Checking Model Searching + ML/DL/RL programming [ Source: https://www.ibm.com/blogs/think/2018/02/data-science-design/ ] 5 SMART MANUFACTURING | 스마트 생산 공정 Manufacturing Data Analytics Manufacturing - The process of converting raw materials into finished products through the use of labor, machinery, tools, and chemical or biological processing or formulation. » Key ingredient of manufacturing industry Smart Manufacturing - The integration of advanced data analytics, artificial intelligence (AI), and information and communication technologies (ICT) into manufacturing processes to improve efficiency, productivity, and flexibility. - List of sensors from ICT: » Temperature, Pressure, Proximity, Force, Flow, Smoke, Motion/Position (Tabular, numeric) » Optical/Infrared (Tabular & Vision) » Image (Vision) » Noise (Sound) 6 SMART MANUFACTURING | 스마트 생산 공정 Manufacturing Data Analytics [ Source: https://www.mckinsey.com/capabilities/operations/our-insights/industry-40-reimagining-manufacturing-operations-after-covid-19 ] 7 SMART MANUFACTURING | 스마트 생산 공정 Manufacturing Data Analytics Services pipeline (upper) and task formulation (lower) + Yield Prediction [ Source: DOI:10.1109/JPROC.2021.3056006 ] 8 RUL ESTIMATION | 잔여 수명 추정 Manufacturing Data Analytics Health Indicator-based remaining useful lifetime (RUL) estimation [ Source: https://ieeexplore-ieee-org-ssl.access.hanyang.ac.kr:8443/document/4711421/ ] 9 RUL ESTIMATION | 잔여 수명 추정 Manufacturing Data Analytics Health Indicator-based remaining useful lifetime (RUL) estimation - [RQ1] How to define a health indicator? » Physics-driven / Data-driven - [RQ2] How to model the degradation (열화/저하) pattern? » Physics-driven / Data-driven [ Source: https://www.mathworks.com/company/technical-articles/three-ways-to-estimate-remaining-useful-life-for-predictive-maintenance.html ] 10 RUL ESTIMATION | 잔여 수명 추정 Manufacturing Data Analytics Health Indicator-based remaining useful lifetime (RUL) estimation - [RQ1] How to define a health indicator? » Physics-driven / Data-driven - [RQ2] How to model the degradation (열화/저하) pattern? » Physics-driven / Data-driven [ Source: https://www.sciencedirect.com/science/article/pii/S0306261923002192 ] 11 RUL ESTIMATION | 잔여 수명 추정 Manufacturing Data Analytics Health Indicator-based remaining useful lifetime (RUL) estimation - [RQ1] How to define a health indicator? » Physics-driven / Data-driven - [RQ2] How to model the degradation (열화/저하) pattern? » Physics-driven / Data-driven [ Source: https://www.sciencedirect.com/science/article/pii/S0306261923002192 ] 12 RUL ESTIMATION | 잔여 수명 추정 Manufacturing Data Analytics Health Indicator-based remaining useful lifetime (RUL) estimation - [RQ3] What if there exists a scarcity in failure data? (e.g., space launch vehicle) » Physics-informed data generation [ Source: https://arxiv.org/pdf/2304.11702 ] 13 FAULT PREDICTION | 불량 예측 Manufacturing Data Analytics Image-based wafer defect identification (semiconductor) - [RQ1] How to choose a proper model? » (Probably) CNN-based Classification Problem - [RQ2] How to deal with the data imbalance issue? » (Under/Over) Sampling » Model modification » Data augmentation (synthetic data generation) [ Source: https://ieeexplore.ieee.org/document/9093073 ] 14 FAULT PREDICTION | 불량 예측 Manufacturing Data Analytics Manufacturing sensor data-driven anomaly detection - [RQ3] What if there is no or lack of labeled data? » Anomaly detection (Semi- or Unsupervised learning) [ Source: https://ieeexplore.ieee.org/document/10318085 ] 15 FAULT PREDICTION | 불량 예측 Manufacturing Data Analytics Manufacturing sensor data-driven anomaly detection - [RQ3] What if there is no or lack of labeled data? » Anomaly detection (Semi- or Unsupervised learning) chain resistance upset butt welding 체인 저항 맞대기 용접 [ Source: https://www.sciencedirect.com/science/article/pii/S1526612521004187?casa_token=1jLzJeQ8G5gAAAAA:XTP3a5fJIRQFr8OJAmVMD1Wr8IhedpJdydDm9tX0zoSeVydGgi9jAx4-ougZRs5O_h1IBYgesqyb ] 16 FAULT PREDICTION | 불량 예측 Manufacturing Data Analytics Prediction of Highly Imbalanced Semiconductor Chip-Level Defects in Module Tests - [RQ4] What if there exists more than one data type for a single purpose? [ Source: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10144380 ] 17 FAULT PREDICTION | 불량 예측 Manufacturing Data Analytics Prediction of Highly Imbalanced Semiconductor Chip-Level Defects in Module Tests - [RQ4] What if there exists more than one data type for a single purpose? » Multimodal machine learning [ Source: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10144380 ] 18 FAULT PREDICTION | 불량 예측 Manufacturing Data Analytics Prediction of Highly Imbalanced Semiconductor Chip-Level Defects in Module Tests - [RQ5] Which factor exerts the most significant influence on the occurrence of defects? » Explainable A.I. (LIME, SHAP) [ Source: https://shap.readthedocs.io/en/latest/ ] 19 YIELD PREDICTION | 수율 예측 Manufacturing Data Analytics Wafer yield prediction [ Source: https://semiconductor.samsung.com/us/support/tools-resources/dictionary/semiconductor-glossary-yield/ ] 20 YIELD PREDICTION | 수율 예측 Manufacturing Data Analytics Wafer yield prediction - [RQ1] What is the type of data? » Time-series (Sequential) & Non-time-series (Non-sequential) [ Source: https://ieeexplore.ieee.org/document/9269970 ] 21 YIELD PREDICTION | 수율 예측 Manufacturing Data Analytics Wafer yield prediction - [RQ1] What is the type of data? » Time-series (Sequential) & Non-time-series (Non-sequential) [ Source: https://ieeexplore.ieee.org/document/9269970 ] 22 YIELD PREDICTION | 수율 예측 Manufacturing Data Analytics Wafer yield prediction - [RQ1] What is the type of data? » Time-series (Sequential) & Non-time-series (Non-sequential) [ Source: https://www.sciencedirect.com/science/article/pii/S2405896322020377 ] 23 CONCLUDING REMARKS Manufacturing Data Analytics Manufacturing data analytics requires careful consideration of the following issues: - What is the business question? » RUL Estimation, Fault/Yield prediction, and more - What is a proper ML/DL/RL model for the question and issue? » Supervised: Regression, Classification » Semi-supervised: Regression, Classification, Anomaly detection » Unsupervised: Anomaly detection - What are key issues? » Physics/Data-driven modeling » Lack of failure data » Data Imbalance » Absence of labeled data » Multimodality » Model explainability » Time-series or non-time-series or BOTH 24 NEXT? Manufacturing Data Analytics ML for Manufacturing Data - Isolation forest (iForest) for anomaly detection - Entropy-based fuzzy support vector machine (EFSVM) for imbalanced data classification - SHapley Additive exPlanations (SHAP) for model explainability Time-series Models - Autoregressive Moving Average (ARMA) - Recurrent Neural Network (RNN) - Long Short-term Memory (LSTM) - Gated Recurrent Unit (GRU) 25 ML for Manufacturing Data MANUFACTURING DATA ANALYTICS ML for Manufacturing Data Manufacturing data analytics requires careful consideration of the following issues: - What is the business question? » RUL Estimation, Fault/Yield prediction - What is a proper ML/DL/RL model for the question and issue? » Supervised: Regression, Classification » Semi-supervised: Regression, Classification, Anomaly detection » Unsupervised: Anomaly detection - What are key issues? » Physics/Data-driven modeling » Lack of failure data » Data Imbalance » Absence of labeled data » Multimodality » Model explainability » Time-series or non-time-series or BOTH 27 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Anomaly (outlier) detection - The identification of observations, events or data points that deviate from what is usual, standard or expected, making them inconsistent with the rest of a data set » Anomaly in machine / product quality... » Semi-supervised or Unsupervised Learning Data No. Feature x Feature y Feature z 1 2.3 1.1 2.6 2 3.4 2.9 1.3 3 1.1 3.1 2.7 4 2.8 2.2 1.5 5 2.7 0.7 1.8 6 2.0 2.0 3.8 ⋮ ⋮ ⋮ ⋮ [ Source: https://www.semanticscholar.org/paper/An-angle-based-subspace-anomaly-detection-approach-Zhang-Lin/2c193a2e1aab048408a9137175c8237ed3297862 ] 28 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Anomaly (outlier) detection - The identification of observations, events or data points that deviate from what is usual, standard or expected, making them inconsistent with the rest of a data set » Anomaly in machine / product quality... » Semi-supervised or Unsupervised Learning [ Source: https://www.semanticscholar.org/paper/An-angle-based-subspace-anomaly-detection-approach-Zhang-Lin/2c193a2e1aab048408a9137175c8237ed3297862 ] 29 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Anomaly (outlier) detection - The identification of observations, events or data points that deviate from what is usual, standard or expected, making them inconsistent with the rest of a data set » Anomaly in machine / product quality... » Semi-supervised or Unsupervised Learning [ Source: https://encord.com/blog/top-tools-for-outlier-detection-in-computer-vision/ ] 30 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Isolation forest (iForest) - Binary decision tree-based anomaly detection via isolating a single instance » One of the simplest but powerful ML algorithm in practice » iForest can be used to ① anomaly detection, ② outlier elimination [ Source: https://wiki.datrics.ai/isolation-forest-model ] 31 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Decision tree (revisited) - Supervised learning for regression & classification - A method to narrow down the subject by asking binary questions » The structure of the model has the shape of a tree Root Node 𝑌 𝑥1 (Color) 𝑥2 (Height) Giraffe Yellow 10 Intermediate Node Elephant Gray 10 Tiger Yellow 4 Giraffe Yellow 10 Monkey Brown 1 Elephant Gray 10 Tiger Yellow 3 Giraffe Yellow 10 Terminal Node [ Source: https://www.simplilearn.com/tutorials/machine-learning-tutorial/decision-tree-in-python ] 32 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Decision tree (revisited) - Regression example: 𝑘1 𝑘3 𝒙𝟏 ≤ 𝒌 𝟏 𝟑 𝟐 𝟏 True False 𝒄𝟓 = 𝟑+𝟐+𝟕 = 𝟒 𝑮𝟓 𝟕 𝟑 𝑮𝟐 𝑘4 𝒙𝟐 ≤ 𝒌 𝟐 𝒙𝟏 ≤ 𝒌 𝟑 𝑮𝟑 True False True False 𝒙𝟐 𝑘2 𝑮𝟒 𝑮𝟏 𝑮𝟐 𝑮𝟑 𝒙𝟏 ≤ 𝒌 𝟒 𝒏=𝟐 𝒏=𝟒 𝒏=𝟓 𝑮𝟏 True False 𝒙𝟏 𝑮𝟒 𝑮𝟓 𝒏=𝟔 𝒏=𝟑 33 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Decision tree (revisited) - Regression example: » General equation for prediction with 𝑀 number of splits can be defined as follows: - The optimal division can be obtained by minimizing the following cost function: » where | ⋅ | indicates the cardinality (number of cardinal member) of a set. 34 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Decision tree (revisited) - Regression example: » The optimal splitting variable (𝑗) and the splitting point (𝑠) are evaluated via grid search 35 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Random forest (revisited) - Ensemble w/ decision tree » Powered by bagging (Bootstrap Aggregating) and random subspace 36 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Random forest (revisited) - Bootstrap Sampling » Sampling w/ replacement 37 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Random forest (revisited) - Result Aggregating » E.g., Majority voting in binary classification Training Ensemble 𝐏 𝒚=𝟏 Predicted Accuracy Population in test set Class Label 0.80 Model 1 0.90 1 0.75 Model 2 0.92 1 0.88 Model 3 0.87 1 0.91 Model 4 0.34 0 0.77 Model 5 0.41 0 0.65 Model 6 0.84 1 0.95 Model 7 0.14 0 0.82 Model 8 0.32 0 0.78 Model 9 0.98 1 0.83 Model 10 0.57 1 38 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Random forest (revisited) - Random subspace (a.k.a. Attribute bagging) » Random feature selection to reduce the correlation between decision trees All 𝑿𝟏 𝑿𝟐 𝑿𝟑 𝑿𝟒 𝑿𝟓 𝑿𝟔 Features 𝑿𝟏 ≥ 𝒃 𝑿𝟏 < 𝒃 Random 𝑿𝟏 𝑿𝟑 𝑿𝟒 Selection Selected 𝑿𝟏 Feature 39 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Isolation forest (iForest) - Binary decision tree-based anomaly detection via isolating a single instance » An anomaly instance consists of fewer instances » An anomaly instance’s attribute(feature) values are distinct from the normal instances - A binary decision tree can be used to isolate every single instance » The number of questions (path lengths) to distinguish the instance’s degree of anomaly - Case study: Normal instance Path lengths: 5 𝒙𝟐 𝒙𝟏 [ Source: https://ieeexplore.ieee.org/document/4781136 ] 40 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Isolation forest (iForest) - Binary decision tree-based anomaly detection via isolating a single instance » An anomaly instance consists of fewer instances » An anomaly instance’s attribute(feature) values are distinct from the normal instances - A binary decision tree can be used to isolate every single instance » The number of questions (path lengths) to distinguish the instance’s degree of anomaly - Case study: Anomaly instance Path lengths: 1 𝒙𝟐 𝒙𝟏 [ Source: https://ieeexplore.ieee.org/document/4781136 ] 41 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Isolation forest (iForest) - Construction of isolation tree (iTree) » Partial random sampling (60%~70%) » Random splitting variable and splitting point (no target variable required!) » One instance in all terminal nodes (full grown tree) » A path length of each instance is stored [ Source: https://www.researchgate.net/figure/Isolation-Forest-learned-iForest-construction-for-toy-dataset_fig1_352017898 ] 42 ANOMALY DETECTION | 이상치 탐지 ML for Manufacturing Data Isolation forest (iForest) - Novelty (Anomaly) Score, 𝑠 𝑥, 𝑛 » ℎ(𝑥) : The path length of instance 𝑥 in an iTree » 𝐸 ℎ(𝑥) : Expectation (average) of ℎ(𝑥) from all iTree » 𝑐(𝑛) : Average path length of all instances » Then, the novelty score can be defined as follows: [ Source: https://ieeexplore.ieee.org/document/4781136 ] 43 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Classification - The action or process of classifying something according to shared qualities or characteristics » Based on ML algorithms & features (target: label) » Binary / Multi-classes [ Source: https://wadhwatanya1234.medium.com/multi-class-classification-one-vs-all-one-vs-one-993dd23ae7ca ] 44 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Data Imbalance - A dataset within which one class has a much greater number of instances than the other » Most of business questions include the data imbalance problem! » Majority vs. Minority class [ Source: https://link.springer.com/article/10.1007/s10115-021-01560-w ] 45 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Data Imbalance – Why does it even matter? - Case study: Defect inspection » Historical defect rates: 3.8% (expectation) » Using ML classifiers, a data scientist obtains 96.2% accuracy. » Deploy the model → Nothing really improves... (never detects defective products) - Confusion matrix [ Source: https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28 ] 46 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Data Imbalance – Why does it even matter? - Case study: Defect inspection » Historical defect rates: 3.8% (expectation) » Using ML classifiers, a data scientist obtains 96.2% accuracy. » Deploy the model → Nothing really improves... (never detects defective products) - Confusion matrix on 10000 products (380 defective products) [ Source: https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28 ] 47 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Data Imbalance - The simplest solution is “sampling techniques” [ Source: https://statisticsofantarctica.com/api/users?token=L2hpNTJzdDJjcnI_a2V5PTBmMjJjMWZkNjA5ZjEzY2I3OTQ3YzhjYWJmZTFhOTBkJnN1Ym1ldHJpYz0xNDk2MTYxNA ] 48 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Entropy-based Fuzzy Support Vector Machine (EFSVM) - Solution to imbalanced data classification via model modification! » Majority: Negative class » Minority: Positive class [ Source: https://www.sciencedirect.com/science/article/abs/pii/S0950705116303495 ] 49 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Support Vector Machine (SVM, revisited) - SVM focuses on finding a hyperplane that maximizes the margin between two classes » Hard-margin DO NOT allow the instances inside the margin (unrealistic) » Soft-margin allows the instances cross the margin (reduction of maximum margin). [ Source: https://ankitnitjsr13.medium.com/math-behind-svm-support-vector-machine-864e58977fdb ] 50 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Support Vector Machine (SVM, revisited) - SVM focuses on finding a hyperplane that maximizes the margin between two classes » Soft-margin imposes a penalty 𝜉 on instances that cross plus/minus decision boundaries 51 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Support Vector Machine (SVM, revisited) - Let 𝑤 𝑇 𝑥 + 𝑏 be the hyperparameter for soft-margin SVM; then, the objective function in optimization for 𝑛 instances can be defined as follows: » 𝐶 is the hyperparameter for a margin (The larger the 𝐶, narrower the margin) - Corresponding constraints are as follows: » Given that the constraints of a hard-margin SVM is 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1, 𝜉𝑖 allows the margin to be reduced for individual instances. » Note that negative 𝜉𝑖 indicates the non-crossover instances (non-negative condition) 52 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Support Vector Machine (SVM, revisited) - Using Lagrange multiplier, the optimization can be transformed to Lagrange primal problem (𝐿𝑝 ): » Since the range of constraints in original optimization problem is non-negative, the constraints of 𝑳𝒑 are as follows: 53 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Support Vector Machine (SVM, revisited) - Karush–Kuhn–Tucker (KKT) condition states that the minimum of 𝑳𝒑 is obtained when the partial derivative of 𝑳𝒑 with respect to each unknown is zero » Replacing the above equations into 𝐿𝑝 , we can transform the Lagrange primal problem to dual problem 54 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Support Vector Machine (SVM, revisited) - The dual problem of soft-margin SVM (𝐿𝐷 ) is as follows: » Based on KKT condition, the constraints of 𝐿𝐷 are 55 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Entropy-based Fuzzy Support Vector Machine (EFSVM) - EVSVM focuses on the reducing the importance of negative(majority) class in decision surface - Entropy(𝐻𝑖 ) measures the information of each instance as follows: » where 𝑘 is the parameter determining number of nearest neighbor based on Euclidean distance - The higher the 𝐻𝑖 , closer the instance to the margin, lower the class certainty » Case study: 𝑘 = 7 56 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Entropy-based Fuzzy Support Vector Machine (EFSVM) - EVSVM focuses on the reducing the importance of negative(minority) class in decision surface - The entropy of negative samples are » where 𝑛− is the number of the negative samples - Let 𝐻min and 𝐻max be the minimum and maximum value of 𝐻, the entropy-based fuzzy membership can be defined as follows: » Firstly, separate the negative samples into 𝑚(parameter) subsets » with an increasing entropy order based on their entropy as Thus, 𝐻𝑆𝑢𝑏1 < 𝐻𝑆𝑢𝑏2 < ⋯ < 𝐻𝑆𝑢𝑏𝑚 in average 57 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Entropy-based Fuzzy Support Vector Machine (EFSVM) - EVSVM focuses on the reducing the importance of negative(minority) class in decision surface » For each 𝐻𝑆𝑢𝑏𝑙 , the entropy-based fuzzy membership (𝐹𝑀𝑙 ) is imposed as follows: » 𝛽 is also a parameter for rate of importance reduction that has the following boundary condition: » Based on 𝐹𝑀𝑙 , the importance of the instances in negative class are modified as follows: » Note that the importance of positive and negative class are indifferent when 𝛽 → 0 58 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Entropy-based Fuzzy Support Vector Machine (EFSVM) - Quadratic Optimization Problem for EFSVM w/ kernel: - Quadratic Optimization Problem for Soft-margin SVM: 59 IMBALANCED DATA CLASSIFICATION | 불균형 데이터 분류 ML for Manufacturing Data Entropy-based Fuzzy Support Vector Machine (EFSVM) - Dual Problem for EFSVM w/ kernel: - Dual Problem for Soft-margin SVM w/ kernel: 60 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data Business decision-making requires the answer for many “WHY” questions - ML model provides the results on “regression”, “classification”, “anomaly detection” » Which features affects the most on a result? - Multiple linear regression w/ three features: » Student’s 𝑡-test on coefficients (𝐻0 : 𝛽 = 0) - Machine/Deep learning models: 𝐗𝒊 𝑦𝑖 𝑓 𝐗𝑖 61 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data Business decision-making requires the answer for many “WHY” questions [ Source: https://www.sciencedirect.com/science/article/pii/S1566253521001597 ] 62 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data Business decision-making requires the answer for many “WHY” questions [ Source: https://www.edaily.co.kr/news/read?newsId=01197206635575136&mediaCodeNo=257 ] 63 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data Business decision-making requires the answer for many “WHY” questions [ Source: https://assets.new.siemens.com/siemens/assets/api/uuid:3b4de373-57e2-4329-b025-2825db0172aa/WhitepaperXAI.pdf ] 64 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data SHapley Additive exPlanations (SHAP) - SHAP, rooted in the Shapley value, pertains to scenarios where an original black-box model 𝑓 exists, facilitating predictions based on it » Rather than inputting original input values, a simplified version of the input (coalition) is employed to find a surrogate model, an explanatory model 𝑔 » This process enables both local and global interpretations 65 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data SHapley Additive exPlanations (SHAP) - Case study: 3 features (𝑥1 , 𝑥2 , 𝑥3 ) w/ average value of target 𝑦 in train set is 10 » Calculate the Sharpley value (𝜙𝑖 ) of 𝑥1 » 𝑆, 𝑖, and 𝐹 are the variable subset from which the variable of interest is excluded, set of variables of interest, and a subset of the entire variable, respectively Coalition 𝑥1 𝑥2 𝑥3 𝑦ො ① 0 0 0 10 Weight on one feature coalition = 1/3 ② 1 0 0 5 ② − ① = −5 𝑥1 , 𝑥2 , 𝑥3 ③ 0 1 0 12 ④ 0 0 1 15 Weight on two features coalition = 1/6 ⑤ 1 1 0 7 ⑤ − ③ = −5 𝑥1 , 𝑥2 , 𝑥1 , 𝑥3 , 𝑥2 , 𝑥3 ⑥ 1 0 1 14 ⑥ − ④ = −1 𝑥1 , 𝑥2 , 𝑥1 , 𝑥3 , 𝑥2 , 𝑥3 ⑦ 0 1 1 18 Weight on three features coalition = 1/3 ⑧ 1 1 1 20 ⑧−⑦=2 𝑥1 , 𝑥2 , 𝑥3 , 𝑥1 , 𝑥2 , 𝑥3 , 𝑥1 , 𝑥2 , 𝑥3 66 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data SHapley Additive exPlanations (SHAP) - Case study: 3 features (𝑥1 , 𝑥2 , 𝑥3 ) w/ average value of target 𝑦 in train set is 10 » Calculate the Sharpley value (𝜙𝑖 ) of 𝑥1 » 𝑆, 𝑖, and 𝐹 are the variable subset from which the variable of interest is excluded, set of variables of interest, and a subset of the entire variable, respectively Coalition 𝑥1 𝑥2 𝑥3 𝑦ො ① 0 0 0 10 ② 1 0 0 5 ③ 0 1 0 12 ④ 0 0 1 15 ⑤ 1 1 0 7 ⑥ 1 0 1 14 ⑦ 0 1 1 18 ⑧ 1 1 1 20 67 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data SHapley Additive exPlanations (SHAP) - Case study: Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence » Global interpretation (Waterfall plot) [ Source: https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003063114 ] 68 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data SHapley Additive exPlanations (SHAP) - Case study: Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence » Local interpretation (Force plot) » SVR @ 2022 [ Source: https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003063114 ] 69 MODEL EXPLAINABILITY | 모델 설명 가능성 ML for Manufacturing Data SHapley Additive exPlanations (SHAP) - Case study: Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence » Local interpretation (Force plot) » SVR @ 2022 [ Source: https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003063114 ] 70 CONCLUDING REMARKS Manufacturing Data Analytics There exist many ML models that addresses the key issues in manufacturing data - Anomaly detection » Solution: Unsupervised ML-based anomaly detection (e.g., iForest) - Data imbalance » Solution: Sampling-driven or model modification-driven approach (e.g., EFSVM) - Model explainability » Solution: Explainable A.I. (e.g., SHAP) 71 Time-series Models MANUFACTURING DATA ANALYTICS Time-series Models Manufacturing data analytics requires careful consideration of the following issues: - What is the business question? » RUL Estimation, Fault/Yield prediction - What is a proper ML/DL/RL model for the question and issue? » Supervised: Regression, Classification » Semi-supervised: Regression, Classification, Anomaly detection » Unsupervised: Anomaly detection - What are key issues? » Physics/Data-driven modeling » Lack of failure data » Data Imbalance » Absence of labeled data » Multimodality » Model explainability » Time-series or non-time-series or BOTH 73 AUTOREGRESSIVE MOVING AVERAGE (ARMA) Time-series Models The main objective of a time series modelling is to develop reasonably simple models capable of forecasting, interpreting, and testing hypotheses concerning time-series data. - The original use of time-series analysis was primarily as an aid to forecasting; as such, a methodology was developed to decompose a series into a trend, a seasonal, a cyclical, and an irregular component. » The trend component represented the long-term behavior of the series and the cyclical component represented the regular periodic movements. » The irregular component was stochastic, and the goal of the modelling was to estimate and forecast this component. 74 AUTOREGRESSIVE MOVING AVERAGE (ARMA) Time-series Models Autoregressive Moving Average (ARMA) - Combine a moving average process with a linear difference equation - Consider the 𝑝th order difference equation » Now let {𝑥𝑡 } be the MA(𝑞) process so that we can write » We follow the convention of normalizing units so that 𝛽0 is always equal to unity (𝜀𝑡 ) - If the characteristic roots of Eq.(2.5) are all in the unit circle (stationary process), {𝑦𝑡 } is called an ARMA model for 𝑦𝑡. » If the autoregresive part of the difference equation contains 𝑝 lags and the model for 𝑥𝑡 contains 𝑞 lags, the model is called an ARMA(𝑝, 𝑞) model 75 AUTOREGRESSIVE MOVING AVERAGE (ARMA) Time-series Models Autoregressive Moving Average (ARMA) - Formally, a time series (𝑦𝑡 ) having a finite mean and variance is covariance stationary if for all 𝑡 and 𝑡 − 𝑠, » Simply put, a time series is covariance stationary if its mean and all autocovariances are unaffected by a change of time origin. 76 AUTOREGRESSIVE MOVING AVERAGE (ARMA) Time-series Models ① ARMA(1,0), 𝑡 𝑦𝑡 𝑦𝑡−1 ① 𝑦ො𝑡 ① 𝜀𝑡 1 0.4967 -0.6588 1.1555 2 0.3088 0.4967 0.3561 -0.0473 3 0.7915 0.3088 0.1910 0.6005 4 2.1922 0.7915 0.6150 1.5772 5 1.5381 2.1922 1.8452 -0.3072 6 0.6185 1.5381 1.2707 -0.6523 7 1.8801 0.6185 0.4630 1.4170 8 2.3692 1.8801 1.5711 0.7982 9 1.1823 2.3692 2.0007 -0.8184 10 1.1111 1.1823 0.9582 0.1529 11 0.3660 1.1111 0.8957 -0.5297 12 -0.3852 0.3660 0.2413 -0.6264 13 -0.1288 -0.3852 -0.4185 0.2896 14 -1.9180 -0.1288 -0.1933 -1.7247 15 -3.4497 -1.9180 -1.7648 -1.6849 16 -3.1496 -3.4497 -3.1101 -0.0395 17 -3.0713 -3.1496 -2.8465 -0.2248 18 -1.8324 -3.0713 -2.7777 0.9453 19 -1.9132 -1.8324 -1.6895 -0.2236 20 -2.8326 -1.9132 -1.7605 -1.0721 77 AUTOREGRESSIVE MOVING AVERAGE (ARMA) Time-series Models ② ARMA(1,1), 𝑡 𝑦𝑡 𝑦𝑡−1 𝜀𝑡−1 ② 𝑦ො𝑡 ② 𝜀𝑡 1 0.4967 -0.4789 0.9756 2 0.3088 0.4967 0.5510 0.3367 -0.0279 3 0.7915 0.3088 0.3525 0.0368 0.7547 4 2.1922 0.7915 0.9539 0.8870 1.3052 5 1.5381 2.1922 1.4834 2.2929 -0.7548 6 0.6185 1.5381 -0.3092 0.2876 0.3309 7 1.8801 0.6185 0.6770 0.5287 1.3513 8 2.3692 1.8801 1.6515 2.2293 0.1399 9 1.1823 2.3692 0.5043 1.5576 -0.3753 10 1.1111 1.1823 0.0048 0.3219 0.7892 11 0.3660 1.1111 1.1425 1.2669 -0.9009 12 -0.3852 0.3660 -0.5171 -0.6836 0.2985 13 -0.1288 -0.3852 0.6647 -0.1583 0.0295 14 -1.9180 -0.1288 0.3988 -0.2176 -1.7004 15 -3.4497 -1.9180 -1.3193 -2.9227 -0.5270 16 -3.1496 -3.4497 -0.1546 -2.9382 -0.2114 17 -3.0713 -3.1496 0.1591 -2.4622 -0.6091 18 -1.8324 -3.0713 -0.2376 -2.7557 0.9233 19 -1.9132 -1.8324 1.2910 -0.5868 -1.3264 20 -2.8326 -1.9132 -0.9544 -2.6010 -0.2316 78 AUTOREGRESSIVE MOVING AVERAGE (ARMA) Time-series Models Linear regression-based modelling - Advantages » Simplicity » Explainability » Low computing cost » Powerful and easy-to-implement » Expansion to SARIMAX - Drawbacks » Assumption of linear relationship between variables » Stationarity condition » Heteroscedasticity (Time-varying variance) » Sensitivity to outliers » Low model scalability [ Source: ] 79 RECURRENT NEURAL NETWORK (RNN) Time-series Models Recurrent neural network (RNN) - An artificial neural network designed to recognize patterns in sequences of data, such as time series, text, or audio. » Unlike feedforward neural networks, which process input data in a strictly forward manner, RNNs have connections that form directed cycles, allowing them to exhibit dynamic temporal behavior. » The key feature of RNNs is their ability to maintain a memory of past inputs using recurrent connections, which enables them to capture temporal dependencies and context within sequential data. 80 RECURRENT NEURAL NETWORK (RNN) Time-series Models Convolutional neural network (CNN) architecture for non-sequential data - 100 instances w/ 3 features (𝑥1 , 𝑥2 , 𝑥3 ), 1 target (𝑦), - 2 hidden nodes 𝑦𝑖 𝑦𝑖 ℎ1 ℎ2 𝐡 𝑥1,𝑖 𝑥2,𝑖 𝑥3,𝑖 𝐱𝑖 𝑖 𝑥1,𝑖 𝑥2,𝑖 𝑥3,𝑖 𝑦𝑖 1 2 3 ⋮ 99 100 81 RECURRENT NEURAL NETWORK (RNN) Time-series Models Case study: Quality control in 1L water bottle manufacturing factory - 𝑥𝑖,𝑡 : the sensor data from machine 𝑖 at time 𝑡 - 𝑦𝑡 : the amount of water filled in a bottle » Assume that past 100 sensor data affects the accuracy of bottle filling » 200 instances 1 𝑡 𝑥1,𝑡 𝑥2,𝑡 𝑥3,𝑡 𝑦100 1 2 𝑡 𝑥1,𝑡 𝑥2,𝑡 𝑥3,𝑡 𝑦101 2 2 3 3 200 ⋮ 4 𝑡 𝑥1,𝑡 𝑥2,𝑡 𝑥3,𝑡 𝑦𝑇 99 ⋮ 𝑇 − 99 100 100 𝑇 − 98 101 𝑇 − 97 ⋮ 𝑁(instance) by 𝑇(time) by 𝐷(variable) 𝑇−1 = 200 by 100 by 3 𝑇 82 RECURRENT NEURAL NETWORK (RNN) Time-series Models Case study: Quality control in 1L water bottle manufacturing factory - (Vanilla) RNN architecture for sequential (time-series) data » 2 hidden nodes » Instance #1 𝑦100 ℎ2,1 ℎ2,2 ⋯ ℎ2,100 ⋯ ℎ1,1 ℎ1,2 ⋯ ℎ1,100 𝑥3,1 𝑥3,2 𝑥3,100 𝑥2,1 𝑥2,2 𝑥2,100 𝑥1,1 𝑥1,2 𝑥1,100 [𝑡=1] [𝑡=2] [ 𝑡 = 100 ] 83 RECURRENT NEURAL NETWORK (RNN) Time-series Models Case study: Quality control in 1L water bottle manufacturing factory - (Vanilla) RNN architecture for sequential (time-series) data » 200 instances to learn 𝑦𝑇 𝑦𝑇 𝐡𝒕 𝐡 𝑇−99 𝐡 𝑇−98 𝐡 𝑇−97 ⋯ 𝐡 𝑇−1 𝐡𝑇 𝐱𝑡 𝐱 𝑇−99 𝐱 𝑇−98 𝐱 𝑇−97 𝐱 𝑇−1 𝐱𝑇 84 RECURRENT NEURAL NETWORK (RNN) Time-series Models Information flow in vanilla RNN - 𝑓( ) & 𝑔( ): activation functions - 𝑡 = 1, … , 𝑇 𝑦𝑡 Parameters (weight vectors) to be estimated 𝐡𝑡−1 𝐡𝑡 𝐱𝑡 » h𝑡 : hidden node’s value at time 𝑡 » Wℎℎ h𝑡−1 : hidden node’s value stored until time 𝑡 − 1 » W𝑥ℎ x𝑡 : New input values at time 𝑡 85 RECURRENT NEURAL NETWORK (RNN) Time-series Models Backpropagation in RNN - Let 𝑓( ) be the hyperbolic tangent (Tanh); then, 𝑦2 𝐶𝑜𝑠𝑡 𝐡0 𝐡1 𝐡2 𝐱1 𝐱2 86 RECURRENT NEURAL NETWORK (RNN) Time-series Models Backpropagation in RNN - Generalization: - Since 𝑓( ) is the hyperbolic tangent (Tanh), » Partial derivatives yield, 87 RECURRENT NEURAL NETWORK (RNN) Time-series Models Gradient exploding/vanishing problem - Note that [ Source: https://www.sciencedirect.com/science/article/abs/pii/S0925231219309464 ] 88 RECURRENT NEURAL NETWORK (RNN) Time-series Models Gradient exploding/vanishing problem - If the gradient vanishing occurs only once in chain, the information from the long past will be lost All past information 𝑦𝑇 will be lost If a gradient vanishes, 𝑦𝑇 𝐡𝒕 𝐡 𝑇−99 𝐡 𝑇−98 𝐡 𝑇−97 ⋯ 𝐡 𝑇−1 𝐡𝑇 𝐱𝑡 𝐱 𝑇−99 𝐱 𝑇−98 𝐱 𝑇−97 𝐱 𝑇−1 𝐱𝑇 89 RECURRENT NEURAL NETWORK (RNN) Time-series Models Gradient exploding/vanishing problem - Solution: Change or develop an activation function » or upgrade the RNN architecture [ Source: https://www.sciencedirect.com/science/article/abs/pii/S0925231219309464 ] 90 LONG SHORT-TERM MEMORY (LSTM) Time-series Models RNN-based model to resolve the gradient exploding/vanishing problem - Enable the learning of the long-term dependency » Selectively remembering or forgetting information over long sequences [ Operation symbols ] [ Vanilla RNN ] [ LSTM ] [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 91 LONG SHORT-TERM MEMORY (LSTM) Time-series Models RNN-based model to resolve the gradient exploding/vanishing problem - Enable the learning of the long-term dependency » Selectively remembering or forgetting information over long sequences. [ Vanilla RNN ] [ Source: https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45 ] 92 LONG SHORT-TERM MEMORY (LSTM) Time-series Models RNN-based model to resolve the gradient exploding/vanishing problem - Enable the learning of the long-term dependency » Selectively remembering or forgetting information over long sequences. [ LSTM ] [ Source: https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45 ] 93 LONG SHORT-TERM MEMORY (LSTM) Time-series Models Cell state - A unique part in LSTM passing through the top of the architecture » The cell state is kind of like a conveyor belt » It runs straight down the entire chain, with only some minor linear interactions » It’s very easy for information to just flow along it unchanged - From the cell state, the model selectively remembers or forgets the past information [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 94 LONG SHORT-TERM MEMORY (LSTM) Time-series Models Step 1: Decide what information the model is going to throw away from the cell state - Forget gate: outputs numbers between 0 and 1 based on a previous hidden state, ℎ𝑡−1 and 𝑥𝑡 » Sigmoid layer (forget layer) - From the forget gate, we evaluate the value of the past information » 1: completely keep all information from the cell state » 0: completely forget about all information from the cell state [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 95 LONG SHORT-TERM MEMORY (LSTM) Time-series Models Step 2: Decide what new information we’re going to store in the cell state - Input gate (sigmoid) layer: decides which values from now we’ll update - Tanh layer: creates a vector of new candidate values, 𝐶ሚ𝑡 that could be added to the cell state [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 96 LONG SHORT-TERM MEMORY (LSTM) Time-series Models Step 3: Update the old cell state, 𝐶𝑡−1 , into the new cell state 𝐶𝑡 [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 97 LONG SHORT-TERM MEMORY (LSTM) Time-series Models Step 4: Decide what we’re going to output (ℎ𝑡 ) - Output (sigmoid) gate layer: decides what parts of the cell state we’re going to output » Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to. [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 98 LONG SHORT-TERM MEMORY (LSTM) Time-series Models Weight matrices to be estimated: [ Vanilla RNN ] [ LSTM ] [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 99 GATED RECURRENT UNIT (GRU) Time-series Models RNN-based model to resolve the gradient exploding/vanishing problem - A simpler alternative to the more complex Long Short-Term Memory (LSTM) units [ GRU ] [ Source: https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45 ] 100 GATED RECURRENT UNIT (GRU) Time-series Models Reduced form of LSTM - Cell state: is removed - Update gate: combines forget and input gates in LSTM (𝑧𝑡 ) - Reset gate: decides the rates of forget on previous hidden state (𝑟𝑡 ) [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 101 GATED RECURRENT UNIT (GRU) Time-series Models [ Vanilla RNN ] [ LSTM ] [ GRU ] [ Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ] 102 CONCLUDING REMARKS Manufacturing Data Analytics There exist many time-series (sequential) data in manufacturing data - ARMA: A linear regression-based approach » Advantages: simple, easy to implement, model explainability w/ coefficients » Disadvantages: linear relationship, stationarity condition, low model scalability - Vanilla RNN: A simplest form of recurrent neural network » Advantages: non-linear relationship, simplest in RNN-based models » Disadvantages: gradient exploding/vanishing problem, model explainability - LSTM: An updated model from Vanilla RNN w/ cell state, forget/input/output gate layers » Advantages: solution to gradient e/v problem, powerful performance » Disadvantages: High complexity in model and computation, sensitivity to hyperparameters - GRU: A simpler version of LSTM » Advantages: powerful performance w/ lower model complexity » Disadvantages: less effective capturing of long-term dependencies, learning from noisy data 103

Manufacturing Data Analytics PDF

Document Details

Tags

Related

Summary

Full Transcript