Logistic Regression and KNN AI Summary (PDF)

Summary

This document summarizes logistic regression and KNN concepts in machine learning, including overfitting, regularization techniques (L1 and L2), and applications in different fields such as healthcare. The document also describes using these methods to train and test models.

Full Transcript

‫‪Logistic Regression and KNN‬‬ ‫علَى آ ِل ُم َح َّمد‪،‬‬ ‫علَى ُم َح َّم ٍد َو َ‬ ‫علَى إِب َْرا ِهي َم‪َ ،‬وبَ ِار ْك َ‬ ‫علَى آ ِل ُم َح َّمدٍ‪َ ،‬ك َما َ‬...

‫‪Logistic Regression and KNN‬‬ ‫علَى آ ِل ُم َح َّمد‪،‬‬ ‫علَى ُم َح َّم ٍد َو َ‬ ‫علَى إِب َْرا ِهي َم‪َ ،‬وبَ ِار ْك َ‬ ‫علَى آ ِل ُم َح َّمدٍ‪َ ،‬ك َما َ‬ ‫صلَّيْتَ َ‬ ‫علَى ُم َح َّم ٍد َو َ‬ ‫اللَّ ُه َّم َ‬ ‫ص ِ ِّل َ‬ ‫علَى آ ِل إِب َْرا ِهي َم‪ ،‬فِي ا ْلعَالَ ِم َ‬ ‫ين‪ ،‬إِنَّكَ َح ِمي ٌد َم ِجيد‬ ‫ار ْكتَ َ‬ ‫َك َما بَ َ‬ ‫‪By: Muhammed Gamal Maklad‬‬ ‫‪My LinkedIn‬‬ ‫ طب أنا أيه يضمنلي ان أصل أضمن ان في‬Best Fit Line ‫ كنا بنحاول نوصل ل‬Linear Regression ‫احنا في‬ ‫ اللي عندي ؟ يعني زي اللي في الصوره علشان كده في ساعات علشان اقسم ال‬Classes ‫خط ممكن يقسم ال‬ ‫ زي ما شرحنا اني هي عباره عن‬sigmoid ‫ فعلشان كده هنستخدم ال‬Curve ‫ ممكن أكون محتاج‬Classes ‫ لو القيمه‬0 ‫ برجعها ب‬0.5 ‫ و لو لقيمه اللي جايه ليها أصغر من‬0.5 ‫ بتاعها ب‬Threeshold ‫ بيكون‬Function 1 ‫ هرجعها ب‬0.5 ‫أكبر من‬ ‫ راجع بقيمو‬Z ‫ فلو ال‬Linear Regression ‫ دي عباره عن قانون‬Z ‫ايوه يعني الكلم ده هيتعمل ازاي يا جيمي بص ال‬ ‫ تمام ؟‬1 ‫ فهرجعه ب‬0.5 ‫ فكده اكبر من‬0.89 ‫ فكده ناتج هيكون ب‬2 ‫سالبه فالسالب ده هيبقي موجب يعني لو راجع ب‬ ‫ أحتمال‬1 -Y ‫ األول و‬Output ‫ بتاعت حدوث ال‬Probability ‫ هي ال‬Y ‫ فكده ال‬2 Output ‫هنا لو عندي‬ ‫ التاني تمام ؟‬Output ‫حدوث ال‬ ❖ Regularization: Overfitting: Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.. ‫ ميعرفش يجاوب‬Test ‫ عليه تيجي في‬Accuracy ‫ مطلع‬Train ‫من االخر الموديل بيحفظ الداتا تلقيه في‬ Regularization: any mechanism that reduces overfitting. Regularization Rate 𝝀 A number that specifies the relative importance of regularization during training. Raising the regularization rate reduces overfitting but may reduce the model's predictive power (increase loss). Conversely, reducing or omitting the regularization rate increases overfitting. ‫ عن طريق أول حاجه اني اشيل‬Overfitting ‫ هو محاوله مننا اننا نقضي علي اصعب مشكله في الماشيين ليرنج اال و هي ال‬regularization ‫ال‬. L1 ,L 2 regularization ‫ عندي منها نوعين‬Overfitting ‫الفيتشر اللي ملهاش أي تلته الزمه اللي بتؤدي ان الموديل بحصله‬ L1 regularization L2 regularization Penalizes weights based on the sum of their absolute values Penalizes weights based on the sum of their squared values. Used in Feature selection Suitable if data contains outliers makes models simpler by removing some features keeps all features but reduces the impact of less completely. important ones. 1 Muhammed Gamal Maklad ‫صل علي النبي‬ ‫ فبناء عليه هيشيلها بحيث انها‬0 ‫ بتاعها ب‬Weight ‫ اللي ملهاش الزم ال‬Feature ‫ هو كل اللي بيعمله انه بيحاول يخلي ال‬Lasso ‫ أو‬L1 ‫ اللي‬Feature ‫ و ال‬Weight ‫ هو بيجيب ال التربيع لل‬Ridge ‫ أو‬L2 ‫ اما‬Feature selection ‫متلغبطش ال موديل و ده حلو أوي في‬. Lambda ‫ و يروح يضرب القيمه دي في‬Outlier ‫ فده كويس جدا لما يكون في‬0 ‫ بتاعها بيقترب من ال‬Weight ‫ملهاش الزمه بيخلي ال‬ Regularization rate (lambda): The Greek character lambda typically symbolizes the regularization rate. high regularization rate low regularization rate reducing the chances of overfitting. increasing the chances of overfitting. normal distribution ,a mean weight of 0. a histogram of model weights with a flat distribution. ❖ K-Nearest Neighbors (KNN): K-Nearest Neighbors algorithm is a supervised learning classifier, which uses proximity to make predictions/classifications about the group of individual data points. It is based on the assumption that similar points can be found near one another. ‫ بيعتمد علي المسافه بمعني انا دلوقتي عندي النجمه دي انا عاووز أحدد هي تبع انهي‬KNN ‫في ال‬ 3 ‫ اللي هبصلهم فهو بيشوف ايه أقرب‬neighbors ‫ هنا عدد‬K ‫ و‬K = 3 ‫ لو انا قولتله‬Class 3 ‫عناصر منها مسافة و بناء ع الكلس اللي هيكون له أكبر عدد فهيحطها معاه يعني هنا أقرب‬ ‫ قوانين حساب المسافه‬. ‫ مربع معني كده انه هيعتبرها انها مربع‬1 ‫ مثلث و‬2 ‫عناصؤ عباره عن‬ ‫تحت اهي‬ Euclidean Distance Manhattan distance Minkowski Distance Hamming Distance ‫المسافه بين نقطتين ع خط مستقيم‬ ‫ المسافه بين نقطتين ع مجسم زي‬distance measure is the used with Boolean or ‫ المسافه بين بلدين‬generalized form of Euclidean string vectors, and Manhattan distance identifying the points metrics. where the vectors do not match. it is recommended to have an odd number for k to avoid ties in classification 2 Muhammed Gamal Maklad ‫صل علي النبي‬ ❖ Cross-Validation: k-fold Cross-Validation: ✓ Partitions data into k equal subsets (or folds). ✓ Each subset is used once as a validation set, while the model is trained on the remaining k- 1 subsets. ✓ Process repeats k times, with each subset serving as the validation set once. ✓ The average error across all k runs is reported as E. ✓ Popular for cross-validation but can be time-consuming due to repeated model training. ‫ ك‬800 ‫ هتعمل منهم‬5 folds ‫ صف قولت هتقسمهم‬1000 ‫تخيل انت جايلك داتا وليكن من‬ 5 ‫ انت هتروح بدل قسمتهم‬Cross-Validation ‫ فعلشان تعمل‬Test ‫ ك‬200 ‫ و‬Train 800 ‫ و ال‬Test ‫ ف هيكونوا‬200 ‫ مرات مره أول‬5 ‫ يبقي زي متقول هتدرب الموديل‬folds ‫ و‬Test ‫ صف هيكونوا‬200 ‫ تاني مره تاني‬Accuracy ‫ و نحسب‬Train ‫التانين هيبقوا‬ ‫ و هكذا و في االخر نطلع ال‬Accuracy ‫ و نحسب‬Train ‫ التانين هيبقوا‬800 ‫ال‬. Average Accuracy ❖ Applications of KNN in Machine Learning: Data preprocessing: Datasets frequently have missing values, but the KNN algorithm can estimate for those values in a process known as missing data imputation. Recommendation Engines: Using clickstream data from websites, the KNN algorithm has been used to provide automatic recommendations to users on additional content. a user is assigned to a particular group, and based on that group’s user behavior, they are given a recommendation. Healthcare: KNN has also had application within the healthcare industry, making predictions on the risk of heart attacks and prostate cancer. The algorithm works by calculating the most likely gene expressions. Pattern Recognition: KNN has also assisted in identifying patterns, such as in text and digit classification. This has been particularly helpful in identifying handwritten numbers that you might find on forms or mailing envelopes. 3 Muhammed Gamal Maklad ‫صل علي النبي‬

Use Quizgecko on...
Browser
Browser