Logistic Regression and KNN AI Summary (PDF)
Document Details
Uploaded by LustrousTantalum
Muhammed Gamal Maklad
Tags
Summary
This document summarizes logistic regression and KNN concepts in machine learning, including overfitting, regularization techniques (L1 and L2), and applications in different fields such as healthcare. The document also describes using these methods to train and test models.
Full Transcript
Logistic Regression and KNN علَى آ ِل ُم َح َّمد، علَى ُم َح َّم ٍد َو َ علَى إِب َْرا ِهي َمَ ،وبَ ِار ْك َ علَى آ ِل ُم َح َّمدٍَ ،ك َما َ...
Logistic Regression and KNN علَى آ ِل ُم َح َّمد، علَى ُم َح َّم ٍد َو َ علَى إِب َْرا ِهي َمَ ،وبَ ِار ْك َ علَى آ ِل ُم َح َّمدٍَ ،ك َما َ صلَّيْتَ َ علَى ُم َح َّم ٍد َو َ اللَّ ُه َّم َ ص ِ ِّل َ علَى آ ِل إِب َْرا ِهي َم ،فِي ا ْلعَالَ ِم َ ين ،إِنَّكَ َح ِمي ٌد َم ِجيد ار ْكتَ َ َك َما بَ َ By: Muhammed Gamal Maklad My LinkedIn طب أنا أيه يضمنلي ان أصل أضمن ان فيBest Fit Line كنا بنحاول نوصل لLinear Regression احنا في اللي عندي ؟ يعني زي اللي في الصوره علشان كده في ساعات علشان اقسم الClasses خط ممكن يقسم ال زي ما شرحنا اني هي عباره عنsigmoid فعلشان كده هنستخدم الCurve ممكن أكون محتاجClasses لو القيمه0 برجعها ب0.5 و لو لقيمه اللي جايه ليها أصغر من0.5 بتاعها بThreeshold بيكونFunction 1 هرجعها ب0.5 أكبر من راجع بقيموZ فلو الLinear Regression دي عباره عن قانونZ ايوه يعني الكلم ده هيتعمل ازاي يا جيمي بص ال تمام ؟1 فهرجعه ب0.5 فكده اكبر من0.89 فكده ناتج هيكون ب2 سالبه فالسالب ده هيبقي موجب يعني لو راجع ب أحتمال1 -Y األول وOutput بتاعت حدوث الProbability هي الY فكده ال2 Output هنا لو عندي التاني تمام ؟Output حدوث ال ❖ Regularization: Overfitting: Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.. ميعرفش يجاوبTest عليه تيجي فيAccuracy مطلعTrain من االخر الموديل بيحفظ الداتا تلقيه في Regularization: any mechanism that reduces overfitting. Regularization Rate 𝝀 A number that specifies the relative importance of regularization during training. Raising the regularization rate reduces overfitting but may reduce the model's predictive power (increase loss). Conversely, reducing or omitting the regularization rate increases overfitting. عن طريق أول حاجه اني اشيلOverfitting هو محاوله مننا اننا نقضي علي اصعب مشكله في الماشيين ليرنج اال و هي الregularization ال. L1 ,L 2 regularization عندي منها نوعينOverfitting الفيتشر اللي ملهاش أي تلته الزمه اللي بتؤدي ان الموديل بحصله L1 regularization L2 regularization Penalizes weights based on the sum of their absolute values Penalizes weights based on the sum of their squared values. Used in Feature selection Suitable if data contains outliers makes models simpler by removing some features keeps all features but reduces the impact of less completely. important ones. 1 Muhammed Gamal Maklad صل علي النبي فبناء عليه هيشيلها بحيث انها0 بتاعها بWeight اللي ملهاش الزم الFeature هو كل اللي بيعمله انه بيحاول يخلي الLasso أوL1 الليFeature و الWeight هو بيجيب ال التربيع للRidge أوL2 اماFeature selection متلغبطش ال موديل و ده حلو أوي في. Lambda و يروح يضرب القيمه دي فيOutlier فده كويس جدا لما يكون في0 بتاعها بيقترب من الWeight ملهاش الزمه بيخلي ال Regularization rate (lambda): The Greek character lambda typically symbolizes the regularization rate. high regularization rate low regularization rate reducing the chances of overfitting. increasing the chances of overfitting. normal distribution ,a mean weight of 0. a histogram of model weights with a flat distribution. ❖ K-Nearest Neighbors (KNN): K-Nearest Neighbors algorithm is a supervised learning classifier, which uses proximity to make predictions/classifications about the group of individual data points. It is based on the assumption that similar points can be found near one another. بيعتمد علي المسافه بمعني انا دلوقتي عندي النجمه دي انا عاووز أحدد هي تبع انهيKNN في ال 3 اللي هبصلهم فهو بيشوف ايه أقربneighbors هنا عددK وK = 3 لو انا قولتلهClass 3 عناصر منها مسافة و بناء ع الكلس اللي هيكون له أكبر عدد فهيحطها معاه يعني هنا أقرب قوانين حساب المسافه. مربع معني كده انه هيعتبرها انها مربع1 مثلث و2 عناصؤ عباره عن تحت اهي Euclidean Distance Manhattan distance Minkowski Distance Hamming Distance المسافه بين نقطتين ع خط مستقيم المسافه بين نقطتين ع مجسم زيdistance measure is the used with Boolean or المسافه بين بلدينgeneralized form of Euclidean string vectors, and Manhattan distance identifying the points metrics. where the vectors do not match. it is recommended to have an odd number for k to avoid ties in classification 2 Muhammed Gamal Maklad صل علي النبي ❖ Cross-Validation: k-fold Cross-Validation: ✓ Partitions data into k equal subsets (or folds). ✓ Each subset is used once as a validation set, while the model is trained on the remaining k- 1 subsets. ✓ Process repeats k times, with each subset serving as the validation set once. ✓ The average error across all k runs is reported as E. ✓ Popular for cross-validation but can be time-consuming due to repeated model training. ك800 هتعمل منهم5 folds صف قولت هتقسمهم1000 تخيل انت جايلك داتا وليكن من 5 انت هتروح بدل قسمتهمCross-Validation فعلشان تعملTest ك200 وTrain 800 و الTest ف هيكونوا200 مرات مره أول5 يبقي زي متقول هتدرب الموديلfolds وTest صف هيكونوا200 تاني مره تانيAccuracy و نحسبTrain التانين هيبقوا و هكذا و في االخر نطلع الAccuracy و نحسبTrain التانين هيبقوا800 ال. Average Accuracy ❖ Applications of KNN in Machine Learning: Data preprocessing: Datasets frequently have missing values, but the KNN algorithm can estimate for those values in a process known as missing data imputation. Recommendation Engines: Using clickstream data from websites, the KNN algorithm has been used to provide automatic recommendations to users on additional content. a user is assigned to a particular group, and based on that group’s user behavior, they are given a recommendation. Healthcare: KNN has also had application within the healthcare industry, making predictions on the risk of heart attacks and prostate cancer. The algorithm works by calculating the most likely gene expressions. Pattern Recognition: KNN has also assisted in identifying patterns, such as in text and digit classification. This has been particularly helpful in identifying handwritten numbers that you might find on forms or mailing envelopes. 3 Muhammed Gamal Maklad صل علي النبي