F of ML_Part_1.pdf

Fundamentals of Machine Learning Pr. Narjisse Nejjari [email protected] November 2023 F U NDA MENTA L S O F M AC H I N E LEARNING BASIC CONCEPTS F U NDA MENTA L S O F M AC H I N E LEARNING A BRIEF HISTORY OF AI P R E - 1 9 5 0 S : E A R LY C O N C E P T S A N D P H I L O S O P H I E S 17th-19th Century: Philosophers like Leibniz and Boole develop mathematical logic, laying groundwork for computational thinking. Science fiction: Tales of mechanical men and artificial beings appear in myths and folklore. Present 1950 1960 1980 1990 1970 2000 2010 1950S: THE BIRTH OF AI 1950 - Turing Test: Alan Turing proposes the Turing Test, setting a foundational goal for AI: creating machines that can mimic human intelligence. 1956 - Dartmouth Conference: John McCarthy coins the term "Artificial Intelligence" at the Dartmouth Conference, considered the official birth of AI as a field. Here, the vision was set to create machines capable of general intelligent action. Present 1950 1960 1980 1990 1970 2000 2010 19 6 0 S - 197 0 S : E A R LY E N T H U S I A S M A N D R E S E A RC H Early Successes: Researchers develop programs that can solve algebra problems, prove theorems, and speak English. AI Winter (Late 1970s): Initial optimism leads to inflated expectations, followed by reduced funding and interest as early AI fails to meet these expectations. Present 1950 1960 1980 1990 1970 2000 2010 1980S: AI’S RESURGENCE Rise of Expert Systems: AI gains commercial success with the development of expert systems, computer programs that simulate the decision-making ability of a human expert. Renewed Funding and Interest: The success of expert systems leads to renewed interest and funding in AI research. Present 1950 1960 1980 1990 1970 2000 2010 19 9 0 S : T H E E R A O F M AC H I N E L E A R N I N G Shift to Machine Learning: Focus shifts from hard-coded expert systems to machine learning, where computers develop the ability to learn from data. Important Algorithms: Development of algorithms like backpropagation for neural networks paves the way for more advanced AI. Present 1950 1960 1980 1990 1970 2000 2010 2 0 0 0 S : B I G D A T A A N D A D VA N C E D A I Big Data: The explosion of data on the internet and improvements in computer hardware enable AI to scale to new heights. Advances in Deep Learning: Breakthroughs in deep learning, especially in areas like image and speech recognition. Present 1950 1960 1980 1990 1970 2000 2010 2 010 S - P R E S E N T: A I I N E V E RY DAY L I F E Mainstream Adoption: AI becomes a part of everyday technology - in smartphones, search engines, and personal assistants. Significant Milestones: 2011: IBM's Watson wins Jeopardy! 2014: Google acquires DeepMind; later, DeepMind's AlphaGo beats the world champion in Go. 2018-Present: Advancements in natural language processing, exemplified by systems like OpenAI's GPT series. Present 1950 1960 1980 1990 1970 2000 2010 THE FUTURE OF AI Ethical Considerations: As AI becomes more integrated into society, ethical considerations and debates about privacy, autonomy, and the role of AI in society intensify. Continuing Advancements: Ongoing research in areas like AI ethics, autonomous vehicles, healthcare, and more promise to further integrate AI into various aspects of human life. Present 1950 1960 1980 1990 1970 2000 2010 F U NDA MENTA L S O F M AC H I N E LEARNING THE MACHINE LEARNING PROCESS T H E M AC H I N E L E A R N I N G P RO C E S S TRAINING SET & TEST SET F EAT U RE S CA L ING F EAT U RE S CA L ING F EAT U RE S CA L ING FEATURE SCA LING F EAT U RE S CA L ING F EAT U RE S CA L ING F EAT U RE S CA L ING F EAT U RE S CA L ING F EAT U RE S CA L ING F EAT U RE S CA L ING F U NDA MENTA L S O F M AC H I N E LEARNING DATA PREPROCESSING – LAB N°1 M IS S ING DATA The SimpleImputer class in scikit-learn is designed to handle missing data. M IS S ING DATA SimpleImputer Class Parameters: missing_values (default=None): Specifies which values should be treated as missing. The default is None, which means that the imputer will identify np.nan values as missing. strategy (default='mean'): Determines the imputation strategy to use. There are several options: 'mean': Replaces missing values with the mean of the non-missing values in the column. 'median': Replaces missing values with the median of the non-missing values in the column. 'most_frequent': Replaces missing values with the most frequent value in the column. 'constant': Replaces missing values with a constant specified by the fill_value parameter. M IS S ING DATA SimpleImputer Methods fit(X) method: Computes the mean, median, most frequent value, or constant fill value based on the chosen strategy for each feature in the dataset. transform(X) method: Applies the imputation strategy to replace missing values in the input dataset X with the values computed during the fitting process. E N C O D I N G T H E I N D E P E N D E N T VA R I A B L E The ColumnTransformer class from scikit-learn to apply specific transformations to different columns in a dataset The OneHotEncoder from scikit-learn is a preprocessing tool used for converting categorical variables into a binary matrix representation, commonly known as one-hot encoding. E N C O D I N G T H E I N D E P E N D E N T VA R I A B L E ColumnTransformer allows you to apply different transformers to different columns. In this case, it specifies one transformer named 'encoder' that uses OneHotEncoder and is applied to the columns specified in the list. remainder='passthrough' means that columns not specified in transformers should be left unchanged (passed through). E N C O D I N G T H E I N D E P E N D E N T VA R I A B L E fit_transform method fits the transformer to the dataset X and transforms it in a single step. np.array(...): The result of the transformation is a matrix. The np.array(...) part is converting the result into a NumPy array. The result is assigned back to the variable X, replacing the original dataset. E N C O D I N G T H E D E P E N D E N T VA R I A B L E LabelEncoder from scikit-learn to encode the labels (target variable) y into numerical format. E N C O D I N G T H E D E P E N D E N T VA R I A B L E We create an instance of the LabelEncoder class. fit_transform is a method of LabelEncoder. It fits the encoder to the unique values in y and transforms the labels into numerical format. The transformed labels are then assigned back to the variable y Do I have to apply scaling Before or after Splitting? S PL IT T ING T HE DATA S E T INTO T HE T R A INING SET AND TEST SET The train_test_split function from scikit-learn splits the dataset into training and testing sets. S PL IT T ING T HE DATA S E T INTO T HE T R A INING SET AND TEST SET X and y are the features and labels, respectively. test_size=0.2 specifies that 20% of the data will be used for testing, and the remaining 80% will be used for training. random_state=1 is used to ensure reproducibility. Returned Values: X_train: Training set features X_test: Testing set features y_train: Training set labels y_test: Testing set labels FEATURE SCA LING The StandardScaler is a class from scikit-learn used to standardize the features in the training and testing sets. FEATURE SCA LING We create an instance sc of the StandardScaler class fit_transform is used on the training set (X_train). It calculates the mean and standard deviation of each feature in the training set and then standardizes the data. transform is used on the testing set (X_test). It applies the same transformation (mean and standard deviation) that was learned from the training set. Do I have to apply scaling to the dummy variables in the dataset?

F of ML_Part_1.pdf

Document Details

Related

Full Transcript

Upgrade to continue