Chapter-1.pdf

CSC311S3/306M3: Machine Learning Prof. A. Ramanan Department of Computer Science, Faculty of Science, University of Jaffna, Jaffna, Sri Lanka. Academic Year: 2021/2022 Course Structure 90 Contents 1. Introduction to machine learning 2. Supervised learning 3. Unsupervised learning 4. Reinforcement learning 5. Introduction to Deep Learning 6. Dimensionality reduction 7. Experimental setup and evaluation www.csc.jfn.ac.lk/index.php/courses-level-3s Assessment Strategy REINFORCEMENT LEARNING Google's GNMT Automatic friend tagging suggestion long short term memory neural network Feed Forward Google Maps Neural network Google Assistant, Siri, Alexa Amazon, Netflix MLP, Decision tree, Tesla Naïve Bayes classifier Data Mining Data mining is about solving problems by analyzing data already present in databases. It is defined as the process of discovering patterns in data. The process must be automatic or (more usually) semiautomatic. The patterns discovered must be meaningful in that they lead to some advantage, usually an economic advantage. Data Cleaning The Contact Lenses Data Rule Set for this Problem If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none Decision Tree for this Problem Inferring Rudimentary Rules 1R for 1-rule, generates a one-level decision tree expressed in the form of a set of rules that all test one particular attribute. 1R is a simple and frequently achieve surprisingly high accuracy. The idea is this: we make rules that test a single attribute and branch accordingly. Each branch corresponds to a different value of the attribute. It is obvious what is the best classification to give each branch: use the class that occurs most often in the training data. Then the error rate of the rules can easily be determined. Just count the errors that occur on the training data, that is, the number of instances that do not have the majority class. 1R … 1R … 1R … 1R chooses the attribute that produces rules with the smallest number of errors—that is, the first and third rule sets. Arbitrarily breaking the tie between these two rule sets gives: Outlook: sunny → no overcast → yes rainy → yes 1R … Missing values and numeric attributes 1R does accommodate both missing values and numeric attributes. It deals with these in simple but effective ways. Missing is treated as just another attribute value so that, for example, if the weather data had contained missing values for the outlook attribute, a rule set formed on outlook would specify four possible class values, one each for sunny, overcast, and rainy and a fourth for missing. We can convert numeric attributes into nominal ones using a simple discretization method. First, sort the training examples according to the values of the numeric attribute. This produces a sequence of class values. 1R … Missing values and numeric attributes Discretization involves partitioning this sequence by placing breakpoints in it. One possibility is to place breakpoints wherever the class changes, producing eight categories: The simplest fix is to move the breakpoint at 72 up one example, to 73.5, producing a mixed partition in which no is the majority class. 1R … Overfitting The 1R method will naturally gravitate toward choosing an attribute that splits into many categories, because this will partition the dataset into many classes, making it more likely that instances will have the same class as the majority in their partition. the limiting case is an attribute that has a different value for each instance—that is, an identification code attribute that pinpoints instances uniquely— and this will yield a zero error rate on the training set because each partition contains just one instance. Highly branching attributes do not usually perform well on test examples; indeed, the identification code attribute will never predict any examples outside the training set correctly. This phenomenon is known as overfitting. 1R … When discretizing a numeric attribute a rule is adopted that dictates a minimum number of examples of the majority class in each partition. This leads to a new division: where each partition contains at least three instances of the majority class, except the last one, which will usually have less. Partition boundaries always fall between examples of different classes. 1R … Whenever adjacent partitions have the same majority class, as do the first two partitions above, they can be merged together without affecting the meaning of the rule sets. Thus the final discretization is: which leads to the rule set:

Document Details

Tags

Related

Full Transcript

Upgrade to continue