Machine Learning - Output Representation & Decision Trees PDF
Document Details
Uploaded by StableBigBen
Brandon University
Tags
Summary
This document provides an overview of different machine-learning models, specifically covering output representation techniques, decision tree concepts, and rule-based systems. It explores various approaches used in machine learning to represent and utilize knowledge.
Full Transcript
Output Knowledge Representation Table – E.g. weather data. – Decision table or regression table. – Might involve selecting a subset of the attributes Output: Linear Model E.g. CPU performance For binary classification problems, line produced by model separates the two classes...
Output Knowledge Representation Table – E.g. weather data. – Decision table or regression table. – Might involve selecting a subset of the attributes Output: Linear Model E.g. CPU performance For binary classification problems, line produced by model separates the two classes (called decision boundary). Line vs hyperplane Trees Decision trees (contact lens, labor) Nodes involve testing an attribute – Usually by comparing it with a constant Leaf nodes give classification, or group of classification, or probabilities. Nodes testing nominal vs numeric attributes – Nominal not usually re-tested later, but numeric may be re-tested later. Testing numeric usually results in two branches (< constant, >= constant) but result in a third branch (for missing value) Trees For real-valued, exact equality may be rare. May specify an interval, and have nodes for Above, Within, Below. What branch to take for a missing value? – Sometimes, missing value is treated as a value by itself and has its own branch. But, what if it is not? Must be treated in special way. E.g. send it down most popular branch. Decision Trees Typically compare att value to constant. Some trees compare 1 att to another? Some compute function of several attrib? Some trees send allow alternative splits (called option nodes) resulting in several leaves being reached with several alternative predictions, which must then be combined (e.g. by majority voting). Try User Classifier within Weka Explorer to manually build a decision tree interactively. Decision Trees for Regression For numeric outcomes, each leaf would contain a numeric value that is the average of the training instances which reach that branch. Called regression tree. Tree is much larger and more complex than linear equation (see next slide), but is usually more accurate (for data that it not really linear) However, tree is more cumbersome & less comprehensible. Model Tree Possible to combine regression equations with regression trees. Leaves contain linear expressions. Called a model tree. Continuous function is modeled by linear patches. Smaller, more comprehensible, and usually more accurate. Output: Rules Antecedent – consequent Antecedent: usually conjunction but could be general logic expression Consequent: class or probability distribution Classification Rules – Association Rules Classification Rules Classification rules can be read directly off a decision tree. However, such rule sets are more complex than necessary. Can be pruned to simplify. Tree cannot always easily be obtained from rules. replicated subtree problem Rules One reason why rules are so popular is that each new rule seems to represent an independent “nugget” of knowledge. New rules can be added to an existing rule set without reshaping the whole tree. However, this independence is an illusion. Decision list: executed in order. Unordered list: what do in case of conflict. Unordered rulesets Multiple classifications for a given example: – give no classification at all. – give most popular (based on training set) – lead to radically different results No classifications apply for given exampe: – give no classification – give most popular class (as default class) Simple Example Boolean class; only rules leading to one outcome are expressed (form of closed world assumption) Nice because conflicts not possible. Rules Unordered rule sets are problematic if no conflict resolution strategy is given. Ordered rule sets sacrifices modularity (each rule is no longer an independent nugget). Most algorithms produce ordered rule sets (sacrificing modularity) Association Rules Like classification rules but can predict any attribute. Not intended to be used as a ruleset. Support and Confidence (a.k.a. Coverage and Accuracy) Association Rules Multiple consequences is not simply shorthand for: In general, we exclude any rule that is implied by another rule Rules with Exceptions For classification rules, we may allow a rule to have an exception clause to facilitate incremental modifications to a ruleset. Suppose we encounter a new plant that an expert says is setosa. Exceptions Fixing a ruleset is not as easy as it seems. Changing the bounds of attribute values in existing rules can result in “old” examples being misclassified. An expert may be consulted to explain why the new flower violates the rules, giving explanations that can be used to extend the relevant rules only. Exceptions Instead of changing the bounds of attributes already tested in the rule, an exception can be made on a some other attribute. We can have exceptions to exceptions to exceptions, resulting in a rule seeming more like a tree. This can also allow an entire concept description to consist of one rule with nested exceptions. Exceptions This rule classifies iris perfectly. Default is usually most popular class. Exceptions People often think of real problems in terms of rules, exceptions, and exceptions to exceptions. So, this is often a good way to express a complex ruleset. More Expressive Rules Consider example on next slide. Rule Expressiveness Most ML schemes do not consider attributes because there is a considerable cost to doing so. Secondary attributes (e.g. “is width > height” or “width minus height”) Instance-Based Representation Lazy learning versus eager learning. Nearest-neighbor (parameter k) Distance metric. – Euclidean? – Need to normalize – Nominal attributes – Attribute Weighting Instance Selection May not be necessary or desirable to store all training instances. Slow Some regions of attribute space are more stable than others with regard to class. Fewer exemplars are needed inside stable regions. More exemplars may be needed near class boundaries. IBL Drawback to instance-based representation is that they do not make explicit the structures that are learned (do not really “describe” the data) However, the instances combine with the distance metric to carve out boundaries in instance space that distinguish one class from another. Can be considered an explicit representation of knowledge. Rectangular regions Some instance-based representations go further and explicitly generalize the instances. Typically by creating rectangular regions that enclose examples of one class. Rectangular Regions Similar to rules with conditions that test a numeric attribute against an upper & lower bound. Rules are generally more conservative than those produce by rule-based methods, because any instance not covered by a rule defaults to nearest-neighbor Possible to allow rectangles to nest (analogous to rules with exceptions). Rectangular generalizations don’t extend well to nominal attributes Prototypes Nearest-Prototypes (instead of nearest- neighbors) Cluster When a cluster rather than a classifier is learned, the output is a diagram that shows how the instances fall into clusters. Multiple Membership Some methods allow an instance to belong to more than 1 cluster. Venn Diagram. Probabilities or Fuzzy Memberships Some algorithms associate instances with clusters with a probability or degree of membership. Hierarchical Clusters Some algorithms produce a hierarchical structure. Dendrogram Clustering Clustering is often followed by a stage in which a decision tree or ruleset is inferred, that allocates each instance to its cluster. In this situation, the clustering operation becomes just one step towards a structural description.