Decision Trees and ID3 Algorithm

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes the primary focus of machine learning as highlighted in the text?

  • Acquiring structured knowledge to build high-performance systems. (correct)
  • Developing algorithms for balancing poles and solving mathematical problems exclusively.
  • Adjusting internal parameters of adaptive systems exclusively.
  • Building expert systems through direct knowledge elicitation.

What challenge does Feigenbaum's 'bottleneck' problem address?

  • The slow pace of knowledge acquisition via interviews for expert systems. (correct)
  • The limitations of machine learning methods in handling complex data.
  • The inadequacy of current algorithms to improve themselves without external input.
  • The difficulty of adaptive systems in monitoring their own performance.

Which of the following is NOT a dimension along which machine learning systems can be classified, according to Carbonell, Michalski, and Mitchell?

  • The application domain of the system.
  • The computational efficiency of the system. (correct)
  • The learning strategies used.
  • The representation of acquired knowledge.

What is a key characteristic of the application domain for the family of learning systems discussed?

<p>General-purpose, applicable to a wide range of classification tasks. (C)</p> Signup and view all the answers

In the context of decision tree induction, what does it mean for attributes to be considered 'inadequate'?

<p>The attributes do not allow differentiation between objects belonging to different classes. (C)</p> Signup and view all the answers

What is the primary goal of induction in the context of decision trees?

<p>Constructing a decision tree that accurately classifies unseen objects, not just the training set. (A)</p> Signup and view all the answers

How does ID3 address the challenge of forming a decision tree for an arbitrary collection of objects?

<p>By iteratively forming a tree from a random subset of the training data, expanding the subset as needed. (D)</p> Signup and view all the answers

What is the purpose of the 'window' in the ID3 algorithm?

<p>To provide a subset of the training set from which to initially form the tree. (B)</p> Signup and view all the answers

How does ID3 typically handle an attribute value for which there are no corresponding objects in the training set?

<p>It generalizes from the set and assigns the leaf the more frequent class from the original set. (D)</p> Signup and view all the answers

What is 'predictive accuracy' in the context of decision tree learning?

<p>The accuracy with which the decision tree classifies objects other than those in the training set. (A)</p> Signup and view all the answers

What is the computational complexity of the ID3 procedure at each node of the decision tree?

<p>O(|C| * |A|), where |C| is the number of objects and |A| is the number of attributes. (C)</p> Signup and view all the answers

What is 'noise' in the context of the training set?

<p>Non-systematic errors in the values of attributes or class information. (B)</p> Signup and view all the answers

What are the two modifications required for a tree-building algorithm to operate with a 'noise-affected' training set?

<p>The algorithm must be able to work with inadequate attributes. The algorithm must be able to decide that testing further attributes won't improve predictive accuracy. (B)</p> Signup and view all the answers

How can the chi-square test be used in decision tree induction to handle noise?

<p>By preventing testing any attribute whose irrelevance cannot be rejected with a high confidence level. (A)</p> Signup and view all the answers

According to the the content, which approach minimizes the sum of absolute errors over objects in a collection C when generating a decision tree for a noisy dataset?

<p>Assigning the leaf to the more numerous class. (D)</p> Signup and view all the answers

What phenomenon is observed when a correct decision tree formed from uncorrupted data performs worse on corrupted data than an imperfect tree formed from similarly corrupted data?

<p>Overfitting (A)</p> Signup and view all the answers

How does ASSISTANT use a Bayesian formalism to address unknown attribute values?

<p>By determining the probability that the object has a particular value based on the distribution of values in its class. (C)</p> Signup and view all the answers

What is the 'token' value used for when classifying with a decision tree?

<p>The probability that you take a subsequent path. (D)</p> Signup and view all the answers

According to the content, what is the problem with simply treating unknown as a seperate value?

<p>It increases the desirability of an attribute, which is counterintuitive to common sense. (A)</p> Signup and view all the answers

In handling unknown attribute values during information gain assessment, how are object values distributed?

<p>In proportion to the relative frequency of these values in C. (B)</p> Signup and view all the answers

What is the approach by Catlett to deal with partial knowledge?

<p>Stating a value in Shafer Notation. (B)</p> Signup and view all the answers

Why do attributes with lots of values have a preference given the gain criterion?

<p>Excessive fineness to the point that A' is less useful. (B)</p> Signup and view all the answers

An attribute has v value subsets and once trivial subsets are removed, we can specify with 2^(v-1)-1. Given that the computation rises, what did the text say about practicality?

<p>It is infeasible if it is around 20 values. (D)</p> Signup and view all the answers

What aspect does IV(A) measure?

<p>The amount of information in an answer. (B)</p> Signup and view all the answers

According to the content, which produces smaller decision trees, subset or gain ratio criteria?

<p>Subset criteria. (C)</p> Signup and view all the answers

According to Hart, for an attribute A, its value is irrelevant to its class. How may this be addressed?

<p>Find the highest confidence level. (D)</p> Signup and view all the answers

When creating algorithms that are noisy, incomplete, and have real world applications what is something work aims to do?

<p>Improve performance. (C)</p> Signup and view all the answers

What obstacle exists to induction due to generated decision trees?

<p>Lack of familiarity. (B)</p> Signup and view all the answers

What does structured induction tackle in terms of programming style?

<p>Using structure like structured programing. (C)</p> Signup and view all the answers

In order to develop rules, consider the example: 'monkey, elephant, house, giraffe'. Which approach had a better accurate classification of objects not used in a training set?

<p>A tree to discriminate. (A)</p> Signup and view all the answers

Flashcards

Machine Learning

A central research area in AI since the 1950s, focusing on the ability of machines to learn and improve from experience.

Learning as a Hallmark

The ability to acquire knowledge is a defining characteristic of intelligence, making learning essential for understanding intelligence.

Learning Methodology

In machine learning, this provides the means to develop systems that can achieve high performance through the acquisition of knowledge.

Adaptive Systems

Systems that observe their own performance and modify internal settings to enhance their efficacy.

Signup and view all the flashcards

Learning in the form of knowledge

The acquisition of new knowledge structured as concepts, discrimination nets, or production rules.

Signup and view all the flashcards

Knowledge Acquisition Bottleneck

The bottleneck in expert systems knowledge acquisition, where knowledge is manually extracted from experts at a slow pace.

Signup and view all the flashcards

Machine Learning as a Solution

Machine learning methods can serve as a means of explicating knowledge, addressing the knowledge acquisition bottleneck.

Signup and view all the flashcards

Microcosm of machine learning

A specific area of machine learning focused on systems used for building knowledge-based systems of a simple kind.

Signup and view all the flashcards

TDIDT Family

A family of machine learning systems that construct decision trees from examples.

Signup and view all the flashcards

Dimensions of Machine Learning

The learning strategies used, how knowledge is represented, and the application domain.

Signup and view all the flashcards

Application Domain

The area to which the learning system is applied, like chemistry or chess.

Signup and view all the flashcards

Classification

The learning activity of assigning an object to one of several disjoint classes.

Signup and view all the flashcards

Decision Tree Systems

Systems that develop decision trees for classifying data, beginning at the root and proceeding to the leaves.

Signup and view all the flashcards

TDIDT Name Emphasis

Systems that carry out Top-Down Induction of Decision Trees.

Signup and view all the flashcards

Classification Rules

Rules developed from example objects described by properties or attributes.

Signup and view all the flashcards

Existing Database

A database providing a history of observations for machine learning algorithms.

Signup and view all the flashcards

"Underlying Strategy"

A non-incremental learning approach that uses a set of cases to develop a decision tree from the top down.

Signup and view all the flashcards

Classification Task Examples

A medical condition diagnosis based on symptoms, where classes are disease states or therapies; helps determine game-theoretic chess values; determines thunderstorm likelihood from atmospheric observations.

Signup and view all the flashcards

Knowledge Representation

Representing acquired knowledge as decision trees, a relatively simple knowledge formalism.

Signup and view all the flashcards

Zeroth–order characterization

Attributes of the objects in the universe that provide a language for characterizing these objects. For example: outlook overcast; temperature: cool; humidity: normal; windy: false

Signup and view all the flashcards

Induction Task Basis

The universe of objects described in terms of a collection of attributes.

Signup and view all the flashcards

Mutually Exclusive Classes

Each object belongs to one of a set of mutually exclusive classes denoted P or N.

Signup and view all the flashcards

Positive and Negative Instances

Objects of class P and N are often referred to as positive instances and negative instances of the concept being learned.

Signup and view all the flashcards

Induction task

To develop a classification rule that can determine the class of any object from its values of the attributes.

Signup and view all the flashcards

Inadequate Attributes

A training set containing two objects that have identical values for each attribute but belong to different classes.

Signup and view all the flashcards

Study Notes

  • This paper summarizes an approach to synthesizing decision trees and describes ID3
  • Recent studies show the methodology can be modified to deal with noisy and/or incomplete information
  • Discusses a reported shortcoming of the basic algorithm and compares two means of overcoming it

Introduction

  • Machine learning has been a central research area since artificial intelligence first achieved recognition as a discipline in the mid 1950's
  • Ability to learn is a hallmark of intelligent behavior, so any attempt to understand intelligence as a phenomenon must include an understanding of learning
  • Learning provides a potential methodology for building performant systems.
  • Research on learning is made up of diverse subfields, ranging from adaptive systems adjusting internal parameters to learning as the acquisition of structured knowledge in the form of concepts

Knowledge-Based Expert Systems

  • Systems are powered by knowledge explicitly represented
  • Knowledge needed to drive the pioneering expert systems was codified through protracted interaction between a domain specialist and a knowledge engineer
  • Typical rate of knowledge elucidation by this method is a few rules per man day, while expert system for a complex task may require hundreds or thousands of such rules
  • Interview approach to knowledge acquisition cannot keep pace with the burgeoning demand for expert systems; Feigenbaum (1981) terms this the 'bottleneck' problem, which has stimulated the investigation of machine learning methods as a means of explicating knowledge (Michie, 1983).

Learning Systems

  • The paper focuses on one microcosm of machine learning and on a family of learning systems used to build knowledge-based systems of a simple kind
  • Section 2 outlines the features of this family and introduces its members
  • These systems address the same task of inducing decision trees from examples
  • One system (ID3) is described in detail in Section 4
  • Sections 5 and 6 present extensions to ID3 that enable it to cope with noisy and incomplete information
  • A review of a central facet of the induction algorithm reveals possible improvements that are set out in Section 7

The TDIDT Family of Learning Systems

  • Carbonell, Michalski and Mitchell (1983) determine machine learning systems can be classified by:
  • The underlying learning strategies used
  • The representation of knowledge acquired by the system
  • The application domain of the system
  • TDIDT family concerned with a family of learning systems that have strong common bonds in all dimensions
  • Application domain of these systems is not limited to any particular area of intellectual activity such as Chemistry or Chess
  • They can be applied to any such area and are general-purpose systems
  • Applications that they address all involve classification
  • Product of learning can assign a hitherto-unseen object to one of a specified number of disjoint classes
  • Diagnosis of a medical condition from symptoms, in which the classes could be either the various disease states or the possible therapies
  • Determining the game-theoretic value of a chess position, with the classes won for white, lost for white, and drawn
  • Deciding from atmospheric observations whether a severe thunderstorm is unlikely, possible or probable
  • Classification tasks are only a minuscule subset of procedural tasks, but even activities such as robot planning can be recast as classification problems (Dechter and Michie, 1985)
  • Members of this family are sharply characterized by their representation of acquired knowledge as decision trees
  • This is a relatively simple knowledge formalism that lacks the expressive power of semantic networks or other first-order representations
  • Learning methodologies used in the TDIDT family are considerably less complex than those employed in systems that can express the results of their learning in a more powerful language
  • It is still possible to generate knowledge in the form of decision trees that is capable of solving difficult problems of practical significance
  • Underlying strategy is non-incremental learning from examples
  • Systems are presented with a set of cases relevant to a classification task and develop a decision tree from the top down, guided by frequency information in the examples but not by the particular order in which they are given
  • The systems described here search for patterns in the given examples and so must be able to examine and re-examine all of them at many stages during learning

Decision Trees

  • Systems described develop decision trees for classification tasks
  • Trees are constructed beginning with the root of the tree and proceeding down to its leaves
  • Palindromic name emphasizes that its members carry out the Top-Down Induction of Decision Trees
  • Example objects from which a classification rule is developed are known only through their values of a set of properties or attributes
  • Decision trees in turn are expressed in terms of these same attributes
  • Examples can be assembled in two ways
  • From an existing database that forms a history of observations, such as patient records in some area of medicine that have accumulated at a diagnosis center
  • As a carefully culled set of tutorial examples prepared by a domain expert, each with some particular relevance to a complete and correct classification rule
  • Expert might take pains to avoid redundancy and to include examples of rare cases

TDIDT Systems

  • Earlier TDIDT systems were designed with the 'historical record' approach in mind, but all systems described here are now often used with tutorial sets (Michie, 1985)
  • The patriarch of this family is Hunt's Concept Learning System framework (Hunt, Marin and Stone, 1966). CLS constructs a decision tree that attempts to minimize the cost of classifying an object
  • Cost has components of two types
  • Measurement cost of determining the value of property A exhibited by the object
  • Misclassification cost of deciding that the object belongs to class J when its real class is K
  • CLS uses a lookahead strategy similar to minimax. At each stage, CLS explores the space of possible decision trees to a fixed depth, chooses an action to minimize cost in this limited space, then moves one level down in the tree
  • Depending on the depth of lookahead chosen, CLS can require a substantial amount of computation, but has been able to unearth subtle patterns in the objects shown to it
  • ID3 (Quinlan, 1979, 1983a) is one of a series of programs developed from CLS in response to a challenging induction task posed by Donald Michie, viz. to decide from pattern-based features alone whether a particular chess position in the King-Rook vs King-Knight endgame is lost for the Knight's side in a fixed number of ply
  • ID3 embeds a tree-building method in an iterative outer shell, and abandons the cost-driven lookahead of CLS with an information-driven evaluation function
  • ACLS (Paterson and Niblett, 1983) is a generalization of ID3, as CLS and ID3 both require each property used to describe objects has only values from a specified set
  • ACLS permits properties that have unrestricted integer values, allowing it to be applied to difficult tasks such as image recognition (Shepherd, 1983)
  • ASSISTANT (Kononenko, Bratko and Roskar, 1984) also acknowledges ID3 as its direct ancestor
  • It differs from ID3 in many ways, some of which are discussed in detail in later sections.
  • ASSISTANT further generalizes on the integer-valued attributes of ACLS by permitting attributes with continuous (real) values
  • Rather than insisting that the classes be disjoint, ASSISTANT allows them to form a hierarchy, so that one class may be a finer division of another
  • ASSISTANT does not form a decision tree iteratively in the manner of ID3, but does include algorithms for choosing a 'good' training set from the objects available has been used in several medical domains with promising results
  • Bottom-most three systems in the figure are commercial derivatives of ACLS
  • They incorporate many user-friendly innovations and utilities that expedite the task of generating and using decision trees
  • Have industrial successes to their credit, such as Westinghouse Electric's Water Reactor Division which points to a fuel-enrichment application in which the company was able to boost revenue by 'more than ten million dollars per annum' through the use of one of them

Induction Task

  • Basic universe of objects that are described in terms of a collection of attributes
  • Each attribute measures some important feature of an object and will be limited here to taking a (usually small) set of discrete, mutually exclusive values
  • Attributes provide a zeroth-order language for characterizing objects in the universe
  • Each object in the universe belongs to one of a set of mutually exclusive classes
  • There are only two such classes denoted P and N, although the extension to any number of classes is not difficult
  • In two-class induction tasks, objects of class P and N are sometimes referred to as positive instances and negative instances, respectively, of the concept being learned
  • Task is to develop a classification rule that can determine the class of any object from its values of the attributes. The immediate question is whether or not the attributes provide sufficient information to do this
  • If the training set contains two objects that have identical values for each attribute and yet belong to different classes, it is clearly impossible to differentiate between these objects with reference only to the given attributes and will be termed inadequate for the training set and hence for the induction task
  • A classification rule will be expressed as a decision tree
  • Leaves of a decision tree are class names, other nodes represent attribute-based tests with a branch for each possible outcome
  • In order to classify an object, we start at the root of the tree, evaluate the test, and take the branch appropriate to the outcome until a leaf is encountered, at which time the object is asserted to belong to the class named by the leaf
  • If the attributes are adequate, it is always possible to construct a decision tree that correctly classifies each object in the training set, and usually there are many such correct decision trees
  • The essence of induction intends to construct a decision tree that correctly classifies not only objects from the training set but other (unseen) objects as well
  • The decision tree must capture some meaningful relationship between an object's class and its values of the attributes
  • Simpler tree is more likely to capture structure inherent in the problem

ID3

  • One approach to the induction task above is to generate all possible decision trees that correctly classify the training set and to select the simplest of them
  • Presented here as a common sense application of Occam's Razor, is also supported by analysis. Pearl (1978b) and Quinlan (1983a) have determined generalizing form
  • The number of such trees is finite but very large, so this approach would only be feasible for small induction tasks
  • ID3 was designed for the other end of the spectrum, where there are many attributes and the training set contains many objects, but where a reasonably good decision tree is required without much computation
  • ID3 has generally been found to construct simple decision trees, but the approach it uses cannot guarantee that better trees have not been overlooked
  • Basic structure of ID3 is iterative
  • A subset of the training set called the window is chosen at random and a decision tree formed from it
  • This tree correctly classifies all objects in the window
  • All other objects in the training set are then classified using the tree
  • If the tree correctly classifies all objects the process terminates
  • If not, a selection of the incorrectly classified objects is added to the window so the process continues
  • Decision trees have been found after only a few iterations for training sets of up to thirty thousand objects described in terms of up to 50 attributes

Issues with ITERATIVE Framework

  • Empirical evidence suggests that a correct decision tree is usually found more quickly by this iterative method than by forming a tree directly from the entire training set
  • O'Keefe (1983) has noted that the iterative framework cannot be guaranteed to converge on a final tree unless the window can grow to include the entire training set. This potential limitation has not yet arisen in practice
  • How to form a decision tree for an arbitrary collection C of objects
  • If C is empty or contains only objects of one class, the simplest decision tree is just a leaf labelled with the class
  • Otherwise, let T be any test on an object with possible outcomes O1, O2, Ow
  • Each object in C will give one of these outcomes for T, so T produces a partition {C1, C2,... Cw} of C with C₁ containing those objects having outcome O₁
  • If each subset Ci in the figure could be replaced by a decision tree for C₁, the result would be a decision tree for all of C
  • Provided that a test can always be found that gives a non-trivial partition of any set of objects, this procedure will always produce a decision tree that correctly classifies each object in C
    • The choice of test is crucial if the decision tree is to be simple

Tests to Branch

  • A test will be restricted to branching on the values of an attribute, so choosing a test comes down to selecting an attribute for the root of the tree
  • The first induction programs in the ID series used a working evaluation function
  • Following a suggestion of Peter Gacs, ID3 adopted an information-based method that depends on two assumptions. Let C contain p objects of class P and n of class N:
  • Any correct decision tree for C will classify objects in the same proportion as their representation in C
  • An arbitrary object will be determined to belong to class P with probability p/(p + n) and to class N with probability n/(p+n).
  • When a decision tree is used to classify an object, it returns a class
  • A decision tree can thus be regarded as a source of a message 'P' or 'N', with the expected information needed to generate this message
    • If attribute A with values { A1, A2, Av) is used for the root of the decision tree, it will partition C into {C1, C2, Cv} where C₁ contains those objects in C that have value A¡ of A
    • The expected information required for the subtree for Ci is I(pi, ni). The expected information required for the tree with A as root is then obtained as the weighted average
    • The information gained by branching on A is therefore
  • ID3 examines all candidate attributes and chooses A to maximize gain(A), forms the tree as above, and then uses the same process recursively to form decision trees for the residual subsets C1, C2, Cv

Attribute Results

  • Tree-forming method used in ID3 would choose outlook as the attribute for the root of the decision tree. The objects would then be divided into subsets according to their values of the outlook attribute. In a similar fashion for the other attributes, a decision tree for each subset would be induced.
  • A special case arises if C contains no objects with some particular value Aj of A, giving an empty Cj
  • ID3 labels such a leaf as 'null' so that it fails to classify any object arriving at that leaf
  • A better solution would generalize from the set C from which Cj came, and assign this leaf the more frequent class in C
  • The worth of ID3's attribute-selecting heuristic can be assessed by
  • Simplicity of the resulting decision trees, or
  • Performance and how well those trees express
  • By the accuracy with which they classify objects other than those in the training set (their predictive accuracy). A straightforward method of assessing this predictive accuracy is to use only part of the given set of objects as a training set, and to check the resulting decision tree on the remainder
  • Discussion of ID3 is rounded off by looking at the computational requirements of the procedure
  • At each non-leaf node of the decision tree, the gain of each untested attribute A must be determined
  • Gain depends on the values pi and n₁ for each value A¡ of A, so every object in C must be examined to determine its class and its value of A
  • Consequently, the computational complexity of the procedure at each such node is O(|C|•|A|), where |A| is the number of attributes above
  • ID3's total computational requirement per iteration is thus proportional to the product of the size of the training set, the number of attributes and the number of non-leaf nodes in the decision tree
  • Same relationship appears to extend to the entire induction process, even when several iterations are performed
  • No exponential growth in time or space has been observed as the dimensions of the induction task increase, so the technique can be applied to large tasks

Noise

  • The information supplied in the training set has been assumed to be entirely accurate
  • Sadly, induction tasks based on real-world data are unlikely to find this assumption to be tenable
  • Description of objects may include attributes based on measurements or subjective judgements, both of which may give rise to errors in the values of attributes
  • Some of the objects in the training set may even have been misclassified
  • Errors in the training set may cause the attributes to become inadequate, or may lead to decision trees of spurious complexity
  • Non-systematic errors of this kind in either the values of attributes or class information are usually referred to as noise
  • Two modifications are required if the tree-building algorithm is to be able to operate with a noise-affected training set
  • Algorithm must be able to work with inadequate attributes, because noise can cause even the most comprehensive set of attributes to appear inadequate
  • The algorithm must be able to decide that testing further attributes will not improve the predictive accuracy of the decision tree. In the last example above, it should refrain from increasing the complexity of the decision tree to accommodate a single noise-generated special case

How to Handle NOISE

  • When an attribute is relevant to classification, an approach is to require that the information gain of any tested attribute exceeds some absolute or percentage threshold
  • Experiments with this approach suggest that a threshold large enough to screen out irrelevant attributes also excludes attributes that are relevant, and the performance of the tree-building procedure is degraded in the noise-free case
  • Alternative method based on the chi-square test for stochastic independence has been found to be more useful
  • To deal with objects of different classes, the notion of class could be generalized to allow the value p/(p + n) in the interval (0,1), a class of 0.8 being interpreted as 'belonging to class P with probability 0.8'
  • Alternative approach would be to opt for the more numerous class, i.e. to assign the leaf to class P if p > n, to class N if p < n, and to either if p = n.
  • Studies have been carried out to see how this modified procedure holds up under varying levels of noise.
  • ASSISTANT uses an information-based measure to perform much the same function, but no comparative results are available to date
  • It might seem that the value should be replaced by an incorrect value
    • If the value of each object were replaced by the (on-ly) incorrect value, the initial attribute will have been merely inverted with no loss of information

Attribute Information

  • Proportion of times that an unknown attribute value is replaced by an incorrect value, noise level in class information results in a 3% degradation. - Comparable figures have been obtained for other induction tasks.One interesting point emerged from other experiments in which a correct decision tree formed from an uncorrupted training set was used to classify objects whose descriptions were corrupted
  • This scenario corresponds to forming a classification rule under controlled and sanitized laboratory conditions, then using it to classify objects in the field
  • For higher noise levels, the performance of the correct decision tree on corrupted data was found to be inferior to that of an imperfect decision tree formed from data corrupted to a similar level
  • To eliminate noise from the attribute information in the training set attribute these attributes will be subject to high noise levels

Unknown Attribute Values

  • Process to enables the tree-building process to deal with noisy or corrupted values and is concerned with an allied problem that also arises in practice: unknown attribute values
  • One way around the problem attempts to fill in an unknown value by utilizing information provided by context
  • ASSISTANT uses a Bayesian formalism to determine the probability that the object has value A¡ of A by examining the distribution of values of
  • The probability that the object has value A¡ for attribute A can be expressed as prob(A = Ai | class = P) =
  • After determining of A is known, could either choose the most likely value or divide the object into fractional objects, each with one possible value of A, weighted according to the probabilities above
  • Alen Shapiro has suggested using a decision-tree approach to determine the unknown values of an attribute
  • To Construct a create C' to classify any object

Attribute Tasks

  • How well the methods perform when asked to fill in a single unknown attribute value
  • The method replaces an unknown value by its correct value, for comparison
  • Always replace an unknown value of an attribute with its most common value
  • The method gives results than strategy: always replace an unknown value of an attribute with its most common value
  • These methods can lead to an anomalous situation Having is unknown apparent increase
  • One strategy performs the following and has been found to work well
  • Let the attribute be a collection objects. numbers
  • Calculate by calculating these values distributed by to frequency These has the the is an

Error Production

  • The distribution of values over the possible classes might also be used to compute a confidence level for the classification
  • Straightforward though it may be, this procedure has been found to give a very graceful degradation as the incidence of unknown values increases
  • Graph summarizes the results of an experiment on the now-familiar task with objects and various results from levels exploration of experiments.
  • Catlett (1985) has taken this approach a stage further by allowing knowledge that an object might have

Selection Criterion

  • Attention has recently been refocused on the evaluation function for selecting the best attribute-based test to form the root of a decision tree
  • Recall that chooses to select the attribute that gains the most information
  • They've found that that gain criterion tends to want attributes with many values
  • However, it can be proved that gain is greater than only the same In will

The Final Criterion for Binary

  • Limiting decision trees to a binary format harks back to each test was of the form has clearly a special case
  • This permits a set of values, rather than a single value, to be distinguished from the others
  • The method of dealing with attributes having continuous values follows the same binary approach
  • An attribute being grouped together
  • Compute to and is

Goal to Produce

  • The gain ratio criterion gives decision trees with the greatest predictive accuracy
  • In all, experiments suggest that the gain ratio criterion does pick a good attribute for the root of the tree
  • Hart (1985) has proposed that this same test could function directly as a selection criterion: pick the attribute for which this confidence level is highest.
  • This measure takes explicit account of the number of values of an attribute and so may not exhibit bias
  • Notes, that the square is that data can't always be true .

Conclusion

  • The aim of this paper has been to reveal technology for building decision trees from examples is fairly robust
  • Commercial systems have achieved noteworthy successes
  • Groundwork has been done for advances that will permit such tools to deal even with noisy, incomplete data
  • Work is continuing improve the performance of the algorithms
  • Recent work by (1983) offer a It tackled subtasks

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Quiz de Aprendizaje Automático
10 questions
Decision Tree Algorithms
134 questions

Decision Tree Algorithms

WellEstablishedWisdom avatar
WellEstablishedWisdom
Decision Trees and Overfitting Quiz
18 questions
Decision Tree Algorithms Overview
13 questions
Use Quizgecko on...
Browser
Browser