ML 02 Concept
ML 02 Concept
Outline
Learning from examples General-to specific ordering of hypotheses Version spaces and candidate elimination algorithm Inductive bias
Machine Learning
Concept Learning
Concept Learning
Inferring a boolean-valued function from training examples of its input and output Supervised learning
Given:
Training Examples <x, f(x)> of some unknown function f
attributes
Sky Sunny Sunny Rainy Sunny
3
Find:
A Good Approximation to f
instance
Representing Hypothesis
Hypothesis h is a conjunction of constraints on attributes Each constraint can be:
A specific value : e.g. Water=Warm A dont care value : e.g. Water=? No value allowed (null hypothesis): e.g. Water=
Example: hypothesis h Sky Temp Humid Wind Water Forecast < Sunny ? ? Strong ? Same >
Find-S Algorithm
Begin with the most specific hypothesis Generalize this hypothesis each time it fails to cover an observed positive training example
Find-S Algorithm
1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint ai in h If the constraint ai in h is satisfied by x then do nothing else replace ai in h by the next more general constraint that is satisfied by x 3. Output hypothesis h
12
11
Properties of Find-S
Hypothesis space described by conjunctions of attributes Find-S will output the most specific hypothesis within H that is consistent with the positive training examples The output hypothesis will also be consistent with the negative examples, provided the target concept is contained in H, and provided the training examples are correct
14
general
x1=<Sunny,Warm,Normal,Strong,Warm,Same>+ h0=< , , , , , ,> h1=< Sunny,Warm,Normal, Strong,Warm,Same> x2=<Sunny,Warm,High,Strong,Warm,Same>+ h2,3=< Sunny,Warm,?, x3=<Rainy,Cold,High,Strong,Warm,Change> Strong,Warm,Same> x4=<Sunny,Warm,High,Strong,Cool,Change> + h4=< Sunny,Warm,?, 13 Strong,?,?>
Version Spaces
A hypothesis h is consistent with a set of training examples D of target concept if and only if h(x)=c(x) for each training example <x,c(x)> in D. Consistent(h,D) <x,c(x)>D h(x)=c(x) The version space, VSH,D , with respect to hypothesis space H, and training set D, is the subset of hypotheses from H consistent with all training examples:
VSH,D {h H | Consistent(h,D) }
16
List-Then-Eliminate Algorithm
List all the possible hypotheses of H Eliminate the hypotheses found inconsistent with any training example
List-Then-Eliminate Algorithm
1. VersionSpace a list containing every hypothesis in H 2. For each training example <x,c(x)> remove from VersionSpace any hypothesis that is inconsistent with the training example h(x) c(x)
Candidate-Elimination Algorithm
Main idea:
A version space can be represented by its most general and least general members These members form general and specific boundaries that delimit the version space
<Sunny,?,?,Strong,?,?>
<Sunny,Warm,?,?,?,?>
<?,Warm,?,Strong,?,?>
G: x1 x2 x3 x4 = = = =
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?> }
Warm Normal Strong Warm Same> + Warm High Strong Warm Same> + Cold High Strong Warm Change> Warm High Strong Cool Change> +
20
19
Remove from S any hypothesis that is more general than another hypothesis in S
22
Example Trace
<, , , , , >
x1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes> x2: <Sunny, Warm, High, Strong, Warm, Same, Yes>
If d is a negative example: Remove from S any hypothesis that is inconsistent with d For each hypothesis g in G that is not consistent with d
Remove g from G. Add to G all minimal specializations h of g such that
h consistent with d Some member of S is more specific than h
S1 S2 = S3 S4
x3: <Rainy, Cold, High, Strong, Warm, Change, No> x4: <Sunny, Warm, High, Strong, Cool, Change, Yes>
G4 G3
<Sunny, ?, ?, ?, ?, ?>
Remove from G any hypothesis that is less general than another hypothesis in G
23
<Sunny, ?, ?, ?, ?, ?>
G0 = G1 = G2
<?, ?, ?, ?, ?, ?> 24
<Sunny,?,?,Strong,?,?>
<Sunny,Warm,?,?,?,?>
<?,Warm,?,Strong,?,?>
The target concept is exactly learned when S and G boundary sets converge to a single, identical, hypothesis If the training data contain an error, the algorithm is certain to remove the correct target concept from the version space
Given enough training data the S and G will converge 25 to an empty version space
G: x5 x6 x7 x8 = = = =
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }
Warm Normal Strong Cool Change> + 6/0 Cold Normal Light Warm Same> - 0/6 Warm Normal Light Warm Same> ? 3/3 Cold Normal Strong Warm Same> ? 2/4
26
<Sunny,?,?,Strong,?,?>
G:
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }
What would be a good query for the learner to pose at that point? Choose an instance that is classified positive by some of the hypothesis and negative by the others. <Sunny, Warm, Normal, Light, Warm, Same> If the example is positive S can be generalized, if it is 27 negative G can be specialized.
Unbiased Learner
Idea: Choose H that expresses every teachable concept, that means H is the set of all possible subsets of X called the power set P(X) |X|=96, |P(X)|=296 ~ 1028 distinct concepts H = disjunctions, conjunctions, negations
e.g. <Sunny Warm Normal ? ? ?> v <? ? ? ? ? Change>
Unbiased Learner
What are S and G in this case? Assume positive examples (x1, x2, x3) and negative examples (x4, x5) G : { (x4 v x5) } S : { (x1 v x2 v x3) }
The only examples that are classified are the training examples themselves. In other words in order to learn the target concept one would have to present every single instance in X as a training example. Each unobserved instance will be classified positive by precisely half the hypothesis in VS and negative by the other half.
H surely contains the target concept. The conjunctive hypothesis space is able to represent only 973 target concepts
29
30
Inductive Bias
Consider: Concept learning algorithm L Instances X, target concept c Training examples Dc={<x,c(x)>} Let L(xi,Dc) denote the classification assigned to instance xi by L after training on Dc. Definition: The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training data Dc (xi X)[B Dc xi] |-- L(xi, Dc) Where A |-- B means that A logically entails B.
theorem prover
Inductive Bias
More strongly biased methods make more inductive leaps, classifying a greater proportion of unseen instances Some inductive biases
Rule out certain concepts Order the hypotheses Are implicit and unchangeable by the learner Are explicit as a set of assertions manipulated by the learner
36
Homework
Sky Sunny Sunny Rainy Sunny Temp Warm Cold Warm Warm Humid High High High Normal Wind Strong Strong Strong Strong Water Cool Warm Warm Waqrm Forecast Change Change Same Same Enjoy Sport Yes No Yes Yes
Exercise
Consider the instance space consisting of integer points in the x, y (with 0 x,y 10) plane and the set of hypotheses H consisting of rectangles
Hypotheses: a x b, c y d d
Positive examples Negative examples
If the training examples are given in reverse order, will the Version Space remain the same? What is the sequence of S and G in this case?
37
c a b
38
Exercise
What is the S boundary of this version space? What is the G boundary of this version space? What is the smallest number of training examples we can provide so that the Candidate-Elimination algorithm will perfectly learn the target concept:
3 x 5, 2 y 9
39