0% found this document useful (0 votes)
89 views7 pages

ML 02 Concept

1. The document discusses concept learning from examples using a machine learning algorithm. 2. It describes representing hypotheses as conjunctions of constraints on attributes and finding the most specific hypothesis consistent with positive training examples. 3. The algorithm starts with the most specific hypothesis and generalizes it if it fails to cover a positive example, represented by boundaries of the version space.

Uploaded by

Ali Alhabsi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views7 pages

ML 02 Concept

1. The document discusses concept learning from examples using a machine learning algorithm. 2. It describes representing hypotheses as conjunctions of constraints on attributes and finding the most specific hypothesis consistent with positive training examples. 3. The algorithm starts with the most specific hypothesis and generalizes it if it fails to cover a positive example, represented by boundaries of the version space.

Uploaded by

Ali Alhabsi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

w w

Outline
Learning from examples General-to specific ordering of hypotheses Version spaces and candidate elimination algorithm Inductive bias

Machine Learning
Concept Learning

Concept Learning
Inferring a boolean-valued function from training examples of its input and output Supervised learning
Given:
Training Examples <x, f(x)> of some unknown function f

Training Examples for Concept Enjoy Sport


Concept: days on which my friend Aldo enjoys his favourite water sports Task: predict the value of Enjoy Sport for an arbitrary day based on the values of the other attributes

attributes
Sky Sunny Sunny Rainy Sunny
3

Temp Warm Warm Cold Warm

Humid Normal High High High

Wind Strong Strong Strong Strong

Water Warm Warm Warm Cool

Forecast Same Same Change Change

Enjoy Sport Yes Yes No Yes


4

Find:
A Good Approximation to f

instance

Representing Hypothesis
Hypothesis h is a conjunction of constraints on attributes Each constraint can be:
A specific value : e.g. Water=Warm A dont care value : e.g. Water=? No value allowed (null hypothesis): e.g. Water=

Prototypical Concept Learning Task


Given: Instances X : Possible days decribed by the attributes Sky, Temp, Humidity, Wind, Water, Forecast Target function c: EnjoySport X {0,1} Hypotheses H: conjunction of literals e.g. < Sunny ? ? Strong ? Same > Training examples D : positive and negative examples of the target function: <x1,c(x1)>,, <xn,c(xn)> Determine: A hypothesis h in H such that h(x)=c(x) for all x in D.
5 6

Example: hypothesis h Sky Temp Humid Wind Water Forecast < Sunny ? ? Strong ? Same >

Inductive Learning Hypothesis


Any hypothesis found to approximate the target function well over the training examples, will also approximate the target function well over the unobserved examples.

Number of Instances, Concepts, Hypotheses


Sky: Sunny, Cloudy, Rainy AirTemp: Warm, Cold Humidity: Normal, High Wind: Strong, Weak Water: Warm, Cold Forecast: Same, Change #distinct instances : 3*2*2*2*2*2 = 96 #distinct concepts : 296 #syntactically distinct hypotheses : 5*4*4*4*4*4=5120 #semantically distinct hypotheses : 1+4*3*3*3*3*3=973

General to Specific Order


Consider two hypotheses: h1=< Sunny,?,?,Strong,?,?> h2=< Sunny,?,?,?,?,?> Set of instances covered by h1 and h2: h2 imposes fewer constraints than h1 and therefore classifies more instances x as positive h(x)=1. Definition: Let hj and hk be boolean-valued functions defined over X. Then hj is more general than or equal to hk (written hj hk) if and only if

Instance, Hypotheses and more general


Instances Hypotheses specific x1 x2 h2 h1 h2 h3 h3 h1 h2 general

x X : [ (hk(x) = 1) (hj(x) = 1)]


The relation imposes a partial order over the hypothesis space H that is utilized many concept learning methods.
9

x1=< Sunny,Warm,High,Strong,Cool,Same> x2=< Sunny,Warm,High,Light,Warm,Same>

h1=< Sunny,?,?,Strong,?,?> h2=< Sunny,?,?,?,?,?> h3=< Sunny,?,?,?,Cool,?>


10

Find-S Algorithm
Begin with the most specific hypothesis Generalize this hypothesis each time it fails to cover an observed positive training example

Find-S Algorithm
1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint ai in h If the constraint ai in h is satisfied by x then do nothing else replace ai in h by the next more general constraint that is satisfied by x 3. Output hypothesis h
12

11

Hypothesis Space Search by Find-S


Instances x3 x1 x4 x2 Hypotheses h0 h1 h2,3 h4 specific

Properties of Find-S
Hypothesis space described by conjunctions of attributes Find-S will output the most specific hypothesis within H that is consistent with the positive training examples The output hypothesis will also be consistent with the negative examples, provided the target concept is contained in H, and provided the training examples are correct
14

general

x1=<Sunny,Warm,Normal,Strong,Warm,Same>+ h0=< , , , , , ,> h1=< Sunny,Warm,Normal, Strong,Warm,Same> x2=<Sunny,Warm,High,Strong,Warm,Same>+ h2,3=< Sunny,Warm,?, x3=<Rainy,Cold,High,Strong,Warm,Change> Strong,Warm,Same> x4=<Sunny,Warm,High,Strong,Cool,Change> + h4=< Sunny,Warm,?, 13 Strong,?,?>

Complaints about Find-S


Cant tell if the learner has converged to the target concept, in the sense that it is unable to determine whether it has found the only hypothesis consistent with the training examples. Cant tell when training data is inconsistent, as it ignores negative training examples. Why prefer the most specific hypothesis? What if there are multiple maximally specific hypothesis?
15

Version Spaces
A hypothesis h is consistent with a set of training examples D of target concept if and only if h(x)=c(x) for each training example <x,c(x)> in D. Consistent(h,D) <x,c(x)>D h(x)=c(x) The version space, VSH,D , with respect to hypothesis space H, and training set D, is the subset of hypotheses from H consistent with all training examples:
VSH,D {h H | Consistent(h,D) }
16

List-Then-Eliminate Algorithm
List all the possible hypotheses of H Eliminate the hypotheses found inconsistent with any training example

List-Then-Eliminate Algorithm
1. VersionSpace a list containing every hypothesis in H 2. For each training example <x,c(x)> remove from VersionSpace any hypothesis that is inconsistent with the training example h(x) c(x)

3. Output the list of hypotheses in VersionSpace


17 18

Candidate-Elimination Algorithm
Main idea:
A version space can be represented by its most general and least general members These members form general and specific boundaries that delimit the version space

Example Version Space


S:
{<Sunny,Warm,?,Strong,?,?>}

<Sunny,?,?,Strong,?,?>

<Sunny,Warm,?,?,?,?>

<?,Warm,?,Strong,?,?>

G: x1 x2 x3 x4 = = = =

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?> }

<Sunny <Sunny <Rainy <Sunny

Warm Normal Strong Warm Same> + Warm High Strong Warm Same> + Cold High Strong Warm Change> Warm High Strong Cool Change> +
20

19

Representing Version Spaces


The general boundary G, of version space VSH,D is the set of maximally general members of H consistent with D. The specific boundary S, of version space VSH,D is the set of maximally specific members of H consistent with D. Every member of the version space lies between these boundaries VSH,D = {h H| ( s S) ( g G) (g h s) where x y means x is more general or equal than y
21

Candidate Elimination Algorithm


G maximally general hypotheses in H S maximally specific hypotheses in H For each training example d=<x,c(x)> If d is a positive example: Remove from G any hypothesis that is inconsistent with d For each hypothesis s in S that is not consistent with d
remove s from S. Add to S all minimal generalizations h of s such that
h consistent with d Some member of G is more general than h

Remove from S any hypothesis that is more general than another hypothesis in S
22

Candidate Elimination Algorithm


S0

Example Trace
<, , , , , >
x1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes> x2: <Sunny, Warm, High, Strong, Warm, Same, Yes>

If d is a negative example: Remove from S any hypothesis that is inconsistent with d For each hypothesis g in G that is not consistent with d
Remove g from G. Add to G all minimal specializations h of g such that
h consistent with d Some member of S is more specific than h

S1 S2 = S3 S4

<Sunny, Warm, Normal, Strong, Warm, Same>

x3: <Rainy, Cold, High, Strong, Warm, Change, No> x4: <Sunny, Warm, High, Strong, Cool, Change, Yes>

<Sunny, Warm, ?, Strong, Warm, Same>

<Sunny, Warm, ?, Strong, ?, ?>

<Sunny, ?, ?, Strong, ?, ?>

<Sunny, Warm, ?, ?, ?, ?>

<?, Warm, ?, Strong, ?, ?>

G4 G3

<Sunny, ?, ?, ?, ?, ?>

<?, Warm, ?, ?, ?, ?>

Remove from G any hypothesis that is less general than another hypothesis in G
23

<Sunny, ?, ?, ?, ?, ?>

<?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>

G0 = G1 = G2

<?, ?, ?, ?, ?, ?> 24

Will the Algorithm Converge to the Correct Hypothesis?


This version of the algorithm will converge towards the hypothesis that correctly describes the target concept provided
There are no errors in the training examples There is some hypothesis in H that correctly describes the target concept

Classification of Unseen Data


S:
{<Sunny,Warm,?,Strong,?,?>}

<Sunny,?,?,Strong,?,?>

<Sunny,Warm,?,?,?,?>

<?,Warm,?,Strong,?,?>

The target concept is exactly learned when S and G boundary sets converge to a single, identical, hypothesis If the training data contain an error, the algorithm is certain to remove the correct target concept from the version space
Given enough training data the S and G will converge 25 to an empty version space

G: x5 x6 x7 x8 = = = =

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }

<Sunny <Rainy <Sunny <Sunny

Warm Normal Strong Cool Change> + 6/0 Cold Normal Light Warm Same> - 0/6 Warm Normal Light Warm Same> ? 3/3 Cold Normal Strong Warm Same> ? 2/4
26

What Example to Query Next?


S:
{<Sunny,Warm,?,Strong,?,?>} <Sunny,Warm,?,?,?,?> <?,Warm,?,Strong,?,?>

Biased Hypothesis Space


Our hypothesis space is unable to represent a simple disjunctive target concept : (Sky=Sunny) v (Sky=Cloudy) x1 = <Sunny Warm Normal Strong Cool Change> + S1 : { <Sunny, Warm, Normal, Strong, Cool, Change> } x2 = <Cloudy Warm Normal Strong Cool Change> + S2 : { <?, Warm, Normal, Strong, Cool, Change> } x3 = <Rainy Warm Normal Strong Cool Change> S3 : {} The third example x3 contradicts the already overly general hypothesis space specific boundary S2.
28

<Sunny,?,?,Strong,?,?>

G:

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }

What would be a good query for the learner to pose at that point? Choose an instance that is classified positive by some of the hypothesis and negative by the others. <Sunny, Warm, Normal, Light, Warm, Same> If the example is positive S can be generalized, if it is 27 negative G can be specialized.

Unbiased Learner
Idea: Choose H that expresses every teachable concept, that means H is the set of all possible subsets of X called the power set P(X) |X|=96, |P(X)|=296 ~ 1028 distinct concepts H = disjunctions, conjunctions, negations
e.g. <Sunny Warm Normal ? ? ?> v <? ? ? ? ? Change>

Unbiased Learner
What are S and G in this case? Assume positive examples (x1, x2, x3) and negative examples (x4, x5) G : { (x4 v x5) } S : { (x1 v x2 v x3) }
The only examples that are classified are the training examples themselves. In other words in order to learn the target concept one would have to present every single instance in X as a training example. Each unobserved instance will be classified positive by precisely half the hypothesis in VS and negative by the other half.

H surely contains the target concept. The conjunctive hypothesis space is able to represent only 973 target concepts
29

30

Futility of Bias-Free Learning


A learner that makes no prior assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances.
No Free Lunch theorem

Assumptions of Candidate-Elimination Algorithm


The target concept can be represented by a conjunction of attribute values Each learning algorithm is characterized by the prior assumptions, or inductive bias, it employs
31 32

Inductive Bias
Consider: Concept learning algorithm L Instances X, target concept c Training examples Dc={<x,c(x)>} Let L(xi,Dc) denote the classification assigned to instance xi by L after training on Dc. Definition: The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training data Dc (xi X)[B Dc xi] |-- L(xi, Dc) Where A |-- B means that A logically entails B.

Inductive Systems and Equivalent Deductive Systems


training examples new instance candidate elimination algorithm using hypothesis space H equivalent deductive system classification of new instance or dont know
34

classification of new instance or dont know

training examples new instance


33

assertion H contains target concept

theorem prover

Three Learners with Different Biases


Rote learner: Store examples classify x if and only if it matches a previously observed example. No inductive bias Version space candidate elimination algorithm. Bias: The hypothesis space contains the target concept. Find-S Bias: The hypothesis space contains the target concept and all instances are negative instances unless the opposite is entailed by its other knowledge.
35

Inductive Bias
More strongly biased methods make more inductive leaps, classifying a greater proportion of unseen instances Some inductive biases
Rule out certain concepts Order the hypotheses Are implicit and unchangeable by the learner Are explicit as a set of assertions manipulated by the learner

36

Homework
Sky Sunny Sunny Rainy Sunny Temp Warm Cold Warm Warm Humid High High High Normal Wind Strong Strong Strong Strong Water Cool Warm Warm Waqrm Forecast Change Change Same Same Enjoy Sport Yes No Yes Yes

Exercise
Consider the instance space consisting of integer points in the x, y (with 0 x,y 10) plane and the set of hypotheses H consisting of rectangles
Hypotheses: a x b, c y d d
Positive examples Negative examples

If the training examples are given in reverse order, will the Version Space remain the same? What is the sequence of S and G in this case?
37

c a b

38

Exercise
What is the S boundary of this version space? What is the G boundary of this version space? What is the smallest number of training examples we can provide so that the Candidate-Elimination algorithm will perfectly learn the target concept:
3 x 5, 2 y 9
39

You might also like