0% found this document useful (0 votes)

35 views4 pages

How Many Samples To Learn A Finite Class?

1. The lecture discusses Probably Approximately Correct (PAC) learning and how it relates to concepts, hypotheses, and samples. PAC learning involves learning a concept from samples in a way that a hypothesis can predict future samples with high probability. 2. It explains that the number of samples needed for learning is related to the size of the concept class and its VC dimension. For a finite concept class C, O(|C|/ε log(1/δ)) samples are enough, while for infinite classes the VC dimension is important - if it is finite, then learning is possible. 3. Computational complexity is discussed - finding a hypothesis consistent with samples could be NP-hard, though improper learning avoids this

Uploaded by

Muhammad Al Kahfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views4 pages

How Many Samples To Learn A Finite Class?

Uploaded by

Muhammad Al Kahfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

6.080/6.

089 GITCS

1 April 2008

Lecture 20
Lecturer: Scott Aaronson

Scribe: Georey Thomas

Probably Approximately Correct Learning

In the last lecture, we covered Valiants model of Probably Approximately Correct (PAC) learn
ing. This involves:
S:
A sample space (e.g., the set of all points)
D:
A sample distribution (a probability distribution over points in the sample space)
c : S {0, 1}: A concept, which accepts or rejects each point in the sample space
C:
A concept class, or collection of concepts
For example, we can take our sample space to be the set of all points on the blackboard, our
sample distribution to be uniform, and our concept class to have one concept corresponding to
each line (where a point is accepted if its above the line and rejected if its below it). Given a set
of points, as well as which points are accepted or rejected, our goal is to output a hypothesis that
explains the data: e.g., draw a line that will correctly classify most of the future points.
A bit more formally, theres some true concept c C that were trying to learn. Given
sample points x1 , . . . , xm , which are drawn independently from D, together with their classications
c(x1 ), . . . , c(xm ), our goal is to nd a hypothesis h C such that
Pr [h(x) = c(x)] 1
. Furthermore, we want to succeed at this goal with probability at least 1 over the choice of
xi s. In other words, with high probability we want to output a hypothesis thats approximately
correct (hence Probably Approximately Correct).

How many samples to learn a nite class?

The rst question we can ask concerns sample complexity: how many samples do we need to have
seen to learn a concept eectively? Its not hard to prove the following theorem: after we see

1
|C|
m=O
log

samples drawn from D, any hypothesis h C we can nd that agrees with all of these samples
(i.e., such that h(xi ) = c(xi ) for all i) will satisfy
Pr [h(x) = c(x)] 1
with probability at least 1 over the choice of x1 , . . . , xm .
We can prove this theorem by the contrapositive. Let h C be any bad hypothesis: that
is, such that Pr [h(x) = c(x)] < 1 . Then if we independently pick m points from the sample
distribution D, the hypothesis h will be correct on all of these points with probability at most
(1 )m . So by the union bound, the probability that there exists a bad hypothesis in C that
nevertheless agrees with all our sample data is at most |C| (1 )m (the number of hypotheses,
20-1

good or bad, times the maximum probability of each bad hypothesis agreeing with the sample
data). Now we just do algebra:
= |C| (1 )m

m = log1
|C|
log /|C|
=
log 1
1
|C|

log
.

Note that there always exists a hypothesis in C that agrees with c on all the sample points:
namely, c itself (i.e. the truth)! So as our learning algorithm, we can simply do the following:
1. Find any hypothesis in h C that agrees with all the sample data (i.e., such that h(xi ) = c(xi )
for all x1 , . . . , xm ).
2. Output h.
Such an h will always exist, and by the theorem above it will probably be a good hypothesis.
All we need is to see enough sample points.

How many samples to learn an innite class?

The formula

1
|C|
log

works so long as |C| is nite, but it breaks down when |C| is innite. How can we formalize the
intuition that the concept class of lines is learnable, but the concept class of arbitrary squiggles is
not? A line seems easy to guess (at least approximately), if I give you a small number of random
points and tell you whether each point is above or below the line. But if I tell you that these points
are on one side of a squiggle, and those points are on the other side, then no matter how many
points I give you, it seems impossible to predict which side the next point will be on.
So whats the dierence between the two cases? It cant be the number of lines versus the number
of squiggles, since theyre both innite (and be taken to have the same innite cardinality).
From the oor: Isnt the dierence just that you need two parameters to specify a line, but
innitely many parameters to specify a squiggle?
Thats getting closer! The trouble is that the notion of a parameter doesnt occur anywhere
in the theory; its something we have to insert ourselves. To put it another way, its possible to
come up with silly parameterizations where even a line takes innitely many parameters to specify,
as well as clever parameterizations where a squiggle can be specied with just one parameter.
Well, the answer isnt obvious! The idea that nally answered the question is called VCdimension (after two of its inventors, Vapnik and Chervonenkis). We say the set of points x1 , . . . , xm
is shattered by a concept class C if for all 2m possible settings of c(x1 ), . . . , c(xm ) to 0 or 1 (reject
or accept), there is some concept c C that agrees with those values. Then the VC-dimension
of C, denoted VCdim(C), is the size of the largest set of points shattered by C. If we can nd
arbitrarily large (nite) sets of points that can be shattered, then VCdim(C) = .
m

20-2

If we let C be the concept class of lines in the plane, then VCdim(C) = 3. Why? Well, we can
put three points in a triangle, and each of the eight possible classications of those points can be
realized by a single line. On the other hand, theres no set of four points such that all sixteen possible
classications of those points can be realized by a line. Either the points form a quadrilateral, in
which case we cant make opposite corners have the same classication; or they form a triangle and
an interior point, in which case we cant make the interior point have a dierent classication from
the other three points; or three of the points are collinear, in which case we certainly cant classify
those points with a line.
Blumer et al.1 proved that a concept class is PAC-learnable if and only if its VC-dimension is
nite, and that

VCdim(C)
1
m=O
log

samples suce. Once again, a learning algorithm that works is just to output any hypothesis h in
the concept class that agrees with all the data. Unfortunately we dont have time to prove that
here.
A useful intuition is provided by a corollary of Blumer et al.s result called the Occams Razor
Theorem: whenever your hypothesis has suciently fewer bits of information than the original
data, it will probably correctly predict most future data drawn from the same distribution.

Computational Complexity of Learning

Weve seen that given a nite concept classor even an innite class with nite VC-dimension
after seeing enough sample points, you can predict the future just by nding any hypothesis in the
class that ts the data. But how hard is it as a computational problem to nd a hypothesis that
ts the data? This has the general feel of something that might be NP-complete! In particular, it
feels similar to satisabilitynd some hypothesis that satises certain xed outputsthough its
not quite the same.
Here we need to make a subtle distinction. For proper learningwhere the goal is to output
a hypothesis in some xed format (like a DNF expression), its indeed possible to prove in some
cases that nding a hypothesis that ts the data is NP-complete. For improper learningwhere
the hypothesis can be any polynomial-time algorithm so long as it predicts the datato this day
we dont know whether nding a hypothesis is NP-complete.
On the other hand, the learning problem is certainly in N P , since given a hypothesis its easy to
check whether it ts the data or not. This means that if P = N P , then all learning problems are in
P and are computationally tractable. Think about what that means: we could ask our computer to
nd the shortest ecient description of the stock market, the patterns of neural rings in a human
brain, etc., and thereby solve many of the hardest problems of AI! This is yet another reason to
believe P = N P .

Blumer, Ehrenfeucht, Haussler, Warmuth, Learnability and the Vapnik-Chervonenkis dimension, JACM, 1989

20-3

MIT OpenCourseWare
http://ocw.mit.edu

6.045J / 18.400J Automata, Computability, and Complexity

Spring 2011

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Week_7_Notes[1]
No ratings yet
Week_7_Notes[1]
11 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
ML 3
No ratings yet
ML 3
36 pages
PAC LEARNING
No ratings yet
PAC LEARNING
30 pages
SML_Lecture3
No ratings yet
SML_Lecture3
36 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
Machine Learning - Computational Learning Theory PDF
No ratings yet
Machine Learning - Computational Learning Theory PDF
7 pages
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
No ratings yet
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
6 pages
Learnability Can Be Undecidable-Nicolelis
No ratings yet
Learnability Can Be Undecidable-Nicolelis
5 pages
MLSM Lecture3 190923
No ratings yet
MLSM Lecture3 190923
36 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Tutorial
No ratings yet
Tutorial
81 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Colt Tutorial
No ratings yet
Colt Tutorial
43 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
PSO
No ratings yet
PSO
74 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
lecture01
No ratings yet
lecture01
11 pages
Lec 6
No ratings yet
Lec 6
29 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
AL3451 13 M
No ratings yet
AL3451 13 M
22 pages
PAC
No ratings yet
PAC
45 pages
Lect 0329
No ratings yet
Lect 0329
3 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
No ratings yet
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
38 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
Model Theory and Machine Learning
No ratings yet
Model Theory and Machine Learning
13 pages
VC_Dim
No ratings yet
VC_Dim
22 pages
Cohn1994 Article ImprovingGeneralizationWithAct
No ratings yet
Cohn1994 Article ImprovingGeneralizationWithAct
21 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
Computational Learning
No ratings yet
Computational Learning
12 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
ML Notes
No ratings yet
ML Notes
161 pages
UNIT-3
No ratings yet
UNIT-3
99 pages
Lecture16 VC
No ratings yet
Lecture16 VC
42 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
Introduction To Machine Learning (67577) : Shai Shalev-Shwartz
No ratings yet
Introduction To Machine Learning (67577) : Shai Shalev-Shwartz
124 pages
Lecture5 Learning Theory v1.1
No ratings yet
Lecture5 Learning Theory v1.1
59 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
MachineLearning_UNIT III
No ratings yet
MachineLearning_UNIT III
30 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
Hw5 Solution
No ratings yet
Hw5 Solution
4 pages
(2007) - Cucker-Learning Theory - An Approximation Theory Viewpoint
No ratings yet
(2007) - Cucker-Learning Theory - An Approximation Theory Viewpoint
237 pages
HW 1
No ratings yet
HW 1
6 pages
HW 1 Eeowh 3
No ratings yet
HW 1 Eeowh 3
6 pages
computational learning theorem
No ratings yet
computational learning theorem
91 pages
Basics of Learning Theory
No ratings yet
Basics of Learning Theory
35 pages
4.0 ALGO211 Week10 Computational Learning Theory
No ratings yet
4.0 ALGO211 Week10 Computational Learning Theory
16 pages
ML Unit-4 (Complete Notes)
No ratings yet
ML Unit-4 (Complete Notes)
20 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
Week 3
No ratings yet
Week 3
56 pages
ML_UNIT 4
No ratings yet
ML_UNIT 4
15 pages
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
No ratings yet
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
34 pages
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Horn Clause: Fundamentals and Applications
From Everand
Horn Clause: Fundamentals and Applications
Fouad Sabry
No ratings yet
Attacking Problems in Logarithms and Exponential Functions
From Everand
Attacking Problems in Logarithms and Exponential Functions
David S. Kahn
5/5 (1)
12 International Mathematics Competition For University Students
No ratings yet
12 International Mathematics Competition For University Students
4 pages
Application For Employement
No ratings yet
Application For Employement
6 pages
12 International Mathematics Competition For University Students
No ratings yet
12 International Mathematics Competition For University Students
4 pages
Fourth International Competition For University Students in Mathematics July 30 - August 4, 1997, Plovdiv, BULGARIA
No ratings yet
Fourth International Competition For University Students in Mathematics July 30 - August 4, 1997, Plovdiv, BULGARIA
8 pages
Solutions For Problems in The 9 International Mathematics Competition For University Students
No ratings yet
Solutions For Problems in The 9 International Mathematics Competition For University Students
4 pages
Fourth International Competition For University Students in Mathematics July 30 - August 4, 1997, Plovdiv, BULGARIA
No ratings yet
Fourth International Competition For University Students in Mathematics July 30 - August 4, 1997, Plovdiv, BULGARIA
8 pages
2000 2 PDF
No ratings yet
2000 2 PDF
4 pages
Solutions For Problems in The 9 International Mathematics Competition For University Students
No ratings yet
Solutions For Problems in The 9 International Mathematics Competition For University Students
5 pages
MIT6 045JS11 Lec17
No ratings yet
MIT6 045JS11 Lec17
53 pages
Introduction To Cryptography and RSA: 1 The Basics of Cryptography
No ratings yet
Introduction To Cryptography and RSA: 1 The Basics of Cryptography
8 pages
6.045: Automata, Computability, and Complexity (GITCS) : Class 15 Nancy Lynch
No ratings yet
6.045: Automata, Computability, and Complexity (GITCS) : Class 15 Nancy Lynch
56 pages
MIT6 045JS11 Lec13
No ratings yet
MIT6 045JS11 Lec13
5 pages
Soal Ujian MIT
No ratings yet
Soal Ujian MIT
7 pages
1.1 Pseudorandom Generators
No ratings yet
1.1 Pseudorandom Generators
7 pages
MIT6 045JS11 Lec10
No ratings yet
MIT6 045JS11 Lec10
46 pages
Zuckerberg Statement To Congress
100% (1)
Zuckerberg Statement To Congress
7 pages
MIT6 045JS11 Lec08
No ratings yet
MIT6 045JS11 Lec08
51 pages
Learning Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks
No ratings yet
Learning Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks
8 pages
Thirteen 19240 PDF
No ratings yet
Thirteen 19240 PDF
17 pages
Boosting Margin
No ratings yet
Boosting Margin
30 pages
Scholkopf B Smola AJ Learning With Kernels Support Vector Machines Regularization Optimization and Beyond PDF
100% (2)
Scholkopf B Smola AJ Learning With Kernels Support Vector Machines Regularization Optimization and Beyond PDF
646 pages
CS 2750 Machine Learning
No ratings yet
CS 2750 Machine Learning
14 pages
Theorist's Toolkit Lecture 9: High Dimensional Geometry (Continued) and VC-dimension
No ratings yet
Theorist's Toolkit Lecture 9: High Dimensional Geometry (Continued) and VC-dimension
8 pages
Midterm Aut2014 (Final) Sol
No ratings yet
Midterm Aut2014 (Final) Sol
23 pages
Problem by David Mount
No ratings yet
Problem by David Mount
28 pages
Chapter 08
100% (2)
Chapter 08
202 pages
NN PDF
No ratings yet
NN PDF
48 pages
Machine Learning 4CO250
No ratings yet
Machine Learning 4CO250
2 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Recent Advances in Learning and Control
No ratings yet
Recent Advances in Learning and Control
283 pages
Practice Midterm 2 Sol
No ratings yet
Practice Midterm 2 Sol
26 pages

How Many Samples To Learn A Finite Class?

Uploaded by

How Many Samples To Learn A Finite Class?

Uploaded by

6.080/6.

Scribe: Georey Thomas

Probably Approximately Correct Learning

How many samples to learn a nite class?

How many samples to learn an innite class?

Computational Complexity of Learning

6.045J / 18.400J Automata, Computability, and Complexity

You might also like