0% found this document useful (0 votes)

121 views75 pages

Class 01

This document provides an overview of a class on statistical learning theory and applications taught in spring 2006. It discusses several key topics: 1) The problem of learning from examples and the goal of finding functions that generalize to predict new examples rather than just memorizing training data. 2) Examples of engineering applications of learning algorithms developed by students in the class. 3) The connection between learning theory foundations and how the visual cortex works, with the goal of informing computer vision systems. The document emphasizes that theoretical foundations of learning are important both for developing predictive algorithms and for understanding brain function, with implications for many fields.

Uploaded by

Habib Mrad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views75 pages

Class 01

Uploaded by

Habib Mrad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

9.

520

Statistical Learning Theory and

Applications

Sasha Rakhlin and Andrea Caponnetto and Ryan Rifkin + tomaso poggio

9.520, spring 2006

Learning: Brains and Machines

Learning is the gateway to

understanding the brain and to
making intelligent machines.

Problem of learning:
a focus for
o modern math
o computer algorithms
o neuroscience

9.520, spring 2006

Learning: much more than memory

Role of learning (theory and applications

in many different domains) has grown substantially in CS

Plasticity and learning have a central stage in the

neurosciences

Until now math and engineering of learning has developed

independently of neuroscience…but it may begin to change: we
will see the example of learning+computer vision…
Learning:
math, engineering, neuroscience
⎡1 l
⎤
∑ V ( y , f ( x )) + μ
2
m in ⎢ i i f ⎥
⎣l ⎦
f ∈H K
i =1

Theorems on foundations of learning:

Learning theory
+ algorithms Predictive algorithms

• Bioinformatics
ENGINEERING • Computer vision
• Computer graphics, speech
APPLICATIONS
synthesis, creating a virtual actor

Computational How visual cortex works – and how it

Neuroscience: may suggest better computer vision
systems
models+experiments
Class

Rules of the game: problem sets (2)

final project (min = review; max = j. paper)
grading
participation!
mathcamps? Monday late afternoon?

Web site: http://www.mit.edu/~9.520/

9.520, spring 2006

9.520 Statistical Learning Theory and Applications
Class 24: Project presentations
2:30—2:45 "Adaboosting SVMs to recover motor behavior from motor
data", Neville Sanjana
2:45-3:00 "Review of Hierarchical Learning", Yann LeTallec
3:00—3:15 "An analytic comparison between SVMs and Bayes Point
Machines", Ashis Kapoor
3:15-3:30 "Semi-supervised learning for tree-structured data", Charles
Kemp
3:30—3:45 “Unsupervised Clustering with Regularized Least Square
classifiers" - Ben Recht
3:40—3:50 "Multi-modal Human Identification." Brian Kim
3:50—4:00 "Regret Bounds, Sequential Decision-Making and Online
Learning", Sanmay Das

9.520, spring 2003

9.520 Statistical Learning Theory and Applications
Class 25: Project presentations

2:35-2:50 "Learning card playing strategies with SVMs", David

Craft and Timothy Chan
2:50-3:00 "Artificial Markets: Learning to trade using Support
Vector Machines“, Adlar Kim
3:00-3:10 "Feature selection: literature review and new
development'‘, Wei Wu
3:10—3:25 "Man vs machines: A computational study on face
detection" Thomas Serre

9.520, spring 2003

9.520, spring 2006
Overview of overview

o The problem of supervised

s learning: “real” math
behind it

o Examples of engineering applications (from our

group)

o Learning and the brain (example of object

recognition)

9.520, spring 2006

Learning from examples: goal is not to memorize
but to generalize, eg predict.

INPUT
f OUTPUT

Given a set of l examples (data)

{( x1 , y1 ), ( x 2 , y 2 ) ,..., ( x l , y l )}

Question: find function f such that

is a good predictor of y for a future input x (fitting the data is not

enough!):
f ( x ) = yˆ
Reason for you to know theory

We will speak today and later about applications…

they are not simply using a black box. The best ones are about
the right formulation of the problem (choice of representation
(inputs, outputs), choice of examples, validate predictivity, do not
datamine)

… f (x) = wx + b
Notes

Two strands in learning theory:

Bayes, graphical models…

Statistical learning theory, regularization (closer to classical

math, functional analysis+probability theory+empirical process
theory…)
Interesting development: the theoretical foundations of
learning are becoming part of mainstream mathematics
Learning from examples: predictive, multivariate
function estimation from sparse data
(not just curve fitting)

= data from f
= function f
= approximation of f y

x
Generalization: estimating value of function where
there are no data (good generalization means
predicting the function well; most important is for
empirical or validation error to be a good proxy of the
prediction error)

Regression: function is real valued

Classification:
9.520, spring 2006 function is binary
Thus….the key requirement (main focus of learning
theory) to solve the problem of learning from
examples:
generalization (and possibly even consistency).

A standard way to learn from examples is ERM (empirical risk

minimization)

The problem does not have a predictive solution in general

(just fitting the data does not work). Choosing an appropriate
hypothesis space H (for instance a compact set of continuous
functions) can guarantee generalization (how good depends on
the problem and other parameters).
9.520, spring 2006
Learning from examples: another goal (from inverse
problems) is to ensure that problem is well-posed (solution
exists stable)

A problem is well-posed if its solution

exists, unique and J. S. Hadamard, 1865-1963

is stable, eg depends continuously on the data

(here examples)

9.520, spring 2006

Thus….two key requirements to solve the problem
of learning from examples:
well-posedness and generalization
Consider the standard learning algorithm, i.e. ERM

The main focus of learning theory is predictivity of the

solution eg generalization. The problem is in addition ill-posed.
It was known that by choosing an appropriate hypothesis space
H predictivity is ensured. It was also known that appropriate H
provide well-posedness.
A couple of years ago it was shown that generalization and
well-posedness are equivalent, eg one implies the other.
Thus a stable solution is predictive and (for
ERM) also viceversa.
9.520, spring 2006
More later…..

9.520, spring 2006

Learning theory and natural sciences

Conditions for generalization in learning theory

have deep, almost philosophical, implications:

they may be regarded as conditions that guarantee a

theory to be predictive (that is scientific)
We have used a simple algorithm
-- that ensures generalization --
in most of our applications…

⎡1 l ⎤
min ⎢ ∑ V ( f ( xi ) − yi ) + λ
2
f K⎥ implies
⎣ i =1
f ∈H l
⎦

f ( x ) = ∑i α i K ( x , x i )
l

Equation includes Regularization Networks (special cases

are splines, Radial Basis Functions and Support Vector
Machines). Function is nonlinear and general approximator…

For a review, see Poggio and Smale, The Mathematics of Learning,

Notices of the AMS, 2003
Classical framework but with more general
loss function

The algorithm uses a quite general space of functions or “hypotheses” :

RKHSs. n of the classical framework can provide a better measure
of “loss” (for instance for classification)…

⎡1 l ⎤
min ⎢ ∑ V ( f ( xi ) − yi ) + λ
2
f K⎥
⎣ i =1
f ∈H l
⎦

9.520, spring 2006 Girosi, Caprile, Poggio, 1990

Another remark: equivalence to networks

Many different V lead to the same solution…

f (x) = ∑i ci K (x, x i ) + b
l
x1

…and can be “written” as K K K

the same type of network…where the

value of K corresponds to the “activity”
ci
of the “unit” and
x
d
the ci correspond to
(synaptic) “weights” +

f
Theory summary
In the course we will introduce

• Generalization (predictivity of the solution)

• Stability (well-posedness)
• RKHSs hypotheses spaces
• Regularization techniques leading to RN and SVMs
• Manifold Regularization (semisupervised learning)
• Unsupervised learning
• Generalization bounds based on stability
• Alternative classical bounds (VC and Vgamma dimensions)

• Related topics

• Applications
S
9.520, spring 2006 y
Syllabus

9.520, spring 2006

Overview of overview

o Supervised learning: real math

o Examples of recent and ongoing in-house engineering
on applications
o Learning and the brain

9.520, spring 2006

Learning from Examples: engineering
applications

INPUT OUTPUT

Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2006
Bioinformatics application: predicting type of
cancer from DNA chips signals
Learning from examples paradigm

Prediction
Statistical Learning Prediction
Algorithm

Examples
New sample

9.520, spring 2006

Bioinformatics application: predicting type of
cancer from DNA chips

New feature selection SVM:

Only 38 training examples, 7100 features

AML vs ALL: 40 genes 34/34 correct, 0 rejects.

5 genes 31/31 correct, 3 rejects of which 1 is an error.

Pomeroy, S.L., P. Tamayo, M. Gaasenbeek, L.M. Sturia, M. Angelo, M.E.

McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D.
Zagzag, M.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S.
Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S.
Lander and T.R. Golub. Prediction of Central Nervous System Embryonal
Tumour Outcome Based on Gene Expression, Nature, 2002.

9.520, spring 2006

Learning from Examples: engineering
applications

INPUT OUTPUT

Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2006
Face identification: example

An old view-based system: 15 views

Performance: 98% on 68 person database

Beymer, 1995

9.520, spring 2006

Learning from Examples: engineering
applications

INPUT OUTPUT

Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2006
System Architecture

Scanning in x,y and

scale

Preprocessing with
overcomplete
TRAINING
dictionary of Haar
wavelets Data Base

QP Solver
SVM Classifier

9.520, spring 2006 Sung, Poggio 1994; Papageorgiou and Poggio, 1998
People classification/detection: training
the system

... ...

1848 patterns 7189 patterns

Representation: overcomplete dictionary of Haar wavelets; high

dimensional feature space (>1300 features)

Core learning algorithm:

Support Vector Machine
classifier

pedestrian detection system

9.520, spring 2006
Trainable System for Object Detection:
Pedestrian detection - Results

Papageorgiou and Poggio, 1998

The system was tested in a test car
(Mercedes)
System installed in
experimental Mercedes

A fast version, integrated

with a real-time obstacle
detection system

MPEG

Constantine Papageorgiou
People classification/detection: training the
system

... ...

1848 patterns 7189 patterns

Representation: overcomplete dictionary of Haar wavelets; high

dimensional feature space (>1300 features)

Core learning algorithm:

Support Vector Machine
classifier

pedestrian detection system

9.520, spring 2006
Face classification/detection: training the
system

... ...

Representation: grey levels (normalized) or overcomplete

dictionary of Haar wavelets

Core learning algorithm:

Support Vector Machine
classifier

face detection system

9.520, spring 2006
Face identification: training the system

... ...

Representation: grey levels (normalized) or overcomplete

dictionary of Haar wavelets

Core learning algorithm:

Support Vector Machine
classifier

face identification system

9.520, spring 2006
Computer vision: new StreetScenes
Project
Learning Algorithms for Scene Understanding

Project Timeline
Construction of Automatic Recognition of Automatic Scene
the StreetScenes Learning of object 10 Object Description
Database specific features Categories
or parts

Lior Wolf, Stan Bileschi, …

Learning from Examples: Applications

INPUT OUTPUT

Object identification
Object categorization
Image analysis
Graphics
Finance
Bioinformatics
…
9.520, spring 2006
Image Analysis

IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE

ESTIMATION

⇒ Bear (0° view)

⇒ Bear (45° view)

9.520, spring 2006

Computer vision: analysis of facial expressions

85
79
73
67
61
55
49
43
37
31
25
19
13
7
1

The main goal is to estimate basic facial parameters, e.g.

degree of mouth openness, through learning. One of the main
applications is video-speech fusion to improve speech
recognition systems.
9.520, spring 2002 Kumar, Poggio, 2001
Learning from Examples: engineering
applications
CBCL MIT

INPUT OUTPUT

Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Image synthesis, eg Graphics
Text Classification
…..
9.520, spring 2003
Image Synthesis

Metaphor for UNCONVENTIONAL GRAPHICS

Θ = 0° view ⇒

Θ = 45° view ⇒

9.520, spring 2006

Reconstructed 3D Face Models from 1 image

Blanz and Vetter,

MPI
SigGraph ‘99
9.520, spring 2006
Reconstructed 3D Face Models from 1
image

Blanz and Vetter,

MPI
SigGraph ‘99
9.520, spring 2006
V. Blanz, C. Basso,
T. Poggio
and
T. Vetter, 2003
Extending the same basic learning techniques (in 2D):
Trainable Videorealistic Face Animation
(voice is real, video is synthetic)

Ezzat, Geiger, Poggio, SigGraph 2002

Trainable Videorealistic Face Animation
1. Learning 2. Run Time

For any speech input the system

provides as output a synthetic
video stream
System learns from 4 mins Phone Stream
of video the face appearance /SIL//B/ /B//AE/ /AE//AE/ /JH//JH/ /JH//SIL/

(Morphable Model) and the

speech dynamics of the
person Trajectory
Phonetic Models
Synthesis

MMM Image Prototypes

Tony Ezzat,Geiger, Poggio, SigGraph 2002

A Turing test: what is real and what is
synthetic?

We assessed the realism of the talking face with

psychophysical experiments.
Data suggest that the system passes a visual
version of the Turing test.
Overview of overview

o Supervised learning: the problem and how to frame

it within classical math
o Examples of in-house applications
o Learning and the brain

9.520, spring 2006

Learning to recognize objects and the ventral
stream in visual cortex
Some numbers
Human Brain
1011… 1012 neurons
1014 + synapses
Neuron
Fine dendrites : 0.1 µ diameter
Lipid bylayer membrane : 5 nm thick
Specific proteins : pumps, channels, receptors, enzymes
Synaptic packet of transmitter opens 2 x 103 channels
(with 104 AcH molecules)
Each channel: conductance g = 10-11 mho
Fundamental time length : 1 msec
A theory
of the ventral stream of visual cortex

Thomas Serre, Minjoon Kouh, Charles Cadieu, Ulf Knoblich

and Tomaso Poggio

The McGovern Institute for Brain Research,

Department of Brain Sciences
Massachusetts Institute of Technology
The Ventral Visual Stream: From V1 to IT

modified from Ungerleider and Haxby, 1994

Hubel & Wiesel, 1959

Desimone, 1991
Desimone, 1991
Summary of “basic facts”
Accumulated evidence points to three (mostly accepted)
properties of the ventral visual stream architecture:

• Hierarchical build-up of invariances (first to

translation and scale, then to viewpoint etc.) , size of
the receptive fields and complexity of preferred
stimuli

• Basic feed-forward processing of information (for

“immediate” recognition tasks)

• Learning of an individual object generalizes to scale

and position
Mapping the ventral stream into a model

Serre, Kouh, Cadieu, Knoblich, Poggio, 2005;

Riesenhuber et al, Nat. Neuro, 1999,2000 …
The model
Claims to interpret or predict several existing data in microcircuits and system
physiology, and also in cognitive science:

• What some complex cells in V1 and V4 do and why: MAX…

• View-tuning of IT cells (Logothetis)

• Response to pseudomirror views
• Effect of scrambling
• Multiple objects
• Robustness/sensitivity to clutter
• K. Tanaka’s simplification procedure
• Categorization tasks (cats vs dogs)
• Invariance to translation, scale etc…
• Read-out data…

• Gender classification
• Face inversion effect : experience, viewpoint, other-race, configural
vs. featural representation
• Binding problem, no need for oscillations…
Neural Correlate of Categorization (NCC)

Define categories in morph space

60% Cat 60% Dog

80% Cat Morphs 80% Dog
Morphs Morphs
Morphs
Prototypes Prototypes
100% Dog
100% Cat

Category
9.520, spring 2006 boundary
Categorization task

Train monkey on categorization task

.
.
.
. (Match)
Fixation
500 ms. Sample
600 ms. Delay .
1000 .
ms.
Test .
(Nonmatch)
Delay
Test
(Match)

After training, record from neurons in IT & PFC

9.520, spring 2006
Single cell example: a “categorical” PFC neuron that
responds more strongly to DOGS than CATS

Fixation Sample Delay Choice

13
Dog 100%
Dog 80%
Dog 60%
Firing Rate (Hz)

4
Cat 100%
Cat 80%
Cat 60%
1
-500 0 500 1000 1500 2000
Time from sample stimulus onset (ms)

D. Freedman + E. Miller + M.
Riesenhuber+T. Poggio (Science,
9.520, spring 2006 2001)
The model fits many physiological data,
predicts several new ones…

recently it provided a surprise (for us)…

…when we compared its performance with
machine vision…
Sample Results on the CalTech 101-object dataset
The model performs at the level of the best
computer vision systems
…and another surprise…

… was the comparison with human performance

(Thomas Serre with Aude Oliva)
on rapid categorization of complex natural images
Experiment: rapid (to avoid backprojections)
animal detection in natural images

Image
Interval
Image-Mask

Mask
1/f noise
20 msec

30 msec

80 msec Animal present

or not ?
[Thorpe et al, 1996; Van Rullen & Koch, 2003;
Oliva & Torralba, in press]
Targets and distractors

[Serre, Oliva & Poggio, in prep]

Humans achieve model-level performance

Model results obtained without tuning a single parameter!

Human: 80% correct

vs.
Model: 82% correct

[Serre, Oliva & Poggio, in prep]

Theory supported by data in V1, V4, IT; works as well as the best computer vision; mimics human
performance

Freedman, Science, 2002

Logothetis et al., Cur. Bio., 1995
Gawne et al., J. Neuro., 2002
Lampl et al.,J. Neuro, 2004.
A challenge for learning theory:

an unusual, hierarchical architecture

with unsupervised and supervised learning
and learning of invariances…

We will see later why this is unusual and interesting for learning
theory!

Daniel Voigt Godoy - Deep Learning With PyTorch Step-By-Step A Beginner's Guide-Leanpub - Com (2022)
100% (1)
Daniel Voigt Godoy - Deep Learning With PyTorch Step-By-Step A Beginner's Guide-Leanpub - Com (2022)
1,045 pages
Those Dark Places RPG
80% (5)
Those Dark Places RPG
54 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
Mathematics of Learning Dealing With Data Notices-Ams2003refs
No ratings yet
Mathematics of Learning Dealing With Data Notices-Ams2003refs
19 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
ML 01
No ratings yet
ML 01
24 pages
(2007) - Cucker-Learning Theory - An Approximation Theory Viewpoint
No ratings yet
(2007) - Cucker-Learning Theory - An Approximation Theory Viewpoint
237 pages
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
No ratings yet
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
51 pages
DSA5102X_lecture1
No ratings yet
DSA5102X_lecture1
51 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
Machine Learnig
No ratings yet
Machine Learnig
38 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Machine Learning: Foundations: Prof. Nathan Intrator
No ratings yet
Machine Learning: Foundations: Prof. Nathan Intrator
60 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
DSA5102_lecture1
No ratings yet
DSA5102_lecture1
60 pages
2-Inductive Learning
No ratings yet
2-Inductive Learning
37 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
RADL TQKhoat
No ratings yet
RADL TQKhoat
50 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
Lecture1
No ratings yet
Lecture1
56 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
1
No ratings yet
1
42 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Machine Learning
No ratings yet
Machine Learning
137 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Module 1
No ratings yet
Module 1
50 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
ML Merge
No ratings yet
ML Merge
145 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
poly_aml
No ratings yet
poly_aml
76 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
ITML U1 Overview
No ratings yet
ITML U1 Overview
45 pages
LN ML Rug
No ratings yet
LN ML Rug
283 pages
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
No ratings yet
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
74 pages
Introduction. Binary Classification and Bayes Optimal Classifier
No ratings yet
Introduction. Binary Classification and Bayes Optimal Classifier
7 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
L06 Slides.mlp3
No ratings yet
L06 Slides.mlp3
26 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
cs419Notes
No ratings yet
cs419Notes
36 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
GML-slides-2024-04-29 (1)
No ratings yet
GML-slides-2024-04-29 (1)
206 pages
01-intro
No ratings yet
01-intro
22 pages
Introduction ML
No ratings yet
Introduction ML
65 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
Machine Learning 1
No ratings yet
Machine Learning 1
29 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Deep learning: deep learning explained to your granny – a guide for beginners
From Everand
Deep learning: deep learning explained to your granny – a guide for beginners
PAT NAKAMOTO
3/5 (2)
Express Entry Application Steps
No ratings yet
Express Entry Application Steps
2 pages
Schedule 50_2024
No ratings yet
Schedule 50_2024
1 page
03. Additional Instructions for Express Entry Canada
No ratings yet
03. Additional Instructions for Express Entry Canada
6 pages
Writing - Task 1 - GT
No ratings yet
Writing - Task 1 - GT
8 pages
ChatGPT for Data Analytics Full Course
No ratings yet
ChatGPT for Data Analytics Full Course
3 pages
(2303.18223) A Survey of Large Language Models
No ratings yet
(2303.18223) A Survey of Large Language Models
115 pages
How Large Language Models Work. From Zero To ChatGPT - by Andreas Stöffelbauer - Medium - Data Science at Microsoft
No ratings yet
How Large Language Models Work. From Zero To ChatGPT - by Andreas Stöffelbauer - Medium - Data Science at Microsoft
39 pages
Building An AI Startup-2024. in 2024, Building An AI Startup - by Bijit Ghosh - Feb, 2024 - Medium
No ratings yet
Building An AI Startup-2024. in 2024, Building An AI Startup - by Bijit Ghosh - Feb, 2024 - Medium
25 pages
alarm_data
No ratings yet
alarm_data
3 pages
Learning Guide: Cardiovascular Diseases: Be Able To Discuss Each of The Following
No ratings yet
Learning Guide: Cardiovascular Diseases: Be Able To Discuss Each of The Following
2 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Cardiology Today Next Gen Innovators: Meet The
100% (1)
Cardiology Today Next Gen Innovators: Meet The
1 page
4.0 - Matrix Inverse
No ratings yet
4.0 - Matrix Inverse
2 pages
ChatGPT Mastery - Zaka
No ratings yet
ChatGPT Mastery - Zaka
10 pages
3.0 - Matrix Properties
No ratings yet
3.0 - Matrix Properties
2 pages
3.2 - Hypothesis Testing (P-Value Approach)
No ratings yet
3.2 - Hypothesis Testing (P-Value Approach)
3 pages
5.4 - Eigendecomposition
No ratings yet
5.4 - Eigendecomposition
2 pages
Class Notes
No ratings yet
Class Notes
147 pages
Class 03
No ratings yet
Class 03
40 pages
Huang Meta Analyses Stat Methods Med Res 2014 0962280214537394
No ratings yet
Huang Meta Analyses Stat Methods Med Res 2014 0962280214537394
35 pages
Regbook Inside
No ratings yet
Regbook Inside
21 pages
Class 02
No ratings yet
Class 02
42 pages
Ranking Problems: 9.520 Class 09, 08 March 2006 Giorgos Zacharia
No ratings yet
Ranking Problems: 9.520 Class 09, 08 March 2006 Giorgos Zacharia
27 pages
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
No ratings yet
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
33 pages
Generalization Bounds and Stability: 9.520 Class 14, 03 April 2006 Sasha Rakhlin
No ratings yet
Generalization Bounds and Stability: 9.520 Class 14, 03 April 2006 Sasha Rakhlin
25 pages
American Headway 1
No ratings yet
American Headway 1
162 pages
Basic Web Page Creation
No ratings yet
Basic Web Page Creation
9 pages
Assignments No. 678
No ratings yet
Assignments No. 678
22 pages
StarWars - Jedi Classes (Revised, d20)
No ratings yet
StarWars - Jedi Classes (Revised, d20)
24 pages
Chapter 1 Though 6 Key Terms
100% (1)
Chapter 1 Though 6 Key Terms
19 pages
SBMPTN 2016
No ratings yet
SBMPTN 2016
9 pages
Il Metron Della Techne (A. CERA)
No ratings yet
Il Metron Della Techne (A. CERA)
19 pages
Mental Health Infographics by Slidesgo
No ratings yet
Mental Health Infographics by Slidesgo
34 pages
Pengembangan Green Community Unnes Melalui Pengelolaan Sampah
No ratings yet
Pengembangan Green Community Unnes Melalui Pengelolaan Sampah
9 pages
PHPExcel Developer Documentation PDF
No ratings yet
PHPExcel Developer Documentation PDF
52 pages
Quantum Computing
100% (1)
Quantum Computing
43 pages
Principles of Speech Writing
100% (1)
Principles of Speech Writing
62 pages
Become Your Higher Self
100% (2)
Become Your Higher Self
148 pages
Bjit CV
No ratings yet
Bjit CV
3 pages
Quick Sort
No ratings yet
Quick Sort
19 pages
Experimental Investigation On Effective Scouring Parameters Downstream From Stepped Spillways
No ratings yet
Experimental Investigation On Effective Scouring Parameters Downstream From Stepped Spillways
11 pages
Continuum by Allen
No ratings yet
Continuum by Allen
4 pages
Tugas Mata Kuliah Statistik Infrensial: Dosen Pengampu: Edy Suryawardana, Se, MM
No ratings yet
Tugas Mata Kuliah Statistik Infrensial: Dosen Pengampu: Edy Suryawardana, Se, MM
8 pages
Philippine Integrated Disease Surveillance and Response: Manual of Procedures For The
100% (1)
Philippine Integrated Disease Surveillance and Response: Manual of Procedures For The
120 pages
Who's Bigger Excerp
No ratings yet
Who's Bigger Excerp
10 pages
70 2009 7092 2
No ratings yet
70 2009 7092 2
11 pages
2nd Internal POM
No ratings yet
2nd Internal POM
4 pages
Solving Scale Problems
No ratings yet
Solving Scale Problems
11 pages
Jhuhuu U
No ratings yet
Jhuhuu U
2 pages
Best Practices Rman
No ratings yet
Best Practices Rman
63 pages
Unit 5
No ratings yet
Unit 5
14 pages
Step 9 Recommend Specific Strategies and Long
100% (1)
Step 9 Recommend Specific Strategies and Long
2 pages
KM Full Notes
0% (1)
KM Full Notes
35 pages

Class 01

Uploaded by

Class 01

Uploaded by

9.

Statistical Learning Theory and

9.520, spring 2006

Learning is the gateway to

9.520, spring 2006

 Role of learning (theory and applications

 Plasticity and learning have a central stage in the

 Until now math and engineering of learning has developed

Theorems on foundations of learning:

Computational How visual cortex works – and how it

Rules of the game: problem sets (2)

Web site: http://www.mit.edu/~9.520/

9.520, spring 2006

9.520, spring 2003

2:35-2:50 "Learning card playing strategies with SVMs", David

9.520, spring 2003

o The problem of supervised

o Examples of engineering applications (from our

o Learning and the brain (example of object

9.520, spring 2006

Given a set of l examples (data)

Question: find function f such that

is a good predictor of y for a future input x (fitting the data is not

We will speak today and later about applications…

Two strands in learning theory:

 Bayes, graphical models…

 Statistical learning theory, regularization (closer to classical

Regression: function is real valued

A standard way to learn from examples is ERM (empirical risk

The problem does not have a predictive solution in general

A problem is well-posed if its solution

is stable, eg depends continuously on the data

9.520, spring 2006

The main focus of learning theory is predictivity of the

9.520, spring 2006

Conditions for generalization in learning theory

have deep, almost philosophical, implications:

they may be regarded as conditions that guarantee a

Equation includes Regularization Networks (special cases

For a review, see Poggio and Smale, The Mathematics of Learning,

The algorithm uses a quite general space of functions or “hypotheses” :

9.520, spring 2006 Girosi, Caprile, Poggio, 1990

Many different V lead to the same solution…

…and can be “written” as K K K

the same type of network…where the

• Generalization (predictivity of the solution)

9.520, spring 2006

o Supervised learning: real math

9.520, spring 2006

9.520, spring 2006

New feature selection SVM:

Only 38 training examples, 7100 features

AML vs ALL: 40 genes 34/34 correct, 0 rejects.

Pomeroy, S.L., P. Tamayo, M. Gaasenbeek, L.M. Sturia, M. Angelo, M.E.

9.520, spring 2006

An old view-based system: 15 views

Performance: 98% on 68 person database

9.520, spring 2006

Scanning in x,y and

1848 patterns 7189 patterns

Representation: overcomplete dictionary of Haar wavelets; high

Core learning algorithm:

pedestrian detection system

Papageorgiou and Poggio, 1998

A fast version, integrated

1848 patterns 7189 patterns

Representation: overcomplete dictionary of Haar wavelets; high

Core learning algorithm:

pedestrian detection system

Representation: grey levels (normalized) or overcomplete

Core learning algorithm:

face detection system

Representation: grey levels (normalized) or overcomplete

Core learning algorithm:

face identification system

Lior Wolf, Stan Bileschi, …

Role of learning (theory and applications

Plasticity and learning have a central stage in the

Until now math and engineering of learning has developed

Bayes, graphical models…

Statistical learning theory, regularization (closer to classical