Lecture Slide 01_ML
Lecture Slide 01_ML
Pattern Recognition
◼ Concentrates more on “tools” rather than theory
Data Mining
◼ More specific about discovery
The following are useful in machine learning techniques or may give insights:
Probability and Statistics
Information theory
◼ 1960s:
Neural networks: Perceptron
Minsky and Papert prove limitations of
Perceptron
◼ 1970s:
Expert systems and the knowledge acquisition
bottleneck
Mathematical discovery with AM
Symbolic concept induction
History of Machine Learning (cont.)
◼ 1980s:
Resurgence of neural networks (connectionism,
backpropagation)
Advanced decision tree and rule learning
Learning, planning and problem solving
Utility theory
Analogy
◼ 1990s
Data mining
Reinforcement learning (RL)
Inductive Logic Programming (ILP)
Ensembles: Bagging, Boosting, and Stacking
History of Machine Learning (cont.)
◼ 2000s
Kernel methods
◼ Support vector machines
Graphical models
Statistical relational learning
Transfer learning
◼ Applications
Adaptive software agents and web applications
Learning in robotics and vision
E-mail management (spam detection)
…
What is Machine Learning ?
◼ Example:
T: Cancer diagnosis
E: A set of diagnosed cases
P: Accuracy of diagnosis on new cases
Z: Noisy measurements, occasionally misdiagnosed training cases
M: A program that runs on a general purpose computer; the
learner
What is Machine Learning ?
◼ Regression
Reinforcement Learning
Unsupervised Learning
Semi-supervised learning
◼ …
Rote Learning is Limited
◼ Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y where X and Y
are products/services.
Tangerines Oranges
a) Classification:
• We are given the label of the training objects: {(x1,x2,y=T/O)}
x1=size
◼ Regression
Target function is continuous rather than class
membership
For example, you have some the selling
prices of houses as their sizes (sq-mt)
changes in a particular location that may y=price
look like this. You may hypothesize that
the prices are governed by a particular
function f(x). Once you have this
function that “explains” this relationship, f(x)
you can guess a given house’s value,
given its sq-mt. The learning here is the
selection of this function f() . Note that
the problem is more meaningful and
challenging if you imagine several input 60 70 90 120 150 x=size
parameters, resulting in a multi-
dimensional input space.
Classification
◼ Example: Credit scoring
◼ Differentiating between
low-risk and high-risk
customers from their
income and savings
◼ Pattern Recognition
Test images
ORL dataset,
AT&T Laboratories, Cambridge UK
Supervised Learning: Uses
◼ Example applications
Customer segmentation in CRM
Image compression: Color quantization
Bioinformatics: Learning motifs
Reinforcement Learning
◼ What to learn: a way of behaving that is very rewarding in the long run -
Learning a policy: A sequence of outputs
◼ Experimental
Conduct controlled cross-validation experiments to compare
various methods on a variety of benchmark datasets.
Gather data on their performance, e.g. test accuracy,
training-time, testing-time…
Analyze differences for statistical significance.
◼ Theoretical
Analyze algorithms mathematically and prove theorems about
their:
◼ Computational complexity
◼ Loss functions