0% found this document useful (0 votes)

7 views

19_ML_intro

The document discusses machine learning, particularly supervised learning, which involves improving decision-making through learning from examples. It covers concepts such as the performance element, types of learning (supervised, unsupervised, reinforcement), and the importance of model simplicity and generalization. Additionally, it outlines the training and testing processes, including model evaluation, hyperparameter tuning, and the significance of dataset splitting.

Uploaded by

Surya Basnet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

19_ML_intro

Uploaded by

Surya Basnet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CS 5/7320

Artificial Intelligence

Learning
from Examples
AIMA Chapter 19
Slides by Michael Hahsler
Based on slides by Dan Klein, Pieter Abbeel, Sergey
Levine and A. Farhadi (http://ai.berkeley.edu)
with figures from the AIMA textbook.

This work is licensed under a Creative Commons

Attribution-ShareAlike 4.0 International License.
Topics

Types of
Supervised Training &
ML & Agents Data supervised Use in AI
Learning Testing
ML Models
Learning from Examples: Machine Learning
Up until now in this course:
• Hand-craft algorithms to make rational/optimal or at least good decisions.
Examples: Search strategies, heuristics.

Issues
• Designer cannot anticipate all possible future situations.
• Designer may have examples but does not know how to program a solution.

Machine Learning
• Learning = Improve performance after making observations about the world.
That is, learn what works and what doesn’t.
• We learn a model that decides on the actions to take. This is called the
“performance element.”
• The goal is to get closer to optimal decisions. I.e., it is an optimization problem.
From Chapter 2: Agents that Learn
The learning element modifies the performance element to improve its
performance.

How is the agent

currently performing?

Update the performance

element and changes how
it selects actions.
E.g., adding rules,
changing weights

Exploration: select actions

that lead to better
information
Types of Using Machine Learning
• What component of the performance element is learned?
E.g., how to select action, estimate the utility of a state, …

• What representation (model) is used in the component?

Linear regression, rules, trees, neural nets,…

• What feedback is available for learning?

• Unsupervised Learning: No feedback, just organize data (e.g., clustering, embedding)

• Supervised Learning: Uses a data set with correct answers. Learn a function (model) to We focus
map an input (e.g., state) to an output (e.g., action or utility).
Examples: here on
 Use a naïve Bayesian classifier to distinguish between spam/no spam supervised
 Learn a playout policy to simulate games (current board -> good move) learning
• Reinforcement Learning: Learn from rewards/punishment (e.g., winning a game) obtained
via interaction with the environment over time.
1+1=2
Supervised
Learning
Supervised Learning
• Examples
• We assume there exists a target function 𝑦𝑦 = 𝑓𝑓(𝑥𝑥) that produces iid (independent
and identically distributed) examples possibly with noise and errors.
• Examples are observed input-output pairs E = 𝑥𝑥1 , 𝑦𝑦1 , … , 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , … , 𝑥𝑥𝑁𝑁 , 𝑦𝑦𝑁𝑁 ,
where 𝑥𝑥 is a vectors called the feature vector.
𝑓𝑓
• Learning problem
• Given a hypothesis space H of representable models.
• Find a hypothesis ℎ ∈ 𝐻𝐻 such that 𝑦𝑦�𝑖𝑖 = ℎ 𝑥𝑥𝑖𝑖 ≈ 𝑦𝑦𝑖𝑖 ∀𝑖𝑖
• That is, we want to approximate 𝑓𝑓 by ℎ using E.
Set of all
functions
• Supervised learning includes
• Classification (outputs = class labels). E.g., 𝑥𝑥 is an email and 𝑓𝑓(𝑥𝑥) is spam / ham.
• Regression (outputs = real numbers). E.g., x is a house and 𝑓𝑓(𝑥𝑥) is its selling price.
Consistency vs. Simplicity
Example: Univariate curve fitting (regression, function approximation)
y Examples y Learned Models x … 𝑓𝑓 𝑥𝑥
lines … ℎ(𝑥𝑥)

Very simple,
but not very
consistent
with the
data!

• Consistency: ℎ 𝑥𝑥𝑖𝑖 ≈ 𝑦𝑦𝑖𝑖

• Simplicity: small number of model parameters
Measuring Consistency using Loss
Goal of learning: Find a hypothesis that makes predictions that are consistent with
the examples E = 𝑥𝑥1 , 𝑦𝑦1 , … , 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , … , 𝑥𝑥𝑁𝑁 , 𝑦𝑦𝑁𝑁 .
That is, 𝑦𝑦� = ℎ 𝑥𝑥 ≈ 𝑦𝑦.

• Measure mistakes: Loss function 𝐿𝐿(𝑦𝑦, 𝑦𝑦)

�
• Absolute-value loss 𝐿𝐿1 𝑦𝑦, 𝑦𝑦� = |𝑦𝑦 − 𝑦𝑦|
� For Regression
• Squared-error loss 𝐿𝐿2 𝑦𝑦, 𝑦𝑦� = 𝑦𝑦 − 𝑦𝑦� 2
• 0/1 loss 𝐿𝐿0/1 𝑦𝑦, 𝑦𝑦� = 0 if 𝑦𝑦 = 𝑦𝑦,
� else 1 For Classification
• Log loss, cross-entropy loss and many others… Loss
𝑓𝑓
• Empirical loss: average loss over the N examples in the dataset
1 ℎ∗
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝐿𝐿,𝐸𝐸 (ℎ) = � 𝐿𝐿(𝑦𝑦, ℎ 𝑥𝑥 )
|𝐸𝐸|
𝑥𝑥,𝑦𝑦 ∈𝐸𝐸
Learning Consistent ℎ by Minimizing the Loss
• Empirical loss 1
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝐿𝐿,𝐸𝐸 (ℎ) = � 𝐿𝐿(𝑦𝑦, ℎ 𝑥𝑥 )
|𝐸𝐸|
𝑥𝑥,𝑦𝑦 ∈𝐸𝐸

• Find the best hypothesis that minimizes the loss

ℎ∗ = argmin 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝐿𝐿,𝐸𝐸 (ℎ)
ℎ∈ 𝐻𝐻 Loss
𝑓𝑓
• Reasons for ℎ∗ ≠ 𝑓𝑓
a) Realizability: 𝑓𝑓 ∉ 𝐻𝐻
b) 𝑓𝑓 is nondeterministic or examples are noisy.
ℎ∗
c) It is computationally intractable to search all 𝐻𝐻,
so we use a non-optimal heuristic.
The Bayes Classifier
For 0/1 loss, the empirical loss is minimized by the model that predicts for each 𝑥𝑥 the most likely class 𝑦𝑦 using
MAP (Maximum a posteriori) estimates. This is called the Bayes classifier.

𝑃𝑃 𝑥𝑥 𝑦𝑦) 𝑃𝑃(𝑦𝑦)
h∗ x = argmax 𝑃𝑃 𝑌𝑌 = 𝑦𝑦 𝑋𝑋 = 𝑥𝑥) = argmax = argmax 𝑃𝑃 𝑥𝑥 𝑦𝑦) 𝑃𝑃(𝑦𝑦)
𝑦𝑦 𝑦𝑦 𝑃𝑃(𝑥𝑥) 𝑦𝑦

Optimality: The Bayes classifier is optimal for 0/1 loss. It is the most consistent classifier possible with the lowest
possible error called the Bayes error rate. No better classifier is possible!

Issue: The classifier requires to learn 𝑃𝑃 𝑥𝑥 𝑦𝑦) 𝑃𝑃 𝑦𝑦 = 𝑃𝑃(𝑥𝑥, 𝑦𝑦) from the examples.
• It needs the complete joint probability which requires in the general case a probability table with one entry for
each possible value for the feature vector 𝑥𝑥.
• This is impractical (unless a simple Bayes network exists) and most classifiers try to approximate the Bayes
classifier using a simpler model with fewer parameters.
Simplicity
Ease of use
• Simpler hypotheses have fewer model parameters to estimate and store.

Generalization: How well does the hypothesis perform on new data?

• We do not want the model to be too specific to the training examples (an issue called
overfitting).
• Simpler models typically generalize better to new examples.

How to achieve simplicity?

a) Model bias: Restrict 𝐻𝐻 to simpler models (e.g., assumptions like independence,
only consider linear models).
b) Feature selection: use fewer variables from the feature vector 𝑥𝑥
c) Regularization: penalize model for its complexity (e.g., number of parameters)
ℎ∗ = argmin 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝐿𝐿,𝐸𝐸 (ℎ) + 𝜆𝜆 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(ℎ)
ℎ∈ 𝐻𝐻
Penalty term
Overfitting

Model Selection: Bias vs. Variance

Simpler More consistent

Points: Two
samples from the
same function 𝑓𝑓
to show variance.

Lines: the learned

function ℎ.

High Bias: restrictions by the model class Low

This is a tradeoff
Low Variance: difference in the model due to slightly different data. high
Data
Feature vector 𝑥𝑥 Class
The Dataset (Features, Variables, Attributes) Label 𝑦𝑦

Examples
(Instances,
Observation)

Find a hypothesis (called “model”) to predict the class given the features.
Feature Engineering
• Add information sources as new variables to the model.
• Add derived features that help the classifier (e.g., 𝑥𝑥1 𝑥𝑥2 , 𝑥𝑥12 ).
• Embedding: E.g., convert words to vectors where vector
similarity between vectors reflects semantic similarity.

• Example for Spam detection: In addition to words

• Have you emailed the sender before?
• Have 1000+ other people just gotten the same email?
• Is the header information consistent?
• Is the email in ALL CAPS?
• Do inline URLs point where they say they point?
• Does the email address you by (your) name?

• Feature Selection: Which features should be used in the

model is a model selection problem (choose between
models with different features).
Training
and
Testing
Model Evaluation (Testing)
The model was trained on the training examples 𝐸𝐸. We want to test how well the model
will perform on new examples 𝑇𝑇 (i.e., how well it generalizes to new data).

• Testing loss: Calculate the empirical loss for predictions on a testing data set 𝑇𝑇 that is
different from the data used for training.
1
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝐿𝐿,𝑇𝑇 (ℎ) = � 𝐿𝐿(𝑦𝑦, ℎ 𝑥𝑥 )
|𝑇𝑇|
𝑥𝑥,𝑦𝑦 ∈𝑇𝑇

• For classification we often use the accuracy measure, the proportion of correctly
classified test examples.
1
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ, 𝑇𝑇 = � [ℎ 𝑥𝑥 = 𝑦𝑦] = 1 − 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝐿𝐿0/1,𝑇𝑇 (ℎ)
𝑇𝑇
(𝑥𝑥,𝑦𝑦)∈𝑇𝑇

𝑐𝑐 is an indicator function returning 1 if 𝑐𝑐 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 and otherwise 0

Training a Model
• Models are “trained” (learned) on the training data. This
involved estimating:

1. Model parameters (the model): E.g., probabilities, weights, Training

factors. Data
2. Hyperparameters: Many learning algorithms have choices for
learning rate, regularization 𝜆𝜆, maximal decision tree depth,
selected features,... The algorithm tries to optimizes the model
parameters given user-specified hyperparameters.

• We need to tune the hyperparameters! Test

Data
Hyperparameter Tuning/Model Selection
1. Hold a validation data set back from the training data.
2. Learn models using the training set with different
hyperparameters. Often a grid of possible hyperparameter
combinations or some greedy search is used.
Training
3. Evaluate the models using the validation data and choose Data
the model with the best accuracy. Selecting the right type of Training
model, hyperparameters and features is called model Data
selection.
4. Learn the final model with the chosen hyperparameters Validation
using all training (including validation data). Data

• Notes:
• The validation set was not used for training, so we get generalization Test
accuracy for the different hyperparameter settings. Data
• If no model selection is necessary, then no validation set is used.
Testing a Model

Training
Data

• After the model is selected, the final model is evaluated against the
test set to estimate the final model accuracy.
Test
• Very important: never “peek” at the test set during training! Data
How to Split the Dataset
• Random splits: Split the data randomly in, e.g.,
60% training, 20% validation, and 20% testing.

• Stratified splits: Like random splits, but balance classes and other Training
properties of the examples. Data
Training
Data
• k-fold cross validation: Use training & validation data better
• Split the training & validation data randomly into k folds.
• For k rounds hold one fold back for testing and use the remaining 𝑘𝑘 − 1 folds
for training. Validation
• Use the average error/accuracy as a better estimate. Data
• Some algorithms/tools do this internally.

• LOOCV (leave-one-out cross validation): 𝑘𝑘 = 𝑛𝑛 used if very little Test

data is available. Data
Learning Curve:
The Effect the Training Data Size
Accuracy of a classifier
when the amount of
available training data
increases.
Accuracy

More data is better!

At some point the

learning curve flattens
out and more data does
not contribute much!
Comparing to a Baselines
• First step: get a baseline
• Baselines are very simple straw man model.
• Helps to determine how hard the task is.
• Helps to find out what a good accuracy is.

• Weak baseline: The most frequent label classifier

• Gives all test instances whatever label was most common in the training set.
• Example: For spam filtering, give every message the label “ham.”
• Accuracy might be very high if the problem is skewed (called class imbalance).
• Example: If calling everything “ham” gets already 66% right, so a classifier that gets 70% isn’t very good…

• Strong baseline: For research, we typically compare to previous published state-

of-the-art as a baseline.
Types of
Models
Regression: Predict a number
Classification: Predict a label
Regression: Linear Regression
Model: ℎ𝒘𝒘 𝒙𝒙𝑗𝑗 = 𝑤𝑤𝑜𝑜 + 𝑤𝑤1 𝑥𝑥𝑗𝑗,1 + ⋯ + 𝑤𝑤𝑛𝑛 𝑥𝑥𝑗𝑗,𝑛𝑛 = ∑𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑗𝑗,𝑖𝑖 = 𝒘𝒘𝑇𝑇 𝒙𝒙𝑗𝑗
Squared error loss over the whole data matrix 𝑿𝑿
Empirical Loss: 𝐿𝐿 𝒘𝒘 = 𝑿𝑿𝑿𝑿 − 𝒚𝒚 𝟐𝟐
The gradient is a vector of partial derivatives
𝑇𝑇
Gradient: ∇𝐿𝐿 𝒘𝒘 = 2𝑿𝑿𝑇𝑇 𝑿𝑿𝑿𝑿 − 𝒚𝒚 𝜕𝜕𝐿𝐿 𝜕𝜕𝐿𝐿 𝜕𝜕𝐿𝐿
∇𝐿𝐿 𝒘𝒘 = (𝒘𝒘), (𝒘𝒘), … , (𝒘𝒘)
𝜕𝜕𝑤𝑤1 𝜕𝜕𝑤𝑤2 𝜕𝜕𝑤𝑤𝑛𝑛
Find: ∇𝐿𝐿 𝒘𝒘 = 0

Gradient descend: ∇𝐿𝐿 𝒘𝒘

𝒘𝒘
𝒘𝒘 = 𝒘𝒘 − 𝛼𝛼∇𝐿𝐿 𝒘𝒘

Analytical solution:
𝒘𝒘∗ = 𝑿𝑿𝑇𝑇 𝑿𝑿 −1
𝑿𝑿𝑇𝑇 𝒚𝒚
Pseudo inverse
Naïve Bayes Classifier
• Approximates a Bayes classifier with the naïve independence assumption that all 𝑛𝑛
features are conditional independent given the class.
𝑛𝑛

ℎ 𝑥𝑥 = argmax 𝑃𝑃 𝑦𝑦 � 𝑃𝑃 𝑥𝑥𝑖𝑖 𝑦𝑦)

𝑦𝑦
𝑖𝑖=1
The 𝑃𝑃 𝑦𝑦 s and the 𝑃𝑃 𝑥𝑥𝑖𝑖 𝑦𝑦)s are estimated from the data by counting.

• Gaussian Naïve Bayes Classifiers extend the approach to continuous features by

assuming:

𝑃𝑃 𝑥𝑥𝑖𝑖 𝑦𝑦) ~ 𝑁𝑁 𝜇𝜇𝑦𝑦 , 𝜎𝜎𝑦𝑦

The parameters for the normal distribution 𝑁𝑁 𝜇𝜇𝑦𝑦 , 𝜎𝜎𝑦𝑦 are estimated from data.
Decision Trees

• A sequence of decisions represented as a tree.

• Many implementations that differ by
• How to select features to split?
• When to stop splitting?
• Is the tree pruned?

• Approximates a Bayesian classifier by

ℎ(𝑥𝑥) = argmax 𝑃𝑃 𝑌𝑌 = 𝑦𝑦 leafNodeMatching(𝑥𝑥))
𝑦𝑦
K-Nearest Neighbors Classifier

• Class is predicted by looking at the majority in the set of the k nearest neighbors. 𝑘𝑘 is a
hyperparameter. Larger 𝑘𝑘 smooth the decision boundary.
• Neighbors are found using a distance measure (e.g., Euclidean distance between points).
• Approximates a Bayesian classifier by
ℎ(𝑥𝑥) = argmax 𝑃𝑃 𝑌𝑌 = 𝑦𝑦 neighborhood(𝑥𝑥))
𝑦𝑦
Support Vector Machine (SVM)

Margin

Decision
boundary

• Linear classifier that finds the maximum margin separator using only the points
that are “support vectors” and quadratic optimization.
• The kernel trick can be used to learn non-linear decision boundaries.
Artificial Neural Networks/Deep Learning
Computational graph
Hidden Layer For classification
typically a softmax • Represent 𝑦𝑦� = ℎ 𝑥𝑥 as a network
activation function of weighted sums with non-linear
returning 𝑷𝑷(𝑦𝑦|𝑥𝑥) activation functions g (e.g.,
logistic, ReLU).
• Learn weights 𝐰𝐰 from examples
using backpropagation of
prediction errors L(𝑦𝑦,
� 𝑦𝑦) (gradient
descend).
• ANNs are universal
approximators. Large networks
can approximate any function (no
bias). Regularization is typically
used to avoid overfitting.
• Deep learning adds more hidden
layers and layer types (e.g.,
convolution layers) for better
Perceptron learning.
Bias term Non-linear activation function
Many other models exist

• Generalized linear model (GLM): This important

Other model family includes linear regression and the

classification method logistic regression.

Popular Often used methods

• Regularization: enforce simplicity by using a penalty

Models and for complexity.

• Kernel trick: Let a linear classifier learn non-linear
decision boundaries ( = a linear boundary in a high

Methods dimensional space).

• Ensemble Learning: Use many models and combine
the results (e.g., random forest, boosting).
• Embedding and Dimensionality Reduction: Learn
how to represent data in a simpler way.
Some Use Cases of ML for Intelligent Agents
Learn Actions Learn Heuristics Perception Compressing Tables

• Directly learn the best • Learn evaluation • Natural language • Neural networks can be
action from examples. functions for states. processing: Use deep used as a compact
learning / word representation of tables
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = ℎ(𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = ℎ(𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) embeddings / language that do not fit in
models to understand memory. E.g.,
concepts, translate • Joint probability table
• This model can also be • Can learn a heuristic for between languages, or • State utility table
used as a playout policy minimax search from generate text.
for Monte Carlo tree examples.
search with data from • Speech recognition:
• The tables can be
self-play. Identify the most likely
learned form data.
sequence of words.
• Vision: Object
recognition in
images/videos.
Generate images/video.

Bottom line: Learning a function is often more effective than hard-coding it

However, we do not always know how it performs in very rare cases!

The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Chapter 1 Describe Artificial Intelligence Workloads and Considerations - Exam Ref AI-900 Microsoft Azure AI Fundamentals
No ratings yet
Chapter 1 Describe Artificial Intelligence Workloads and Considerations - Exam Ref AI-900 Microsoft Azure AI Fundamentals
24 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Chapter 19
No ratings yet
Chapter 19
30 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
ML 01
No ratings yet
ML 01
24 pages
Notes
No ratings yet
Notes
125 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
07 Intro to ML
No ratings yet
07 Intro to ML
38 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
ML HAND WRITTEN NOTES
No ratings yet
ML HAND WRITTEN NOTES
19 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
SML_Lecture1
No ratings yet
SML_Lecture1
37 pages
Classification
No ratings yet
Classification
53 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Ttnt 09 Learning From Examples
No ratings yet
Ttnt 09 Learning From Examples
58 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
2-Inductive Learning
No ratings yet
2-Inductive Learning
37 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Lecture # 09
No ratings yet
Lecture # 09
3 pages
AAI Lecture 9 Sp 25
No ratings yet
AAI Lecture 9 Sp 25
26 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Lec2 Intro to ML
No ratings yet
Lec2 Intro to ML
35 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
OR forecasting tool
No ratings yet
OR forecasting tool
39 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Lec1 -Introduction
No ratings yet
Lec1 -Introduction
55 pages
Module 1
No ratings yet
Module 1
50 pages
L2_Problems in ML & Performance Evaluation - Copy
No ratings yet
L2_Problems in ML & Performance Evaluation - Copy
30 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
COMP2050-Lecture 22 - Machine Learning
No ratings yet
COMP2050-Lecture 22 - Machine Learning
47 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Chap 5 Learning
No ratings yet
Chap 5 Learning
56 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Unit 4_C Designing Interfaces and Dialouges
No ratings yet
Unit 4_C Designing Interfaces and Dialouges
25 pages
Unit 3 Analysis_a System Requirements
No ratings yet
Unit 3 Analysis_a System Requirements
47 pages
cs221-lecture10
No ratings yet
cs221-lecture10
43 pages
OS Syllabus
No ratings yet
OS Syllabus
5 pages
system call
No ratings yet
system call
21 pages
disk_management
No ratings yet
disk_management
46 pages
Deadlock
No ratings yet
Deadlock
38 pages
Process creation 2
No ratings yet
Process creation 2
11 pages
ai-ch18-learning-from-examples-part-2
No ratings yet
ai-ch18-learning-from-examples-part-2
30 pages
Machine Learning
No ratings yet
Machine Learning
68 pages
cs221-lecture12
No ratings yet
cs221-lecture12
28 pages
slides_kbAgents (1)
No ratings yet
slides_kbAgents (1)
97 pages
Aneka
No ratings yet
Aneka
12 pages
Hypervisor ESXI 5
No ratings yet
Hypervisor ESXI 5
8 pages
MoreAnekaExamples
No ratings yet
MoreAnekaExamples
10 pages
Architecture of Server Virtualization 3
No ratings yet
Architecture of Server Virtualization 3
13 pages
chapter-2-lab-lab-assignment
No ratings yet
chapter-2-lab-lab-assignment
6 pages
unit-2-linked-lists
No ratings yet
unit-2-linked-lists
21 pages
TaskModel
No ratings yet
TaskModel
68 pages
unit-3-stacks-and-queues
No ratings yet
unit-3-stacks-and-queues
13 pages
chapter-3-lab-lab-assignment
No ratings yet
chapter-3-lab-lab-assignment
7 pages
unit-5-binary-trees
No ratings yet
unit-5-binary-trees
28 pages
chapter-1-lab-lab-assignment
No ratings yet
chapter-1-lab-lab-assignment
7 pages
2 vector-calculus
No ratings yet
2 vector-calculus
3 pages
Chapter 4 Lab Instructions
No ratings yet
Chapter 4 Lab Instructions
3 pages
laudon_ess10e_pp_4
No ratings yet
laudon_ess10e_pp_4
48 pages
unit-4-recursion
No ratings yet
unit-4-recursion
10 pages
unit-1-complexity-analysis
No ratings yet
unit-1-complexity-analysis
6 pages
e Commercesecurityandpaymentsystems
No ratings yet
e Commercesecurityandpaymentsystems
21 pages
Malware Images: Visualization and Automatic Classification: July 2011
No ratings yet
Malware Images: Visualization and Automatic Classification: July 2011
8 pages
School of Engineering and Technology: Naga Nikhil Kaushik A
No ratings yet
School of Engineering and Technology: Naga Nikhil Kaushik A
62 pages
Analysis of Performance Metrics of Heart Failured Patie - 2021 - Global Transiti
No ratings yet
Analysis of Performance Metrics of Heart Failured Patie - 2021 - Global Transiti
5 pages
DS Lab Manual Final
No ratings yet
DS Lab Manual Final
49 pages
Technical Specificati: Geomatica® Core
No ratings yet
Technical Specificati: Geomatica® Core
27 pages
Flann
No ratings yet
Flann
8 pages
"One Rule To Classify Them All": - Lord of The Rules
No ratings yet
"One Rule To Classify Them All": - Lord of The Rules
13 pages
Geokim Jurnal
No ratings yet
Geokim Jurnal
13 pages
HyperSpecTral Image Classification
No ratings yet
HyperSpecTral Image Classification
17 pages
Land Use / Land Cover Change Detection Using Remote Sensing Techniques
No ratings yet
Land Use / Land Cover Change Detection Using Remote Sensing Techniques
25 pages
TDP_YuShan2024
No ratings yet
TDP_YuShan2024
10 pages
Gis Processing
No ratings yet
Gis Processing
46 pages
Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, Antonio Jose Serrano Lopez, Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martin
No ratings yet
Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, Antonio Jose Serrano Lopez, Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martin
736 pages
MNIST Based Handwritten Digits Recognition
No ratings yet
MNIST Based Handwritten Digits Recognition
5 pages
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
13 pages
ee_708_report
No ratings yet
ee_708_report
3 pages
5E Science Lesson-Classifying
No ratings yet
5E Science Lesson-Classifying
5 pages
Applied Sciences: Comprehensive Analysis of Traffic Accidents in Seoul: Major Factors and Types Affecting Injury Severity
No ratings yet
Applied Sciences: Comprehensive Analysis of Traffic Accidents in Seoul: Major Factors and Types Affecting Injury Severity
14 pages
Machine Learning Python
100% (1)
Machine Learning Python
9 pages
A Machine Learning Based CIDS Model For Intrusion Detection To Ensure Security Within Cloud Network
No ratings yet
A Machine Learning Based CIDS Model For Intrusion Detection To Ensure Security Within Cloud Network
9 pages
mml-book (ch1)
No ratings yet
mml-book (ch1)
6 pages
Age and Gender Prediction in Open Domain Text
No ratings yet
Age and Gender Prediction in Open Domain Text
8 pages
Artificial Neural Networks: Asad Anwar Butt
No ratings yet
Artificial Neural Networks: Asad Anwar Butt
39 pages
Course Outline CSC 588 Data Warehousing and Data Mining1
No ratings yet
Course Outline CSC 588 Data Warehousing and Data Mining1
5 pages
Stock Market Analysis Using Classification Algorithm PDF
No ratings yet
Stock Market Analysis Using Classification Algorithm PDF
6 pages
Modelling and Simulation For Installation Feasibility of Standalone Photovoltaic System For Quetta, Pakistan
No ratings yet
Modelling and Simulation For Installation Feasibility of Standalone Photovoltaic System For Quetta, Pakistan
27 pages
A Financial Statement Fraud Detection Model Based On Hybrid Data Mining Methods
No ratings yet
A Financial Statement Fraud Detection Model Based On Hybrid Data Mining Methods
5 pages
Lesson Plan -ML3
No ratings yet
Lesson Plan -ML3
4 pages

19_ML_intro

Uploaded by

19_ML_intro

Uploaded by

CS 5/7320

This work is licensed under a Creative Commons

How is the agent

Update the performance

Exploration: select actions

• What representation (model) is used in the component?

• What feedback is available for learning?

• Consistency: ℎ 𝑥𝑥𝑖𝑖 ≈ 𝑦𝑦𝑖𝑖

• Measure mistakes: Loss function 𝐿𝐿(𝑦𝑦, 𝑦𝑦)

• Find the best hypothesis that minimizes the loss

Generalization: How well does the hypothesis perform on new data?

How to achieve simplicity?

Model Selection: Bias vs. Variance

Lines: the learned

High Bias: restrictions by the model class Low

• Example for Spam detection: In addition to words

• Feature Selection: Which features should be used in the

𝑐𝑐 is an indicator function returning 1 if 𝑐𝑐 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 and otherwise 0

1. Model parameters (the model): E.g., probabilities, weights, Training

• We need to tune the hyperparameters! Test

• LOOCV (leave-one-out cross validation): 𝑘𝑘 = 𝑛𝑛 used if very little Test

More data is better!

At some point the

• Weak baseline: The most frequent label classifier

• Strong baseline: For research, we typically compare to previous published state-

Gradient descend: ∇𝐿𝐿 𝒘𝒘

ℎ 𝑥𝑥 = argmax 𝑃𝑃 𝑦𝑦 � 𝑃𝑃 𝑥𝑥𝑖𝑖 𝑦𝑦)

• Gaussian Naïve Bayes Classifiers extend the approach to continuous features by

𝑃𝑃 𝑥𝑥𝑖𝑖 𝑦𝑦) ~ 𝑁𝑁 𝜇𝜇𝑦𝑦 , 𝜎𝜎𝑦𝑦

• A sequence of decisions represented as a tree.

• Approximates a Bayesian classifier by

• Generalized linear model (GLM): This important

Other model family includes linear regression and the

Popular Often used methods

• Regularization: enforce simplicity by using a penalty

Models and for complexity.

Methods dimensional space).

Bottom line: Learning a function is often more effective than hard-coding it

You might also like