0% found this document useful (0 votes)
10 views27 pages

3. Introduction to Machine Learning

The document provides an introduction to machine learning, covering its definition, key concepts, and various algorithms including supervised and unsupervised learning. It discusses data representation, feature extraction, and the importance of training and testing data in building machine learning models. Additionally, it highlights applications of machine learning in various fields such as recommendation systems, virtual assistants, and autonomous vehicles.

Uploaded by

mathurarushi4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views27 pages

3. Introduction to Machine Learning

The document provides an introduction to machine learning, covering its definition, key concepts, and various algorithms including supervised and unsupervised learning. It discusses data representation, feature extraction, and the importance of training and testing data in building machine learning models. Additionally, it highlights applications of machine learning in various fields such as recommendation systems, virtual assistants, and autonomous vehicles.

Uploaded by

mathurarushi4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Introduction to Machine

Learning
What to expect from the course?
 What is Machine Learning ?

 Data Visualization/Analysis, Pandas, NumPy,……..

 Dimensionality Reduction
Hands-On session will be
 Different Machine Learning Algorithms (Supervised, conducted in parallel
Unsupervised, metrics)

 Deep Learning (Neural Networks, back propagation, loss


functions)

 CNN, RNN, LSTM

 ….. and more


Introduction
Any technique which enables
computers to mimic human behaviour
Artificial Intelligence

Machine Learning  Learning is much deeper than


memorization and information recall

 Learning is “a process that leads to


Deep Learning
change, which occurs as a result of
experience and increases the potential
for improved performance and future
learning” (Ambrose et al, 2010, p.3)
Machine Learning
 Machine learning is a “Field of study that gives
computers the ability to learn without being explicitly INPUT, DATA
programmed.” : Arthur Samuel

 The function of a machine learning system can be: Intelligent


System
 descriptive, meaning that the system uses the data
to explain what happened
Decisions,
 predictive, meaning the system uses the data to
Output,
predict what will happen
Actions
 prescriptive, meaning the system will use the data
to make suggestions about what action to take
Data Driven Problem Solving
Area (sq.ft) Price Area (sq.ft) Price
250 250000 250 145500
120 120000 120 212800
310 310000 310 194390
290 290000 290

Not a trivial solution. There


Simple well-known solution.
should be more parameters,
(Price = Area *1000)
(e.g., Age, Location)
The above relation obtained in a
Lot more data is needed to
trivial way, with one example.
solve the above.
Remarks
General Strategy: Given many examples of (X,Y), learn an automated solution to predict Y
Given a new X, Y = F(X)

 Main Challenge: The data is becoming complex


3.1
-2.6
 What is X is not a simple number? 0.41 3.9 m

 A N-dim vector? 1.89 ₹ 8.2 L


15.2 Blue
 Entities other than numbers?
Sedan
 A picture? …
 A sound bite? 9.23
How do we get the machine to do this?

General Strategy: Given many examples of (X,Y), learn an automated solution to predict Y
Given a new X, Y = F(X)

3.1
 There is too much information in raw data
-2.6
0.41
 Relevant information is hidden probably? 1.89
3.9 m
₹ 8.2 L
15.2
 Leads to Feature Extraction: Extracting Blue
useful information (X) from raw data

Sedan

9.23
Representation: From Raw data to Features
Area Bedrooms Bathrooms Age Parking Basement Price
240 3 2 10 No Yes 250000

 Convert all data into a vector of real numbers: Raw Data


 Points in a feature space

𝐢 𝐢

 Convert all predictions into an integer/real number:

 How do we deal with categorical data?


Categorical Data
 Ordinal Data – The categories have a meaningful order or ranking, but the intervals between the
categories are not necessarily equal. e.g.-Satisfaction Rating: Poor, Fair, Good, Excellent.

 Nominal Data - The categories are names or labels with no inherent order or ranking.
e.g.- Colors: Red, Green, Blue, Types of Pets: Dog, Cat, Bird, Fish.

 Use Integer Encoding for ordinal data where the order of categories is meaningful.

 Categories like "low," "medium," and "high" can be represented as 1, 2, and 3, respectively. The
numerical values reflect the order or ranking among the categories.

 One-Hot Encoding is used for nominal data where there is no natural order, or to prevent
algorithms from mistakenly interpreting ordinal relationships between categories.
One-Hot Encoding
 Most widespread approach used for categorical data, unless your categorical variable takes on a large
number of values.

Pets Cat Dog Fish


Cat 1 0 0
Cat 1 0 0
Dog One-Hot Encoding 0 1 0
Fish 0 0 1
Dog 0 1 0
Cat 1 0 0
Fish 0 0 1

 Can lead to a significant increase in the number of features, especially if the categorical feature has
many unique values.
Representation: From Raw data to Features
Area Bedrooms Bathrooms Age Parking Basement Price
240 3 2 10 No Yes 250000

 We are given a set of n examples: 𝐢

 Our goal is to learn a model: that captures the pattern of the training
samples
 We can assume a model and learn its parameters

 Once we learn the model, we can predict the output, corresponding to any new
input, X’ :
Usual Programming vs Machine Learning
Programming: Machine Learning:

New Data: X’
Data Program Data: X Output: Y
F(X, Y)

Testing phase
Training phase
Computer
Computer Computer

Output: Y’
Output Program: F(X, Y)
ML Based on Training-Testing Data

Labelled Data [ samples]

Test Data for


Training Data [ samples] remaining
samples

 Take care to not leak information from Test Data into the Model
Feature extraction, Goal: to predict f()
Training Data with the Building
Learn about f() from
representation of a model
training data
feature space

Test Data Model Design


and Validation

Feature extraction, with the


representation of feature
space Trained Model

Model
Evaluation and Compute Prediction
Deployment for the test data
Data Representation
Age
Area Age Property
230 15 A
120 6 B
202 2 B
398 11 A
274 8 ?
Area
Feature Space Representation
Finding the best
Property Type Equation of the line
Feature extraction, fit line
with the Goal: to predict f()
Training Data Building
representation of Learn about f() from
a model
feature space training data
Unknown Property Type Area, Age as points in
Test Data 2D space Model Design
and Validation

Feature extraction, with the


representation of feature
Point vs. Line
space Trained Model
Area, Age as points in
2D space Property Type – A/B
Learning is concerned with accurate Model
Evaluation and Compute Prediction
prediction of future data, not accurate
Deployment for the test data
prediction of training or available data
Summary – Machine Learning Framework

y = f(x)  Note: Training set and


testing set comes from
the same distribution
output prediction feature or
function representation

 Training: given a training set of labeled examples 𝟏 𝟏 𝟐 𝟐 𝑵 𝑵 ,


estimate the prediction function f by minimizing the prediction error

 Testing: apply f to the test example x’ and output the predicted value y = f(x’)
Summary – Machine Learning Framework

y = f(x)
output prediction feature or
function representation

 The input is converted to a vector x


 The output is a value indicated by y
 Depending on the nature of x and y, we define
1) Regression
2) Classification
3) …………………
Representations
 Representations in machine learning refer to the way data is transformed or encoded into
a format that is suitable for a learning algorithm to process

Sepal Length Sepal Width Petal Length Petal Width Species


5.1 3.5 1.4 0.2 A
5.4 3.7 1.1 0.1 A
5.2 2.7 3.9 1.0 B
6.6 2.9 3.5 1.2 B
5.8 2.8 5.1 2.4 C
7.7 3.7 6.7 2.2 C
-------- ------ ----- ----- -----
Feature Space
Representations
 Images: Raw Pixel Representation, Deep Learning Based Features

 The sum of all the pixels


 The number of boundary pixels
 Edge detection
Representations

 Sound: Waveform Representation,


Spectrogram Representation, Mel-
Frequency Cepstral Coefficients

Reference: Towards Low-Complexity Wireless


Technology Classification Across Multiple
Environments,
DOI:10.1016/j.adhoc.2019.101881
Representations – Textual Data
 Text Data: N-grams, Bag of Words, Term Frequency-Inverse Document Frequency, Word
Embeddings
Sentence: The weather is sunny today

N-gram N-gram Generated Number of N-gram


Sentence Features
Unigram (1-Gram) “The”, “weather”, “is”, 5
“sunny”, “today”
Bigram (2-Gram) “The weather”, “The is”, 10
“The sunny”, ……
Trigram (3-Gram) “The weather is”, 3
“weather is sunny”, ….
Representations – Textual Data
 Text Data: N-grams, Bag of Words, Term Frequency-Inverse Document Frequency, Word
Embeddings
 Sentence 1: The weather is sunny today
 Sentence 2: The weather was rainy yesterday

1 2 3 4 5 6 7 8 Length
The weather is sunny today was rainy yesterday
1 1 1 1 1 1 0 0 0 5
2 1 1 0 0 0 1 1 1 5

 Vector of Sentence 1: [1 1 1 1 1 0 0 0]

 Vector of Sentence 2: [1 1 0 0 0 1 1 1]
Why sudden interest in AI?

 Appearance of large, high-quality labeled datasets

 Massively parallel computing with GPUs

 Backprop-friendly activation functions, Improved


architectures

 Software platforms, Cloud Compute, APIs,


Libraries
More People, Papers, Results,  New regularization techniques, Robust optimizers
Funding, Positive Feedback.
Where is Machine Learning?

Recommendation
Systems Virtual Assistants
Facial Recognition

E-Commerce
Create Photographs, Paintings
Chess/ Go Champions

Autonomous Cars/Navigation

Speech Recognition
Segmentation
Image Courtesy: Google
Other Applications
• Surveillance
• Automated Assembly
• Mail Sorting
• Face detection (photography)
• Robot Navigation
• Content-Based Image Retrieval
• Entertainment
• And many more…

Image Courtesy: Google

You might also like