NLP Course Lecture02 Huawei Noahs Ark Lab
NLP Course Lecture02 Huawei Noahs Ark Lab
Spring 2020
A course delivered at MIPT, Moscow
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 1 / 135
Content
3 Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 2 / 135
Machine Learning basics
Content
3 Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 3 / 135
Machine Learning basics What is machine learning?
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 4 / 135
Machine Learning basics What is machine learning?
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 5 / 135
Machine Learning basics What is machine learning?
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 6 / 135
Machine Learning basics What is machine learning?
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 7 / 135
Machine Learning basics What is machine learning?
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 8 / 135
Machine Learning basics What is machine learning?
Figure: Using the model to make predictions for new query instances.
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 9 / 135
Machine Learning basics What is machine learning?
L OAN -S ALARY
ID O CCUPATION AGE R ATIO O UTCOME
1 industrial 34 2.96 repaid
2 professional 41 4.64 default
3 professional 36 3.22 default
4 professional 41 3.11 default
5 industrial 48 3.80 default
6 industrial 61 2.52 repaid
7 professional 37 1.50 repaid
8 professional 40 1.93 repaid
9 industrial 33 5.25 default
10 industrial 32 4.15 default
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 10 / 135
Machine Learning basics Machine learning – an example
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 11 / 135
Machine Learning basics Machine learning – an example
L OAN -S ALARY
ID O CCUPATION AGE R ATIO O UTCOME
1 industrial 34 2.96 repaid
2 professional 41 4.64 default
3 professional 36 3.22 default
4 professional 41 3.11 default
5 industrial 48 3.80 default
6 industrial 61 2.52 repaid
7 professional 37 1.50 repaid
8 professional 40 1.93 repaid
9 industrial 33 5.25 default
10 industrial 32 4.15 default
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 12 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 13 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 14 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 15 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 16 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 17 / 135
Machine Learning basics Machine learning – an example
Loan-
Salary
ID Amount Salary Ratio Age Occupation House Type Outcome
1 245,100 66,400 3.69 44 industrial farm stb repaid
2 90,600 75,300 1.2 41 industrial farm stb repaid
3 195,600 52,100 3.75 37 industrial farm ftb default
4 157,800 67,600 2.33 44 industrial apartment ftb repaid
5 150,800 35,800 4.21 39 professional apartment stb default
6 133,000 45,300 2.94 29 industrial farm ftb default
7 193,100 73,200 2.64 38 professional house ftb repaid
8 215,000 77,600 2.77 17 professional farm ftb repaid
9 83,000 62,500 1.33 30 professional house ftb repaid
10 186,100 49,200 3.78 30 industrial house ftb default
11 161,500 53,300 3.03 28 professional apartment stb repaid
12 157,400 63,900 2.46 30 professional farm stb repaid
13 210,000 54,200 3.87 43 professional apartment ftb repaid
14 209,700 53,000 3.96 39 industrial farm ftb default
15 143,200 65,300 2.19 32 industrial apartment ftb default
16 203,000 64,400 3.15 44 industrial farm ftb repaid
17 247,800 63,800 3.88 46 industrial house stb repaid
18 162,700 77,400 2.1 37 professional house ftb repaid
19 213,300 61,100 3.49 21 industrial apartment ftb default
20 284,100 32,300 8.8 51 industrial farm ftb default
21 154,000 48,900 3.15 49 professional house stb repaid
22 112,800 79,700 1.42 41 professional house ftb repaid
23 252,000 59,700 4.22 27 professional house stb default
24 175,200 39,900 4.39 37 professional apartment stb default
25 149,700 58,600 2.55 35 industrial farm stb default
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 18 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 19 / 135
Machine Learning basics Machine learning – an example
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 20 / 135
Machine Learning basics Model spaces and inductive bias
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 21 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 22 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 23 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 24 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 25 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
B BY A LC O RG G RP M1 M2 M3 M4 M5 ... M6 561
no no no ? couple couple single couple couple couple
no no yes ? single couple single couple couple single
no yes no ? family family single single single family
no yes yes ? single single single single single couple
...
yes no no ? couple couple family family family family
yes no yes ? couple family family family family couple
yes yes no ? single family family family family single
yes yes yes ? single single family family couple family
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 26 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
Table: A sample of the models that are consistent with the training
data
B BY A LC O RG G RP M1 M2 M3 M4 M5 ... M6 561
no no no couple couple couple single couple couple couple
no no yes couple single couple single couple couple single
no yes no ? family family single single single family
no yes yes single single single single single single couple
...
yes no no ? couple couple family family family family
yes no yes family couple family family family family couple
yes yes no family single family family family family single
yes yes yes ? single single family family couple family
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 27 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
Table: A sample of the models that are consistent with the training
data
B BY A LC O RG G RP M1 M2 M3 M4 M5 ... M6 561
no no no couple couple couple single couple couple couple
no no yes couple single couple single couple couple single
no yes no ? family family single single single family
no yes yes single single single single single single couple
...
yes no no ? couple couple family family family family
yes no yes family couple family family family family couple
yes yes no family single family family family family single
yes yes yes ? single single family family couple family
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 28 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 29 / 135
Machine Learning basics Model spaces and inductive bias
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 30 / 135
Machine Learning basics Classification and regression
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 31 / 135
Machine Learning basics Classification and regression
Classification
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 32 / 135
Machine Learning basics Classification and regression
Regression
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 33 / 135
Machine Learning basics Overfitting and underfitting
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 34 / 135
Machine Learning basics Overfitting and underfitting
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 35 / 135
Machine Learning basics Overfitting and underfitting
80000
●
60000
●
Income
●
●
40000
20000
0 20 40 60 80 100
Age
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 36 / 135
Machine Learning basics Overfitting and underfitting
80000
●
60000
●
Income
●
●
40000
20000
0 20 40 60 80 100
Age
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 37 / 135
Machine Learning basics Overfitting and underfitting
80000
●
60000
●
Income
●
●
40000
20000
0 20 40 60 80 100
Age
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 38 / 135
Machine Learning basics Overfitting and underfitting
80000
●
60000
●
Income
●
●
40000
20000
0 20 40 60 80 100
Age
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 39 / 135
Machine Learning basics Overfitting and underfitting
What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary
80000
80000
80000
● ● ● ●
60000
60000
60000
60000
● ● ● ●
Income
Income
Income
Income
● ● ● ●
● ● ● ●
40000
40000
40000
40000
20000
20000
20000
20000
● ● ● ●
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 40 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 41 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Unsupervised learning
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 42 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Supervised learning
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 43 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Unsupervised learning
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 44 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Semi-supervised learning
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 45 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Clustering
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 46 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Clustering – An example
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 47 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Clustering – An example
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 48 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Clustering – An example
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 49 / 135
Machine Learning basics Unsupervised learning and semi-supervised learning
Clustering – An example
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 50 / 135
Classification and logistic regression
Content
3 Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 51 / 135
Classification and logistic regression Classification - an example
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 52 / 135
Classification and logistic regression Classification - an example
Classification - an example
A power generator
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 53 / 135
Classification and logistic regression Classification - an example
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 54 / 135
Classification and logistic regression Classification - an example
Classification - an example
600
500
400
Vibration
300
200
100
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 56 / 135
Classification and logistic regression Decision boundary
Decision boundary
600
500
400
Vibration
300
200
100
Decision boundary
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 58 / 135
Classification and logistic regression Decision boundary
Decision boundary
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 58 / 135
Classification and logistic regression Decision boundary
Decision boundary
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 58 / 135
Classification and logistic regression Decision boundary
Notation
1 x2
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 59 / 135
Classification and logistic regression Model definition
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 60 / 135
Classification and logistic regression Model definition
(
1 if: x >= 0
Heaviside(x) =
0 if: x < 0
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 61 / 135
Classification and logistic regression Model definition
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 62 / 135
Classification and logistic regression Model definition
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 63 / 135
Classification and logistic regression Model definition
(a) (b)
Figure: (a) A surface showing the value of Equation (6)[21] for all
values of RPM dand θ (x) hθ (x) =boundary
V IBRATION. The decision Heaviside(d given θ (x))
in
Equation (6)[21] is highlighted. (b) The same surface linearly
thresholded
John Kelleher and Brian Macat zeroand
Namee to Aoife
operate
D’Arcy,as a predictor.
Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 64 / 135
Classification and logistic regression Model definition
1
Logistic(x) =
1 + e−x
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 65 / 135
Classification and logistic regression Model definition
Logistic function
Interpreting Learning Rate Cat. Features Logistic Reg. Non-Linear Relationships Multinomial SVM
1.00
0.75
logistic(x)
0.50
0.25
0.00
−10 −5 0 5 10
x
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 66 / 135
Classification and logistic regression Model definition
1 1
hθ (x) = Logistic(dθ (x)) = =
1+e −dθ (x) 1 + e−θT x
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 67 / 135
Classification and logistic regression Model definition
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 69 / 135
Classification and logistic regression Cost function
Cost function
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 70 / 135
Classification and logistic regression Cost function
Cost function
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 71 / 135
Classification and logistic regression Cost function
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 72 / 135
Classification and logistic regression Cost function
n
1X
J(θ) = Cost(hθ (x {i} ), y {i} )
n
i=1
where:
(
− log(hθ (x)) if y = 1
Cost(hθ (x), y ) =
− log(1 − hθ (x)) if y = 0
= − [y log(hθ (x)) + (1 − y ) log(1 − hθ (x))]
y {∗} ∈ {0, 1}
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 73 / 135
Classification and logistic regression Cost function
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 74 / 135
Classification and logistic regression Stochastic gradient descend
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 75 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 76 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 77 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 78 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 79 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 80 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 81 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 82 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 83 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 84 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 85 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 86 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descend
J(0,1)
1
0
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 87 / 135
Classification and logistic regression Stochastic gradient descend
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 88 / 135
Classification and logistic regression Stochastic gradient descend
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 89 / 135
Classification and logistic regression Stochastic gradient descend
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 90 / 135
Classification and logistic regression Stochastic gradient descend
∂J(θ)
The derivation ∂θ gives the direction of the movement.
The learning rate α is used to adjust the size for each step.
figure source: https://machinelearningmedium.com/2017/08/15/gradient-descent/
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 91 / 135
Classification and logistic regression Stochastic gradient descend
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 92 / 135
Classification and logistic regression Stochastic gradient descend
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 93 / 135
Classification and logistic regression Stochastic gradient descend
Model
1
hθ (x) =
1 + e−θT x
Parameters
θ0 , θ1
Cost Function
n
1 X h {i} i
J(θ) = − y log(hθ (x {i} )) + (1 − y {i} ) log(1 − hθ (x {i} ))
n
i=1
Goal
minimize J(θ)
θ0 ,θ1
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 94 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descent:
n
1 X h {i} i
J(θ) = − y log(hθ (x {i} )) + (1 − y {i} ) log(1 − hθ (x {i} ))
n
i=1
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 95 / 135
Classification and logistic regression Stochastic gradient descend
n
1 X h {i} i
J(θ) = − y log(hθ (x {i} )) + (1 − y {i} ) log(1 − hθ (x {i} ))
n
i=1
n
∂ 1 ∂ X h {i} i
J(θ) = − y log(hθ (x {i} )) + (1 − y {i} ) log(1 − hθ (x {i} ))
∂θj n ∂θj
i=1
= ······
n
1X {i}
= (hθ (x {i} ) − y {i} )xj
n
i=1
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 96 / 135
Classification and logistic regression Stochastic gradient descend
Gradient descent:
n
1 X h {i} i
J(θ) = − y log(hθ (x {i} )) + (1 − y {i} ) log(1 − hθ (x {i} ))
n
i=1
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 97 / 135
Classification and logistic regression Stochastic gradient descend
Andrew Ng
Fortunately the error surface for logistic regression is convex.
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 98 / 135
Classification and logistic regression Stochastic gradient descend
1.0
1.0
1.0
0.5
0.5
0.5
Vibration
Vibration
Vibration
0.0
0.0
0.0
−0.5
−0.5
−0.5
−1.0
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
RPM RPM RPM
1.0
1.0
25
20
0.5
0.5
Vibration
15
0.0
0.0
10
−0.5
−0.5
5
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 0 200 400 600 800
RPM RPM Training Iteration
John Kelleher and Brian Mac Namee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics
Figure: A selection of the logistic regression models developed
during the gradient descent process for the extended generators
dataset
Qun Liu in Table
& Valentin Malykh 35 [35] . The
(Huawei) bottom-right
Natural panel shows the sum
Language Processing of2020
Spring 99 / 135
Classification and logistic regression Stochastic gradient descend
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 100 / 135
Classification and logistic regression Stochastic gradient descend
Code
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 101 / 135
Classification and logistic regression Stochastic gradient descend
• Disadvantages
• It can be very slow.
• It is intractable for datasets that do not fit in memory.
• It does not allow us to update our model online.
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 102 / 135
Classification and logistic regression Stochastic gradient descend
Update equation
We need to calculate the
𝜃 = 𝜃 − 𝜂 ∗ 𝛻𝜃 𝐽(𝜃; 𝑥 𝑖 ; 𝑦 (𝑖) ) gradients for the whole dataset
to perform just one update.
Code
Note : we shuffle the training data at every epoch
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 103 / 135
Classification and logistic regression Stochastic gradient descend
• Disadvantages
• It performs frequent updates with a high variance that
cause the objective function to fluctuate heavily.
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 104 / 135
Classification and logistic regression Stochastic gradient descend
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 105 / 135
Classification and logistic regression Stochastic gradient descend
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 106 / 135
Classification and logistic regression Stochastic gradient descend
Update equation
𝑖:𝑖+𝑛
𝜃 = 𝜃 − 𝜂 ∗ 𝛻𝜃 𝐽(𝜃; 𝑥 ; 𝑦 (𝑖:𝑖+𝑛) )
Code
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 107 / 135
Classification and logistic regression Stochastic gradient descend
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 108 / 135
Classification and logistic regression Stochastic gradient descend
Trade-off
• Depending on the amount of data, they make a
trade-off :
• The accuracy of the parameter update
• The time it takes to perform an update.
Memory Online
Method Accuracy Time
Usage Learning
Batch gradient
○ Slow High ×
descent
Stochastic gradient
△ High Low ○
descent
Mini-batch gradient
○ Midium Midium ○
descent
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv:1609.04747 (2016).
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 109 / 135
Classification and logistic regression Multiclass classification
Content
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 110 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 111 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 111 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
Binary classification:
x2
x1
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 112 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
x2 x2
x1 x1
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 113 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
One-vs-all (one-vs-rest):
x2
x1
Class 1:
Class 2:
Class 3:
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 114 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
x2
One-vs-all (one-vs-rest):
x1
x2
x1
Class 1:
Class 2:
Class 3:
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 115 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
x2
One-vs-all (one-vs-rest):
x1
x2 x2
x1 x1
Class 1:
Class 2:
Class 3:
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 116 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
x2
One-vs-all (one-vs-rest):
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 117 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
x2
One-vs-all (one-vs-rest):
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 118 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
x2
One-vs-all (one-vs-rest):
x1
x2 x2
PP
PP
P
x1 x1
x2 B
Class 1: B
Class 2: B
Class 3: BB
x1
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 119 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
One-vs-all
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 120 / 135
Classification and logistic regression Multiclass classification
Multiclass classification
One-vs-all
Andrew Ng
Andrew Ng, Machine Learning, Coursera course
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 121 / 135
Text Classification
Content
3 Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 122 / 135
Text Classification
Text classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 123 / 135
Text Classification
Applications
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 124 / 135
Text Classification
This dataset contains around 200k news headlines from the year
2012 to 2018 obtained from HuffPost.
https://www.kaggle.com/rmisra/news-category-dataset
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 125 / 135
Text Classification
Kavita Ganesan, Build Your First Text Classifier in Python with Logistic Regression
https://kavita-ganesan.com/news-classifier-with-logistic-regression-in-python
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 126 / 135
Text Classification
Procedure
Text preprocessing
Feature extraction
Model training
Model Application
Evaluation
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 127 / 135
Text Classification
Text preprocessing
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 128 / 135
Text Classification
Stop words
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 129 / 135
Text Classification
Feature extraction
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 130 / 135
Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 131 / 135
Text Classification
(
1 if fik > 0
Boolean weighting: wik =
0 Otherwise
Word frequency weighting: wik = fik
TF-IDF weighting: wik = fik × log nNi
i: word index
k : document index
fik : word frequency in a document
N: number of documents in the corpus
ni : number of documents containing the word
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 132 / 135
Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 133 / 135
Text Classification
Algorithms
Logistic regression
Nearest neighbor
Decision trees
Support vector machines
Neural networks
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 134 / 135
Text Classification
Further topics
Feature selection
Dimension reduction
Document embeddings
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 135 / 135
Summary
Content
3 Text Classification
Qun Liu & Valentin Malykh (Huawei) Natural Language Processing Spring 2020 136 / 135