Machine Learning
Practice 2
Quan Minh Phan & Ngoc Hoang Luong
University of Information Technology (UIT)
April 7, 2021
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 1 / 74
Table of contents
1 Perceptron
2 Linear Regression
3 Adaptive Linear Neuron (Adaline)
4 Logistic Regression
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 2 / 74
Figure: The general concept of the perceptron
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 3 / 74
Recall
Problem
Using the perceptron model to classify the species of flowers (’setosa’ or
’versicolor’) based on the sepal and petal width.
2 features: the sepal width x1 ; the petal width x2
x = [x1 , x2 ]
2 labels: ’setosa’ ← -1; ’versicolor’ ← 1
w = [w1 , w2 ]
Net input function (z)
z = w1 ∗ x1 + w2 ∗ x2
Unit step function (φ(·))
(
1 if z ≥ θ,
φ(z) =
−1 otherwise.
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 4 / 74
Recall (next)
w = [w0 , w1 , w2 ]
Net input function (z)
z = w0 + w1 ∗ x1 + w2 ∗ x2 = w T x
Unit step function (φ(·))
(
1 if z ≥ 0,
φ(z) =
−1 otherwise.
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 5 / 74
Unit Step Function
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 6 / 74
Figure: The general concept of the perceptron
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 7 / 74
Training process
Algorithm 1 Pseudocode for the training process
1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 8 / 74
Updating the weights
w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label
Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 9 / 74
Components
Hyperparameters
eta → the learning rate
max iter → the maximum number of epochs
random state → to make the reproducible results
Parameters
w → the weights of model
errors → to store the error in each epoch
Methods
fit(X , y ) → to train the model
predict(X ) → to predict the output value
net input(X ) → to combine the features with the weights
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 10 / 74
Implement (code from scratch)
class Peceptron :
def init (self, eta = 0.01, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.errors = [ ]
def net input(self, X):
return np.dot(X, self.w[1:]) + self.w[0]
def predict(self, X):
return np.where(self.net input(X) ≥ 0.0, 1, -1)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 11 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.errors = [ ]
for n iter in range (self.max iter):
n wronglabels = 0
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self. eta * error
n wronglabels += int(error != 0.0)
self.errors.append(n wronglabels)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 12 / 74
Implement (library)
from sklearn.linear model import Perceptron
Hyperparameters
eta
max iter
random state
Parameters
coef
intercept
Methods
fit(X , y )
predict(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 13 / 74
Practice
Using ’iris.csv’ dataset
How can we use the ’sepal length’ and ’sepal width’ to classify the
speices of flower?
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 14 / 74
Data visualization
>> import matplotlib.pyplot as plt
>> idx setosa = y train == -1
idx versicolor = y train != -1
>> plt.scatter(X train[idx setosa, 0], X train[idx setosa, 1], color=’red’,
marker=’s’, label=’setosa’)
plt.scatter(X train[idx versicolor, 0], X train[idx versicolor, 1],
color=’blue’, marker=’x’, label=’versicolor’)
plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 15 / 74
Data visualization
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 16 / 74
Practice
>> ppn = Perceptron (eta = 0.001, max iter = 30, random state = 1)
ppn.fit(X train, y train)
>> from sklearn.linear model import Perceptron
>> ppn = Perceptron(eta = 0.001, max iter = 30, random state = 1)
ppn .fit(X train, y train)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 17 / 74
Plotting the errors
>> plt.plot(range(1, len(ppn.errors) + 1), ppn.errors, marker=’o’)
plt.xlabel(’Epochs’)
plt.ylabel(’# Missifications’)
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 18 / 74
Plotting the errors
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 19 / 74
Practice
>> w ppn = ppn.w
w ppn
>> [-0.01575655 0.06508244 -0.11108172]
>> w ppn = np.append(ppn .intercept , ppn .coef )
w ppn
>> [-0.006 0.0196 -0.0325]
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 20 / 74
Visualization
>> plot decision regions(X train, y train, classifier=ppn)
plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 21 / 74
Visualization
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 22 / 74
Practice (next)
Create a new model and train it on data after standardizing
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 23 / 74
Data visualization
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 24 / 74
Plotting the costs
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 25 / 74
Plotting the results
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 26 / 74
Table of contents
1 Perceptron
2 Linear Regression
3 Adaptive Linear Neuron (Adaline)
4 Logistic Regression
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 27 / 74
Figure: The general concept of Linear Regression
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 28 / 74
Minimizing cost functions with gradient descent
Cost function:
1 X (i)
J(w ) = (y − φ(z (i) ))2
2
i
Update the weights:
w := w + ∆w
∆w = −η∇J(w )
∂J X (i)
=− (y (i) − φ(z (i) ))xj
∂wj
i
∂J X (i)
∆wj = −η =η (y (i) − φ(z (i) ))xj
∂wj
i
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 29 / 74
Minimizing cost functions with gradient descent
(
wj + η ∗ X T .dot((y − φ(z)) j ∈ [1, . . . , n]
wj =
wj + η ∗ sum(y − φ(z)) j =0
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 30 / 74
Pseudocode of Training process
Algorithm 2 Gradient Descent
1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, ŷ
4: Updates the weights
5: end while
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 31 / 74
Components
Hyperparameters Parameters Methods
eta w fit(X , y )
max iter costs predict(X )
random state net input(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 32 / 74
Implement (code from scratch)
class LinearRegression GD:
def init (self, eta = 0.001, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]
def net input(self, X):
return np.dot(X, self.w[1:]) + self.w[0]
def predict(self, X):
return self.net input(X)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 33 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range (self.max iter):
errors = y - self.predict(X)
self.w[1:] += self.eta * X.T.dot(error)
self.w[0] += self.eta * error.sum()
cost = (error**2).sum() / 2
self.costs.append(cost)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 34 / 74
Implement (library)
Stochastic Gradient Descent
from sklearn.linear model import SGDRegressor
Hyperparameters Parameters Methods
eta0 intercept fit(X, y)
max iter coef predict(X)
random state
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 35 / 74
Implement (library)
Normal Equation
from sklearn.linear model import LinearRegression
Parameters Methods
intercept fit(X, y)
coef predict(X)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 36 / 74
Differences
Gradient Descent
w := w + ∆w
∆w = η i (y (i) − φ(z (i) )x i
P
Stochastic Gradient Descent
w := w + ∆w
∆w = η(y (i) − φ(z (i) )x i
Normal Equation
w = (X T X )−1 X T y
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 37 / 74
Practice
Using ’housing.csv’ dataset
How can we use the ’average number of rooms’ (RM) to estimate the
’price’ of houses (MEDV)?
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 38 / 74
Plotting data
>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 39 / 74
Practice
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 40 / 74
Practice
Gradient Descent
>> reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=1)
reg GD.fit(X train, y train)
Stochastic Gradient Descent
>> reg SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=1, l1 ratio=0, tol=None, learning rate=’constant’)
reg SGD.fit(X train, y train)
Normal Equation
>> reg NE = LinearRegression()
reg NE.fit(X train, y train)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 41 / 74
Plotting the cost
>> plt.plot(range(1, len(reg GD.costs) + 1), reg GD.costs)
plt.xlabel(’Epochs’)
plt.ylabel(’Cost’)
plt.title(’Gradient Descent’)
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 42 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 43 / 74
Practice
>> w GD = reg GD.w
w GD
>> [0.00767139 0.64623542]
>> w SGD = np.append(reg SGD.intercept , reg SGD.coef )
w SGD
>> [0.00783841 0.64551218]
>> w NE = np.append(reg NE.intercept , reg NE.coef )
w NE
>> [0.00773059 0.64638912]
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 44 / 74
Plotting the results
>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)
plt.plot(X train, reg GD.predict(X train), color=’red’, lw=10,
label=’Gradient Descent’)
plt.plot(X train, reg SGD.predict(X train), color=’blue’, lw=6,
label=’Stochastic Gradient Descent’)
plt.plot(X train, reg NE.predict(X train), color=’black’, lw=2,
label=’Normal Equation’)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.legend()
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 45 / 74
Plotting the results
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 46 / 74
Practice
>> y pred 1 = reg GD.predict(X test)
>> y pred 2 = reg SGD.predict(X test)
>> y pred 3 = reg NE.predict(X test)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 47 / 74
Performance Evaluation
>> from sklearn.metrics import mean absolute error as MAE
from sklearn.metrics import mean squared error as MSE
from sklearn.metrics import r2 score as R2
Mean Absolute Error
>> print(’MAE of GD:’, round(MAE(y test, y pred 1), 6))
print(’MAE of SGD:’, round(MAE(y test, y pred 2), 6))
print(’MAE of NE:’, round(MAE(y test, y pred 3), 6))
Mean Squared Error
>> print(’MSE of GD:’, round(MSE(y test, y pred 1), 6))
print(’MSE of SGD:’, round(MSE(y test, y pred 2), 6))
print(’MSE of NE:’, round(MSE(y test, y pred 3), 6))
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 48 / 74
R 2 score
>> print(’R2 of GD:’, round(R2(y test, y pred 1), 6))
print(’R2 of SGD:’, round(R2(y test, y pred 2), 6))
print(’R2 of NE:’, round(R2(y test, y pred 3), 6))
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 49 / 74
Learning rate too large
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 50 / 74
Polynominal Regression
Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]
>> X = np.array([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,
480.0, 586.0])[:, np.newaxis]
y = np.array([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 51 / 74
Plotting data
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 52 / 74
Polynominal Regression
>> from sklearn.linear model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 53 / 74
Polynominal Regression
Syntax
from sklearn.preprocessing import PolynomialFeatures
>> from sklearn.preprocessing import PolynomialFeatures
pr = LinearRegression()
quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X)
pr.fit(X quad, y)
>> X fit = np.arange(250, 600, 10)[:, np.newaxis]
>> y fit linear = lr.predict(X fit)
y fit quad = pr.predict(quadratic.fit transform(X fit))
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 54 / 74
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.plot(X fit, y fit linear, label=’Linear fit’, linestyle=’–’)
plt.plot(X fit, y fit quad, label=’Quadratic fit’)
plt.legend()
plt.tight layout()
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 55 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 56 / 74
Practice
Linear regression
>> lr = LR()
lr.fit(X train, y train)
Polynominal regression (quadratic)
>> quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X train)
pr quad = LR() pr quad = pr quad.fit(X quad, y train)
Polynominal regression (cubic)
>> cubic = PolynomialFeatures(degree=3)
X cubic = cubic.fit transform(X train)
pr cubic = LR() pr cubic = pr cubic.fit(X cubic, y train)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 57 / 74
>> X fit = np.arange(X train.min(), X train.max(), 0.1)[:, np.newaxis]
>> y linear fit = lr.predict(X fit)
y quad fit = pr quad.predict(quadratic.fit transform(X fit))
y cubic fit = pr cubic.predict(cubic.fit transform(X fit))
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 58 / 74
Plotting the results
>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)
plt.plot(X fit, y lin fit, label=’Linear (d=1)’, color=’blue’, lw=2,
linestyle=’:’)
plt.plot(X fit, y quad fit, label=’Quadratic (d=2)’, color=’red’, lw=2,
linestyle=’-’)
plt.plot(X fit, y cubic fit, label=’Cubic (d=3)’, color=’green’,
lw=2,linestyle=’–’)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.legend()
plt.show()
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 59 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 60 / 74
Table of contents
1 Perceptron
2 Linear Regression
3 Adaptive Linear Neuron (Adaline)
4 Logistic Regression
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 61 / 74
Figure: Differences between Perceptron and Adaline
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 62 / 74
Training process
Algorithm 3 Pseudocode for the training process
1: Initialize the weights, w
2: while stopping criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 63 / 74
Updating the weights
w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label
Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 64 / 74
Components
Hyperparameters
eta
max iter
random state
Parameters
w
costs
Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 65 / 74
Implement (code from scratch)
class Adaline:
def init (self, eta = 0.01, max iter = 50, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]
def activation(self, X):
return self.net input(X)
def predict(self, X):
return np.where(self.activation(X) ≥ 0.0, 1, -1)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 66 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iter in range (self.max iter):
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
cost = 0
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self.eta * error
cost += error**2
cost /= 2
self.costs.append(cost)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 67 / 74
Table of contents
1 Perceptron
2 Linear Regression
3 Adaptive Linear Neuron (Adaline)
4 Logistic Regression
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 68 / 74
Figure: Differences between Adaline and Logistic regression
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 69 / 74
Components
Hyperparameters
eta
max iter
random state
Parameters
w
costs
Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 70 / 74
Implement (code from scratch)
class LogisticRegression:
def init (self, eta = 0.01, max iter = 50, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]
def activation(self, X):
return 1. / (1. + np.exp(-np.clip(self.net input(X), -250, 250)))
def predict(self, X):
return np.where(self.activation(X) ≥ 0.5, 1, 0)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 71 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iter in range (self.max iter):
output = self.activation(X)
errors = y - output
self.w[1:] += self.eta * X.T.dot(errors)
self.w[0] += self.eta * errors.sum()
cost = (-y.dot(np.log(output)) - ((1 - y).dot(np.log(1 - output))))
self.costs.append(cost)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 72 / 74
Practice
>> clf LR = LogisticRegression(eta=0.01, max iter=20,
random state=1)
clf LR.fit SGD(X train, y train)
>> y pred = clf LR.predict(X test)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 73 / 74
Implement (library)
Syntax (import)
from sklearn.linear model import LogisticRegression
Examples
>> from sklearn.linear model import LogisticRegression as
LogisticRegression
>> clf LR lib = LogisticRegression (random state=1)
clf LR lib.fit(X train, y train)
>> y pred lib1 = clf LR lib.predict(X test)
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 74 / 74