Machine learning with pytho
ESCP-Paris 2021
Slides (or images, contents) adapted from D. Dligach, C. Müller, E.
Duchesnay, M.Defferrard, E. Eaton, S. Sankararaman and many others (who
,
made their course materials freely available online).
Anh-Phuong TA
Chief data scientist at Le Figaro CCM-Benchmark group
[email protected]
1
n
Exercise
Testing kNN with boston dataset
A little bit …
Today’s lecture
• KNN (last time)
• Linear regression
• Model evaluation
• How to avoid over tting
fi
KNN
from sklearn.neighbors import KNeighborsClassi er
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
iris_dataset = load_iris()
X = iris_dataset.data
y = iris_dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=0)
knn = KNeighborsClassi er(n_neighbors=1)
knn. t(X_train, y_train)
y_pred = knn.predict(X_test)
print("Score: {:.2f}".format(np.mean(y_pred == y_test)))
print("Score: {:.2f}".format(knn.score(X_test, y_test)))
fi
fi
fi
Over tting & Under tting
fi
fi
Over tting & Under tting
fi
fi
Avoid over tting
• Reduce the number of features manually or do feature
selection
• Do a model selection.
• Use regularization (keep the features but reduce their
importance by setting small parameter values)
• Do a cross-validation to estimate the test error
.
fi
.
cross-validation
pro: more stable, more dat
con: slower
a
Cross-validation
Cross-validation
Cross-validation
GridSearchCV
GridSearchCV results
CV strategies
Strati ed: Ensure relative class frequencies in each fold re ect
relative class frequencies on the whole dataset.
fi
fl
Repeated KFold and LeaveOneOut
• LeaveOneOut : KFold(n_folds=n_samples) High
variance, takes a long tim
• Better: RepeatedKFold. Apply KFold or
Strati edKFold multiple times with shuf ed data.
Reduces variance!
fi
e
fl
Strati edShuf eSplit
fi
fl
Using Cross-Validation Generators
cross_validate Function
Feature engineering: scaling
Standard Scaler Example
from sklearn.preprocessing import StandardScaler, RobustScaler, MinMaxScaler, Normalize
Standard Scaler + pipeline
Pipeline
Categorical variables (credit: C.
Muller)
Ordinal encoding
One-Hot (Dummy) Encoding
One-Hot (Dummy) Encoding
One-Hot (Dummy) Encoding
Categorical columns with Pandas
Or you can use:
from sklearn.preprocessing import OneHotEncoder
Dealing with Missing Values
Among others, Imputation Methods
Mean / Media
kN
Regression model
Matrix factorization
•
•
•
•
N
Baseline: Dropping Columns
nan_columns = np.any(np.isnan(X_train), axis=0
X_drop_columns = X_train[:, ~nan_columns
And then, use X_drop_columns to train your model
Imputation: Median, Mean
Imputation: Median, Mean
Linear regression
• If your data:
Good to use linear regression, and
our goal is to nd:
Note that: if there are more than one variables
=> multiple linear regression
fi
Linear regression
• If your data:
Humm!!!
Or do some data transformations rst
fi
Lost/Cost function
It compares all the predictions against their actual
values and provides us with a score value
Training and Testing
=> f(x_i) is used interchangeably with h(x_i)
Training and Testing: linear
regression
=> z is used interchangeably with phi
Linear regression: loss
functions
Lost/Cost function
ML algorithms often de ne an objective functio
This function is optimized during learning
It is often a cost function we want to minimiz
Function J below learns weights as the sum of squared errors (SSE)
fi
e
Learning as optimization
The fundamental dif culty of machine learning
Picture was taken from some ML courses at Stanford
fi
How to optimize?
What is the gradient?
Gradient descent?
Gradient descent:
un intuition
Gradient descent:
un intuition
Gradient descent:
un intuition
Gradient descent
Gradient computation
We update all weights simultaneously:
Partial derivatives
Whiteboard!!!!
What should step size be
Stochastic gradient descent (SGD)
Stochastic gradient descent (SGD)
Avoid over tting
• Reduce the number of features manually or do feature
selection
• Do a model selection.
• Use regularization (keep the features but reduce their
importance by setting small parameter values)
• Do a cross-validation to estimate the test error
.
fi
.
Avoid over tting:
Regularization
Idea: regularized Empirical
Risk Minimizatio
fi
n
Ridge Regression (L2)
lasso (least absolute shrinkage
and selection operator) (L1)
Understanding L1 and L2
Penalties
Understanding L1 and L2
Penalties
Example
from sklearn.linear_model import Ridge, LinearRegression, Lass
from sklearn.datasets import load_bosto
boston = load_boston(
from sklearn.model_selection import train_test_spli
X, y = boston.data, boston.targe
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42
score = cross_val_score(LinearRegression(), X_train, y_train, cv=10
np.mean(score
from sklearn.model_selection import GridSearchC
param_grid = {'alpha': np.logspace(-3, 3, 13)
print(param_grid
grid = GridSearchCV(Ridge(), param_grid, cv=10, return_train_score=True,
iid=False
grid. t(X_train, y_train
print(grid.best_params_
print(grid.best_score_)
fi
)
L1 + L2 = Elastic Net
In sklearn
Grid-searching ElasticNet
Assignment 3