Machine learning (1)
Machine learning (1)
DATA ENGINEER - the job data engineer is to gather the information from different places and
place them in a common env
1gb - 800 rs
15 gb - 800 rs
DATA SCIENTIST -
Apply mathematical stats to the data and try to identify some relation between them.
MACHINE LEARNING -
Set of mathematical equation / formula , so when applied on a given set of input it can predict
the output
Early 19’s
Linear Regression -
Y = c + mx
When you have more then one input , one output - multiple linear regression
Y = c + m1 x1 + m2 x2 + ….. Mn xn
Y = B0 + B1 X
Y = B0 + B1X
Y = 3.3518 + 0.0527 X
Xxxx
model building :
imagine : you have 100 records , split 70 records for training and remaining 30 records for
testing
Using above farmula , we are going to calculate B0 and B1 , and predict the output
ERROR :
Sensitivity - out of positive outcome , how much the model predicted correctly
specificity- out of negative outcome , how much the model predicted correctly
in training it was able to perform very good, but in testing if the model shows more variance / more error
the it is overfitiing
1, correlated
2, Removing column that has no influence on any other columns like Employee Id or Employee Name
3, derived metrics
4, all the data analytical stuffs u learned , remove outliers
hypothesis how it is usefull in column selecttion -
the null hypothesis corresponding to each p-value is that the corresponding independent variable does
not impact the dependent variable.
The alternate hypothesis is that the corresponding independent variable impacts the dependent variable
Now, p-value indicates the probability that the null hypothesis is true. Therefore, a low p-value, i.e. less
than 0.05, indicates that you can reject the null hypothesis
p > 0.05 - accept null hypothesis - independent variable does not impact the dependent variable
p < 0.05 - reject null hypothesis - independent variable impacts the dependent variable
5, normalizing
5 , regularization model (regression to be discussed)
6, vif - Variance Inflation Factor
Y = b0 + b1xn + b2 x2 + b3 x3 ….. Bn xn
If my
Vif < 2 - very very good fit , you can have the column
Vif < 5 - very good - you can have the column
Vif < 10 - okay - you can have the column
Out = rfe.fit(xtrain,y_train)
Rfe.support_
rfe.raking_
Lasso - L1 Regularization .
1, calc b0 - initialize b0
2, calc b1 - initialize b1
3, y_pred = b0 + b1x
4, error = 1/n (y - y_pred) ^2 + penalty term
Penalty term - alpha * modulus of sum of all co-efficient
Alpha - random number given by user (eg : 1.0 , 0.1, 0.001)
5, gradient descent :
B0_new = b0_old - (learning_rate * d/db0)
B1_new = b1_old - (learning_rate * d/db1)
Ridge - L2 Regularization
1, calc b0 - initialize b0
2, calc b1 - initialize b1
3, y_pred = b0 + b1x
4, error = 1/n (y - y_pred) ^2 + penalty term
Penalty term - alpha * (sum of (square of all co-efficient))
Alpha - random number given by user (eg : 1.0 , 0.1, 0.001)
5, gradient descent :
B0_new = b0_old - (learning_rate * d/db0)
B1_new = b1_old - (learning_rate * d/db1)
Note: In the process of reducing the error by adding penalty term , both lasso and ridge is reducing the
coefficient value
In the process of reducing the coefficient .
Lasso - will make some of the coefficient equal to zero(unfit for business use case - my perspective)
Ridge - will make some of the coefficient closer to zero
Polynomial regression 👍
In lasso or ridge regression , i initialized bo and b1 with zero then i did gradient descent and reduced the
error
In lasso and ridge we added penalty term to error formula
In error : we don’t have any penalty term , penalty term is for only ridge and lasso
x y
2 3
3 4
4 2
5 5
6 7