0% found this document useful (0 votes)
20 views13 pages

Chapter8 Regression Exercises v2 20230112

The document provides exercises on linear and logistic regression, including parameter estimation using normal equations and predicting values based on a regression line. It details a practical application using a dataset to predict house prices in Ames, Iowa, and outlines steps for training a logistic regression model for wine classification. The document serves as a guide for implementing regression techniques in data science.

Uploaded by

maha.kandadai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Chapter8 Regression Exercises v2 20230112

The document provides exercises on linear and logistic regression, including parameter estimation using normal equations and predicting values based on a regression line. It details a practical application using a dataset to predict house prices in Ames, Iowa, and outlines steps for training a logistic regression model for wine classification. The document serves as a guide for implementing regression techniques in data science.

Uploaded by

maha.kandadai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Regressions:

Exercise
Exercise 1
Linear Regression

Guide to Intelligent Data Science Second Edition, 2020 2


Linear Regression

- The following dataset is going to be examined

x 1 2 6 4 5

y 1 2 4 3 3

1. Calculate the parameters of the linear regression using the normal


equations given in the lecture. Don’t forget the solution approach.
2. Use the regression line to predict (or calculate) the values of the
function for all .

Guide to Intelligent Data Science Second Edition, 2020 3


1. Parameter Estimation

Calculate the parameters of the linear regression using the normal equations
given in the lecture.

Data Normal equations

1 2 6 4 5 18
1 2 4 3 3 13
1 4 36 16 25 82
1 4 24 12 15 56

Guide to Intelligent Data Science Second Edition, 2020 4


1. Parameter Estimation

Calculate the parameters of the linear regression using the normal equations
given in the lecture.

56 − 82 𝑏
5 𝑎 +18 𝑏=13 𝑎=
18
18 𝑎 +82 𝑏=56
18 𝑎 +82 𝑏=56

56 82 𝑏 (324 − 410)𝑏 234 −280


5 −5 +18 𝑏=13 =
18 18 18 18

46
86 𝑏=46 𝑏=
86
=0.5349

Guide to Intelligent Data Science Second Edition, 2020 5


1. Parameter Estimation

Calculate the parameters of the linear regression using the normal equations
given in the lecture.

56 − 82 𝑏 56 82 23 2408 −1886 522 29


𝑎= 𝑎= − = = = =0.6744
18 18 18 43 774 774 43

Regression line

29 23
𝑦= + 𝑥=0.6744+ 0.5349 𝑥
43 43

Guide to Intelligent Data Science Second Edition, 2020 6


2. Prediction

Use the regression line to predict (or calculate) the values of the function for all .

Prediction at and

Guide to Intelligent Data Science Second Edition, 2020 7


Exercise 2
Practice with KNIME

Guide to Intelligent Data Science Second Edition, 2020 8


1. Linear Regression

Predict the price of an house in Ames (Iowa, USA) given a number of features
(size, neighborhood, heating...) using Linear Regression.

1. Read dataset AmesHousing_simple.csv. It 4. Add Regression Predictor


contains information about houses sold in - Predict test set (remaining 30% rows) by simply connecting
Ames (only numerical values) as well as the the remaining unconnected output ports
SalePrice.
5. Remove rows with missing prediction
2. Add Partitioning node to File Reader output
- Top port should have 70 % of the rows 6. Add Numeric Scorer to Regression
- Draw randomly such rows Predictor Output
- Reference Column: the column you learned
3. Add Linear Regression Learner to top - Predicted Column: the new column created by the predictor
output port of Partitioning node node
- Select price column to be learned
- Execute the node and open its scatter plot view. Which
column is most correlated to the price (column selection
tab)?

Guide to Intelligent Data Science Second Edition, 2020 9


1. Linear Regression

Predict the price of an house in Ames (Iowa, USA) given a number of features
(size, neighborhood, heating...) using Linear Regression.

Guide to Intelligent Data Science Second Edition, 2020 10


2. Logistic Regression

Train a Logistic Regression model that predicts whether a wine is red or white.

1. Read data wine.csv (Hint: drag and drop)


2. Use the Normalizer (PMML) node to z normalize all numerical
columns
3. Partition the dataset into a training set (80%) and a test set (20%).
a. Apply stratified sampling on the color column.

4. Train a logistic regression model on the training set, and apply the
model to the test set
5. Use the Scorer node to evaluate the accuracy of the model

Guide to Intelligent Data Science Second Edition, 2020 11


2. Logistic Regression

Train a Logistic Regression model that predicts whether a wine is red or white.

Guide to Intelligent Data Science Second Edition, 2020 12


Thank you
For any questions please contact: [email protected]

Guide to Intelligent Data Science Second Edition, 2020 13

You might also like