Applied Linear Regression
Applied Linear Regression
References: “Applied Linear Regression in Matlab.” Accessed: May 17, 2023. [Online]. Available:
https://courses.engr.illinois.edu/bioe298b/sp2018/Course%20Notes%20%28Text%29/
Applied_Linear_Regression.pdf
“Chapter 2 Polynomial Interpolation §2.1 The Vandermonde Approach §2.2 The Newton
Approach §2.3 Properties §2.4 Special Topics.” Accessed: May 17, 2023. [Online].
Available: https://www.cs.cornell.edu/courses/cs4210/2015fa/CVLBook/CVL2.PDF
The variable x is the independent variable, with values based on acidic pH levels of 1-pH level
increment between 1 to 6. The variable y is the dependent variable, with values based on the
amount of Fe2O3 dissolved in corresponding acidic ionic liquids. After inputting the amounts
needed, it will automatically be tabulated.
Subsequent to the establishment of the data set, a scatter visualization will be done. This will
identify if the variables x and y will have a positive or negative relationship.
scatter(x,y)
xlabel('x')
ylabel('y')
A simple indication of a positive relationship shows the data points inclining upwards to the
right. For example, the amount of Fe2O3 dissolved are increasingly 112, 119, 125, 133, 141 and 156
ppm, which results to the tabulation and visualization:
Clearly there is some positive relationship between x and y. Let's begin by fitting the simplest
linear model
In this case, starting with one parameter to estimate ( ), and the design matrix has only a
single column with the 6 x values.
X = [x];
The general linear model can be solved for by finding the pseudoinverse of the
design matrix . Then our estimate for can be found via matrix multiplication. Solve for the
parameter estimates by pseudoinversion ( ), or, equivalently, using the backslash operator.
b = X \ y
b = 31.8462
Let's plot our model on the same plot as the original data. First, plot the data.
scatter(x,y)
Then, tell Matlab to "hold on". This prevents Matlab from making a new figure for subsequent plots
(until we tell it to "hold off").
hold on
Now plot a line with the model. The easiest way to multiply the design matrix by the parameter estimates.
plot(x, X*b)
title('y = \beta_1 x', 'FontSize',18)
hold off
This does not seem to be a great fit. Clearly, an intercept term is needed,
X = [ones(size(x)) x];
The ones function is used to create a column of ones. The ones function accepts either two
values giving the dimensions (e.g. ones(3,4)) or the size of a similar matrix (ones(size(x))).
b = X \ y
b=
101.6000
8.4000
The vector b now has two entries. The first is our estimate for , the second is the estimate for .
scatter(x,y)
hold on
plot(x, X*b)
title('y = \beta_0 + \beta_1 x', 'FontSize',18)
hold off
To plot the line representing our model, the point, b(1) + b(2).*x can be manually constructed.
However, let’s notice two things. First, Matlab indexes vectors starting at one, so b(1) is actually the
estimate for , not for . Second, Matlab distinguishes between matrix multiplication (*) and
element- by-element multiplication (.*).
This is still not the best fit. Based on the slight upward curve in the data, a quadratic model
may be appropriate.
Topic 2: Quadratic Regression
Quadratic polynomials can still be fit since all polynomials are linear with respect to the
parameters. To fit a quadratic, a column will be added to the design matrix that contains the square
of each element in the vector x. (The element-by-element exponentiation operator .^ will be used
here; matrix exponentiation is completely different.).
X = [ones(size(x)) b x= X
x.^2];
\ y
b=
109.6000
2.4000
0.8571
The vector of estimates now has three entries corresponding to , , and . Let's plot the quadratic
model.
scatter(x,y)
hold on
plot(x, X*b)
title('y = \beta_0 + \beta_1 x + \beta_2 x^2', 'FontSize',18) hold off
That looks like a much better fit. These data appear to have a quadratic relationship.
Topic 3: Newton’s interpolating Polynomials
Newton’s polynomial interpolation is another popular way to fit exactly for a set of data points.
The interpolating polynomial is written in the form,
y=β 0 + β 1 ( x − x 0 ) + β 2 ( x − x0 ) ( x − x 1 ) + β 3 ( x − x 0 ) ( x − x1 ) ( x − x 2 )
Similar to quadratic, polynomials can still be fit since all of them are linear with respect to the
parameters. To fit an interpolation, a column will be added to the design matrix that contains the
interpolation of each element in the vector x. (An element- by-element multiplication operator .* will
be used here).
b=
11.9127
7.2381
-0.5595
0.3148
The vector of estimates now has four entries corresponding to , , and β 3. Let's plot the model.
scatter(x,y)
xlabel('acidic pH Level')
ylabel('Amount of Fe2O3 dissolved (ppm)')
hold on
plot(x,X*b)
title('y= \beta_0 + \beta_1 (x-x_0) + \beta_2 (x-x_0)(x-x_1) + \beta_3 (x-x_0)(x-
x_1)(x-x_2)','FontSize',12)
hold off
As observed, the Newton’s polynomial goes through all the data points and fit the data.