0% found this document useful (0 votes)

254 views

Linear Regression Analysis. Statistics 2 Notes

1. The document discusses using linear regression analysis to forecast future capacity needs based on demand forecasts. It provides an example of using a company's past sales data and local payroll amounts to develop a regression model to predict future sales based on predicted payroll. 2. The example shows data for a construction company's sales and local payroll over 6 years in a table. A scatter plot of this data indicates a relationship between higher payroll and higher sales. 3. The document then explains how to calculate the slope and intercept of the regression line that best fits the data by minimizing the sum of squared errors between the actual and predicted values. It works through the calculations for the example company's data to develop a regression equation to predict

Uploaded by

NAJJEMBA WINNIFRED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

254 views

Linear Regression Analysis. Statistics 2 Notes

Uploaded by

NAJJEMBA WINNIFRED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

BUSITEMA UNIVERSITY

Faculty of Engineering & Technology

Department of Agricultural Mechanization & Irrigation Engineering

Course: Engineering Statistics
Programme: AMI Year IV

Definition: Linear Regression Analysis is a forecasting procedure that uses the least
squares approach on one or more Independent Variables to develop a forecasting model for
a capacity requirements for future business.

Synopsis
Let us begin to look at this chapter with an objective to determine the normal capacity
required by a company for the medium to long term. The decision will be based on demand
forecast. When forecasting future demand for capacity planning purposes, fluctuations in
demand will be expected –but to some extent ignored. The aim will be to get a general
picture of future demand. Take as an example the forecast demand pattern shown in figure
1.1 below

Figure 1.1: Demand Forecast

The average demand is stable at X, but some variations about this level are expected. In this
case, when planning future capacity, a capacity level of X could be established, with the
expectation that some means would be available to deal with the short-term variations.
However, if it was known in advance that this was not possible, then a higher level of capacity,
e.g. Y might be fixed. So knowing whether and how demand fluctuations might be dealt with
will influence the normal capacity, which is fixed given the forecast of future demand.

To develop forecasting techniques based on associative predictions or economic indicators

can be a lengthy business which usually involves an extensive examination of the statistical
relationship of past sales data and various likely indices. The closeness of the statistical
association of variables can be measured by calculation of a correlation coefficient. Once
indices bearing a close correlation with historical data have been found, the relationship can

1|Page
be used for forecasting. The relationship is expressed mathematically by means of a
Regression Equation

Let’s now consider Regression Analysis as a Chapter

1.0 Introduction
1.1.0 Regression Analysis
Linear Regression analysis is a very useful tool for today’s manager. Two purposes of
regression analysis are to understand the relationship between variables and to predict
the value of one based on another (such as cost of advertising and sales). Regression
has been used to model such things as relationship between level of education and
income, the price of a house and its square footage (Centimetres).

In any regression model, the variable to be predicted is called the dependent variable
or response variable. The value of this is said to be dependent upon the value of an
independent variable, which is sometimes called an explanatory variable or a
predictor variable.

1.1.2 Investigation of relationship between variables.

To investigate the relationship between variables, it is helpful to look at the data
available to you. The available data can be plotted on a Chart/ Graph for interpretation.
Such a graph is often called a scatter diagram or a scatter plot. Normally the
independent variable is plotted on the horizontal (X-axis) and the dependent variable
is plotted on the vertical (Y-axis).

Example 1: Tororo Construction Company (TCC) renovates old homes in Podut

village on Iganga-Malaba highway. Over time, the company has found that its year
Income volume of renovation work is dependent on the Podut area payroll of the
residents. The figures for Tororo Construction Company’s revenue and the amount of
money earned by wage earners who live in Podut for the past six (6) years are presented
in table 1.1. Economists have predicted the local area income (payroll) to be $600
million in the coming year, and Tororo Construction Company (TCC) wants to plan
accordingly.

Table 1.1:b Tororo Construction Company Sales and Local Incomes (Payroll)
TCC Sales Local Payroll
($100,000s) ($100,000,000s)
6 3
8 4
9 6
5 4
4.5 2
9.5 5

Table 1.1 data for TCC used to draw Scatter Diagram in Fig.1.1 below.

2|Page
This graph indicates that higher values for the local income (payroll) seem to result in
higher sales for the company. This is not a perfect relationship because not all the points
lie on a straight line, but there is a relationship. A line has been drawn through the data
to help show the relationship that exists between the income (payroll) and sales. The
points do not all lie on the line, so there would be some error involved if we tried to
predict sales based on payroll using this or any other line. Regression analysis provides
the answer to this question.

Figure 1.1: Scatter Diagram of Tororo Construction Company Data

1.1.3 Simple Linear Regression

In any regression model there is an implicit assumption (which can be tested) that a
relationship exists between the variables. There is also some random error that cannot
be predicted. The underlying simple linear regression model is given by this formula:
𝒀 = 𝛽! + 𝛽" 𝑋 + 𝝐
Where:
Y = dependent variable (or Response Variable)
X = independent variable (or Predictor variable or explanatory variable)
β0 = intercept (value of Y when X = 0)
β1 = slope of regression line
є = random error

3|Page
1.1.4 Estimation of the Slope and Intercept
Estimate of the slope and intercept are found from sample data. The true values for
the intercept and slope are not known, and therefore; they are estimated using sample
data. The regression equation based on sample data is given by equation
Ŷ = 𝑏! + 𝑏" 𝑋
Where:
Ŷ = predicted value of Y
b0 = estimate of β0, based on sample results
b1 = estimate of β1, based on sample results.

Therefore, in the case of Tororo Construction Company, we are trying to predict the
sales, so the dependent variable (Y) would be sales. The variable we use to help
predict sales is the Podut area Incomes (Payroll), so this is the independent variable
(X). Although any number of lines can be drawn through these points to show a
relationship between X and Y in figure 1.1 above, the line that will be chosen is the one
that in some way minimises the errors. Error is defined as

Error = (Actual value) – (Predicted Value)

∈ = 𝒀 − Ŷ
2.0 Minimisation of Square Errors
Since errors may be positive or negative, the average error could be zero even though
there are extremely large errors –both positive and negative. To eliminate the difficulty
of negative errors cancelling positive errors, the errors can be squared.

Thus the best regression line is defined as the one with the minimum sum of the squared
errors. For this reason, regression analysis is sometimes called Least-squares
regression.

2.1.0 Statisticians
Statisticians have developed formulas that we can use to find the equation of a straight
line that would minimise the sum of the squared errors. The simple linear regression
equation is
Ŷ = 𝑏! + 𝑏" 𝑋

The following formulas can be used to compute the intercept and the slope.
= ∑𝑋
𝑛
= average (mean) of X values

= ∑𝑌
𝑛
= average (mean) of Y values

4|Page
Σ(𝑋 − )(𝑌 − )
𝑏1 =
∑(𝑋 − )#

𝑏% = 𝑌 − 𝑏&
Table 1.2: Regressions Calculations for Tororo Construction Company
Y X (X – )2 (X – )(Y – )
2
6 3 (3 – 4) = 1 (3 – 4)(6 – 7) = 1
2
8 4 (4 – 4) = 0 (4 – 4)(8 – 7) = 0
9 6 (6 – 4)2 = 4 (6 – 4)(9 – 7) = 4
5 4 (4 – 4)2 = 0 (4 – 4)(5 – 7) = 0
4.5 2 (2 – 4)2 = 4 (2 – 4)(4.5 – 7) = 5
9.5 5 (5 – 4)2 = 1 (5 – 4)(9.5 – 7) = 2.5
∑Y = 42 ∑X = 24 => ∑(X - )2 = 10 ∑(X - )(Y – ) =
= 42/6 =7 24/6 = 4 12.5

∑'(
= ∑𝑋
𝑛 = = 4
)

∑* ('
= = = 7
+ )

Σ(𝑋 − )(𝑌 − ) 12.5

𝑏" = = = 1.25
∑(𝑋 − )# 10

𝑏! = − 𝑏"

Substituting

𝑏! = 7 − (1.25)(4) = 2

The estimated regression equation therefore is

Ŷ = 2 + 1.25𝑋
Or
Sales = 2 + 1.25(Payroll)

If the payroll next year is $600 million (X = 6), then the predicted value would be

Ŷ = 2 + 1.25(6) = 9.5 Or $950,000.00

5|Page
2.1.1 Deduction
One of the purposes of regression is to understand the relationship among variables.
This model tells us that for each $100 million (represented by X) increase in the payroll,
we would expect the sales to increase by $125,000 since b1 = 1.25(100,000s). this model
helps Tororo Construction Company see how the local economy and company sales are
related.

Possible Questions involved in measuring the fit of the Regression Model

A regression equation can be developed for any variables X and Y, even random
numbers. We certainly would not have any confidence in the ability of one random
number to predict the value of another random number. This may lead to argument
that:
How do we know that the model is actually helpful in predicting Y based on X?
Should we have confidence in this model?
Does the model provide better predictions (smaller errors) than simply using the
average of the Y values?

3.0 Deviations May be Positive or Negative.

In the Tororo Construction Company case, sales figures (Y) varied from a low 4.5 to a
high of 9.5, and the mean was 7. If each sales value is compared with the mean, we see
how far they deviate from the mean and we could compute a measure of the total
variability in sales. Because Y is sometimes higher and sometimes lower than the
mean, there may be both Positive and Negative Deviations.

3.1.1 The SST Measures the Total Variability in Y about the Mean
Logically summing-up these values would be misleading because the negatives would
cancel out the positives, making it appear that the numbers are closer to the mean than
they actually are. To prevent this problem, we will use the Sum of the Squares Total
(SST) to measure the total variability in Y.
𝑆𝑆𝑇 = ∑(𝑌 − )!

6|Page
3.1.2 The SSE measures the variability in Y about the regression line
If we did not use X to predict Y, we would simply use the mean of Y as the prediction,
and the SST would measure the accuracy of our predictions. However, a regression
line may be used to predict the value of Y, and while there are still errors involved, the
sum of these squared errors will be less than the total sum, of squares just compared.
The Sum of the Squares Error (SSE) is
𝑆𝑆𝐸 = ∑𝑒 ! = ∑(𝑌 − Ŷ)!

Table 1.3: Sum of Squares for Tororo Construction Company

Y X (Y - )𝟐 Ŷ (Y - Ŷ)𝟐 (Ŷ − )𝟐

6 3 (6 – 7)! = 1 2 + 1.25(3) = 5.75 0.0625 1.563

8 4 (8 – 7)! = 1 2 + 1.25(4) = 7.00 1 0

9 6 (9 - 7)! = 4 2 + 1.25(6) = 9.50 0.25 6.25

5 4 (5 - 7)! = 4 2 + 1.25(4) = 7.00 4 0

4.5 2 (4.5 - 7)! = 6.25 2 + 1.25(2) = 4.50 0 6.25

9.5 5 (9.5 - 7)! = 6.25 2 + 1.25(5) = 8.25 1.5625 1.563

∑(Y- )𝟐 = 𝟐𝟐. 𝟓 ∑(Y – Ŷ)𝟐 = 𝟔. 𝟖𝟕𝟓 ∑( Ŷ − )𝟐 = 15.625
=7 SST = 22.5 SSE = 6.875 SSR = 15.625

Table 1.2 provides the calculations for the Tororo Construction example. The mean (
= 7) is compared to each value and we get
SST = 22.5
The prediction (Ŷ) for each observation is computed and compared to the actual value.
These results in
SSE = 6.875
The SSE is much lower than the SST. Using the regression line has reduced the
variability in the sum of squares by 22.5 – 6.875 = 15.625. This is called the Sum of
Squares due to Regression (SSR) and indicates how much of the total variability in Y
is explained by the regression model. Mathematically, this can be calculated as
𝑺𝑺𝑹 = (𝑆𝑆𝑇 − SSE)𝟐
Table 1.3 indicates that SSR = 15.625

There is a very important relationship between the sums of squares that we have
computed:-

7|Page
(Sum of squares total) = (Sum of squares due to regression) + (Sum of squares Error)

SST = SSR + SSE

Figure 1.2: Deviations from the Regression Line and from the Mean

The figure above 1.2 displays the data for Tororo Construction Company. The
regression line is shown, as a line representing the mean of the Y values. The errors
used in computing the sums of squares are shown on this graph. Notice how the sample
points are closer to the regression line than they are to the mean.

3.1.3 Coefficient of Determination (r2)

The SSR is sometimes called the explained variability in Y while the SSE is the
unexplained variability is Y. The proportion of the variability in Y that is explained
by the regression equation is called the coefficient of determination and is denoted by
r2. Thus
𝑆𝑆𝑅 𝑆𝑆𝐸
𝒓𝟐 =
= 1 −
𝑆𝑆𝑇 𝑆𝑆𝑇
2
Thus, r can be found using either the SSR or the SSE. For Tororo Construction
Company, we have

8|Page
15.625
𝒓𝟐 = = 0.6944
22.5
This means that about 69% of the variability in sales (Y) is explained by the regression
equation based on payroll (X)

If every point in the sample were on the regression line (meaning all errors are 0), then
100% of the variability in Y could be explained by the regression equation, so r2 = 1
and SSE = 0. The lowest possible value of r2 is 0. Indicating that X explains 0% of
the variability in Y. Thus, r2 can range from a low of 0 to a high of 1. In developing
regression equations, a good model will have an r2 value close to 1.

4.0 Correlation Coefficient

This measure expresses the degree or strength of the linear relationship. It is usually
expressed as r and can be any number between and including ±1 . Figure 1.3(a, b, c,
d) illustrates possible scatter diagrams for different values of r. The value of r is the
square root of r2. It is negative if the slope is negative, and is positive is the slope is
positive. Thus,
𝒓 = ±√𝑟 !
For the Tororo Construction Company example with r2 = 0.6944,

𝒓 = ±√0.6944 = 0.8333

We know it is positive because the slope is +1.25.

9|Page
Figure 1.3 (a, b, c, d): Illustrate possible scatter diagrams for different values of r

10 | P a g e
Using Computer Software (Excel) for Regression Analysis
Software such as QM for windows and Excel QM is often used for regression
calculations. We will rely on Excel for most of the calculations in the rest of
this chapter. When using Excel to develop a regression model, the input and
output for Excel 2007 and 2010 are the same.

The Tororo Construction Company example will be used to illustrate how to develop a
Regression model in Excel 2010. Go to the Data Analysis in the Excel Windows Menu.
When the Data Analysis window opens, scroll down to and highlight Regression and
click OK. (table 1.4)

Table 1.4: How to Access the regression Option in Excel 2007 or 2010

The Regression window will open, and you can input the X and Y ranges (values).
Check the Labels box because the cells with the variable name were included in the first
row of the X and Y ranges/ values. (table 1.5)

11 | P a g e
Table 1.5: Showing Data Input for Regression in Excel

To have the output presented on this page rather than on a new worksheet, select Output
Range and give a cell address for the start of the output. Click the OK button, and the
output appears in the output range specified.

Errors are also called Residuals

The sums of squares are shown in the column headed by SS. Another name for Error is
Residual. In Excel, the sum of squares error is shown as the sum of squares residual. The
values in this output are the same values shown in table 1.3 above (page 6)
Sum of squares regression = SSR = 15.625
Sum of squares error (Residual) = SSE = 6.8750
Sum of squares total = SST = 22.5
The coefficient of determination (r2) is shown to be 0.6944. The coefficient of correlation is
called Multiple R in the Excel output, and this is 0.8333

Table 1.6: Excel Output for the Tororo Construction Company example
12 | P a g e
Assumption of the Regression Model
If we can make certain assumptions about the errors in a regression model, we can
perform statistical tests to determine if the model is useful. The following assumptions
are made about the errors:
1. The errors are independent
2. The errors are normally distributed
3. The errors have a mean of zero
4. The errors have a constant variance (regardless of the value of X)

Plotting Errors
A Plot of the errors may highlight problems with the model. It is possible to check the
data to see if these assumptions are met. Often a plot of the residuals will highlight any
glaring violations of the assumptions. When the errors (residuals) are plotted against
the independent variable, the pattern should appear random.

Figure 1.4 series, present some typical error patterns, with figure 1.4A displaying a
pattern that is expected when the assumptions are met, thus; the model is appropriate.
The errors are random and no discernible pattern is present

13 | P a g e
Figure 1.4A: Pattern of Errors Indicating Randomness

Figure 1.4B demonstrates an error pattern in which the errors increase as X increases.
Violating the constant variance assumption.

Figure 1.4B: Pattern of Errors Indicating Non-constant Error Variance

Figure 1.4C shows errors consistently increasing at first, and then consistently
decreasing. A pattern such as this would indicate that the model is not linear and some
other form (perhaps quadratic) should be used. In general, patterns in the plot of the
errors indicate problems with the assumptions or the model specification.

Figure 1.4C: Pattern of Errors Indicating that the Errors Relationship is Not Linear

14 | P a g e
Estimating the Variance
While the errors are assumed to have constant variance (𝜎 # ), this is usually not known. It can
be estimated from the sample results. The estimate of 𝜎 # is the mean squared error (MSE)
and is estimated by S2. The MSE is the sum of squares due to error divided by the degrees
of freedom.
SSE
S # = MSE =
n−k−1
Where
n = number of observations in the sample
k = number of independent variables
in this example, n = 6 and k = 1, so
𝑆𝑆𝐸 6.8750 6.8750
𝑆 # = 𝑀𝑆𝐸 = = = = 1.7188
𝑛−𝑘−1 6−1−1 4
from this we can estimate the standard deviation as

𝑆 = √𝑀𝑆𝐸

this is called the standard error of the estimate or the standard deviation of the regression.
In the example shown
𝑆 = √𝑀𝑆𝐸 = √1.7188 = 1.31

This is used in many of the statistical tests about the model. It is also used to find interval
estimates for both Y and regression coefficients. (The MSE is a common measure of
accuracy in forecasting. When used with techniques besides regression, it is common to
divide the SSE by n rather than n - k - 1)

Testing the Model for Significance (An F test is used to determine if there is a relationship
between X and Y)

Both the MSE and r2 (r2 = Correlation Coefficient) provide a measure of accuracy in a regression
model. However, when the sample size is too small, it is possible to get good values for both
of these even if there is no relationship between the variables in the regression model. To
determine whether these values are meaningful, it is necessary to test the model for
significance.
To see if there is a linear relationship between X and Y, a statistical hypothesis test is
performed. The underlying linear model was given earlier as

𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿+ ∈

15 | P a g e
If β1 = 0, then Y does not depend on X in any way. The null hypothesis says there is no linear
relationship between the two variables (i.e. β1 = 0). The alternate hypothesis is that there is a
linear relationship (i.e. β1 ≠ 0). If the null hypothesis can be rejected, then we have proven
that a linear relationship does exist, so X is helpful in predicting Y. The F distribution is used
to test this hypothesis.

The F statistic used in the hypothesis test is based on the MSE and the mean squared regression
(MSR). The MSR is calculated as
𝑆𝑆𝑅
𝑀𝑆𝑅 =
𝑘
Where
K = number of independent variables in the model
The F statistic is
𝑀𝑆𝑅
𝐹 =
𝑀𝑆𝐸
Based on the assumptions regarding the errors in a regression model, this calculated F
statistic is described by the F distribution with
• degrees of freedom for the numerator = df1 = k
• degrees of freedom for the denominator = df2 = n – k – 1
Where
k = the number of independent (X) variables.

If there is very little error, the denominator (MSE) of the F statistics is very small relative to
the numerator (MSR), and the resulting F statistic would be large. This would be an indication
that the model is useful. A significance level related to the value of the F statistic is then found.
Whenever the F value is large, the significances level (p-value) will be low, indicating that it
is extremely unlikely that this could have occurred by chance. When the F value is large (with
a resulting small significance level), we can reject the null hypothesis that there is no linear
relationship. This means that there is a linear relationship and the values of MSE and r2 are
meaningful. The hypothesis test just described above can be summarised as follows, step by
step.

16 | P a g e
Steps in Hypothesis Test for a Significant Regression Model

1) Specify null and alternative hypotheses:

H0:β1 = 0
H1:β1 ≠ 0
2) Select the level of significance (α). Common values are 0.01 and 0.05
3) Calculate the value of the test statistic using the formula
𝑀𝑆𝑅
𝐹 =
𝑀𝑆𝐸
4) Make a decision using one of the following methods:
a) Reject the null hypothesis if the test statistic is greater than the F value from the table
in Appendix D. Otherwise, do not reject the null hypothesis:
Reject if F calculated > Fα,df1,df2
df1 = k
df2 = n – k - 1
b) Reject the null hypothesis if the observed significance level, or p-value, is less than
the level of significance (α). Otherwise, do not reject the null hypothesis
p-value = P(F > calculated test statistic)
Reject if p-value < α

17 | P a g e
Figure 1.5: F Distribution for KYU Construction Company Test for Significance

18 | P a g e
The Analysis of Variance (ANOVA) Table
When software such as Excel or QM for windows is used to develop regression models, the
output provides the observed significance level, or p-value, for the calculated F value. This is
then compared to the level of significance (α) to make the decision.
Table 1.7: Analysis of variance (ANOVA) table for regression

DF SS MS F Significance F
k SSR MSR = SSR/k 𝑀𝑆𝑅 P(F > MSR/ MSE)
Regression 𝑀𝑆𝐸
n – k- 1 SSE MSE = SSE/(n-k-1)
Residual
n-1 SST
Total

Table 1.7 provides summary information about the ANOVA table. This shows how the
numbers in the last three columns of the table are computed. The last column of this table,
labelled Significance F, is the p-value, or observed significance level, which can be used in
the hypothesis test about the regression model.

Tororo Construction Company Example

The Excel output that includes the ANOVA table for the Tororo construction company is
shown in table 1.6 . The observed significance level for F = 9.0909 is given to be 0.0394.
This means
𝑃(𝐹 > 9.0909) = 0.0394
Because this probability is less than 0.05 (α), we would reject the hypothesis of no linear
relationship and conclude that there is a linear relationship between X and Y. Note in figure
1.5 above that the area under the curve to the right of 9.09 is clearly less than 0.05, which is
the area to the right of the F value associated with a 0.05, level of significance.

19 | P a g e
20 | P a g e

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
No ratings yet
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
148 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
Dwbi Unit 4 & 5
No ratings yet
Dwbi Unit 4 & 5
26 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
UNIT2
No ratings yet
UNIT2
25 pages
UNIT 4 Predicate Logic
No ratings yet
UNIT 4 Predicate Logic
20 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Back Propagation Network: Soft Computing
No ratings yet
Back Propagation Network: Soft Computing
33 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Introduction To Python and Computer Programming 1704298503
No ratings yet
Introduction To Python and Computer Programming 1704298503
44 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
DATA ANAYTICS Notes UNIT4
No ratings yet
DATA ANAYTICS Notes UNIT4
45 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Chapter 6 Measures of Skewness and Kurtosis
No ratings yet
Chapter 6 Measures of Skewness and Kurtosis
25 pages
New Advances in Machine Learning: ISBN 978-953-307-034-6
No ratings yet
New Advances in Machine Learning: ISBN 978-953-307-034-6
378 pages
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
No ratings yet
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
54 pages
Data Science: Concepts and Practice: Course Slides
No ratings yet
Data Science: Concepts and Practice: Course Slides
9 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
No ratings yet
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
60 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
No ratings yet
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
11 pages
ML (U1&u2)
No ratings yet
ML (U1&u2)
51 pages
Algorithm Analysis Design Lecture1 PowerPoint Presentation
No ratings yet
Algorithm Analysis Design Lecture1 PowerPoint Presentation
9 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
UE20CS302 Unit4 Slides
No ratings yet
UE20CS302 Unit4 Slides
312 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
DSF Unit IV MCQ Notes
No ratings yet
DSF Unit IV MCQ Notes
6 pages
ML Question Bank 2024
No ratings yet
ML Question Bank 2024
2 pages
ML unit-2
100% (1)
ML unit-2
28 pages
Data Science
No ratings yet
Data Science
74 pages
CISC 867: Deep Learning Assignment #1: K J Net
No ratings yet
CISC 867: Deep Learning Assignment #1: K J Net
3 pages
SCSA3016 Data Science L T P Credits Total Marks 3 0 0 3 100
No ratings yet
SCSA3016 Data Science L T P Credits Total Marks 3 0 0 3 100
1 page
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
Linear Regression: in Machine Learning
No ratings yet
Linear Regression: in Machine Learning
6 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
7 Energy Economics
No ratings yet
7 Energy Economics
9 pages
An Overview of The Production of Non-Woven Fabric From Woolen Materials
No ratings yet
An Overview of The Production of Non-Woven Fabric From Woolen Materials
23 pages
Effect of Processing Parameters On Properties of L
No ratings yet
Effect of Processing Parameters On Properties of L
7 pages
Desig A Fabricatio of Rice Dehuller
No ratings yet
Desig A Fabricatio of Rice Dehuller
4 pages
Greykite Part 2
No ratings yet
Greykite Part 2
2 pages
QUESTION 1 (3 + 12 + 5 = 20 marks) :, … ,Y Y μ and V Y σ
No ratings yet
QUESTION 1 (3 + 12 + 5 = 20 marks) :, … ,Y Y μ and V Y σ
4 pages
Zuur Et Al 2009 BOOK - Chap01 - Introduction
No ratings yet
Zuur Et Al 2009 BOOK - Chap01 - Introduction
10 pages
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
No ratings yet
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
10 pages
Ogistic Egression: Concha Bielza, Pedro Larra Naga
No ratings yet
Ogistic Egression: Concha Bielza, Pedro Larra Naga
33 pages
Actuarial Society of India: Examinations
No ratings yet
Actuarial Society of India: Examinations
5 pages
Group 10 - 7qqmm811
No ratings yet
Group 10 - 7qqmm811
38 pages
Course Syllabus: Professor's Contact Information
No ratings yet
Course Syllabus: Professor's Contact Information
8 pages
Lecture 04
No ratings yet
Lecture 04
19 pages
Categorical Dependent Variable Regression Models Using STATA, SAS, and SPSS
No ratings yet
Categorical Dependent Variable Regression Models Using STATA, SAS, and SPSS
32 pages
IME Course Content
No ratings yet
IME Course Content
28 pages
2024 11 S1 - 1 Linear Regression (A)
No ratings yet
2024 11 S1 - 1 Linear Regression (A)
2 pages
UDSM Statistics and Probability For Non-Majors
No ratings yet
UDSM Statistics and Probability For Non-Majors
148 pages
Formula Sheet
No ratings yet
Formula Sheet
8 pages
Chapter 08
No ratings yet
Chapter 08
66 pages
Practica 3 Eviews
No ratings yet
Practica 3 Eviews
37 pages
Applied Categorical And Count Data Analysis 1st Edition Tang pdf download
100% (3)
Applied Categorical And Count Data Analysis 1st Edition Tang pdf download
76 pages
STAT TRANSES
No ratings yet
STAT TRANSES
5 pages
Dl-Unit-2 - 1
No ratings yet
Dl-Unit-2 - 1
47 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Garch Model Literature Review
100% (1)
Garch Model Literature Review
7 pages
Your Answer Score Explanation
No ratings yet
Your Answer Score Explanation
20 pages
M.SC - SDS EntranceTest Syllabus&Sampleuestions 2022
No ratings yet
M.SC - SDS EntranceTest Syllabus&Sampleuestions 2022
2 pages
Mid-term-test-2021_2911
No ratings yet
Mid-term-test-2021_2911
5 pages
Analysis of Longitudinal Data Second Edition Peter Diggle pdf download
No ratings yet
Analysis of Longitudinal Data Second Edition Peter Diggle pdf download
49 pages
COURSE CODE 8614 Assignment 2
No ratings yet
COURSE CODE 8614 Assignment 2
9 pages
Bed (1.5) Year Course Code Edu 309 Educational Research
No ratings yet
Bed (1.5) Year Course Code Edu 309 Educational Research
12 pages
T Distribution
No ratings yet
T Distribution
51 pages
PDF Statistics For Business Economics With XLSTAT Education Edition Printed Access Card David R. Anderson Download
100% (6)
PDF Statistics For Business Economics With XLSTAT Education Edition Printed Access Card David R. Anderson Download
53 pages
EAE250A B. Eke 2013-2
No ratings yet
EAE250A B. Eke 2013-2
3 pages

Linear Regression Analysis. Statistics 2 Notes

Uploaded by

Linear Regression Analysis. Statistics 2 Notes

Uploaded by

BUSITEMA UNIVERSITY

Faculty of Engineering & Technology

Department of Agricultural Mechanization & Irrigation Engineering

Figure 1.1: Demand Forecast

To develop forecasting techniques based on associative predictions or economic indicators

Let’s now consider Regression Analysis as a Chapter

1.1.2 Investigation of relationship between variables.

Example 1: Tororo Construction Company (TCC) renovates old homes in Podut

Figure 1.1: Scatter Diagram of Tororo Construction Company Data

1.1.3 Simple Linear Regression

Error = (Actual value) – (Predicted Value)

Σ(𝑋 − )(𝑌 − ) 12.5

The estimated regression equation therefore is

Ŷ = 2 + 1.25(6) = 9.5 Or $950,000.00

Possible Questions involved in measuring the fit of the Regression Model

3.0 Deviations May be Positive or Negative.

Table 1.3: Sum of Squares for Tororo Construction Company

6 3 (6 – 7)! = 1 2 + 1.25(3) = 5.75 0.0625 1.563

8 4 (8 – 7)! = 1 2 + 1.25(4) = 7.00 1 0

9 6 (9 - 7)! = 4 2 + 1.25(6) = 9.50 0.25 6.25

5 4 (5 - 7)! = 4 2 + 1.25(4) = 7.00 4 0

4.5 2 (4.5 - 7)! = 6.25 2 + 1.25(2) = 4.50 0 6.25

9.5 5 (9.5 - 7)! = 6.25 2 + 1.25(5) = 8.25 1.5625 1.563

SST = SSR + SSE

3.1.3 Coefficient of Determination (r2)

4.0 Correlation Coefficient

We know it is positive because the slope is +1.25.

Errors are also called Residuals

Figure 1.4B: Pattern of Errors Indicating Non-constant Error Variance

1) Specify null and alternative hypotheses:

Tororo Construction Company Example

You might also like