0% found this document useful (0 votes)

30 views10 pages

ch03 Regression

Uploaded by

david

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views10 pages

ch03 Regression

Uploaded by

david

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

CHAPTER 3

Linear Regression

Our work on correlation shows that it is a useful measure for understanding relationships between
variables, telling us both what type of relationship exists (positive or negative) and how strong it is.
Note that correlation only measures the strength of linear relationships. Correlation can also tell you
things like “when one variable goes up, the other tends to go up too” but it cannot quantify by how
much. It can, however, quantify how reliable the relationship is, and that rank correlation can ignore
the di↵erences in those quantities.

In this chapter we will study regression, which again is about understanding the relationship between
variables. But it can tell us something di↵erent from correlation. Whereas correlation says how strong
a relationship is, regression says exactly how large a change in one variable to expect from a particular
change in the other. Together, regression and correlation tell us everything we need about these types
of relationships.

Regression is a statistical technique for fitting a function to data, that is, drawing a line or a curve that
goes through data points. If you have some data, and you can fit a function to it, then you can make
predictions about what new data will look like. In a business context, making predictions is very useful,
as it allows us to answer questions such as:

• How does the size of a shop a↵ect its sales?

• How will the increase of population of the city in which a shop is located a↵ect its sales?
• Given the current sales trend, what will be the value of sales after 12 months?

We will look first at linear regression, which it turns out is closely related to correlation. Linear models
are simple but surprisingly powerful.

Recall that a line can be written with the equation y = a + bx, where x and y are the two variables,
and a and b are constants (i.e. just numbers). That means that if x = 0, then y = a (a is called the
intercept). It also means that for every increment of one unit in x, y will increase by b (b is called the
slope). We say that y is a linear function of x.

Consider the data shown in Figure 3.1. It represents sales of a franchise-type shop (like GAP, Subway,
etc) in cities with di↵erent population sizes. We can plot this data using a scatter plot, by using
Population as the x variable, and Sales as the y variable. Notice that city 5 appears twice, meaning that
there are two shops in that city, so there are two sales values for that city.

It looks as if sales are somewhat related to city population. In fact, the relationship looks somewhat
linear. That is, one could draw a straight line through the data, and although it would not go through
every single data point, it would be quite close. We could even use that line to answer a question like
“what value for sales would we expect if a new location for this franchise was started in a new city with
population x = 1 million?”

One problem is that the process of drawing the line is quite subjective. Two di↵erent people might think
that slightly di↵erent lines were correct. If one has a particular bias or an interest in the decision of
whether or not to open a new franchise, he or she might try to push the line a little in a favourable
direction. Clearly, it is better to use an objective method of drawing the line and answering these
questions. To do that, we use a method called Least Squares.
29
30 MIS2008L

Figure 3.1. Data on the sales of a chain of shops and the populations of the cities where
they are located.

1. Least Squares

Let us start by looking at two di↵erent lines: see Figure 3.2. The line on the left has the equation
y = 3790.1x 1000, and we can see that it is pretty good, but it misses some points. The line on the
right has the equation y = 2522x + 56.3, and it is also pretty good, but it too misses some points. Which
is better?

Sales&versus&popula4on& y"="3790.1x"+"1000"
Sales&versus&popula4on& y"="2522x"+"56.3"

4000" 4000"

3500" 3500"
Sales&in&euros&(thousands)&

Sales&in&euros&(thousands)&

3000" 3000"

2500" 2500"

2000" 2000"

1500" 1500"

1000" 1000"

500" 500"

0" 0"
0" 0.2" 0.4" 0.6" 0.8" 1" 1.2" 1.4" 0" 0.2" 0.4" 0.6" 0.8" 1" 1.2" 1.4"
City&popula4on&(millions)& City&popula4on&(millions)&

Figure 3.2. Two linear approximations to the city population–sales data. Which is better?

Clearly, we want to choose a line that is as close to all the points as possible. And clearly there is no
straight line which goes through all the points (for this data, and generally for real-world problems).
The objective way to draw the best line is to draw the line which minimises the distance of the line to
all the points.
We can see the distance from each point to the line as the error of the linear model. So we choose to
draw the line which gives the smallest total error, taking into account that that error can be positive or
negative (if a point is above or below the line). There are several error measures that one can use for
this, such as:
• Mean Absolute Error (MAE);
• Mean Squared Error (MSE);
• Root Mean Squared Error (RMSE).
It might be useful to think of a picture like Figure 3.3. The springs “want” to contract. As they
contract, they move the pole. The vertical distance from each hook to the pole represents the “error”
in a regression sense. Eventually the pole settles into a position which corresponds to the best possible
regression line.
In the context of linear regression, the RMSE of a line is the root mean square of the di↵erences between
the data-points yi and the values of the line at those points. How do we calculate those di↵erences? See
Figure 3.4.
Data Analysis for Decision Makers 31

Figure 3.3. Each regression data point imagined as a fixed hook, attached by a spring to
a movable pole. The springs will try to contract, that is try to minimise the distance from
the pole to the hooks. Image from the LION book intelligent-optimization.org.

y y=f(x)=a+bx
(xi , y)
i
yi
E
i

xi x

Figure 3.4. Calculating the error of a line y = f (x) = a + bx, at a point xi . The
diamond-shaped point has coordinates (xi , yi ), so the error of the line at that xi point is
E = (yi f (xi )).

The y-value of a point (xi , yi ) is just yi . The y-value of the line at the point xi is f (xi ) = a + bxi .
Therefore the error is |yi (a + bxi )|, i.e. the absolute di↵erence between the observed yi value and the
corresponding f (xi ) = a + bxi value.
We can then calculate the RMSE as such:
r Pn
(a + bxi ))2
i=1 (yi
RMSE =
n
where n is the number of data points. Note that the errors are squared, to account for positive and
negative di↵erences.
Smaller RMSE values imply a good fit between the line and the data, i.e. the given line models or
describes well the relationship between x and y.

Exercise 3.1. Write out the formula for the error at a single point (xi , yi ). Then write out the formula
for the sum of squared errors. Then write out the formula for the mean square error. Then write out
the formula for the root mean square error. This should help you see how each part of the name root
mean square error corresponds to a “layer” in the RMSE formula.

How can we actually find the values for a and b which give the very best possible line? There is a set of
equations which find the best (optimal) a and b values for given data, that is, the a and b values which
32 MIS2008L

give the best line (which is the line of lowest RMSE):

y = a + bx
P P P
n · xy x y
where b = P 2 P 2
n· x ( x)
a = ȳ bx̄

where ȳ and x̄ are the average values of all y and x values, respectively.

These equations allow us to find the best-fitting straight line to model the relationship between any two
variables. This line is called the line of best fit.

Exercise 3.2. Given these equations, and the data in Figure 3.1, write the equation of the line of best
fit.

2. Predicting with Regression

Once we have fitted a regression line, we can use it to make predictions. To be specific, if we want
to estimate the likely value of y given some value of x, we can do so using a model built with linear
regression.

In the example of sales versus city population, that would allow us to make a prediction about the likely
sales if we opened a new shop in a new city. We can do this just using the chart, as follows. We find
the line of best fit using linear regression. Then we find the x-value for the new city: suppose we are
considering a city where x = 1 million. We find that point on the x-axis, and we go vertically up from
there to hit the regression line, as shown in Figure 3.5. Then we go horizontally from that point to hit
the y-axis. The y-value here — about 2600 — is our prediction for this city. It is in thousands of Euros,
hence our estimate for sales in a new shop in a city with population of 1 million is e2.6M.

Figure 3.5. Linear regression finds the line of best fit. We can find an estimate for sales
in a new shop in a new city of (e.g.) 1 million people.

We can carry out the same process more precisely. Our question is: what is the y-value of the regression
line, when x = 1.0? We would feed in the city population (x) to the equation and get back an estimate
for sales (y). This city has x = 1. The line has the equation y = 56.3 + 2522x. We calculate y by putting
x = 1 in the equation, obtaining y = 56.3 + 2522 ⇥ 1 = 2578.3, quite close to the 2600 we estimated
graphically. As always, we must interpret the figure correctly: it means e2,578,300, because the y-values
are in thousands.
Data Analysis for Decision Makers 33

2.1. Interpolation and Extrapolation. In either case, this is an example of interpolation: making
predictions inside the range of the existing data. It is the opposite of extrapolation: making predictions
outside the range of existing data points.
We could also carry out extrapolation here. We might be interested in a new city where population
is 1.3 million. That is outside the range of data we have previously observed, but we can just extend
the regression line so that it intersects with our straight line up from x = 1.3. So we can just put
x = 1.3 straight into the equation, getting y = 56.3 + 2522(1.3) = 3334.9, which we interpret as sales of
e3,334,900. See Figure 3.6.

Figure 3.6. Extrapolation means making a prediction for y at an x-value outside the
range of existing x values.

Interpolation is more “reliable” than extrapolation, all else being equal. Extrapolation is more likely to
lead to an incorrect estimate, because with extrapolation we do not have evidence that the existing linear
trend will continue. Later we will see examples where a trend changes over time, making extrapolation
into the future unreliable.
Exercise 3.3. Make a sales prediction for a city with population of 0 people. What is the predicted
sales value? Does it make sense? What does this tell you about interpolation versus extrapolation?

3. Interpreting a Linear Model

3.1. Interpreting Regression Models. In the example above, we found a simple model for sales
(y) in terms of city population (x): y = a + bx. It allowed us to make predictions, and that was useful.
We can also gain some business insight from it, by interpreting the values of a and b.
Interpreting the coefficient, b: suppose we are thinking about our shop in the city of population 0.9
million. Suppose we hear that this city is growing fast. b tells us how fast the sales might change in
response to the change in population. To be specific: if x increased by 1 unit (from 0.9 to 1.9), then y
would increase by b. That is what b (the slope) means. In the current example, growing from 0.9 to 1.9
is not realistic, so we might instead consider: what if it grew from 0.9 to 1.0? That is an increase of 0.1
in x, so y would increase by 0.1b.
Interpreting the intercept, a: let us pretend there was a city of population 0. That would mean
x = 0. That would imply y = a. In other words, a is the value that we might expect for sales if population
was zero. That is not a very realistic situation; maybe we can imagine a shop in the middle of nowhere?
Now is a good time to consider the principle illustrated in Figure 3.7: extrapolating too far can give you
nonsensical results. A linear model should only be trusted for interpolation or for extrapolation “near”
the original data.
34 MIS2008L

Figure 3.7. Extrapolating too far backwards gives nonsensical results. From
smbc-comics.com.

Exercise 3.4. Based on the graphic in Figure 3.5, give a visual estimate for the correlation between
population and sales. Now calculate that correlation, using the equation given in Section 6.2. How close
was your estimate to the actual measured correlation value?
Exercise 3.5. Given the data shown in Figure 3.1, and the equations for the a and b least squares
coefficients, derive a linear model to predict sales (in thousands of Euros), given population (in millions).
Exercise 3.6. Calculate the RMSE for the model you derived. Compare that value to the distances
between the datapoints and the model line, as seen in Figure 3.5.
Exercise 3.7. Suppose you have data on sales versus shop size (in square metres). Provide a verbal
interpretation of what a and b would mean.

4. Predictors and Responses

Note that the model we derived allows us to predict what the value of sales is, for a given population
size. Unlike correlation, this is not a symmetric technique: we specifically chose x to be the population
size, and y to be the corresponding sales, and a model1 y = a + bx is made specifically to predict the
value of y.
In statistics, the variable we are trying to model (and thus predict) is usually represented with the letter
y, and can have several names; the most common are:
• dependent variable;
• response variable.
Dependent on what? Response to what? To the other variable (or, more generally, variables; though we
will not cover the topic of multiple regression). The variable we are using to predict sales is population
size; it can thus be referred to as:
• predictor variable;
1We could easily change this model to predict values of x: x = y a
. But we would also need to recalculate RMSE.
b
Data Analysis for Decision Makers 35

• independent variable;
• explanatory variable.
It is typically represented with the letter x.

5. Testing Goodness of Fit

5.1. Testing Goodness of Fit. Note that the line of best fit is not necessarily a good fit. Even
if there is no relationship between the variables, some line will still be found using the above methods.
Probably it will not fit the data well, i.e. it will miss the data points, i.e. the RMSE will be large. So it
would be good to have some measure of how good a fit it is. We could use RMSE for this purpose, but
the disadvantage of that is that RMSE depends on the scale of the data, that is how large the y values
are overall.
For example, suppose you are doing a linear regression where the dependent variable is a person’s income,
and you are studying a sample of people on typical wages. You might find, say, that RMSE is 5000.
Now suppose you have a sample of professional footballers. You might find that RMSE is 12000 on
this sample. But because the footballers’ wages are larger to start with, you expect RMSE to be larger
(Exercise: think about this, and make a plot to help if needed), but it is hard to be sure how much
larger. You cannot compare the two values for RMSE. You cannot say which regression has found a
better fit.
Another way to measure how well the line fits the data is to use the coefficient of determination R2 ,
which is based on Linear Correlation. To be specific: we measure the correlation r between the dependent
values yi and the values predicted by the linear regression, i.e. y 0 = a + bx. We regard these two things
as a paired data-set (y, y 0 ), so we can calculate r and from that R2 which is just defined as r2 . Higher
values for R2 indicate a better fit, i.e. how well the linear model describes the relationship between x and
y. Note carefully that this is not the same as measuring the correlation between the original variables x
and y.
The advantage of using R2 in this way is that it does not depend on the scale of the data. An R2 value
of 0.85 always indicates that the linear regression is giving a very good fit, whether we are talking about
millionaires or people on typical wages. An R2 value of 0.05 indicates a pretty poor fit, in other words
it indicates that linear regression is not succeeding in “modelling” the data well.
Note that RMSE is still a useful measure. In many business scenarios, you might be interested in the
mean error of your predictions; for example, if you want to predict the price of an asset, R2 will tell you
if your model is adequate, but RM SE will give you an idea of how good or bad that prediction will be.
If you have a tight budget, that might be vital information.
So when testing how good a linear model is, R2 gives you insight into how appropriate the model is
(with the given data), whereas RM SE gives you insight into the expected error of that model.
Exercise 3.8. Calculate the R2 for the model you derived in the previous sub-section. Interpret this
value; is your model a good fit for the data?

6. In- and Out-of-Sample Data

With the model we have derived, we can predict the sales in a new city with any population (although
better predictions will be achieved when doing interpolation). We can estimate how good that model
is, by comparing the estimated values for the actual datapoints (0.4, 0.5, etc) that we used to build
the linear model; and then calculating the RMSE or R2 between the actual observed y values, and the
estimated y 0 values.
However, one major problem in analytics methods is the di↵erence between in-sample and out-of-sample
data. In-sample data is the data which we use to train the model (as seen in Figure 3.1). Out-of-sample
data, on the other hand, is data on which the model will be run later. The problem is that sometimes a
model will perform well (low RMSE or high R2 ) on the in-sample data, but badly on the out-of-sample
data.
36 MIS2008L

One reason for this to occur is that the model is just not able to generalise well. Such a model is a bit
like a sports pundit who is always able to provide a good explanation for what just happened in a game,
but is never able to predict what will happen in an upcoming game.
The second reason is that the data on which the model will be used might not be similar to that on
which the model was trained. For example, the training data seen in Figure 3.1 might come from a
Western country, where prices are relatively high. So if the model is used to make predictions for cities
in a third world country, chances are that its predictions of sales figures will be wrong.

6.1. Training and Testing Error. If we have a model that performs well on in-sample data and
then makes bad predictions in practice, we may lose a lot of money! We mitigate this risk in an obvious
way: we only use some of the available data for training. We keep some of it back, “unseen” by the
model, and we test whether the model makes good predictions on it.
So in essence, we are dividing the available data into two sets: Training Data and Testing Data. We
use only the training data to create our model: we choose the model that gives us a low training error
(RMSE) or high accuracy (R2 ). But when estimating how well that model will do in unseen data, we
measure its test error (or test accuracy), that is, its RMSE or R2 in the test data.
How many samples to use for training versus testing? It depends on the modelling technique. As a rule
of thumb, 66% of the data might be used for training, and 33% for testing (or other variants: 75/25,
80/20, etc).
For example, Figure 3.8 shows the same data as before, but with four additional cities. In this case, we
decided to use the first 10 cities as our training set, and the last four as our test set. So we built a model
as before, using only the blue lines, and calculated the train R2 and RM SE. Then, using that same
model built using only the first ten observations, we calculate the R2 and RM SE values associated with
the test samples. This gives us an idea of how good/bad our model will predict unseen data, that is,
data not used to build the model.

Figure 3.8. Further data on the sales of a chain of shops and the populations of the
cities where they are located, divided into train (blue) and test (red) sets.

Note that we cannot always do this. Sometimes, our datasets are very small, and further breaking them
down into train and test data would leave very few samples to build a model. Ideally, we have many
data samples (thousands of observations), so that we create a model using a large training set, and test
its accuracy on unseen data using a realistic test set.

7. Linear Regression in Excel

7.1. Least Squares in Excel. We can create linear regression models in Excel using the LINEST
function: see Figure 3.9.
Alternatively, we can start with a scatter plot of the data. Right-click a data point, choose to add
a trendline, and choose the linear trendline type. Then, under Options, check Display equation on
Data Analysis for Decision Makers 37

chart. That will show an equation with the values of b and a (yes, in this order) which Excel has
calculated using the above equations.

Figure 3.9. Calculating a regression in Excel. The x and y data are in two columns. The
linear regression will return the two values which specify a line: the slope and the constant
(i.e. b and a, in our terminology). Therefore you must select the 2x1 rectangle of cells
as shown to tell Excel that you expect two results. Enter the formula =LINEST(C2:C11,
B2:B11). Note the order of B and C here: we put the dependent variable first. Type
Ctrl+Shift+Enter to execute. The rectangle of cells is filled with the results, as shown.
The two cells give b = 2522 and a = 56.3, in this order, which specifies the line of best fit.
The spreadsheet also shows how to calculate the values of a and b using the least squares
equations. Finally, the train and test RM SE and R2 values are also calculated.

Exercise 3.9. Given the x-values in column B, and y-values in column C (as in Figure 3.9), and given
the equation of the line f (x) = 56.3 + 2522x as shown, how can you calculate the RMSE? Hints: start
by calculating a new column (say G) with the values of the line at the x-points, in other words put
the formula for f (xi ) into a new column. Then calculate the line’s errors at those points in another
new column, by subtracting the line’s value from the correct value. Then use the Excel functions SQRT,
SUMSQ, and COUNTA (look up Excel help to find out what they do).

8. Multiple Linear Regression

Regression can be carried out on multiple independent variables. In this case, it is usually referred to as
multiple regression. Instead of using a model like y = a + bx, we use a model like y = a + b1 x1 + b2 x2 ,
supposing there are 2 independent variables, x1 and x2 . There is always only one dependent variable, y.

9. Non-linear Variable Transforms

Sometimes a linear model is unsuitable for the data. For example, what if we plot one independent
variable against a dependent variable and instead of seeing an approximately linear relationship, we see
a relationship that looks logarithmic (like that in Figure 3.10)?
We can see a “plateau” e↵ect, which makes sense in context. If our advertising spend is currently low,
we can gain a lot by increasing. But if our advertising spend is already high, adding a lot more does
not seem worth it. Clearly, we could fit a straight line to this data, but it would not fit the data well,
particularly the initial strong growth of sales with small advertising spending.
38 MIS2008L

60000"

50000"

40000"

Sales&
30000"

20000"

10000"

0"
0" 5000" 10000" 15000" 20000" 25000" 30000"
Adver+sing&spend&

Figure 3.10. Sales versus advertising spend. Clearly, spending some money on adver-
tising is a good idea, but the e↵ect seems to plateau. This is a logarithmic relationship.

In this case, a good approach is to carry out a variable transformation, to make the variable “look linear”.
In the current case, because it currently “looks” logarithmic, we carry out an logarithmic transform. Let
us always use log2 (log to the base 2) for ease of interpretation (see below) but it does not really matter
which base we use so long as we are consistent. We define a new variable x0 = log2 (x), where x was
the original variable and x0 is the new one. We replace x with x0 . Then we will see that x0 is in an
approximately linear relationship with y. Then we can run linear regression as before.
We have to be careful when interpreting and using the resulting model, though. Remember that x0 is in
di↵erent units from the original x.
If the Advertising variable has a coefficient b in the model, then we cannot say “for every 1 euro increase
in Advertising, we see a b euro increase in Sales”. Instead, you must interpret as follows: “for every 1
unit increase in log2 (Advertising), we see a b euro increase in Sales”. But what does a 1 unit increase
in log2 (Advertising) mean? This is a crucial point. If log2 (Advertising) increases by 1, that means
Advertising itself has doubled.
If we want to predict the value of y (Sales) for some given value of x (Advertising), clearly we must
calculate x0 = log2 (Advertising) before putting that into our linear model. We do not put in x itself.
Above we used a logarithmic transform to deal with logarithmic data. We can also do the opposite:
if there seems to be an exponential relationship (an “accelerating” curve, in contrast to the “slowing”
curve above), then we can transform x0 = ex .

10. Missing Data

It is quite common to receive a data file with some missing data. Correlation still works fine: it acts as
if the whole row was deleted. However, regression does not. We can address this by deleting the row
manually.
Sometimes missing data is not shown as a blank value in datafiles. It is sometimes shown as “na”, “NA”,
“n/a” or similar, which stands for “not available”, or as “?”. In these cases we will have to filter the
data, to find those values and remove the samples containing them.
There are even cases where missing data is encoded by a special numerical value. E.g. a record of student
grades (percentages) might indicate a missing grade using the value 1. Obviously, if we do not notice
that and we calculate a mean, a correlation, or a regression, we will get misleading results. When we
receive a data file, we have to check these issues carefully, before carrying out our analysis.

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
MIS_BA_20232024_notes_chapter03
No ratings yet
MIS_BA_20232024_notes_chapter03
13 pages
DA_UNIT_3_R22
No ratings yet
DA_UNIT_3_R22
15 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Unit-III
No ratings yet
Unit-III
13 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
3 Da
No ratings yet
3 Da
16 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
2023 Statistics Fin 10
No ratings yet
2023 Statistics Fin 10
14 pages
Regression_PDF
No ratings yet
Regression_PDF
33 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
The Simple Linear Regression Model and Correlation
100% (1)
The Simple Linear Regression Model and Correlation
64 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Coding 2
No ratings yet
Coding 2
3 pages
FDSA Unit V LECTURE NOTS
No ratings yet
FDSA Unit V LECTURE NOTS
28 pages
Uttam Linear Regression 17March24 (1)
No ratings yet
Uttam Linear Regression 17March24 (1)
82 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
13 Predictive Analysis - Tests of Association- Regression
No ratings yet
13 Predictive Analysis - Tests of Association- Regression
70 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Week 7 - Regression
No ratings yet
Week 7 - Regression
24 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
38 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Correlation Regression And: Learning Outcomes
No ratings yet
Correlation Regression And: Learning Outcomes
16 pages
HELM Workbook 43 Regression and Correlation
No ratings yet
HELM Workbook 43 Regression and Correlation
32 pages
Econometrics for Mgt ppt-2 (1)
No ratings yet
Econometrics for Mgt ppt-2 (1)
58 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
ABRM Regression
No ratings yet
ABRM Regression
22 pages
MBAS901 2 Lecture
No ratings yet
MBAS901 2 Lecture
87 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Regression v33
No ratings yet
Regression v33
81 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Data Visualization: Six Sigma Thinking, #2
From Everand
Data Visualization: Six Sigma Thinking, #2
Sumeet Savant
No ratings yet
Calculus Essentials For Dummies
From Everand
Calculus Essentials For Dummies
Mark Ryan
No ratings yet
210-Full Length Article-489-1-10-20190709
No ratings yet
210-Full Length Article-489-1-10-20190709
12 pages
Cu Stat3008 Assignment 1
0% (1)
Cu Stat3008 Assignment 1
2 pages
ML Lab Manual TE 2021-22
No ratings yet
ML Lab Manual TE 2021-22
43 pages
اسئلة احصاء
No ratings yet
اسئلة احصاء
4 pages
House
100% (2)
House
19 pages
NASKAH Yanis Submid Modiv
No ratings yet
NASKAH Yanis Submid Modiv
7 pages
SEM
100% (1)
SEM
10 pages
Prism 6 - Linear Standard Curve
No ratings yet
Prism 6 - Linear Standard Curve
7 pages
Scoring Poverty Philippines 2002
No ratings yet
Scoring Poverty Philippines 2002
68 pages
Lesson 2 - MA Models, Partial Autocorrelation, Notational Conventions
No ratings yet
Lesson 2 - MA Models, Partial Autocorrelation, Notational Conventions
17 pages
Identification of Parkinson's Disease Using Machine Learning Algorithms
No ratings yet
Identification of Parkinson's Disease Using Machine Learning Algorithms
4 pages
Validity of A Questionnaire On Self-Efficacy For Pap Test Adherence Screening
No ratings yet
Validity of A Questionnaire On Self-Efficacy For Pap Test Adherence Screening
8 pages
Investigation and Comparison Missing Data Imputation Methods
No ratings yet
Investigation and Comparison Missing Data Imputation Methods
73 pages
ELISA-Logit21042005-TESTING20190531
No ratings yet
ELISA-Logit21042005-TESTING20190531
6 pages
SMIP12 Proceedings PDF
No ratings yet
SMIP12 Proceedings PDF
136 pages
Group B04 - MM2 - Pepper Spray Product Launch
No ratings yet
Group B04 - MM2 - Pepper Spray Product Launch
34 pages
The MIT Press
No ratings yet
The MIT Press
17 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Doran-Fields Medal
No ratings yet
Doran-Fields Medal
53 pages
UnderstandingDeepLearning 03-26-25 C 15 28
No ratings yet
UnderstandingDeepLearning 03-26-25 C 15 28
14 pages
Sample Exam 1 FKB 20302
No ratings yet
Sample Exam 1 FKB 20302
9 pages
Bi Notes
No ratings yet
Bi Notes
138 pages
Stat 136 Chapter 11 Autocorrelation
No ratings yet
Stat 136 Chapter 11 Autocorrelation
15 pages
Topic 3 - endogeneity (1)
No ratings yet
Topic 3 - endogeneity (1)
53 pages
Analysis of The Effect of Service Quality, Rest Area Facilities, and Perceived Price On Customer Satisfaction at The Cipali Toll Road
No ratings yet
Analysis of The Effect of Service Quality, Rest Area Facilities, and Perceived Price On Customer Satisfaction at The Cipali Toll Road
5 pages
Dell - 2010 - The Persistent Effect of Peru's Mining Mita
No ratings yet
Dell - 2010 - The Persistent Effect of Peru's Mining Mita
42 pages
Anfis Note
No ratings yet
Anfis Note
15 pages
Fitting Nonlinear Calibration Curves No Models Per
No ratings yet
Fitting Nonlinear Calibration Curves No Models Per
17 pages
Kadek Dita Purwita Sari - 1807531103 - 01
No ratings yet
Kadek Dita Purwita Sari - 1807531103 - 01
43 pages
Assessment Review - Corporate Finance Institute
No ratings yet
Assessment Review - Corporate Finance Institute
22 pages

ch03 Regression

Uploaded by

ch03 Regression

Uploaded by

CHAPTER 3

• How does the size of a shop a↵ect its sales?

give the best line (which is the line of lowest RMSE):

2. Predicting with Regression

3. Interpreting a Linear Model

4. Predictors and Responses

5. Testing Goodness of Fit

6. In- and Out-of-Sample Data

7. Linear Regression in Excel

8. Multiple Linear Regression

9. Non-linear Variable Transforms

10. Missing Data

You might also like