0% found this document useful (0 votes)

8 views

II-I_MCA_Data Science and Analytics_Course Material_Unit2

Data science and analytics

Uploaded by

jeevansai496

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

II-I_MCA_Data Science and Analytics_Course Material_Unit2

Data science and analytics

Uploaded by

jeevansai496

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

SVCE TIRUPATI

COURSE MATERIAL

DATA SCIENCE & ANALYTICS

SUBJECT (CA20FPC302)

UNIT 2

COURSE MCA

DEPARTMENT MCA

SEMESTER 21

PREPARED BY A.JYOTHSNA
(Faculty Name / s) Assistant Professor

Version V-1

PREPARED / REVISED DATE 17-03-2022

MCA-SEM 21
SVCE TIRUPATI

TABLE OF CONTENTS – UNIT 2

S. NO CONTENTS PAGE NO.
1 COURSE OBJECTIVES 1
2 PREREQUISITES 1
3 SYLLABUS 1
4 COURSE OUTCOMES 1
5 CO - PO/PSO MAPPING 2
6 LESSON PLAN 2
7 ACTIVITY BASED LEARNING 2
8 LECTURE NOTES 3
2.1 LINEAR REGRESSION 3
2.2 ESTIMATING THE COEFFICIENTS 3
2.3 ASSESSING THE ACCURACY OF THE COEFFICIENT ESTIMATE 4
2.4 HYPOTHESIS TESTING 5
2.5 ASSESSING THE ACCURACY OF THE MODEL 5
2.6 MULTIPLE LINEAR REGRESSION 7
2.7 OTHER CONSIDERATION IN THE REGRESSION MODEL 10
2.8 COMPARISON OF LINEAR REGRESSION WITH K-NEAREST 12
NEIGHBORS
9 PRACTICE QUIZ 14
10 ASSIGNMENTS 14
11 QUESTIONS & ANSWERS 14
12 SUPPORTIVE ONLINE CERTIFICATION COURSES 15
13 REAL TIME APPLICATIONS 15
14 CONTENTS BEYOND THE SYLLABUS 15
15 PRESCRIBED TEXT BOOKS & REFERENCE BOOKS 15
16 MINI PROJECT SUGGESTION 16

MCA-SEM 21
SVCE
1. Course Objectives
The objectives of this course is to
1. The course gives you a set of practical skills for handling data that comes in a
variety of formats and sizes, such as texts, spatial and time series data.
2. These skills cover the data analysis lifecycle from initial access and acquisition,
modelling, transformation, integration, querying, application of statistical
learning and data mining methods, and presentation of results.
3. This includes data wrangling, the process of converting raw data into a more
useful form that can be subsequently analysed.

2. Prerequisites
Students should have knowledge on
1. Basic Mathematics
2. Basic understanding of programming

3. Syllabus
UNIT II
Linear Regression, Simple Linear Regression, Multiple Linear Regression, Other
Considerations in the Regression Model, Comparison of Linear Regression with K-
Nearest Neighbours, Linear Regression.

4. Course outcomes
1. Understand business intelligence and business and data analytics.
2. To understand the business data analysis through the powerful tools of data
application.
3. Understand the methods of data mining.
4. Apply basic tools (plots, graphs, summary statistics) to carry out EDA.
5. Understand the key elements of a data science project
6. Identify the appropriate data science technique and/or algorithm to use for the
major data science tasks.

5. Co-PO / PSO Mapping

DSA PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 P10 PO11 PO12 PSO1 PSO2

CO1 3 3 2 2

CO2 3 3 2 2

CO3 3 3 2 2

CO4 3 3 2 2

CO5 3 3 2 2

1|D S A ‐ U N I T ‐ I

MCA-SEM
SVCE
6. Lesson Plan

Lecture No. Weeks Topics to be covered References

Linear Regression T1
1
Simple Linear Regression T1
2
5
3 Multiple Linear Regression T1

4 Other Considerations in the Regression Model T1

5 Comparison of Linear Regressions with K-Nearest Neighbors T1

6 Comparison of Linear Regressions with K-Nearest Neighbors T1

6 Linear Regression
7 T1

8 Revision on Unit-2 T1

7. Activity Based Learning

1. Develop Regression model using Linear Regression
2. Analyzing performance of Linear Regression model and KNN

8. Lecture Notes

1.1 LINEAR REGRESSION

Linear Regression is a very simple approach for supervised learning. It is a useful
tool for predicting a quantative response. It is used for finding linear
relationship between target and one or more predictors.
They are two types of Linear Regression
• Simple Linear Regression
• Multiple Linear Regression
2.1.1 Simple Linear Regression:
Simple Linear Regression is a very straight forward approach for predicting
a quantitative response Y on the basis of a single predictor variable X. It
assumes that there is approximately a linear relationship between X on
Y. Mathematically we can write this linear relationship as
Y ≈ β0+β1 x
 β0 and β1 are two unknown constants that represent the intercept
and slope terms in the linear model.
 β0& β1 together known as model coefficients or parameters.
1.2 ESTIMATING THE COEFFICIENTS:
 In Practice, β0 & β1 are unknown we must use data to estimate
the coefficients.
 Let y^i = β^0 + β^1 xi be the prediction for y based on the ith value of X.

2|D S A ‐ U N I T ‐ I

MCA-SEM
SVCE
 ei =yi- y^i represents the ith residual. This is the difference between the
ith observed response value and ith response value that is predicted
by our linear model.
 The Residual sum of squares (RSS) can be defined as
o RSS = e12+ e22 +....................+ en2
 The least squares approach chooses β^0 &β^1 to minimize the RSS.
Using some calculus, one can show that the minimizers are

∑n (xi — x̅ )(yi — y¯)
^
β1=
i=1
∑n (xi — x̅)2
i
β^0 = y¯ — β^1 x̅
where x̅ and y¯ are the sample
means.
1.3 ASSESSING THE ACCURACY OF THE COEFFICIENT ESTIMATES:
Y= β0 + β1 x + є
Where β0 = intercept term
β1= slope
є = mean-zero random error term
 Standard Error of the smaple mean μ^ is necessary to calculate
how accurate is the μ^
 From the well-known formula

where σ is the standard deviation

 The standard errors associated with β^0 and β^ , we use the

following formulas:

where σ2 = var (є)

 These standard errors can be used to compute confidence intervals. A

95% confidence interval is defined as a range of values such that with
95% probability the range will contain the true unknown value of the
parameter. It has the form

o That is, there is approximately a 95% chance that the interval

Will contain the true value of β1

3|D S A ‐ U N I T ‐ I

MCA-SEM
SVCE

1.4 HYPOTHESIS TESTING

 Standard errors can also be used to perform hypothesis tests on the
coefficients. The most common hypothesis test involves testing the null
hypothesis of
H0: There is no relation between X and Y versus the alternative hypothesis
H1: There is some relationship between X and Y.
Mathematically, this corresponds to testing
H0: β0 = 0
Verus
HA : β1 ≠0
Since if β1 = 0 then the model reduces to Y= β 0 +E and X is not associated
with Y
 To test the null hypothesis, we compute a t-statistic given by

This will have a t-Distribution with n-2 degrees of freedom, assuming β1 = 0

 Using Statistical software, it is easy to compute the probability of
observing any value equal to |t| or larger. We call this probability the P-
value.
1.5 ASSESSING THE ACCURACY OF THE MODEL
 Once the null Hypothesis is rejected in favor of the alternative
hypothesis, it is natural to want to quantify the extent to which the model
fits the data.
 The Quality of a linear Regression fit is typically assessed using two
related Quantities:
 The Residual Standard Error (RSE)
 R2 Statistic
Residual Standard Error (RSE)
 We compute the RSE

Where RSS = Residual Sum of Squares

 The RSE is considered a measure of the lack of fit of the model to the data.
 If the predictions obtained using the model are very close to the true
outcomes values- i.e, if y^i ≈ yi for i=1,2,…..n then RSE will be small, and
we can conclude that the model fits the data very well.
 On the other hand, if y^i is very far from yi for one or more observations,
then the RSE may be quite large, indicating that the model doesn’t fit the
date well .
2
R Statistic
 The R2 provides an alternative measure of fit.

4|D S A ‐ U N I T ‐ I

MCA-SEM
SVCE
 It takes the form of a proportion the proportion of variance explained and
so it always takes on a value between 0 and 1, and is independent of the
scale Y.

Where TSS = Total Sum of Squares

 R2 statistic that is close to 1 indicates that a large proportion of the
variability in the response.
 The R2 statistic has interpretational advantage over RSE, since unlike
RSE, it always lies between 0 and 1.
 The R2 statistic is a measure of the linear relationship between X
and Y . R2 =r2 where r is the correlation between X and Y.
1.6 MULTIPLE LINEAR REGRESSION

 Simple Linear Regression is a useful approach for predicting a response on

the basis of single predictor variable. However, in practice we often have
more than one predictor.
 Multiple linear regression can accommodate multiple predictors.
Y= β0 + β1 x1 + β0 + β1 x2 +… + βp + βp x +E
 We interpret βi as the average effect on Y of a one unit increase in X j,
holding all other predictors fixed.
 For example,

2.6.1 Estimating Regression Coefficients

 Given Estimates β^0 , β^1 , β^2 ……… β^n , we can make predictions
using the formula

 We estimate β0, β1,…….. βp as the values that minimize the sum of

squares residuals

 This is done using standard statistical software. The Values β^0 , β^1 ,
β^2 ………
β^n that minimize RSS are the multiple least squares regression coefficient
estimates.

5|D S A ‐ U N I T ‐ I

MCA-SEM
SVCE
2.6.2 SOME IMPORTANT QUESTIONS
 When we perform multiple linear Regression we usually are interested in
answering a few important questions
1. Is at least one of the predictors x , x , x
1 2 3............................ xp useful in predicting

the response?
2.
Do all the predictors help to explain Y, or is only a subset of the
predictors useful?
3.
How well does the model fit the data?
4.
Given a set of predictor values, what response value should we predict
, and how accurate is our prediction?
Is there a Relationship between the Response and Predictors
 In Simple linear Regression setting, in order to determine the relationship
between the response and the predictor simply we check whether β1
= 0.
 In the multiple regression setting with P predictors, we need to ask
whether all of the regression coefficients are zero, i.e., whether β1 = β2 =
……… βp = 0.
 As in the simple linear Regression, we use a hypothesis test to answer
the question.
H0: β1 = β2 = ……… βp = 0.
Verus the alternative
Ha = at least one βi is non-zero
 This hypothesis test is performed by computing the F-Statistic

 If the linear model assumptions are correct, then

and that, provided H0 is true,

 Hence, when there is no relationship between the response and

predictors, one would expect the F-statistic to take on a value
close

to 1. On the other hand, if Ha is true then .

So we expect F to be greater than 1.

Deciding on Important Variable

 The task of determining which predictors are associated with the
response to fit a single model is referred to as variable selection
6|D S A ‐ U N I T ‐ I I

MCA-SEM
SVCE
 There are three classical approaches to choose a smaller set of
models to consider.
 Forward Selection
 Backward Selection
 Mixed Selection
Forward Selection
 Initially begin with the null model which contains an intercept but no
predictors.
 We then fit P simple linear regressions and add to the null model the
variable that results in lowest RSS.
 This approach is continued until some stopping rule is satisfied.
Backward Selection
 We start with all variables in the model, and remove the variable with
the largest P-value. Which is the least statistically significant.
 The new (p-1) – variable model is fit, and the variable with the largest p-
value is removed.
 This procedure continues until a stopping rule is reached.
Mixed Selection
 This is a combination of forward and backward selection
 We start with no variables in the model, and as with forward selection,
we add the variable that provides the best fit.
 We continue to perform these forwards and backward steps until all
variables in the model have a sufficiently low p-value.
Model fit
 Two of the most common numerical measures of model fit are the RSS
and R2.
 These Quantities are computed and interpreted in the same fashion
as for simple linear Regression.
 In simple Regression. R2 is the square of the correlation of the response
and the variable.
 In multiple Regression, it is equals to the square of the correlation
between the response and the fitted linear model.
An R2 value close to 1 indicates that the model explains a large
portion of the variance in the response variable.
Predictions
 Once we have fit the multiple regression model, there are three sorts
of uncertainty associated with this prediction.
 Least Squares plane estimate only true population regression plane.
 The inaccuracy in the coefficient estimates is related to the reducible
error.
Model Bias

7|D S A ‐ U N I T ‐ I I

MCA-SEM
SVCE
In practice assuming a linear model for f(x) is almost always an

approximately of reality. So there is an additional source of potentially
reducible error called model bias.
1.7 OTHER CONSIDERATION IN THE REGRESSION MODEL
2.7.1 Qualitative Predictors

 Some predictors are not quantitative but are qualitative, taking a

discrete set of values.
 These are also called Categorical predictors or factor variables.
 For Example: Investigate difference in credit card balance
between males and females, ignoring the other variables. We create a
new variable

Resulting Model

With more than two levels, we create additional dummy


variable is known as the baseline.
2.7.2 Extensions of the Liner Model
 Two of the most important assumptions state that the relationship
between the predictor and response are additive & linear.
 The additive assumption means that the effect of changes in a
predictor xj on the response Y is independent of the values of the
other predictors.
 The Linear assumption states that the change in response Y due to
a one unit change in xj is constant, regardless of the value xj.
Removing the additive Assumptions
 If TV increases by one unit, then sales will increase by β1,
independently from the amount of radio budget.
 This simple model may be wrong. It may be the case that the
coefficient for TV should increase or radio increases
Sales= β0 + β1 *TV+ β2 *Radio+ β3 * Newspaper + E
 How to extend the standard linear Regression model, by
“releasing” the additive assumption?

8|D S A ‐ U N I T ‐ I I

MCA-SEM
SVCE

 Interaction term β3 x1 x2 is added to remove interaction

effect Y= β0 +( β1 + β3 *2)x1 + β2 x2 +E
Y= β0 + β1x1 + β2x2 + E
 Since β˜1 changes with x2 , the effect of x1 on y is no longer
constant advertising x2 will change the impact of x1 on y.
Non-Linear Relationships
 The linear regression model assumes a linear Relationship between
the response and predictors.
 But in some cases, the true relationship between the response and
the predictors may be non-linear.
 A very simple way to directly extend the linear model to
accommodate non-linear relationships is using polynomial
regression.
 Polynomial regression is a form of regression analysis in which the
relationship between the independent variable X and the
dependent variable Y.
 It fits a nonlinear Relationship between the value of x and the
corresponding conditional mean of Y.
1.8 COMPARISON OF LINEAR REGRESSION WITH K-NEAREST NEIGHBORS
 Linear regression is an example of a parametric approach because it
assumes a linear function form for f(x)
 Parametric methods have several advantages. They are often easy to fit,
because one need estimate only a small number of coefficients.
 But parametric methods do have a disadvantage by construction, they
make strong assumptions about the form of f(x).
 If the specified functional form is far from the truth, and prediction
accuracy is our goal, then the parametric method will perform poorly.
 Non-parametric methods do not explicitly assume a parametric form for
f(x), and there by provide an alternative and more flexible approach for
performing regression.
 The simplest and best-known non parametric methods is K-NN regression
 The KNN Regression method is closely related to the KNN classifier.
 KNN Regression first identifier the K training observations that are closest to
x0 , represented by N0. It then estimates f(x0) using the average of all
the training responses in N0.

9|D S A ‐ U N I T ‐ I I

MCA-SEM
SVCE
9. Pra ctice Quiz
1. Line ar Regression is a supervised machine learning algorithm
a) true
b) false
2. Which of the following methods do we use to find the best fit line for data in Line ar
Regression?
a) Least Square Error
b) Maximum Likelihood
c) Logarithmic Loss
d) Both A and B
3. Which of the following evaluation metrics can be used to evaluate a model while
modeling a continuous output variable?
a) AUC-ROC
b) Accuracy
c) Logloss
d) Mean-Squared-Error
4. Which of the following is true about Residuals?
A) Lower is better
B)Higher is better
C) A or B depend on the situation
D) None of these
5. Which of the following statement is true about outliers in Line ar regression?
a) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
6. Which of the following metrics can be used for evaluating regression models?
1) R Squared
2) Adjusted R Squared
3) F Statistics
4) RMSE / MSE / MAE
a) 2 and 4.
b) 1 and 2.
c) 3 and 4.
d) All of the above.
7. A regression model in which more than one independent variable is used to
predict the dependent variable is called
a) a simple linear regression model
b) a multiple regression model
c) an independent model
d) none of the above
8. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1
unit (holding x2 constant), y will?

10|D S A ‐ U N I T ‐

MCA-SEM
SVCE
a) incre ase by 3 units
b) decrease by 3 units
c) increase by 4 units
d) decrease by 4 units
9. A multiple regression model has
a) only one independent variable
b) more than one dependent variable
c) more than one independent variable
d) none of the above
10. A measure of goodness of fit for the estimated regression equation is the
a) multiple coefficient of determination
b) mean square due to error
c) mean square due to regression
d) none of the above

10. Assignments

S.No Question BL CO
Define simple linear regression and explain how to estimate the
1 2 1
coefficients.
Define Hypothesis Testing and explain Hypothesis testing with an
2 2 1
example.
Compare Linear Regression with K-Nearest Neighbors, Linear
3 Regression. 2 1

11. Questions

S.No Question BL CO
1 Define simple linear regression and explain how to estimate the 1 1
coefficients.
2 Define Hypothesis Testing and explain Hypothesis testing with an 2 1
example.
3 Compare Linear Regression with K-Nearest Neighbors, Linear 2 1
Regression.

12. Supportive Online Certification Courses

1. Essentials of Data Science With R Software - 2: Sampling Theory and Linear
Regression Analysis-By Prof. Shalabh organized by IIT Kanpur |12 weeks

11|D S A ‐ U N I T ‐

MCA-SEM
SVCE
13. Real Time Applications
S.No Application CO
1 Predictive Analytics: 1
Predictive analytics i.e., forecasting future opportunities and risks is the
most prominent application of regression analysis in business.
2 Operation Efficiency: 1
Regression models can also be used to optimize business processes. A
factory manager, for example, can create a statistical model to
understand the impact of oven temperature on the shelf life of the
cookies baked in those ovens.
3 Supporting Decisions: 1
Businesses today are overloaded with data on finances, operations and
customer purchases.
4 Correcting Errors: 1
Regression is not only great for lending empirical support to
management decisions but also for identifying errors in judgment.
5 New Insights: 1
Over time businesses have gathered a large volume of unorganized data
that has the potential to yield valuable insights.

14. Contents Beyond the Syllabus

1. Multiple Line ar Regression Analysis with R
Applying the multiple linear regression model using R.
2. Variable Selection using LASSO Regression
Data analysts and data scientists use different regression methods for different
kinds of analytics problems. From the simplest ones to the most complex ones.
One of the most talked-about methods is the Lasso. Lasso was often described
as one of the most useful linear regression tools and we are about to find out
why.

15. Prescribed Text Books & Reference Books

Text Book
1. Gareth James Daniela Witten Trevor Hastie, Robert Tibshirani, An Introduction to
Statistical Learning with Applications in R, February 11, 2013, web link:
www.statlearning.com.
2. Mark Gardener, Beginning R The statistical Programming Language, Wiley, 2015.
3. Han ,Kamber, and J Pei, Data Mining Concepts and Techniques, 3rd edition,
Morgan Kaufman, 2012.
References:
1. Sinan Ozdemir, Principles of Data Science, Packt Publishing Ltd Dec 2016.
2. Joel Grus, Data Science from Scratch, Oreilly media, 2015.

12|D S A ‐ U N I T ‐

MCA-SEM
SVCE
16. Mini Project Suggestion
1. Budget a Long Drive
Suppose you want to go on a long drive (from Delhi to Lonawala). Before going
on a trip this long, it’s best to prepare a budget and figure out how much you
need to spend on a particular section. You can use a linear regression model
here to determine the cost of gas you’ll have to get.
2. Compare Unemployment Rates with Gains in Stock Market
If you’re an economics enthusiast, or if you want to use your knowledge of
Machine Learning in this field, then this is one of the best linear regression project
ideas for you. We all know how unemployment is a significant problem for our
country. In this project, we’d find the relation between the unemployment
rates and the gains happening in the stock market.
3. Compare Salaries of Batsmen with The Average Runs They Score per Game
Cricket is easily the most popular game in India. You can use your knowledge
of machine learning in this simple yet exciting project where you’ll plot
the relationship between the salaries of batsmen and the average runs they
score in every game. Our cricketers are among some of the highest-earning
athletes in the world. Working on this project would help you find out how
much their batting averages are responsible for their earnings.
4. Compare the Dates in a Month with the Monthly Salary
This project explores the application of machine learning in human resources
and management. It is among the beginner-level linear regression projects, so
if you haven’t worked on such a project before, then you can start with this
one. Here, you’ll take the dates present in a month and compare it with the
monthly salary.
5. Compare Average Global Temperatur es and Levels of Pollution
Pollution and its impact on the environment is a prominent topic of discussion.
The recent pandemic has also shown us how we can still save our
environment. You can use your machine learning skills in this field too. This
project would help you in understanding how machine learning can solve the
various problems present in this domain as well.

13|D S A ‐ U N I T ‐

MCA-SEM

HW7
100% (3)
HW7
6 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Foam Car - Shampoo - Formulation
No ratings yet
Foam Car - Shampoo - Formulation
9 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
Data Analytics Unit 3 Notes
100% (2)
Data Analytics Unit 3 Notes
28 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
FDA UNIT 5
No ratings yet
FDA UNIT 5
20 pages
Section 2
No ratings yet
Section 2
22 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
CIVI6731 Lecture (Week11)
No ratings yet
CIVI6731 Lecture (Week11)
22 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
No ratings yet
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
6 pages
UNIT-III Lecture Notes
No ratings yet
UNIT-III Lecture Notes
18 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Numerical Methods With Applications
No ratings yet
Numerical Methods With Applications
34 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Annotated 4 Ch4 Linear Regression F2014
No ratings yet
Annotated 4 Ch4 Linear Regression F2014
11 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
3 Da
No ratings yet
3 Da
16 pages
Data Science Assignment
No ratings yet
Data Science Assignment
10 pages
1.1 Regression Analysis
No ratings yet
1.1 Regression Analysis
33 pages
Lecture 9-10 -Updated vesion S25 - Regression
No ratings yet
Lecture 9-10 -Updated vesion S25 - Regression
43 pages
Hhghiikkk
No ratings yet
Hhghiikkk
29 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Polynomial Curve Fitting
No ratings yet
Polynomial Curve Fitting
44 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Part A Assignment - No - 4
No ratings yet
Part A Assignment - No - 4
14 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Lecture 10
No ratings yet
Lecture 10
38 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Regression
No ratings yet
Regression
56 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Introduction to Finite Element Analysis
From Everand
Introduction to Finite Element Analysis
Rahul Basu
No ratings yet
Mastering Dynamic Programming in Java
From Everand
Mastering Dynamic Programming in Java
Ed A Norex
No ratings yet
Implementing Models of Financial Derivatives: Object Oriented Applications with VBA
From Everand
Implementing Models of Financial Derivatives: Object Oriented Applications with VBA
Nick Webber
3.5/5 (1)
One Way and Two Way Classification Analysis of Variance
No ratings yet
One Way and Two Way Classification Analysis of Variance
61 pages
ANOVAangel
No ratings yet
ANOVAangel
24 pages
Penjelasan Hasil Output Eview
No ratings yet
Penjelasan Hasil Output Eview
6 pages
BCom BRM Unit-III Lecture-7,8doc PDF
No ratings yet
BCom BRM Unit-III Lecture-7,8doc PDF
14 pages
Detail Interp For Econometrics
No ratings yet
Detail Interp For Econometrics
8 pages
MCQ On Anova
100% (2)
MCQ On Anova
6 pages
Detrmination of Acceptable Firmnes An Color Values of Tomatoes
No ratings yet
Detrmination of Acceptable Firmnes An Color Values of Tomatoes
5 pages
STA100-Fall2024-HW5-Due-Nov-8th
No ratings yet
STA100-Fall2024-HW5-Due-Nov-8th
2 pages
Research Methods Session 11 Data Preparation and Preliminary Data Analysis (Compatibility Mode)
No ratings yet
Research Methods Session 11 Data Preparation and Preliminary Data Analysis (Compatibility Mode)
9 pages
The Effect of Partial Dictation .08
No ratings yet
The Effect of Partial Dictation .08
7 pages
UC Berkeley Econ 140 Section 10
No ratings yet
UC Berkeley Econ 140 Section 10
8 pages
Practical Research Ii: Analysis of Variance
No ratings yet
Practical Research Ii: Analysis of Variance
15 pages
Impact of Entrepreneurship On Unemployment Reduction in Nigeria
No ratings yet
Impact of Entrepreneurship On Unemployment Reduction in Nigeria
19 pages
How To Guide: Simple Tips For Using Our Most Popular Tools
No ratings yet
How To Guide: Simple Tips For Using Our Most Popular Tools
15 pages
Chapter4 Solutions
No ratings yet
Chapter4 Solutions
5 pages
How To Write Chapter 3 - Methods of Research and Procedures (Continuation)
No ratings yet
How To Write Chapter 3 - Methods of Research and Procedures (Continuation)
56 pages
Econometrics Pset 8
No ratings yet
Econometrics Pset 8
5 pages
Notes Unit-4 BRM
No ratings yet
Notes Unit-4 BRM
10 pages
FM 11 Chapter 5
No ratings yet
FM 11 Chapter 5
47 pages
Experimental Design
100% (1)
Experimental Design
16 pages
4 Regression Analysis
No ratings yet
4 Regression Analysis
33 pages
The Effect of Store Environment On Impulse Buying Behavior Influenced by Emotions
100% (1)
The Effect of Store Environment On Impulse Buying Behavior Influenced by Emotions
19 pages
Multiple Regression: Problem Set 7
No ratings yet
Multiple Regression: Problem Set 7
3 pages
Numaamati, 07
No ratings yet
Numaamati, 07
13 pages
Econ 222 W2012 Assignment 3 Answers Posted
No ratings yet
Econ 222 W2012 Assignment 3 Answers Posted
9 pages
ANOVA test bank
No ratings yet
ANOVA test bank
121 pages
standard error
No ratings yet
standard error
14 pages
Influence of Whistleblowing Systems, Effectiveness of Intenal Audits and Good Government Governance On Fraud Prevention
No ratings yet
Influence of Whistleblowing Systems, Effectiveness of Intenal Audits and Good Government Governance On Fraud Prevention
11 pages

II-I_MCA_Data Science and Analytics_Course Material_Unit2

Uploaded by

II-I_MCA_Data Science and Analytics_Course Material_Unit2

Uploaded by

SVCE TIRUPATI

DATA SCIENCE & ANALYTICS

PREPARED / REVISED DATE 17-03-2022

TABLE OF CONTENTS – UNIT 2

5. Co-PO / PSO Mapping

Lecture No. Weeks Topics to be covered References

4 Other Considerations in the Regression Model T1

5 Comparison of Linear Regressions with K-Nearest Neighbors T1

6 Comparison of Linear Regressions with K-Nearest Neighbors T1

7. Activity Based Learning

1.1 LINEAR REGRESSION

where σ is the standard deviation

 The standard errors associated with β^0 and β^ , we use the

where σ2 = var (є)

 These standard errors can be used to compute confidence intervals. A

o That is, there is approximately a 95% chance that the interval

Will contain the true value of β1

1.4 HYPOTHESIS TESTING

This will have a t-Distribution with n-2 degrees of freedom, assuming β1 = 0

Where RSS = Residual Sum of Squares

Where TSS = Total Sum of Squares

 Simple Linear Regression is a useful approach for predicting a response on

2.6.1 Estimating Regression Coefficients

 We estimate β0, β1,…….. βp as the values that minimize the sum of

 If the linear model assumptions are correct, then

and that, provided H0 is true,

 Hence, when there is no relationship between the response and

to 1. On the other hand, if Ha is true then .

Deciding on Important Variable

 Some predictors are not quantitative but are qualitative, taking a

With more than two levels, we create additional dummy

 Interaction term β3 x1 x2 is added to remove interaction

12. Supportive Online Certification Courses

14. Contents Beyond the Syllabus

15. Prescribed Text Books & Reference Books

You might also like