0% found this document useful (0 votes)

31 views7 pages

Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data

This paper focuses on detecting outliers in simple linear regression models using medical data, specifically examining the relationship between age and systolic blood pressure. It discusses various methods for outlier detection, including residual analysis and a new approach based on standardized scores, and verifies the performance of these methods using real-life data. The findings indicate that outlier detection is crucial for improving data quality and ensuring accurate regression model estimations.

Uploaded by

emizeppelin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views7 pages

Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data

Uploaded by

emizeppelin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Available online at www.ijmrhs.

com
dical Rese
Me ar
of

ch
l
t i o n a l J ou r n a

&H
International Journal of Medical Research &
ISSN No: 2319-5886

ealth Scie
Health Sciences, 2017, 6(7): 50-56
na

nc
er

es
I nt
• I
JMRHS•

Detection of Outliers in Regression Model for Medical Data

Stephen Raj S1* and Senthamarai Kannan K2
Research Scholar, Department of Statistics, Manonmaniam Sundaranar University,
1

Abhishekapatti, Tirunelveli, Tamil Nadu, India

2
Professor, Department of Statistics, Manonmaniam Sundaranar University, Abhishekapatti,
Tirunelveli, Tamil Nadu, India
*Corresponding e-mail: stephenstats17@gmail.com

ABSTRACT
In regression analysis, an outlier is an observation for which the residual is large in magnitude compared to other
observations in the data set. The detection of outliers and influential points is an important step of the regression
analysis. Outlier detection methods have been used to detect and remove anomalous values from data. In this paper,
we detect the presence of outliers in simple linear regression models for medical data set. Chatterjee and Hadi
mentioned that the ordinary residuals are not appropriate for diagnostic purposes; a transformed version of them is
preferable. First, we investigate the presence of outliers based on existing procedures of residuals and standardized
residuals. Next, we have used the new approach of standardized scores for detecting outliers without the use of
predicted values. The performance of the new approach was verified with the real-life data.
Keywords: Medical data, Outlier, Residual analysis, Regression and residual analysis

INTRODUCTION
Regression analysis is a statistical technique for analysing and modelling the relationship between dependent variable
and one or more independent variables. This technique uses the mathematical equation to establish the relationship
between variables. It is a predictive modelling technique used for forecasting and to find casual effect relationship
between the variables. The applications of regression analysis were found in almost every field including physical and
chemical sciences, engineering, economics, finance, pharmacology, life and biological sciences, social sciences, and
other fields of study. In simple linear regression model, only one independent variable (x) is used to predict a single
dependent variable (y). The scatter diagram is used to diagrammatically display the relationship between independent
variable and dependent variable.
The equation of a straight line relating these two variables is given by Montgomery, et al. [1].
y β 0 + β1 x
= (1)

where β0 is the intercept and β1 is the slope.

The difference between the observed value of y and the fitted straight line is a statistical error ε. It is a random variable
that accounts for the failure of the model to fit the data exactly.
Hence the model is given by
y =β 0 + β1 x + ε (2)

which is called a ‘simple linear regression model’.

The important objective of regression analysis is to estimate the unknown parameters β0 and β1 in the regression
model. There are several techniques are available for estimating the unknown parameters, here we use the ‘method
of least squares’. In the method of least squares, we will estimate β0 and β1 so that the sum of the squares of the

50
Stephen Raj, et al. Int J Med Res Health Sci 2017, 6(7): 50-56

differences between the observations yi and the straight line is a minimum. The ordinary least squares (OLS) method
has been used to fit the model and to estimate the parameter values. There are several assumptions that must be
fulfilled for the OLS model to be valid. When the regression model does not satisfy the fundamental assumptions of
the model, predictions and estimations based on the model, may be biased [2].
The least square estimator of the intercept β0 is:

βˆ0= y − βˆ1 x (3)

The least square estimators of the slope β1 is:

 n  n 
n  ∑ yi   ∑ xi 
∑ yi xi −  i 1 =
=  i 1 
n
β̂1 = i =1
2
(4)
 n 
n  ∑ xi 
 i =1 
∑
i =1
xi
2
−
n

Then the fitted simple linear regression model is given by:

yˆ βˆ0 + βˆ1 x
= (5)

which gives a point estimate of the mean of y for a particular x.

The difference between the observed value yi and the corresponding fitted value is called residual. Mathematically the
ith residual is given by Bipin et al. [3].

ei = yi − yˆ i = yi − ( βˆ0 + βˆ1 xi ), i = 1, 2,..., n (6)

Residuals play an important role in investigating the adequacy of the fitted regression model and in detecting departures
from the underlying assumptions.
After obtaining the least squares fit, we should check for the following:
• How well does this equation fit the data?
• Is the model likely to be useful as a predictor?
• Are any of the basic assumptions (such as constant variance and uncorrelated errors) violated?

All of these issues must be investigated before the model is finally adopted for use. Outliers/bad values can seriously
disturb the least-squares fit. An observation falls far away from the line implied by the rest of the data. If this point is
really an outlier, then the estimate of the intercept may be incorrect. On the other hand, the data point may not be a
bad value and may be a highly useful piece of evidence concerning the process under investigation.
The major assumptions of the regression analysis are as follows: [4].
i. The relationship between the response y and the regressor’s x is linear, at least approximately.
ii. The error term ε has zero mean.
iii. The error term ε has constant variance σ2.
iv. The errors are uncorrelated.
v. The errors are normally distributed.
Assumptions (iv) and (v) implies that the errors are independent random variables. Assumption (v) is required for
hypothesis testing and interval estimation.
The appropriateness of the model is studied and the quality of the fit is ascertained by model adequacy checking t or
F-statistics or R2.

51
Stephen Raj, et al. Int J Med Res Health Sci 2017, 6(7): 50-56

Outliers are observations that appear inconsistent with the remainder of the data set [5]. Outliers may be mistakes,
or else accurate but unexpected observations which could shed new light on the phenomenon under study [6]. In this
study, we have concentrated on outlier detection methods on linear regression model. Specifically, we are concerned
with observations that differ from the regression plane defined by the bulk of the data. It is important to identify these
types of outliers in regression modelling because the observations, when undetected, can lead to erroneous parameter
estimates and inferences from the model [7]. Identifying outliers in the real-world database is important for improving
the quality of original data and for reducing the impact of outliers [8]. The standard outlier detection procedures are
based on residuals, which require the predicted value. Hence, we have used a new approach without using residuals.
The performance of the new approach was verified by using the real-life data set, based on medical data pertaining
to the age and systolic blood pressure (mm Hg) of 30 people of different ages, was retrieved from the web site of
Florida State University [9]. The increase in blood pressure with age is mostly associated with structural changes
in the arteries and especially with large artery stiffness, which is associated with increased cardiovascular risk [10].
On average, systolic blood pressure increases with age, while diastolic blood pressure increases to age 50 and then
decrease [2].
MATERIALS AND METHODS
Methods of outlier detection in regression
There are many methods already exists for the detection of outliers in linear regression. They may be classified into
two groups, namely graphical and analytical methods [11,12].
Outliers were detected based on the following methods:
i. Residual analyses using standardized residuals, studentized residuals, jackknife residuals and predicted residuals;
ii. Residuals plots such as the graph of predicted residuals, the Williams graph, and the Rankit Q-Q plot;
iii. Scalar measures of influence statistics such as cook’s Di (measures the change in the estimates that outcome of
deleting each observation), DFFITSi (measures the change in the predicted value of the dependent variable when the
current value is omitted from the calculations), DFBETASj (i) (measures the influence on regression coefficients),
Atkinson measures, and the Covariance ratio (measure of model performance).
Diagnostics based on residual analysis
1. Residuals: The residual is defined as:
yi − yˆi ,
ei = 1, 2,..., n
i= (7)

where yi is an observation (dependent variable) and is the corresponding fitted value. Since a residual may be viewed
as the deviation between the data and the fit, it is a measure of the variability in the dependent variable not explained
by the regression model.
2. Standardised residuals (Normalised): Chatterjee and Hadi discuss that the normal residuals are not appropriate
for diagnostic purposes; a transformed version of them would be better. Transformations of residuals such as
standardized residuals, studentised residuals, jackknife residuals and predicted residuals, are often preferred over raw
residuals because they overcome some of the limitations of raw residuals.
A logical scaling for the residuals is the standardised residuals and is given by Pimpan et al. [13].
ei
=di = , i 1, 2,3,..., n (8)
MS RES

where MSRES is the mean square residual. The standardized residuals have mean zero and approximately unit
variance. Consequently, large standardized residuals (di>3) potentially indicate an outlier.
3. Standardized scores: In this paper, a new approach for outlier detection was used to detect the values in linear
regression models. This method is based on the individual standardized scores of dependent variable (y) and
independent variable (x).

52
Stephen Raj, et al. Int J Med Res Health Sci 2017, 6(7): 50-56

xi − µ
=xi = , i 1, 2,..., n (9)
σ
xi − µ
=xi = , i 1, 2,..., n (10)
σ
RESULTS AND DISCUSSION
In this paper, the presence of outliers in blood pressure data based on residuals obtained from the fitted simple linear
regression model have been studied and the relationship between age and SBP are investigated. Furthermore, we
investigate the presence of outliers based on residuals and standardized residuals (Table 1).
Figures 1 and 2 shows the scatter plot, which suggests that there is a moderate statistical relationship between age
versus SBP, and the tentative assumption of the straight-line model y =β 0 + β1 x + ε appears to be reasonable.

Scatter Plot 1
250
200
150
SBP

100
50
0
0 20 40 60 80
Age

Figure 1 Scatter plot for the data set with outlier

Using MS-Excel, the following regression model is fitted to the medical data pertaining to the systolic blood pressure
was measured for 30 people of different ages (n=30).
=yˆ 98.71 + 0.97 x (11)
with R2=43% (where y is the systolic blood pressure and x is the age). The residuals (Column 5) and the standard
residuals (Column 6) have been shown in Table 1 were taken from the excel output. Table 1 displays the observed
values yi, the fitted values, residuals, standard residuals and standardized xi and yi scores.
From Table 1, we can observe that the residual e2=75.65 is very large, the standardized residual d2=4.45 exceeded
the cut-off value of >3, standardized yi score y2=3.43 exceeded the cut-off value of >3; therefore, the observations at
the data point 2 is considered as outliers. Table 2 displays the descriptive statistics and Table 3 shows that the model
fitting information and summary statistics for the dependent and independent variables. The outliers detected by the
method of residuals and standard residuals are similar to those detected by standardized score approach.
Table 1 Tabulation of residuals, standard residuals and difference method (n=30)
S. no Age (xi) SBP (yi) Predicted SBP (ŷi) Residuals (ei) Standard Residuals (di) Standardized (xi) Standardized (yi)
(1) (2) (3) (4) (5) (6) (8) (9)
1 39 144 136.58 7.42 0.44 -0.4 0.06
2 47 220 144.35 75.65 4.45 0.12 3.43
3 45 138 142.4 -4.4 -0.26 -0.01 -0.2
4 47 145 144.35 0.65 0.04 0.12 0.11
5 65 162 161.82 0.18 0.01 1.3 0.86
6 46 142 143.37 -1.37 -0.08 0.06 -0.02
7 67 170 163.76 6.24 0.37 1.43 1.22
8 42 124 139.49 -15.49 -0.91 -0.2 -0.82
9 67 158 163.76 -5.76 -0.34 1.43 0.68
10 56 154 153.08 0.92 0.05 0.71 0.51

53
Stephen Raj, et al. Int J Med Res Health Sci 2017, 6(7): 50-56

11 64 162 160.85 1.15 0.07 1.23 0.86

12 56 150 153.08 -3.08 -0.18 0.71 0.33
13 59 140 156 -16 -0.94 0.91 -0.11
14 34 110 131.72 -21.72 -1.28 -0.73 -1.44
15 42 128 139.49 -11.49 -0.68 -0.2 -0.64
16 48 130 145.32 -15.32 -0.9 0.19 -0.56
17 45 135 142.4 -7.4 -0.44 -0.01 -0.33
18 17 114 115.22 -1.22 -0.07 -1.84 -1.26
19 20 116 118.13 -2.13 -0.13 -1.64 -1.18
20 19 124 117.16 6.84 0.4 -1.71 -0.82
21 36 136 133.67 2.33 0.14 -0.6 -0.29
22 50 142 147.26 -5.26 -0.31 0.32 -0.02
23 39 120 136.58 -16.58 -0.97 -0.4 -1
24 21 120 119.1 0.9 0.05 -1.58 -1
25 44 160 141.43 18.57 1.09 -0.07 0.77
26 53 158 150.17 7.83 0.46 0.51 0.68
27 63 144 159.88 -15.88 -0.93 1.17 0.06
28 29 130 126.87 3.13 0.18 -1.05 -0.56
29 25 125 122.99 2.01 0.12 -1.32 -0.78
30 69 175 165.7 9.3 0.55 1.56 1.44

Table 2 Descriptive statistics for the data set

Variables Age SBP
Mean 45.13 142.53
Median 45.5 141
Mode 39 144
Standard Deviation 15.29 22.58
Range 52 110
Minimum 17 110
Maximum 69 220
Count 30 30

Table 3 Summary output for the data set

Regression Statistics
Multiple R 0.66
R Square 0.43
Intercept 98.71
Slope (Age) 0.97

In regression analysis, the effect of the case can be studied by deleting the particular case from the data and analysing
the rest of the population. Hence the results after deleting the 2nd observation with sample size (n=29) are shown below
(Tables 4-6).

Scatter Plot 2
200
150
SBP

100
50
0
0 20 40 60 80
Age

Figure 2 Scatter plot for the data set without outlier

54
Stephen Raj, et al. Int J Med Res Health Sci 2017, 6(7): 50-56

Table 4 Tabulation of residuals, standard residuals and difference method (n=29)

S. no Age (xi) SBP (yi) Predicted SBP (ŷi) Residuals (ei) Standard Residuals (di) Standardized (xi) Standardized (yi)
(1) (2) (3) (4) (5) (6) (8) (9)
1 39 144 134.1 9.9 1.05 -2.83 -1.99
2 45 138 139.8 -1.8 -0.19 -2.91 -2.68
3 47 145 141.7 3.3 0.35 -2.87 -2.39
4 65 162 158.78 3.22 0.34 -2.87 -2.45
5 46 142 140.75 1.25 0.13 -2.89 -2.51
6 67 170 160.68 9.32 0.99 -2.83 -2.11
7 42 124 136.95 -12.95 -1.38 -2.99 -3.31
8 67 158 160.68 -2.68 -0.29 -2.91 -2.79
9 56 154 150.24 3.76 0.4 -2.87 -2.39
10 64 162 157.83 4.17 0.44 -2.87 -2.39
11 56 150 150.24 -0.24 -0.03 -2.9 -2.62
12 59 140 153.09 -13.09 -1.39 -2.99 -3.36
13 34 110 129.35 -19.35 -2.06 -3.03 -3.65
14 42 128 136.95 -8.95 -0.95 -2.96 -3.08
15 48 130 142.64 -12.64 -1.35 -2.98 -3.31
16 45 135 139.8 -4.8 -0.51 -2.93 -2.85
17 17 114 113.22 0.78 0.08 -2.89 -2.45
18 20 116 116.06 -0.06 -0.01 -2.9 -2.51
19 19 124 115.11 8.89 0.95 -2.84 -1.99
20 36 136 131.25 4.75 0.51 -2.86 -2.28
21 50 142 144.54 -2.54 -0.27 -2.91 -2.73
22 39 120 134.1 -14.1 -1.5 -2.99 -3.36
23 21 120 117.01 2.99 0.32 -2.88 -2.33
24 44 160 138.85 21.15 2.25 -2.75 -1.36
25 53 158 147.39 10.61 1.13 -2.82 -1.99
26 63 144 156.88 -12.88 -1.37 -2.98 -3.36
27 29 130 124.61 5.39 0.57 -2.86 -2.22
28 25 125 120.81 4.19 0.45 -2.87 -2.28
29 69 175 162.58 12.42 1.32 -2.81 -1.93

Table 5 Descriptive statistics for the data set

Variables Age SBP
Mean 45.07 139.86
Median 45 140
Mode 39 144
Standard Deviation 15.56 17.5
Range 52 65
Minimum 17 110
Maximum 69 175
Count 29 29

Table 6 Summary output for the data set

Regression Statistics
Multiple R 0.84
R Square 0.71
Intercept 97.08
Slope (Age) 0.95

CONCLUSION
In this paper, the detection of outliers in simple linear regression model have been discussed. A new approach for
detecting outliers without the use of predicted values have been proposed. Which is quite useful in detecting outliers,

55
Stephen Raj, et al. Int J Med Res Health Sci 2017, 6(7): 50-56

detects the outliers as same as the residual and standardized residual method. Hence, we suggest that in simple linear
regression model, the difference method can be used for detecting outliers. Also by removing the influential point it
is found that the model adequacy has been increased (from R2=0.43 to R2=0.71).
ACKNOWLEDGEMENT
The first author expresses his gratitude to the UGC for awarding the scheme of Basic Science Research Fellowship
(BSRF) for providing financial support to carry out his work. The second author acknowledges the UGC for providing
financial support to carry out this work under scheme UGC SAP (DRS-1).
REFERENCES
[1] Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to Linear Regression Analysis.
John Wiley & Sons, NY, USA. 2015.
[2] Leroy, Annick M., and Peter J. Rousseeuw. “Robust regression and outlier detection.” Wiley Series in Probability
and Mathematical Statistics, John Wiley & Sons, NY, USA. (1987).
[3] Bipin Gogoi and Mintu Kr. Das. Usage of graphical displays to detect outlying observations in linear regression.
Indian Journal of Applied Research 5.5 (2015): 19-24.
[4] Framstad, Erik, Steinar Engen, and Nils Chr. “Regression analysis, residual analysis and missing variables in
regression models.” Oikos (1985): 319-323.
[5] Barnett, Vic, and Toby Lewis. Outliers in Statistical Data. Vol. 3. No. 1. New York: Wiley, 1994.
[6] Stefansky, Wilhelmine. “Rejecting outliers by maximum normed residual.” The Annals of Mathematical
Statistics 42.1 (1971): 35-45.
[7] Wisnowski, James W., Douglas C. Montgomery, and James R. Simpson. “A comparative analysis of multiple
outlier detection procedures in the linear regression model.” Computational Statistics & Data Analysis 36.3
(2001): 351-382.
[8] Rahman, SMA Khaleelur, M. Mohamed Sathik, and K. Senthamarai Kannan. “Multiple linear regression models
in outlier detection.” International Journal of Research in Computer Science 2.2 (2012): 23.
[9] Florida State University, Department of Scientific Computing. Available from: http://people.sc.fsu.edu/~jburkardt/
datasets/regression/x03.txt.
[10] Pinto, Elisabete. “Blood pressure and ageing.” Postgraduate Medical Journal 83.976 (2007): 109-114.
[11] Rockwood, Michael RH, and Susan E. Howlett. “Blood pressure in relation to age and frailty.” Canadian
Geriatrics Journal: CGJ 14.1 (2011): 2.
[12] Rajarathinam, A., and B. Vinoth. “Outlier detection in simple linear regression models and robust regression–A
case study on wheat production data.” Statistics 3.2(2014).
[13] Ampanthong, Pimpan, and Prachoom Suwattee. “A comparative study of outlier detection procedures in multiple
linear regressions.” Proceedings of the International Multi-Conference of Engineers and Computer Scientists.
Vol. 1. 2009.

Regression Analysis
No ratings yet
Regression Analysis
7 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
41 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Unit III
No ratings yet
Unit III
24 pages
Unit III
No ratings yet
Unit III
18 pages
Econometrics I Course Overview 2024/2025
No ratings yet
Econometrics I Course Overview 2024/2025
51 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
Intro to Simple Linear Regression
No ratings yet
Intro to Simple Linear Regression
11 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
14 pages
Linear Regression Models Guide
No ratings yet
Linear Regression Models Guide
42 pages
Understanding Regression Models Basics
No ratings yet
Understanding Regression Models Basics
15 pages
Linear Regression Analysis in R
No ratings yet
Linear Regression Analysis in R
17 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
11 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
DA unit-III
No ratings yet
DA unit-III
30 pages
Applied Linear Regression Insights
No ratings yet
Applied Linear Regression Insights
47 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Comprehensive Guide to Regression Analysis
No ratings yet
Comprehensive Guide to Regression Analysis
132 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
35 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
22 pages
Simple - Linear - Regression-Presentation - Review-Analysis - Covariance
No ratings yet
Simple - Linear - Regression-Presentation - Review-Analysis - Covariance
10 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
9 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
17 pages
Basics of Regression Analysis
No ratings yet
Basics of Regression Analysis
63 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
60 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
31 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
53 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Econometrics Theory Note
No ratings yet
Econometrics Theory Note
13 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
55 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
258 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
45 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
14 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Simple Linear Regression Assumptions
No ratings yet
Simple Linear Regression Assumptions
20 pages
Regression Analysis for Forecasting
No ratings yet
Regression Analysis for Forecasting
7 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
52 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Linear Regression Analysis Basics
No ratings yet
Linear Regression Analysis Basics
51 pages
Regression Analysis: Simple & Multiple Techniques
No ratings yet
Regression Analysis: Simple & Multiple Techniques
33 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
13 pages
Overview of Classical Linear Regression
No ratings yet
Overview of Classical Linear Regression
40 pages
1 - Linear Models
No ratings yet
1 - Linear Models
22 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Statistical Modelling and Regression Techniques
No ratings yet
Statistical Modelling and Regression Techniques
63 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
34 pages
Regression Analysis and Model Building
No ratings yet
Regression Analysis and Model Building
16 pages
Comprehensive Guide to Regression Analysis
No ratings yet
Comprehensive Guide to Regression Analysis
7 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
Chapter Two
No ratings yet
Chapter Two
44 pages
Stata Regression Techniques Overview
No ratings yet
Stata Regression Techniques Overview
134 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Introductory Econometrics Test Bank
100% (1)
Introductory Econometrics Test Bank
134 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
ANOVA and Multiple Regression Analysis
No ratings yet
ANOVA and Multiple Regression Analysis
14 pages
Statistics Exam Practice Questions
No ratings yet
Statistics Exam Practice Questions
19 pages
Cointegration & ECM Analysis Guide
No ratings yet
Cointegration & ECM Analysis Guide
16 pages
Applied Econometrics Course Guide
No ratings yet
Applied Econometrics Course Guide
41 pages
A Guide To Basic Econometric Techniques 2nd Edition by Elia Kacapyr Test Bank
No ratings yet
A Guide To Basic Econometric Techniques 2nd Edition by Elia Kacapyr Test Bank
7 pages
Corporate Governance and Firm Performance in Singapore RG
No ratings yet
Corporate Governance and Firm Performance in Singapore RG
35 pages
Macroeconomics and Methodology
No ratings yet
Macroeconomics and Methodology
21 pages
Modeling and Forecasting in Ation in Nigeria: A Time Series Regression With ARIMA Method
No ratings yet
Modeling and Forecasting in Ation in Nigeria: A Time Series Regression With ARIMA Method
13 pages
Airport Planning and Master Plans
No ratings yet
Airport Planning and Master Plans
11 pages
Nerlove, M., & Wallis, K. F. (1966) - Use of The Durbin-Watson Statistic in Inappropriate Situations. Econometrica, 34 (1), 235.
No ratings yet
Nerlove, M., & Wallis, K. F. (1966) - Use of The Durbin-Watson Statistic in Inappropriate Situations. Econometrica, 34 (1), 235.
5 pages
STATA Analysis: Employment & Loan Approval
No ratings yet
STATA Analysis: Employment & Loan Approval
3 pages
Polynomial Regression Analysis
No ratings yet
Polynomial Regression Analysis
3 pages
Probit Regression Analysis of Low Birth Weight
No ratings yet
Probit Regression Analysis of Low Birth Weight
9 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
43 pages
Econometrics Course Overview - UO Winter 2015
No ratings yet
Econometrics Course Overview - UO Winter 2015
2 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Impact of Institutions on FDI Decisions
No ratings yet
Impact of Institutions on FDI Decisions
28 pages
Intro MIEPP Summer 2022
No ratings yet
Intro MIEPP Summer 2022
55 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Multicollinearity & Heteroskedasticity Analysis
No ratings yet
Multicollinearity & Heteroskedasticity Analysis
3 pages
Getachew Belay
No ratings yet
Getachew Belay
131 pages
Contoh Multiple Linear Regression
No ratings yet
Contoh Multiple Linear Regression
147 pages
Ionosphere Analysis with R and Radar Data
No ratings yet
Ionosphere Analysis with R and Radar Data
22 pages
T Test F Test Table
No ratings yet
T Test F Test Table
337 pages

Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data

Uploaded by

Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data

Uploaded by

Available online at www.ijmrhs.

Detection of Outliers in Regression Model for Medical Data

Abhishekapatti, Tirunelveli, Tamil Nadu, India

where β0 is the intercept and β1 is the slope.

which is called a ‘simple linear regression model’.

βˆ0= y − βˆ1 x (3)

The least square estimators of the slope β1 is:

Then the fitted simple linear regression model is given by:

which gives a point estimate of the mean of y for a particular x.

ei = yi − yˆ i = yi − ( βˆ0 + βˆ1 xi ), i = 1, 2,..., n (6)

Figure 1 Scatter plot for the data set with outlier

11 64 162 160.85 1.15 0.07 1.23 0.86

Table 2 Descriptive statistics for the data set

Table 3 Summary output for the data set

Figure 2 Scatter plot for the data set without outlier

Table 4 Tabulation of residuals, standard residuals and difference method (n=29)

Table 5 Descriptive statistics for the data set

Table 6 Summary output for the data set

You might also like