0% found this document useful (0 votes)
87 views75 pages

Data Management

Here are the steps to find the mean height of the sampled mountains: 1. The heights of the sampled mountains are: 1.5, 1.8, 2.1, 1.7, 2.0 2. Add up all the heights: 1.5 + 1.8 + 2.1 + 1.7 + 2.0 = 9.1 3. Count the number of sampled mountains: There are 5 mountains 4. To calculate the mean, take the sum and divide by the total number: Mean = Sum / Total number = 9.1 / 5 = 1.82 Therefore, the mean height of the sampled mountains is 1.82 meters.

Uploaded by

Fernando Medrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views75 pages

Data Management

Here are the steps to find the mean height of the sampled mountains: 1. The heights of the sampled mountains are: 1.5, 1.8, 2.1, 1.7, 2.0 2. Add up all the heights: 1.5 + 1.8 + 2.1 + 1.7 + 2.0 = 9.1 3. Count the number of sampled mountains: There are 5 mountains 4. To calculate the mean, take the sum and divide by the total number: Mean = Sum / Total number = 9.1 / 5 = 1.82 Therefore, the mean height of the sampled mountains is 1.82 meters.

Uploaded by

Fernando Medrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Data Management

Libeeth B. Guevarra
Department of Mathematics and Natural Sciences

Data Management 1
Review: Basic

STATISTICS is an art of learning from the data. It is a branch of


knowledge which deals with collection, presentation, analysis
and interpretation of data that are subject for variability.

Statistics maybe defined as a body of methods for making wise


decisions in the face of uncertainty.
(by W.A. Wallis)

A person who understands the nature of statistics is equipped


to see beyond short-term and individual perspectives.
(by Johnson & Mowry)

Data Management 2
Review: Basic

Areas of Statistics

Descriptive Statistics pertains to the methods dealing with


the collection, organization and analysis of a set of data without
making conclusions, predictions or inferences about a larger
set.
Example
Presentation of the Trend of Mortality from COVID-19 in the
Philippines from March to May, 2020.

Data Management 3
Review: Basic

Inferential Statistics pertains to the methods dealing with


making inferences, estimation or prediction about a larger set
of data (population) using the information gathered from a
subset of this larger set (sample).
Example
With the precautionary measures set by the government,
COVID-19 cases in the country will decrease in the next two
months.

Data Management 4
Review: Basic

Basic Statistical Terms

Universe or physical population is the set of all individuals or


entities under consideration or study.

Study: The researchers would like to determine the average


age of COVID-19 patients.

Universe = COVID-19 patients

Data Management 5
Review: Basic

Variable is a characteristic or attribute of persons or objects


which assumes different values or label.
This is a thing that we measure, control or manipulate in a
research. This has the characteristic that may vary from unit to
unit.
Example
degree program, year level, body temperature, daily income

If it can only assume one value, then it is called a constant.

Data Management 6
Review: Basic

Classification of data:
Qualitative Data (categorical)
Example
Marital Status, Socio-Economic Status, Religious Sector, zip
code, and military rank

Quantitative Data (either Discrete or Continuous)


Example
number of students in a classroom, weight and height of a
respondent, and monthly income of managers

Data Management 7
Review: Basic

Statistical Population is a collection of all cases in which the


researcher is interested in a statistical study.
The numerical measures that describe it are parameter.

Sample is a portion or a subset of the population from which


the information is gathered.

The numerical measures that describe it are statistic.

Data Management 8
Review: Basic

Some of the statistical measures and symbols are presented in


the table.

Descriptive Measure Parameter Statistic


Mean µ x
Standard Deviation σ s
Variance σ2 s2
Pearson Correlation Coefficient ρ r
Number of Cases N n

Data Management 9
Review: Basic

Levels of Measurement
Nominal level characterized by data that consist of
names, labels, or categories only. The data cannot be
arranged in an ordering scheme (such as low to high).
Examples: gender, race, color, and savings account
number.
Ordinal level involves data that may be arranged in some
order, but differences between data values either cannot
be determined or are meaningless.
Examples: socioeconomic status of families, Class
Standing (A to D), and Teacher’s Evaluation (Excellent to
Poor)

Data Management 10
Review: Basic

Levels of Measurement
Interval level like the ordinal level, with the additional
property that the difference between any two data values
is meaningful. However, there is no natural zero starting
point (where none of the quantity is present).
Examples: temperature, score in an exam, and IQ.
Ratio level possesses the characteristic of interval level,
and there exist a true zero. Differences and ratios are
meaningful.
Examples: ratio scales are measures of time or space,
height, weight, width, area, age, and monthly income.

Data Management 11
Review: Basic

Methods of Data Collection


1 Observation method
2 Experimental method
3 Use of existing studies
4 Registration method
5 Survey method

Data Management 12
Review: Basic

Organizing Data
1 Textual Method
2 Tabular Method
Parts of a Statistical Table
1 Table Heading includes the table number and the
title of the table
2 Body is the main part of the table that contains the
information or figures
3 Stubs or Classes are the classification or categories
describing the data.
4 Caption is a designation or identification of the
information contained in a column.
3 Graphical Method

Data Management 13
Review: Basic

Categorical Distribution
Twenty five inductees were given a blood test to determine their
blood type.

A B B AB O
B AB B B B
O A O O O
AB AB A O B
O O O B A

Data Management 14
Review: Basic

Table 1: Blood type of the 25 inductees

Class Tally Frequency Percent


A |||| 4 16
B ||||| − ||| 8 32
O ||||| − |||| 9 36
AB |||| 4 16

More people have type O blood than any other type.

Data Management 15
Review: Basic

Graphical
Pie Chart is used to visually depict qualitative data. A circle
divided into sections according to the percentage of
frequencies in each category of the distribution

Data Management 16
Review: Basic

Bar Graph represents the data by using vertical or horizontal


bars whose heights or lenghts represent the frequencies of the
data.

Data Management 17
Review: Basic

Time Series Graph shows the data that have been collected at
different point in time.

Data Management 18
Review: Basic

Line Graph is used to show trend (increase or decrease in


quantitative data)

Data Management 19
Review: Basic

Pareto Chart is a type of chart that contains both bar and line
graph, where individual values are represented in descending
order by bars and the cumulative total is represented by the
line.

Data Management 20
Central Location

Measures of Central Tendency


Measures of Central Tendency indicates the center
of the set of data arranged in increasing or
decreasing order of magnitude.
There are three common measures of central
tendency:
Mean
Median
Mode

Data Management 21
Central Location

The mean is the most commonly used measure of central


location.
The sum of all the values of the observations divided by the
number of observations.
The sample mean which is symbolized as x̄ , used to estimate
the population mean µ.
Pn
xi
x̄ = i=1 (1)
Pnn
i=1 wi · xi
x̄ = P n (2)
i=1 wi
Pn
fi · xi
x̄ = i=1 (3)
n

Data Management 22
Central Location

The heights (in meters) of the sampled mountains in the


Philippines are provided as follows in the table below. What is
the mean height of these mountains?
(http://www.pinoymountaineer.com)
Mountain Height (meters)
Mt. Apo 2956
Mt. Dulang-Dulang 2938
Mt. Pulag 2922
Mt. Kalatungan 2860
Mt. Tabayoc 2842
P5
2956 + 2938 + · · · + 2842
i=1 xi
x̄ = = = 2903.6
n 5
The mean is 2903.6 meters.

Data Management 23
Central Location

Example
Out of 100 numbers, 20 were 5’s, 40 were 4’s, 35
were 7’s, and 5 were 3’s. What is the mean of the
data set?
Pn
i=1 fi · xi 20(5) + 40(4) + 35(7) + 5(3)
x̄ = = = 5.2
n 100

The mean is 5.2.

Data Management 24
Central Location

Median of the data set is the middle or center observation


when the data set is arranged in either increasing or
decreasing order.

x̃ = x n+1 (4)
2
x n2 + x n+2
2
x̃ = (5)
2
Example
1 Find the Median of : 9, 3, 44, 17, 15
Answer: Median is 15
2 Find the Median of : 8, 3, 44, 17, 12, 6
Answer: Median is 10

Data Management 25
Central Location

Mode of a set of data is the most frequent value that occur/s. It


is the only measure of central location helpful for qualitative
data. In some data sets, the mode does not always exist, and if
does, it may not be unique.
Example
Find the Mode of the following set of data:
1 A : 9, 3, 4, 17, 15, 3 (Answer: Mode is 3)
2 B : 9, 3, 4, 17, 15, 3, 9 (Answer: Mode is 3 and 9)
3 C: A+ , AB, A, O, B, B + , A (Answer: Mode is A)
4 D: A+ , AB, A, O, B, B + (Answer: NO mode)

Data Management 26
Central Location

Quantiles or Fractiles, are natural extension of the median


concept in that they are values which divides a set of data into
equal parts.
These are used to describe the standing or place occupied by a
data value relative to the rest of the data.
Common Quantiles
1 Quartiles Qm , divides the set of data into 4 equal parts.
2 Deciles Dm , divides the set of data into 10 equal parts.
3 Percentiles Pm , divides the set of data into 100 equal
parts.

Data Management 27
Central Location

Percentile Ranking
The pth Percentile
A value x is called the pth percentile of a data set, provided that
p% of the data value are less than x.
#of data values < x + 0.5
Percentile rank of x = · 100
total number of data values

A teacher gives a 20-point test to 10 students. The scores are


as follows: 10, 20, 3, 5, 6, 8, 18, 12, 15 and 2.
Find the percentile rank of a score 12?
6 + 0.5
Percentile rank of 12 = · 100 = 65 (65th percentile)
10

Data Management 28
Central Location

Quartile Ranking
Quartiles are values that divide a set of data into 4 equal parts,
denoted by Q1 , Q2 , Q3 , Q4 .
Example
A teacher gives a 20-point test to 10 students. The scores are
as follows: 10, 20, 3, 5, 6, 8, 18, 12, 15 and 2.
Find the quartiles of the given scores.

Q2 = Median{2, 3, 5, 6, 8, 10, 12, 15, 18, 20} = 9


Q1 = Median{2, 3, 5, 6, 8} = 5
Q3 = Median{10, 12, 15, 18, 20} = 15

Data Management 29
Dispersion

Suppose that a hospital’s cardiology unit is evaluating two


types of pacemaker batteries. Data below are the number of
hours (in thousand) each battery would last .
A: 46; 47; 46.8; 45.5; 46.7; 49.3; 45.3; 41.4
B: 47; 50; 41.3; 35.1; 40.9; 36.9; 50.8; 66
Should the cardiologist use battery A or battery B?

Things to consider:

mean of A and B
how far are the scores from each other?
how far are the scores from their mean?

Data Management 30
Dispersion

Measures of dispersion indicates the degree to which


numerical data tend to spread about the mean. It is used to
determine the extent of the scatter so that ways may be taken
to control the existing variation. It is used as a measure of
reliability of the average value.

General Classifications of Measures of Dispersion


1 Measures of Absolute Dispersion
2 Measures of Relative Dispersion

Data Management 31
Dispersion

Measures of Absolute Dispersion


The measures of absolute dispersion are expressed in the units
of the original observations.
Common Measures
Range is the difference between the highest score and the
lowest score.
Example
The IQ scores of 5 Accountancy students are 108, 112, 130,
115, and 105. Find the range.

Range = 130 − 105 = 25

Data Management 32
Dispersion

Variance is the average squared deviation of the observations


from the mean.
PN
2 fi (xi − µ)2
σ = i=1 (6)
Pn N
2 fi (xi − x̄)2
s = i=1 (7)
n−1
Standard Deviation is the positive square root of the variance.
s
PN 2
i=1 fi (xi − µ)
σ= (8)
N
s
Pn 2
i=1 fi (xi − x̄)
s= (9)
n−1

Data Management 33
Dispersion

Example
A= 5, 5, 5, 5, 5, 5, 5, 5
B = 4, 4, 4, 5, 5, 5, 5, 6, 6, 6
C = 0, 0 , 0 , 0 , 10, 10, 10 , 10
D = 5, 7, 10, 11, 11, 15, 16, 20

Compute the range, standard deviation and variance. You can


use your calculator to compute the standard deviation and
variance.

Data Management 34
Dispersion

THE STANDARD NORMAL RANDOM A normal


random variable x is standardized by expressing its value as
the number of standard deviation σ it lies to the left or right of
its mean µ. The standardized normal random variables z is
defined as
x −µ
z=
σ

Data Management 35
Dispersion

Example
The mean time to download pdf file is 12 min with a standard
deviation of 4 min. Belle’s download time is 20 min. John’s
download time is 6 min. How can you compare Belle’s
download time compare with John?

x −µ 20 − 12
zBelle = = =2
σ 4
x −µ 6 − 12
zJohn = = = −2
σ 4

Data Management 36
Frequency Distribution

The Frequency Distribution Table


Frequency Distribution refers to the tabular arrangement of
data by non-overlapping classes or categories together with
their corresponding class frequencies.
Example
Table 1. Occupational status of participants in the research

Occupation Frequency
Nuns 17
Nursery teachers 3
Television presenters 23
Students 20
Other 17

Data Management 37
Frequency Distribution

Example
Ordered Array is a listing of values from the smallest to largest
values or conversely.

Consider the following completion time (in minutes) of the 50


students doing an activity in the laboratory.

25 29 30 32 36 36 39 40 40 44
45 48 49 50 50 51 54 55 55 55
55 56 57 57 59 60 60 60 61 61
61 63 65 65 65 67 68 70 71 74
74 76 77 77 80 81 81 83 84 90

Data Management 38
Frequency Distribution

How to construct frequency distribution

1 Selecting the number of class intervals or groupings (k).


(Sturge’s rule) k = smallest integer greater than or
equal to 1 + 3.322log(n), where n is the number of data.
2 compute the class width.
3 Determine the lower and the upper limit of the intervals.
4 Determine the frequency of values falling within each
class interval.

Data Management 39
Frequency Distribution

k = 1 + 3.322log(50) = 7
class width = 90−25
7
= 10

Completion time (in minutes) of the 50 students


Class limits Class Boundaries Tally Frequency
25 - 34 24.5 - 34.5 |||| 4
35 - 44 34.5 - 44.5 |||||| 6
45 - 54 44.5 - 54.5 ||||| − || 7
55 - 64 54.5 - 64.5 15 tallies 15
65 - 74 64.5 - 74.5 ||||| − |||| 9
75 - 84 74.5 - 84.5 ||||| − ||| 8
85 - 94 84.5 - 94.5 | 1
Total 50 50

Data Management 40
Frequency Distribution

Graphical
A. Histogram
Histogram is a bar graph which the horizontal scale represents
classes of data values and the vertical scale represent
frequencies. The heights of the bars correspond to the
frequency values and the bars are drawn adjacent to each
other (without gaps)

Data Management 41
Frequency Distribution

B. Frequency Polygon
Frequency polygon uses line segments connected to points
located directly above class midpoint values.

Completion time (in minutes) of the 50 students


Class limits Class Marks Frequency
25 - 34 29.5 4
35 - 44 39.5 6
45 - 54 49.5 7
55 - 64 59.5 15
65 - 74 69. 5 9
75 - 84 79.5 8
85 - 94 89.5 1
Total 50

Data Management 42
Frequency Distribution

Data Management 43
Shape

Boxplot
A boxplot is also called a box - and - whisker plot. It is a
graphical representation of a summary of five important values;

minimum
first quartile
median
third quartile
maximum value

Data Management 44
Shape

Steps in constructing a boxplot

1 Determine the five-number summary.


2 Draw a box with the ends of the box at the first and third
quartiles.
3 Draw a vertical line inside the box at the location of the
median.
4 Draw horizontal dashed lines (called whiskers) from the
ends of the box to the minimum and maximum values in
the data set.

Data Management 45
Shape

Data Management 46
Shape

It can also be used to detect outliers. Observations outside the


outer fences are outliers. Those inside the inner fence will
never be an outlier, and those in between the inner and outer
fences are suspected outliers.

If you wish to identify outliers, you construct the fences.


IQR = Q3 − Q1
Inner Fence: Q1 - 1.5×IQR and Q3 + 1.5×IQR
Outer Fence: Q1 - 3×IQR and Q3 + 3×IQR

Data Management 47
Shape

Example
B1: Construct a boxplot for the given data set:
Number of rooms Occupied in a resort during a
10-day period

12 12 13 14 14
16 17 19 19 25

Data Management 48
Shape

Measures of Skewness
Skewness measures the deviation from the symmetry.
3(µ − median)
SK = (10)
σ
3(x̄ − median)
SK = (11)
s

Example
The scores of the students in the Prelim Exam has a median of
18 and a mean of 16. What does this indicate about the shape
of the distribution of the scores?
Mean < Median, hence SK will be negative. The distribution is
negatively skewed.

Data Management 49
Normal Distribution

The Normal Distribution

The normal (or Gaussian) distribution or curve is defined as


follows:
− 1 (x−µ)2
1 2
f (x) = √ e σ2
σ 2π
where µ > 0 and σ > 0 are arbitrary constants.
Denote normal distribution with mean µ and variance σ 2 by
N(µ, σ 2 ).

Data Management 50
Normal Distribution

Properties of a normal curve:


1 It is symmetrical about the mean.
2 The mean is equal to the median, which is also equal to
the mode.
3 The tails or ends are asymptotic relative to the horizontal
line.
4 The total area under the normal curve is equal to 1 or
100%.
5 The normal curve area may be subdivided into at least
three standard scores each to the left and to the right of
the vertical axis.

Data Management 51
Normal Distribution

In a normal distribution, approximately


1 68% of the data lie within 1 standard deviation of the
mean.
2 95% of the data lie within 2 standard deviations of the
mean.
3 99.7% of the data lie within 3 standard deviations of the
mean.

Data Management 52
Normal Distribution

Because a normal distribution is symmetric about the mean,


the area under the curve can be visualized in the given

Data Management 53
Normal Distribution

Example
ND1: A vegetable distributor knows that during the month of
August, the weights of its tomatoes are normally distributed
with a mean of 0.61 lb and a standard deviation of 0.15 lb.
1 What percent of the tomatoes weigh less than 0.76 lb?
0.5 + 0.3413 = 0.8413 = 84%
2 In a shipment of 6000 tomatoes, how many tomatoes can
be expected to weigh more than 0.31 lb?
0.5 + 0.475 = 0.975; 5,850 tomatoes
3 In a shipment of 4500 tomatoes, how many tomatoes can
be expected to weigh from 0.31 lb to 0.91 lb?
95% of 4500 = 4, 275

Data Management 54
Normal Distribution

Standard Normal Distribution

The standard normal distribution is the normal distribution


that has a mean of 0 and a standard deviation of 1.
x−µ
Let z = σ
, we obtain the standard normal distribution

1 1 2
φ(z) = √ e− 2 z

Data Management 55
Normal Distribution

All normally distributed variables can be transformed into the


standard normally distributed variable using the z - score.
x −µ
zx =
σ
x − x̄
zx =
s

Data Management 56
Normal Distribution

Areas, Percentages, and Probabilities


In the standard normal distribution, the area of the distribution
from z = a to z = b represents

1 the percentage of z-values in the interval from a to b.


2 the probability that z lies in the interval from a to b

Data Management 57
Normal Distribution

Data Management 58
Normal Distribution

Data Management 59
Normal Distribution

ND2:Find the probabilities for each, using the


standard normal distribution.
1
P(0 ≤ z ≤ 1.46) = 0.4279
2
P(-1.23 ≤ z ≤ 0)= 0.3907
3
P(z ≤ -1.17) = 0.121
4
P(0.20 ≤ z ≤ 1.56) = 0.3613
5
P(z ≥ -1.43) = 0.9236
6
P(z ≥ 0.82) =0.2061

Data Management 60
Normal Distribution

ND3:
Find a z- score such that 10 percent of the area
under the standard normal curve is above that
score.
Answer: z = 1.28
Find a z- score such that 24 percent of the area
under the standard normal curve is below that
score.
Answer: z = −0.71

Data Management 61
Normal Distribution

ND4: The diameter of steel bearing is normally distributed


with mean of 12 cm and a standard deviation of 0.9 cm.
1 What proportion of bearings will have diameters
exceeding 10.56 cm? P(z ≥ −1.6) = 0.9452
2 What is the probability that a bearing will have a diameter
between 10.29 and 14 cm? P(−1.9 ≤ z ≤ 2.22) = 0.9581
3 If there are 1000 steel bearings, how many will have a
diameter between 10.29 and 14 cm?
95.81% of 1000 = 958

Data Management 62
Correlation and Regression

Correlation and Regression


Correlation is a statistical method used to determine whether
a relationship between variables exists.
Regression is a statistical method used to describe the nature
of the relationship between variables, that is, positive or
negative, linear or nonlinear.

A scatter plot is a graph of the ordered pairs (x, y) of numbers


consisting of the independent variable x and the dependent
variable y.

Data Management 63
Correlation and Regression

A research hospital did a study on the relationship between


stress and diastolic blood pressure. The results of the study are
given in the table. The units for blood pressure values are
measured in milliliters of mercury.

Stress test score (x) Blood pressure (y)


55 70
62 85
58 72
78 85
92 96
88 90
75 82
80 85

Data Management 64
Correlation and Regression

Scatter plot shows that the relationship of stress test score and
blood pressure is linear

Data Management 65
Correlation and Regression

The Correlation coefficient measures the strength and


direction of a linear relationship between two variables.
The range of the correlation coefficient is from −1 to +1, and
the closer it is to −1 or to +1 the stronger is the correlation or
relationship (negative or positive). 0 coefficient means no
relationship at all.
Formula for the Correlation Coefficient r for variables x and y .
P P P
n( xy ) − ( x)( y)
r=p P P P P
[n( x 2 ) − ( x)2 ][n( y 2 ) − ( y)2 ]

where n is the number of data pairs.

Data Management 66
Correlation and Regression

Example
Compute the correlation coefficient for the data:

Stress test score (x) Blood pressure (y)


55 70
62 85
58 72
78 85
92 96
88 90
75 82
80 85

Data Management 67
Correlation and Regression

P P P
n( xy)−( x)( y) 8(49628)−(588)(665)
r=√ =√
[n( x 2 )−( x)2 ][n( y 2 )−( y)2 ] [8(44550)−(588)2 ][8(55799)−(665)2 ]
P P P P

r = 0.9010

Data Management 68
Correlation and Regression

If the value of the correlation coefficient is significant, the next


step is to determine the equation of the regression line, which
is the data’s line of best fit.
This enables the researcher to see the trend and make
predictions on the basis of the data.
The equation of the least-squares line for the ordered pairs
(x1 , y1 ), (x2 , y2 ), . . . (xn , yn ) is the line

y − ȳ = m(x − x̄)

Data Management 69
Correlation and Regression

y − ȳ = m(x − x̄)
where:
x̄ = mean of variable x
ȳ = mean of variable y
m =slope of the line
P
xy − nx̄ ȳ
m=P 2
x − n(x̄)2

Data Management 70
Correlation and Regression

Example
Find the equation of the regression line for the data. (This data
shows linear relationship as seen in the scatter plot)

Stress test score (x) Blood pressure (y)


55 70
62 85
58 72
78 85
92 96
88 90
75 82
80 85

Data Management 71
Correlation and Regression

y − ȳ P
= m(x − x̄) and
xy − nx̄ ȳ 49628 − 8(73.5)(83.125)
m=P 2 2
= = 0.56344
x − n(x̄) 44550 − 8(73.5)2
y − 83.125 = 0.56344(x − 73.5)
y = 0.56344x + 41.71 or we may have
y = 0.56x + 41.71

Data Management 72
Correlation and Regression

Another formula for the Regression line y = a + bx.

( y)( x 2 ) − ( x)( xy)


P P P P
a= P P
n( x 2 ) − ( x)2
P P P
n( xy ) − ( x)( y)
b= P P
n( x 2 ) − ( x)2
where a is the y intercept and b is the slope of the line.

Data Management 73
Correlation and Regression

( y)( Px 2 )−( Px)( xy)


P P P P
a= 2 2 = (665)(44550)−(588)(49628)
8(44550)−(588)2
= 41.71
P n( xP)−( Px)
n( Pxy)−( P x)( y) 8(49628)−(588)(665)
b= n( x 2 )−( x)2
= 8(44550)−(588)2 = 0.56344

The regression equation is y = 41.71 + 0.56x


To estimate the blood pressure when the stress level is 100:
y = 41.71 + 0.56(100) = 97.71 or 98.

Data Management 74
Correlation and Regression

The Coefficient of Determination is a measure of the


variation of the dependent variable that is explained by the
regression line and the independent variable. The symbol for
the coefficient of determination is r 2 . If r = 0.90, then r 2 = 0.81,
which is equivalent to 81%. This result means that 81% of the
variation in the dependent variable is accounted for by the
variations in the independent variable. The rest of the variation,
0.19, or 19 %, is unexplained.

Data Management 75

You might also like