0% found this document useful (0 votes)
10 views6 pages

Chapter 5

Introduction to Statistics

Uploaded by

murad.ridwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Chapter 5

Introduction to Statistics

Uploaded by

murad.ridwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 5

Introduction to Statistics

5.1 Statistics
Statistics is the science which deals with methods of collecting, classifying,
presenting, comparing and interpreting numerical data collected to throw
light on any sphere of enquiry.

5.2 Frequency Distributions


Consider the marks (per 50)obtained by 60 students according to their ID
numbers:

38, 11, 40, 0, 26, 15, 5, 45, 7, 32, 2, 18, 42, 8, 31, 27, 4, 12, 35, 15, 0, 7, 28,
46, 9, 16, 29, 34, 10, 7, 5, 1, 17, 22, 35, 8, 36, 47, 11, 30, 19, 0, 16, 14, 14,
18, 41, 38, 2, 17, 42, 45, 48, 28, 7, 21, 8, 28, 5, 20.

The data does not give any useful information. These are called raw data or
ungrouped data. To bring out certain salient features of this data, we arrange
the data into classes or categories and determine the number of individuals
belonging to each class, called the class frequency. The resulting arrange-
ment is called a frequency distribution or frequency table. A graph for the
frequency distribution can be supplied by a histogram or by a polygon graph
connecting the middle points of of the tops in the histogram.

1
Class Notes on
5.3. MEASURES OF CENTRAL TENDENCY Applied Probability and Statistics ECEG-342

Table 5.1: Frequency Distribution


Marks Frequency
0-9 17
10 - 19 15
20 - 29 7
30 - 39 9
40 - 50 12

5.3 Measures of Central Tendency


Types of averages in common use:
1. Arithmetic average or mean
2. Median
3. Mode

1. Arithmetic Mean x̄
If x : x1 , x2 , . . . , xn then
x1 + x2 + . . . + x n 1X
x̄ = = xi
n n i

If the frequency distribution is given, i.e.,

x : x1 , x2 , . . . , xn
f : f1 , f2 , . . . , fn

then P
f 1 x1 + f 2 x2 + . . . + f n xn f i xi
x̄ = = Pi
f1 + f2 + . . . + fn i fi
Weighted Arithmetic Mean: If the individual data are not of equal
importance, we may attach to them weights w1 , w2 , . . . , wn as measure
of their importance
P
w 1 x1 + w 2 x2 + . . . + w n xn w i xi
x̄w = = Pi
w1 + w2 + . . . + wn i wi

Exercise 5.1: The mean of 200 items was 50. Later on it was discovered
that two items were misread as 92 and 8 instead of 192 and 88. Find out
the correct mean. (ans. 50.9)

Murad Ridwan, 2 of 6
Dep. of Electrical & Computer Engineering
AAiT, Addis Ababa University.
Aug 2010.
Class Notes on
5.4. MEASURES OF DISPERSION Applied Probability and Statistics ECEG-342

2. Median
It is the central value of the data when the values are arranged in
ascending or descending order of magnitude.
When the n data are arranged in ascending or descending order of
magnitude, i.e., x1 , x2 , . . . , xn

x(n+1)/2 , if n is odd;
median = xn/2 +x(n/2+1)
2
, if n is even.

3. Mode
Mode is the value which occurs most frequently. It does not always
exist. This is certainly true when all observations occur with the same
frequency.

Exercise 5.2: The nicotine contents for a random sample of 6 cigarettes


of a certain brand are found to be 2.3, 2.7, 2.5, 2.9, 3.1, and 1.9. milligrams.
Find the mean, median and mode. (ans. x = 2.57; median = 2.6; mode =
does not exist)

5.4 Measures of Dispersion


Measures of dispersion in common use are range, variance and standard
deviation.

1. Range
The range of x1 , x2 , . . . , xn is defined as x(n) − x(1) where x(n) and x(1)
are, respectively, the largest and smallest values.

2. Variance n
1X
S2 = (xi − x)2
n i=1

3. Standard Deviation: is the positive square root of variance, i.e., S.

5.5 Regression and Correlation


Often, in practice, we encounter experiments in which we observe or measure
two quantities simultaneously. In practice we may distinguish between two
kinds of experiments, as follows:

Murad Ridwan, 3 of 6
Dep. of Electrical & Computer Engineering
AAiT, Addis Ababa University.
Aug 2010.
Class Notes on
5.5. REGRESSION AND CORRELATION Applied Probability and Statistics ECEG-342

1. In correlation analysis both quantities are random variables and we


are interested in relations between them. For example, if X represents
the age of a used automobile and Y represents the retail book value of
the automobile, we should expect large values of X to correspond to
small values of Y and small values of X to correspond to large values
values of X. Correlation analalysis attempts to measure the strength of
such relationships between two variables by means of a single number
called correlation coefficient.

2. In regression analysis one of the two variables, call it x, can be


regarded as an ordinary variable, that is, it can be measured without
appreciable error. The other variable, Y , is a random variable. x is
called the independent or controlled variable, and one is interested in
the dependence of Y on x. Typical example may be the dependence of
the blood pressure Y on the age x of a person or, as we shall now say,
the regression of Y on x.

5.5.1 Linear Regression


Assume in an experiment we first select n values x1 , . . . , xn of x and
then observe Y at those values of x, so that we obtain a sample of the
form (x1 , y1 ), . . . , (xn , yn ). In regression analysis the mean µ of Y is
assumed to depend on x, i.e., µ = µ(x). The curve of µ(x) is called the
regression curve of Y on x. We consider the simplest case, when µ(x)
is a linear function, µ(x) = α + βx. Then we may want to plot the
sample values as n points in the xY -plane, fit a straight line through
them and use this line for estimating µ(x) for given values of x, so that
we know what values of Y we can expect if we choose certain certain
values of x.

A widely used mathematical model for fitting lines is the method of


least squares by Gauss. In this method, the straight line shroud be
fitted through the given points so that the sum of the squares of the
distances of those points from the straight line is minimum, where the
distance is measured in the vertical direction.

The vertical distance (in the y-direction) of a sample point (xi , yi ) from
a straight line y = a + bx is |yi − a − bxi |. Hence the sum of the squares

Murad Ridwan, 4 of 6
Dep. of Electrical & Computer Engineering
AAiT, Addis Ababa University.
Aug 2010.
Class Notes on
5.5. REGRESSION AND CORRELATION Applied Probability and Statistics ECEG-342

of these distances is
n
X
e= (yi − a − bxi )2
i=1

In the method of least squares we choose a and b such that the estima-
tion error e is minimum. A necessary condition for e to be minimum
is
∂e ∂e
= 0, and =0
∂a ∂b
Theorem 1. The least-squares line (linear regression) approximating
the set of points (x1 , y1 ), . . . , (xn , yn ) has the equation y = a+bx, where
the constants a and b are given by
n ni=1 xi yi − ( ni=1 xi )( ni=1 yi )
P P P
b =
n ni=1 x2i − ( ni=1 xi )2
P P
Pn Pn
i=1 yi − b i=1 xi
a =
n
Exercise 5.3: Verify the above theorem.

Exercise 5.4: A study was made on the amount of converted sugar in a


certain process at various temperatures. The data were coded and recorded
as follows:

Temperature, x Converted Sugar, y


1.0 8.1
1.1 7.8
1.2 8.5
1.3 9.8
1.4 9.5
1.5 8.9
1.6 8.6
1.7 10.2
1.8 9.3
1.9 9.2
2.0 10.5

(a) Plot a scatter diagram


(b) Estimate the linear regression line
(c) Estimate the mean amount of converted sugar produced when the
coded temperature is 1.75.
(ans. (b) 6.4136 + 1.8091x, (c) ŷ = 9.580.)

Murad Ridwan, 5 of 6
Dep. of Electrical & Computer Engineering
AAiT, Addis Ababa University.
Aug 2010.
Class Notes on
5.5. REGRESSION AND CORRELATION Applied Probability and Statistics ECEG-342

Exercise 5.5: The following data are the selling prices z of a certain make
and model of used cars w years old:

w years z dollars
1 6350
2 5695
2 5750
3 5395
5 4985
5 4895

(a) Plot a scatter diagram


(b) Fit a nonlinear sample regression curve of the form z = cdw .
Hint: Write

ln z = ln c + (ln d)w
= a + bw,

where a = ln c and b = ln d, and then estimate a and b by the formulas


for linear regression using the sample points (wi , ln zi )
(c) Estimate the selling price of such a car when it is 4 years old.
(ans. (b) z = 6461.392 × 0.947w (c) z = 5197)

Exercise 5.6: Construct a least-squares straight line which approximates


the data given below using

(a) x as independent variable,


(b) x as dependent variable.

x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9

Murad Ridwan, 6 of 6
Dep. of Electrical & Computer Engineering
AAiT, Addis Ababa University.
Aug 2010.

You might also like