Statistics Part 1 and 2
Statistics Part 1 and 2
Mean Example 1
The mean is the sum of all the values divided Calculate the mean for the following data:
σ𝑥
𝑥ҧ =
𝑛
σ 𝑥 indicates that we must add all values in the
data set.
• The median is the middle number in a sequence of data Determine the median for the following data set:
items.
2 9 5 12 10
• To find the median, organize each data item in ascending
order.
Example 3
• The value in the central position is the median. There are as
many values above the median as below it. Determine the median for the following data set:
The median may be a better indicator of the most typical value Consider the following data set:
if a data set has an outlier
5 12 7 36 8 9 7
Outlier (a) Determine the mean and the median.
(b) Which value is an outlier?
An outlier is an extreme value that differs greatly from other (c) Which measure of central tendency is more
values in a set of values. representative of the data set?
If there are more than two modes, then the data set is
multimodal.
The ages of thirty people using smartphones between the ages of 15 and 60 were recorded.
(a) 1 3 4 5 6 7 8 8 9 9 10
(b) 2 3 4 5 5 5 6 7 7 8 9 10 10
Measures of dispersion
The interquartile range (IQR)
Measures of dispersion help us to determine how
data is spread around the mean or median. The interquartile range is the difference between the upper and
lower quartiles.
This enables one to establish whether the data is 𝐈𝐐𝐑 = 𝐐𝟑 − 𝐐𝟏
grouped closely or scattered more widely.
It spans 50% of a data set and eliminates the influence of outliers
There are three measures of dispersion that we will because, in effect, the highest and lowest
consider: range, interquartile range and quarters are removed.
semi-interquartile range.
It is a good measure of the spread of the data either side of the
Range
median.
The range of a data set is the difference between
the maximum and minimum values in the set. Semi-interquartile range
The range simply tells us how far apart the largest This gives the range of the middle half of the data set.
and smallest values in a data set are.
The range is very sensitive to outliers. The semi interquartile range is half of the interquartile range.
The range is measure of dispersion or measure of 𝐐𝟑 − 𝐐𝟏
𝐒𝐞𝐦𝐢 − 𝐢𝐧𝐭𝐞𝐫𝐪𝐮𝐚𝐫𝐭𝐞𝐥𝐢𝐞 𝐫𝐚𝐧𝐠𝐞 =
spread as it tells you how spread out the data is. 𝟐
1 2 2 2 3 3 4 4 4 6 7 8 8 9 10 10 30
(b) Which measure of dispersion is more suitable for this data, the range or
interquartile range?
The Five Number Summary
The five-number summary is a set of descriptive statistics that
provide information about a data set.
Determine the five-number summary and draw a box-and-whisker plot for the
following data:
1 2 2 2 3 3 4 4 4 6 7 8 8 9 10 10 10
Skewness
• Skewness is the tendency for the values to be more
frequently around the high or low ends of the 𝑥-axis.
9 = 7 = 11 = 10 = 13 = 7 (=)
[AC]
3: 𝜎𝑥 [=]
So the Standard Deviation ≈ 2,141
Example 10
23 34 39 40 42 53 56 62 68 76
(e) Make conclusions about the spread of the data about the
mean by establishing how many of the data values lie within or
outside of the first standard deviation interval.
PAST EXAM
PRACTISE
QUESTIONS
2019 Eastern Cape Exemplar Paper 2 Q 1
The data below shows the energy levels, in kilocalories per 100 g,
of 10 different snack foods.
440 520 480 560 615
550 620 680 545 490
1.1 Calculate the mean energy level of these snack foods. (2)
1.2 Calculate the standard deviation. (4)
1.3 The energy levels, in kilocalories per 100 g, of 10 different
breakfast cereals had a mean of 545,7 kilocalories and a standard
deviation of 28 kilocalories. Which of the two types of food show
greater variation in energy levels? What do you conclude? (2)
2016 November Paper 2 Q 1
2022 Gauteng November Paper 2 Q 1
2018 November Paper 2 Q 1
2015 Eastern Cape Exemplar Paper 2 Q 2
2023 Gauteng November Paper 2 Q 2
STATISTICS PART 2
Histogram
A histogram is a display of statistical information that uses
rectangles to show the frequency of data items in successive In the most common form of histogram, the independent
numerical intervals of equal size. variable is plotted along the horizontal axis and the dependent
variable is plotted along the vertical axis.
Histograms visualise how many times different events
occurred. Histograms must have bars of equal widths.
For the above data set, the frequencies in each bin has been
tabulated along with the scores that contributed to the
frequency in each bin (see below)
Frequency Polygon
A frequency polygon can be used instead of a histogram for
illustrating grouped data. It is called a frequency polygon
because of its shape. One way of drawing a frequency polygon
is to:
a) Draw a histogram
b) Join the midpoints of the top of the columns of the
histogram
c) Extend the line to the midpoint of the class interval below
the lowest value and to the midpoint of the class interval above
the highest value so that the line touches the horizontal axis on
both sides.
Ogives / Cumulative Frequency curves
An ogive or cumulative frequency curve is a graph that shows
the information in a cumulative frequency table.
The last count in an ogive is always the sum of all the counts
in the data set.
As an example, the following frequency table shows
the time (in minutes) taken by learners to travel to
school.
The data below shows the energy levels, in kilocalories per 100 g,
of 10 different snack foods.
440 520 480 560 615
550 620 680 545 490
1.1 Calculate the mean energy level of these snack foods. (2)
1.2 Calculate the standard deviation. (4)
1.3 The energy levels, in kilocalories per 100 g, of 10 different
breakfast cereals had a mean of 545,7 kilocalories and a standard
deviation of 28 kilocalories. Which of the two types of food show
greater variation in energy levels? What do you conclude? (2)
2016 November Paper 2 Q 1
2022 Gauteng November Paper 2 Q 1
2018 November Paper 2 Q 1
2015 Eastern Cape Exemplar Paper 2 Q 2
2023 Gauteng November Paper 2 Q 2