0% found this document useful (0 votes)
82 views53 pages

Statistics Part 1 and 2

Stats notes and practice grade 10 and 11

Uploaded by

ledwabakarabo23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views53 pages

Statistics Part 1 and 2

Stats notes and practice grade 10 and 11

Uploaded by

ledwabakarabo23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

STATISTICS PART 1

Mean Example 1

The mean is the sum of all the values divided Calculate the mean for the following data:

by the total number of values. 1 3 3 3 3 4 5 9 9 9

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠


𝑥ҧ =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

σ𝑥
𝑥ҧ =
𝑛
σ 𝑥 indicates that we must add all values in the
data set.

𝑛 is the number of data items.


Median Example 2

• The median is the middle number in a sequence of data Determine the median for the following data set:
items.
2 9 5 12 10
• To find the median, organize each data item in ascending
order.
Example 3
• The value in the central position is the median. There are as
many values above the median as below it. Determine the median for the following data set:

• If there is an odd number of data items, the median is one 3 9 10 12 15 18


of the data items.
• If there is an even number of data items, the median is
found by adding the two middle data items and dividing it
by two.
Note: Example 4

The median may be a better indicator of the most typical value Consider the following data set:
if a data set has an outlier
5 12 7 36 8 9 7
Outlier (a) Determine the mean and the median.
(b) Which value is an outlier?
An outlier is an extreme value that differs greatly from other (c) Which measure of central tendency is more
values in a set of values. representative of the data set?

It is usually a value that is much greater or much less than all


the other values in the data set.

An outlier has an influence on the mean and the range of the


data set, but has no influence on the median or lower or upper
quartiles.

An outlier can affect the skewness of the data.


Mode Example 5

Calculate the mode for the following data:


The mode is the data item that occurs most frequently.
1 2 2 2 2 3 5 6 6 7 8 8 8 8 9 10 12

If there are two modes, then the data set is said to be


bimodal.

If there are more than two modes, then the data set is
multimodal.

All the data items in a set may be different. In this case


it has no mode.

Remember that in a frequency table, the mode is the


value, not the frequency.
Example 6

The ages of thirty people using smartphones between the ages of 15 and 60 were recorded.

(a) Calculate the mean for this data.


(b) Determine the median.
(c) Determine the mode.
Quarterlies

 Quartiles are values that divide a sample of Quartile Description


data into four equal parts.
1st quartile (𝑸𝟏 ) 25% of the data are less than or equal to
this value.
 With them you can quickly evaluate a data
set's spread and central tendency, which are 2nd quartile (𝑸𝟐 ) The median. 50% of the data are less than
important first steps in understanding your or equal to this value.
data.
3rd quartile (𝑸𝟑 ) 75% of the data are less than or equal to
this value.
 The four quarters that divide a data set into
quartiles are shown in the table alongside
Example 7

Calculate the quartiles for the following sets of data:

(a) 1 3 4 5 6 7 8 8 9 9 10

(b) 2 3 4 5 5 5 6 7 7 8 9 10 10
Measures of dispersion
The interquartile range (IQR)
Measures of dispersion help us to determine how
data is spread around the mean or median. The interquartile range is the difference between the upper and
lower quartiles.
This enables one to establish whether the data is 𝐈𝐐𝐑 = 𝐐𝟑 − 𝐐𝟏
grouped closely or scattered more widely.
It spans 50% of a data set and eliminates the influence of outliers
There are three measures of dispersion that we will because, in effect, the highest and lowest
consider: range, interquartile range and quarters are removed.
semi-interquartile range.
It is a good measure of the spread of the data either side of the
Range
median.
The range of a data set is the difference between
the maximum and minimum values in the set. Semi-interquartile range
The range simply tells us how far apart the largest This gives the range of the middle half of the data set.
and smallest values in a data set are.
The range is very sensitive to outliers. The semi interquartile range is half of the interquartile range.
The range is measure of dispersion or measure of 𝐐𝟑 − 𝐐𝟏
𝐒𝐞𝐦𝐢 − 𝐢𝐧𝐭𝐞𝐫𝐪𝐮𝐚𝐫𝐭𝐞𝐥𝐢𝐞 𝐫𝐚𝐧𝐠𝐞 =
spread as it tells you how spread out the data is. 𝟐

𝐑𝐚𝐧𝐠𝐞 = 𝐌𝐚𝐱𝐢𝐦𝐮𝐦 𝐕𝐚𝐥𝐮𝐞 − 𝐌𝐢𝐧𝐢𝐦𝐮𝐦 𝐕𝐚𝐥𝐮𝐞


Example 8

Consider the following data:

1 2 2 2 3 3 4 4 4 6 7 8 8 9 10 10 30

(a) Determine the range, interquartile range and semi-interquartile range.

(b) Which measure of dispersion is more suitable for this data, the range or
interquartile range?
The Five Number Summary
The five-number summary is a set of descriptive statistics that
provide information about a data set.

In order to write down the five number summary it is important


to put the numbers in the data set in ascending order.

Once the data set is in order, find the following information:


1) The minimum value in the data set
2) 𝑸𝟏 the lower quartile
3) 𝑸𝟐 ; the median
4) 𝑸𝟑 , the upper quartile
5) The maximum value in the data set
Box and whisker diagram
A graphical representation of the five number summary is
known as a box and whisker diagram or a box plot. To draw a
box and whisker diagram:
1. Make sure the data set is arranged in ascending order.
2. Find the five-number summary.
3. Using a given scale or a number line fit the minimum and
maximum values.
4. Draw vertical lines at 𝑄1 , 𝑄2 and 𝑄3 and then draw two
horizontal lines to make the box
5. From the middle of the box, first draw a horizontal line to
the minimum value and then draw a horizontal line to the
maximum value.
From the quartiles, horizontal lines are drawn to the minimum
and maximum values. These lines are the whiskers. The width
of the box illustrates the interquartile range.
Example 9

Determine the five-number summary and draw a box-and-whisker plot for the
following data:

1 2 2 2 3 3 4 4 4 6 7 8 8 9 10 10 10
Skewness
• Skewness is the tendency for the values to be more
frequently around the high or low ends of the 𝑥-axis.

• If the longer part of the box is to the right or greater


than the median, the data is said to be skewed right.

• If the longer part is to the left or less than the


median, the data is skewed left.

• For a normal distribution, the shape is symmetrical.


Most of the data are clustered around the centre and
the mode, median and mode have the same value. Note
 if mean − median ~ 0, then the distribution is symmetric
 if mean − median > 0, then the distribution is positively skewed
 if mean − median < 0, then the distribution negatively skewed
2013 June Gauteng Paper 2 Q 1.1.1-1.16 76 60 79 82 81 50 48 92 98 73

The times, in seconds, it took a group of Grade 11 52 80 82 76 78 91 76 59 68 84

learners to cover 400 m are as shown alongside


1.1.1. Determine the median of the times. (2)
1.1.2 Determine the lower and upper quartiles of
the data. (2)
1.1.3 Represent the above data using a box-and-
whisker diagram. (3)
1.1.4 Use the box-and-whisker diagram to describe
the distribution of the data. (1)
1.1.5 Determine the interquartile range of the
data. (2)
Standard deviation and Variance  Outliers (extremely low or extremely high numbers
 The standard deviation (𝜎) and the variance (𝜎 2 ) are in the data set) affect the standard deviation.
measures of dispersion.  This is because the standard deviation is based on
 The standard deviation measures how concentrated the the distance from the mean and the mean is also
data are around the mean. affected by outliers.
 The more concentrated the data is, the smaller the  The standard deviation (𝜎) and the variance (𝜎 2 )
standard deviation. can be calculated using a CALCULATOR
 The standard deviation is the square root of the variance.
 The standard deviation is measured in the same units as the
mean and the data, but the variance is not.
 The variance is measured in the square of the data units.
Calculating the standard deviation using a calculator
Given the following data set, use a scientific calculator to
calculate the standard deviation of:
9; 7; 11; 10; 13; 7
Using a CASIO 𝑓𝑥 − 82𝑍𝐴 𝑃𝐿𝑈𝑆 calculator, press the
following keys:
 Get the calculator to STAT mode first
MODE [2: STAT][1: 1 − VAR]

9 = 7 = 11 = 10 = 13 = 7 (=)

[AC]

SHIFT: 1 STAT [4: VAR]

3: 𝜎𝑥 [=]
So the Standard Deviation ≈ 2,141
Example 10

A chess team consisting of 10 players scored the following


points during the year:

23 34 39 40 42 53 56 62 68 76

(a) Calculate the mean rounded off to one decimal place.

(b) Calculate the variance rounded off to two decimal places.

(c) Calculate the standard deviation rounded off to one


decimal place.

(d) Determine the standard deviation intervals for the data.

(e) Make conclusions about the spread of the data about the
mean by establishing how many of the data values lie within or
outside of the first standard deviation interval.
PAST EXAM
PRACTISE
QUESTIONS
2019 Eastern Cape Exemplar Paper 2 Q 1

The data below shows the energy levels, in kilocalories per 100 g,
of 10 different snack foods.
440 520 480 560 615
550 620 680 545 490
1.1 Calculate the mean energy level of these snack foods. (2)
1.2 Calculate the standard deviation. (4)
1.3 The energy levels, in kilocalories per 100 g, of 10 different
breakfast cereals had a mean of 545,7 kilocalories and a standard
deviation of 28 kilocalories. Which of the two types of food show
greater variation in energy levels? What do you conclude? (2)
2016 November Paper 2 Q 1
2022 Gauteng November Paper 2 Q 1
2018 November Paper 2 Q 1
2015 Eastern Cape Exemplar Paper 2 Q 2
2023 Gauteng November Paper 2 Q 2
STATISTICS PART 2
Histogram
A histogram is a display of statistical information that uses
rectangles to show the frequency of data items in successive In the most common form of histogram, the independent
numerical intervals of equal size. variable is plotted along the horizontal axis and the dependent
variable is plotted along the vertical axis.
Histograms visualise how many times different events
occurred. Histograms must have bars of equal widths.

Each rectangle in a histogram represents one event and the


height of the rectangle is relative to the number of times that
the event occurred.
Bin Frequency Scores Included
To construct a histogram you first need to split the data into in Bin
20-30 2 25,22
intervals, called bins. For example if given different ages for
30-40 4 36,38,36,38
men as shown below: 40-50 4 46,45,48,46
50-60 5 55,55,52,58,55
36 25 38 46 55 68 72 55 36 38 60-70 3 68,67,61
67 45 22 48 91 46 52 61 58 55 70-80 1 72
80-90 0 -
90-100 1 91
To draw a histogram, split the age into bins, with each bin
representing a 10-year period starting at 20 years. The histogram would look like the one below:

Each bin contains the number of occurrences of scores in the


data set that are contained within that bin.

For the above data set, the frequencies in each bin has been
tabulated along with the scores that contributed to the
frequency in each bin (see below)
Frequency Polygon
A frequency polygon can be used instead of a histogram for
illustrating grouped data. It is called a frequency polygon
because of its shape. One way of drawing a frequency polygon
is to:
a) Draw a histogram
b) Join the midpoints of the top of the columns of the
histogram
c) Extend the line to the midpoint of the class interval below
the lowest value and to the midpoint of the class interval above
the highest value so that the line touches the horizontal axis on
both sides.
Ogives / Cumulative Frequency curves
An ogive or cumulative frequency curve is a graph that shows
the information in a cumulative frequency table.

The graph is useful for estimating the median and inter-quartile


range of the grouped data.

Ogives show the total number of times that a value or


anything less than that value appears in the data set.

The first count in an ogive is always zero.

The last count in an ogive is always the sum of all the counts
in the data set.
As an example, the following frequency table shows
the time (in minutes) taken by learners to travel to
school.

When completing the table and drawing an ogive to illustrate


the information above, the table will look as below:

Always remember when drawing cumulative frequency


curve from a table of grouped data, the cumulative
frequencies are plotted at the upper limit of the interval.
PAST EXAM
PRACTISE
QUESTIONS
2023 Gauteng November Paper 2 Q 1
CONTINUED………..
2022 Eastern Cape November Paper 2 Q 2
CONTINUED………..
2015 Eastern Cape Exemplar Paper 2 Q 1
CONTINUED………..
2019 Eastern Cape Exemplar Paper 2 Q 2
CONTINUED………..
2016 November Paper 2 Q 2
CONTINUED………..
2019 November Paper 2 Q 1
CONTINUED………..
MORE PAST EXAM
PRACTISE
QUESTIONS
2019 Eastern Cape Exemplar Paper 2 Q 1

The data below shows the energy levels, in kilocalories per 100 g,
of 10 different snack foods.
440 520 480 560 615
550 620 680 545 490
1.1 Calculate the mean energy level of these snack foods. (2)
1.2 Calculate the standard deviation. (4)
1.3 The energy levels, in kilocalories per 100 g, of 10 different
breakfast cereals had a mean of 545,7 kilocalories and a standard
deviation of 28 kilocalories. Which of the two types of food show
greater variation in energy levels? What do you conclude? (2)
2016 November Paper 2 Q 1
2022 Gauteng November Paper 2 Q 1
2018 November Paper 2 Q 1
2015 Eastern Cape Exemplar Paper 2 Q 2
2023 Gauteng November Paper 2 Q 2

You might also like