0% found this document useful (0 votes)
7 views18 pages

Statistics Learner Notes

The document provides a comprehensive overview of statistical concepts for Grade 10 to Grade 12, focusing on measures of central tendency, dispersion, and data distribution. It includes methods for calculating mean, median, mode, range, quartiles, and standard deviation, along with practical examples and exercises. Additionally, it covers graphical representations such as box and whisker diagrams and scatterplots to analyze data trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Statistics Learner Notes

The document provides a comprehensive overview of statistical concepts for Grade 10 to Grade 12, focusing on measures of central tendency, dispersion, and data distribution. It includes methods for calculating mean, median, mode, range, quartiles, and standard deviation, along with practical examples and exercises. Additionally, it covers graphical representations such as box and whisker diagrams and scatterplots to analyze data trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2 STATISTICS

2.1 Grade 10 Revision

Measures of central tendency (ungrouped data)

Mean (𝑥𝑥 ) Median(𝑄𝑄𝟐𝟐 ) Mode

• The sum of all the values(𝑥𝑥 ) • The middle value of an arranged • The value of the data set with the
of the data set dived by the data set. highest frequency/ most common
number of values (𝑛𝑛). • Divides data in two equal sets. value.
∑ 𝑥𝑥 • The position of 𝑄𝑄2 = (𝑛𝑛 + 1).
1
• 𝑥𝑥 = 2
𝑛𝑛 Arrange data in

� = 𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 ascending order.

Measures of central tendency (grouped data)


Calculating the estimated mean. Median Mode
1. Create additional columns Midpoint of Find the position of 𝑄𝑄2 . The class interval
class interval and Frequency × Midpoint. Determine in which class with the highest
2. Calculate the Midpoint of class interval
interval the position of 𝑄𝑄2 falls. frequency.
and
Use the class midpoint as the
Frequency × Midpoint.
value of 𝑄𝑄2 .
2.1 Midpoint of class interval =
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐+𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
2

2.2 Multiply the frequency column with


the midpoint column for each class
interval.
3. Calculate the sum of the Frequency
column to find the “𝑛𝑛” value.
4. Calculate the sum of the Frequency ×
Midpoint column to find the “ ∑ 𝑥𝑥 ” value.
∑ 𝑥𝑥
5. Substitute step 3 and 4 into 𝑥𝑥 = to find
𝑛𝑛
the estimated mean.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 13


Measures of dispersion

Range Quartiles Percentiles


Largest value – smallest value Divides an arranged data set into quarters. • Divides an arranged data set
• Lower quartile (𝑄𝑄1 ) into 100 equal parts.
Interquartile range • Median (𝑄𝑄2 ) • The position of the 𝑘𝑘𝑡𝑡ℎ per-
𝐼𝐼𝐼𝐼𝐼𝐼 = 𝑄𝑄𝟑𝟑 − 𝑄𝑄𝟏𝟏 • Upper quartile(𝑄𝑄3 ) centile =
𝑘𝑘
(𝑛𝑛 + 1).
1 100
The position of 𝑄𝑄1 = (𝑛𝑛 + 1).
Semi – interquartile range 4
3
𝑄𝑄𝟑𝟑 − 𝑄𝑄𝟏𝟏 The position of 𝑄𝑄3 = (𝑛𝑛 + 1).
4
=
2

The five number summary Outliers


The five number summary consists of the following measures of dispersion:
 The minimum value of the data set A value that "lies outside"
 𝑄𝑄1 ….. 25% (is much smaller or larger than)
 𝑄𝑄2 50% median most of the other values in a set
 𝑄𝑄3 ……75% of data.
The maximum value of the data set

Box and whisker diagram

The Box and Whisker diagram is a graphical representation of the five


number summary.

Box
Whisker

Whisker

Min 𝑄𝑄1 𝑄𝑄2 𝑄𝑄3 Maks

Drawing tips

Diagram must always have a box and whiskers.


Clearly show five number summary on axis.

Important conclusions from box and whisker diagram.


• 25% of data lies between the minimum value and 𝑄𝑄1 .
• 25% of data lies between 𝑄𝑄1 and 𝑄𝑄2 .
• 25% of data lies between 𝑄𝑄2 and 𝑄𝑄3 .
• 25% of data lies between 𝑄𝑄3 and the maximum value.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 14


Distribution of data
Positive skew/Skewed to the Symmetrical distribution
Negative skew/Skewed to the
right left
A high volume of data around A high volume of data A high volume of data around
the lower values of the data situated around the mean. the higher values of the data
set. Data is symmetrical set.
The higher values of the data distributed around middle. The lower values of the data
set are more spread out. Mean = Median. set are more spread out.
Mean > Median. Mean < Median.
Graphical representation of distribution

Median Mean Median


Mode Mean Median Mean Mode

Positive Symmetrical Negative


skew distribution skew
k f

Positive skew Symmetrical distribution Negative skew

Positive skew Symmetrical distribution Negative skew

Important deductions
Skewness influences the mean. The more skew the data, the less the mean can be used for
central tendency. It the data is skew to the left the mean is too low. If it is skew to the right the
mean will be too high. The best measure of central tendency to use is the median, as this gives a
better idea of what is happening to the central tendency of the data.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 15


Grade 10 Revision exercise

QUESTION 1
The lengths of 20 children are measured (in centimeters) and the results is recorded. The data collected
is shown in the table below.

127 128 129 130 131 133 134 134 135 136

137 138 139 140 141 142 142 143 144 145

1.1 Write down the median length. (1)


1.2 Determine:
1.2.1 The mean length. (2)
1.2.2 The range. (1)
1.2.3 The interquartile range. (3)

1.3 Sketch a Box and Whisker diagram to represent the data. (2)
[9]
QUESTION 2
The intelligence quotient score (IQ) of a Grade 10 class is summarised in the table below.

IQ INTERVAL FREQUENCY

90 ≤ x < 100 4
100 ≤ x < 110 8
110 ≤ x < 120 7
120 ≤ x < 130 5
130 ≤ x < 140 4
140 ≤ x < 150 2

2.1 Write the modal class of the data. (1)


2.2 Determine the interval where the median lies in. (2)
2.3 Estimate the mean IQ score of this class of learners. (3)
[6]

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 16


2.2 Grade 11 Revision
Cumulative frequency graph (ogive)
The total of a frequency and all frequencies so far in a frequency distribution. It is the “running total”
of frequencies.
What must I be able to do? What is important when drawing an ogive?
• Use a cumulative frequency table to draw 1. It must have a heading.
an ogive. 2. Mark and label axes.
• Determine values by using the ogive. 3. Plot points. (Upper boundary of class; cumulative
• Use a given ogive and complete a cumula- frequency of class)
tive frequency table from it. 4. GROUND ogive at (lower boundary of first interval ; 0)
• Use the ogive to determine percentiles and When drawing the curve, you may not use a ruler. It is
quantiles. an S – shaped curve.
EXAMPLE: Frequency Table
Amount of Frequency Cumulative Coordinates
goals frequency
5 < 𝑥𝑥 ≤ 10 6 6 A (10 ; 6)
10 < 𝑥𝑥 ≤ 15 8 14 B (15 ; 14)
15 < 𝑥𝑥 ≤ 20 13 27 C (20 ; 27)
20 < 𝑥𝑥 ≤ 25 9 36 D (25 ; 36)
25 < 𝑥𝑥 ≤ 30 7 43 E (30 ; 43)
30 < 𝑥𝑥 ≤ 35 5 45 F (35 ; 45)
35 < 𝑥𝑥 ≤ 40 2 50 G (40 ; 50)

How to find quartiles and percentiles


Goals scored in 2019
Ogive
graphically.
60 1. Find the position. (on y-axis)
F G
Cumulative frequency

50 E 2. Find this position on cumulative fre-


40 D quency axis.
30 C 3. Draw a dashed line towards the ogive.
20 B 4. Draw a dashed line from the ogive to-
10 A wards the 𝑥𝑥 − axis.
0 5. Read the value off 𝑥𝑥 − axis.
0 (5 ; 0) 10 20 30 40 50
Amount of goals
Standard deviation and Variance
Standard deviation(𝜎𝜎) is a quantity expressing by how much the members of a group differ from the
mean value for the group. The bigger the value of the standard deviation the further away the data lies
from the mean(𝑥𝑥). This tells us that the mean is not the best measure of central tendency, and the
median will be a better option. The smaller the value of the standard deviation, the closer the data is
lying to the mean. Therefore, the mean is a reliable measurement of central tendency and can be used
and trusted. The variance is the standard deviation squared(𝜎𝜎 𝟐𝟐 ). Interval for data to be 𝑘𝑘 standard
deviations from the mean, (𝑥𝑥 − 𝑘𝑘𝑘𝑘; 𝑥𝑥 + 𝑘𝑘𝑘𝑘).

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 17


How to calculate standard deviation with your casio calculator.
1. Press mode and select STAT.
2. Select 1 – VAR.
3. Enter values. After entering each individual value press “=” before entering the next value.
4. After entering all the individual values press AC.
5. To find 𝜎𝜎, press shift then STAT(at 1).
6. Then press Var to find 𝜎𝜎.

Grade 11 Revision Exercise


QUESTION 1

The table below shows the weight (to the nearest kilogram) of each of the 27 participants in a weight loss
program.

56 68 69 71 71 72 82 84 85

88 89 90 92 93 94 96 97 99

102 103 127 128 134 135 137 144 156


1.1 Calculate the range of the data. (2)

1.2 Write down the mode of the data.


(1)
1.3 Determine the median of the data.
(1)
1.4 Determine interquartile range of the data.
(3)
1.5 Use the number line given in the ANSWERBOOK to draw a box-and-whisker diagram of (2)
the given data.
1.6 Determine the standard deviation of the data. B (2)

1.7 The person weighing 127 kg claims she weighs more than one standard deviation above the (3)
mean. Do you agree with this person? Use calculations to motivate your answer.
[14]
ANSWERBOOK

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 18


QUESTION 2
The table shows the weight (in Gram) that each of the 27 participants lost on the weight
loss in total over a 4 week period.

WEIGHTLOSS
IN 4 WEEKS (IN GRAM) FREQUENCY

1 000 < x ≤ 1 500 2


1 500 < x ≤ 2 000 3
2 000 < x ≤ 2 500 3
2 500 < x ≤ 3 000 4
3 000 < x ≤ 3 500 5
3 500 < x ≤ 4 000 7
4 000 < x ≤ 4 500 2
4 500 < x ≤ 5 000 1

2.1 Sketch the ogive on the grid supplied. (4)


2.2 The weight loss program guarantees a weight loss of 800g per week if the person doesn’t (2)
cheat and follow the program. Hence, determine the number of participants who had an
average weight loss of 800g and more per week over the 4 weeks.
[8]
ANSWERBOOK

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 19


2.3 Grade 12
Scatterplots
A scatterplot is a graph used to determine whether there is a relationship between paired data. A scatter
plot diagram is a powerful tool for researchers to determine if there is any association between two
variables. The data on the scatterplot could follow the following trends: linear, quadratic or
exponential.
Linear Quadratic Exponential

Line of best fit How to draw a line of best fit:


Refers to a line through a scatter plot of data points that best Try to have the same amount of data
represents the trend of the data. The line of best fit is not that points above and below the line.

accurate, but it helps us to see if there is a trend in the data set


Regression line
The regression line is basically an accurate line of best fit. It uses the least squares method to calculate
the gradient and y intercept of the regression line.
Standard form of regression line: 𝑦𝑦 = 𝐴𝐴 + 𝐵𝐵𝐵𝐵
How to determine the equation of the How to draw the regression line:
regression line with a Casio(fx-82 ZA) If your scale stars with the origin of (0;0)
calculator:
1. Press MODE and select STAT. Plot the 𝐴𝐴 value of the regression line i.e the y-
2. Select 𝐴𝐴 + 𝐵𝐵𝐵𝐵. intercept.
Determine the mean point(𝑥𝑥 ; 𝑦𝑦 ) and plot the
3. Enter the data points. mean point.
4. Column(X) type = after each data point Draw a line from point 𝐴𝐴 through the mean point.
5. Column(Y) type = after each data point How to draw a regression line if the scale of the
6. After entering all the data points press AC. graph do not start with (0; 𝟎𝟎)

7. Press SHIFT then press 1. In order to draw the regression line, substitute any
8. Press 5: Reg two x-values that lie between the minimum and
9. Press 1 then =: to find 𝐴𝐴 maximum x- values into the equation of the
10. Press SHIFT then press 1. regression line, plot the two points and then join
11. Press 5: Reg them up.
12. Press 2 then =: to find 𝐵𝐵

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 20


How to determine the mean point using a Outliers
Casio fx-82ZA calculator:
● Make sure your data is entered in your calcu- A value that "lies outside" (is much smaller or
lator. larger than) most of the other values in a set of
● Press SHIFT then Press 1. data. Do not add an outlier to the dataset when
● Press 4: Var drawing a line of best fit or calculating the
● Press 2 then =: 𝑥𝑥 regression line. The reason for this is that the
● Press SHIFT then press 1. outlier is not part of the trend and will influence
● Press 4: Var Press 5 then = : 𝑦𝑦 the trend lines and hence any future predictions
from these trend lines.
Correlation
The correlation coefficient is a statistical measure of the strength of the relationship between two
variables. Correlation coefficient is denoted by (𝑟𝑟) and is between -1 and 1. The closer the data
points are to the regression line the stronger the relationship. That means the closer 𝑟𝑟 is to 1 or -1.
The further the data points are away from the regression line, the weaker the relationship, and the
closer 𝑟𝑟 is to 0. If the gradient of the regression line is positive, then the data set has a positive
correlation and if the gradient of the regression line is negative, then the data has a negative
correlation. If the correlation coefficient is greater than 0,9 we say there is a very strong positive
correlation. If the correlation coefficient is smaller than -0,9 we say there is a very strong negative
correlation. The strength of the association is determined by the correlation coefficient (𝑟𝑟).

Perfect positive linear association. Very Strong positive linear association.

Perfect negative association. Very Strong negative association.


How to calculate the correlation coefficient
using your calculator:
● Make sure your data is entered correctly in
your calculator. See page 2.
● Press shift then press 1.
● Press the number next to “Reg”
● Press the number next to 𝑟𝑟 then press =.
No correlation.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 21


Examination Questions

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 22


2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 23
QUESTION 3
The table below shows the monthly income (in rands) of 6 different people and the amount (inrands)
that each person spends on the monthly repayment of a motor vehicle.

MONTHLY INCOME
9 000 13 500 15 000 16 500 17 000 20 000
(IN RANDS)

MONTHLY
2 000 3 000 3 500 5 200 5 500 6 000
REPAYMENT
(IN RANDS)

3.1 Determine the equation of the least squares regression line for the data. (3)

3.2 If a person earns R14 000 per month, predict the monthly repayment that the (2)
person
could make towards a motor vehicle.
3.3 Determine the correlation coefficient between the monthly income and the monthly (1)
repayment of a motor vehicle.

3.4 A person who earns R18 000 per month has to decide whether to spend R9 000 as a monthly
repayment of a motor vehicle, or not. If the above information is a true representation of the
population data, which of the following would the person most likely decide on:
A. Spend R9 000 per month because there is a very strong positive
correlationbetween the amount earned and the monthly repayment.
B. NOT to spend R9 000 per month because there is a very weak positive
correlation between the amount earned and the monthly repayment.
C. Spend R9 000 per month because the point (18 000 ; 9 000) lies very near to
theleast squares regression line.
D. NOT to spend R9 000 per month because the point (18 000 ; 9 000) lies very (2)
far from the least squares regression line.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 24


QUESTION 4
A survey was conducted among 100 people about the amount that they paid on a monthly
basis for their cellphone contracts. The person carrying out the survey calculated the
estimated mean to be R309 per month. Unfortunately, he lost some of the data thereafter.
The partial results of the survey are shown in the frequency table below:

AMOUNT PAID (IN RANDS) FREQUENCY

0 < x ≤ 100 7

100 < x ≤ 200 12

200 < x ≤ 300 a

300 < x ≤ 400 35

400 < x ≤ 500 b

500 < x ≤ 600 6

4.1 How many people paid R200 or less on their monthly cellphone contracts? (1)

4.2 Use the information above to show that a = 24 and b = 16. (5)

4.3 Write down the modal class for the data. (1)

4.4 On the grid provided in the ANSWER BOOK, draw an ogive (cumulative frequency (4)
graph) to represent the data.

4.5 Determine how many people paid more than R420 per month for their cellphone (4)
contracts.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 25


QUESTION 5
5.1 The cumulative frequency graph (ogive) drawn below shows the total number offood items
ordered from a menu over a period of 1 hour.

5.1.1 Write down the total number of food items ordered from the menu dur- (1)
ing this hour
5.1.2 Write down the modal class of the data (1)
5.1.3 How long did it take to order the first 30 food items? (1)
5.1.4 How many food items were ordered in the last 15 minutes? (2)
5.1.5 Determine the 75th percentile for the data. (2)
5.1.6 Calculate the interquartile range of the data. (2)

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 26


5.2 Reggie works part-time as a waiter at a local restaurant. The amount of money(in
rands) he made in tips over a 15-day period is given below.
35 70 75 80 80

90 100 100 105 105

110 110 115 120 125

5.2.1 Calculate:
(a) The mean of the data (2)
(b) The standard deviation of the data (2)
5.2.2 Mary also works part-time as a waitress at the same restaurant. Over the same
15-day period Mary collected the same mean amount in tips as Reggie, but her
standard deviation was R14.
Using the available information, comment on the:
(a) Total amount in tips that they EACH collected over the 15-day period. (1)
(b) Variation that EACH of them received in daily tips over this period. (1)
[15]

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 27


QUESTION 6
A familiar question among professional tennis players is whether the speed of a tennis serve (in
km/h) depends on the height of a player (in metres). The heights of 21 tennis players and the
average speed of their serves were recorded during a tournament. The data is represented in the
scatter plot below. The least squares regression line is also drawn.

SCATTER PLOT
255

250

245
Average serve speed (in km/h)

240

235

230

225

220

215

210

205
1,8 1,85 1,9 1,95 2 2,05 2,1
Height of a player (in metres)

6.1 Write down the fastest average serve speed (in km/h) achieved in this tournament. (1)
6.2 Consider the following correlation coefficients:
A. r = 0,93 B. r = –0,42 C. r = 0,52
6.2.1 Which ONE of the given correlation coefficients best fits the plotted data? (1)
6.2.2 Use the scatter plot and least squares regression line to motivate your (1)
answer to QUESTION 6.2.1
6.3 What does the data suggest about the speed of a tennis serve (in km/h) and the height (1)
of a player (in metres)?
6.4 The equation of the regression line is given as 𝑦𝑦� = 27,07 + 𝑏𝑏𝑏𝑏 . Explain why, in this (1)
context, the least squares regression line CANNOT intersect the y-axis at (0 ; 27,07).

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 28


QUESTION 7
The table below shows the time (in seconds, rounded to one decimal places) taken by 12 athletes to
run the 100-metre sprint and the distance (in metres, rounded to ONE decimal places) of their best
long jump.

The scatter plot representing the data above is given below.

The equation of the least squares regression line is 𝑦𝑦� = 𝑎𝑎 + 𝑏𝑏𝑏𝑏.


7.1 Determine the values of 𝑎𝑎 and 𝑏𝑏. (3)
7.2 An athlete runs the 100 metre sprint in 11,7 seconds. Use 𝑦𝑦� = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 to predict the (2)
distance of the best long jump of this athlete.
7.3 Another athlete completes the 100 metre sprint in 12,3 seconds and the distance of his
best long jump is 7,6 metres. If this is included in the data, will the gradient of the least
squares regression line increase or decrease? Motivate your answer without any further (2)
calculations.
[7]

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 29


QUESTIONS 8
In an experiment, a group of 23 girls were presented with a page containing 30 coloured rectangles.
They were asked to name the colours of the rectangles correctly as quickly as possible. The time, in
seconds, taken by each of the girls is given in the table below.

8.1 Calculate:
8.1.1 The mean of the data (2)
8.1.2 The interquartile range of the data (3)
8.2 The standard deviation of the times taken by the girls is 5,94. How many girls took (2)
longer than ONE standard deviation from the mean to name the colours?
8.3 Draw a box and whisker diagram to represent the data on the number line provided in (3)
the ANSWER BOOK.
8.4 The five-number summary of the times taken by a group of 23 boys in naming the
colours of the rectangles correctly is (15 ; 21 ; 23 ; 5 ; 26 ; 38).
8.4.1 Which of the two groups, girls or boys, had the lower median time to correctly (1)
name the colours of the rectangle?
8.4.2 The first three learners who named the colours of all 30 rectangles correctly in (2)
the shortest time will receive a prize. How many boys will be among these [13]
three prize winners? Motivate your answer.

2024 BOT 12 Term 3 Tutor Material: MATHEMATICS 30

You might also like