0% found this document useful (0 votes)
30 views41 pages

L1 Representations of Data

Graham is researching the effects of a high protein diet on glucose levels in adults aged 25-35. He decides to collect blood samples from 50 females and 50 males using a sampling technique. This technique allows Graham to generalize his findings to the overall population being studied. However, it provides only a snapshot of data and may miss important subgroups. Graham also plans to estimate the mean and standard deviation of the collected data.

Uploaded by

teeformee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views41 pages

L1 Representations of Data

Graham is researching the effects of a high protein diet on glucose levels in adults aged 25-35. He decides to collect blood samples from 50 females and 50 males using a sampling technique. This technique allows Graham to generalize his findings to the overall population being studied. However, it provides only a snapshot of data and may miss important subgroups. Graham also plans to estimate the mean and standard deviation of the collected data.

Uploaded by

teeformee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Representations of Data

Date: 08/11/2023
1) Graham is researching the affects a high protein diet has on the
glucose level of adults aged 25 to 35. He decides to collect blood
samples from 50 females and 50 males.

a) State the sampling technique Graham has used.


b) Give two advantages and one disadvantage of this sampling
technique.

2) Estimate the mean and standard deviation of the data above.


Representations of Data
Date: 08/11/2023
1) Graham is researching the affects a high protein diet has on the
glucose level of adults aged 25 to 35. He decides to collect blood
samples from 50 females and 50 males.

a) State the sampling technique Graham has used.


Representations of Data
Date: 08/11/2023
1) Graham is researching the affects a high protein diet has on the
glucose level of adults aged 25 to 35. He decides to collect blood
samples from 50 females and 50 males.

ab) Give two advantages and one disadvantage of this sampling


technique.
Representations of Data
Date: 08/11/2023
2) Estimate the mean and standard deviation of the data above.
Outliers
An outlier is an extreme value
which lies outside the overall
pattern of data The interquartile
Upper range, multiplied
quartile by a constant
There are different ways to calculate
outliers, but a common definition of
an outlier is:

Greater than 𝑄 3 +𝑘 (𝑄3 −𝑄 1)


In the exam, you will
be told what value of
OR
to use
Less than 𝑄1 −𝑘(𝑄 3 − 𝑄1 )

Lower
The interquartile range,
quartile
multiplied by a constant
3A
Outliers

Less than
𝑄1 −𝑘(𝑄 3 − 𝑄1 )
Greater than
𝑄 3 +𝑘 (𝑄3 −𝑄 1)
Outliers
Example 1 Greater than 𝑄 3 +𝑘 (𝑄3 −𝑄 1)

The blood glucose level of 30 Less than 𝑄1 −𝑘(𝑄 3 − 𝑄1 )


females is recorded. The results,
in mmol/litre, are shown below:
𝑘=1.5
𝑄1 =¿
𝑄 3=¿

An outlier is an observation that


falls either above , or below .
Find any outliers.
Outliers
Example 1 Greater than 𝑄 3 +𝑘 (𝑄3 −𝑄 1)

The blood glucose level of 30 Less than 𝑄1 −𝑘(𝑄 3 − 𝑄1 )


females is recorded. The results,
in mmol/litre, are shown below:
𝑘=1.5
𝑄1 =¿𝟑 . 𝟐
𝑄 3=¿
𝑛 30
For
4
→ 4¿ 7.5
An outlier is an observation that
falls either above , or below .  So take the 8th value
Find any outliers.
𝑄1 =3.2
Outliers
Example 1 Greater than 𝑄 3 +𝑘 (𝑄3 −𝑄 1)

The blood glucose level of 30 Less than 𝑄1 −𝑘(𝑄 3 − 𝑄1 )


females is recorded. The results,
in mmol/litre, are shown below:
𝑘=1.5
𝑄1 =¿𝟑 . 𝟐
𝑄 3=¿𝟒 . 𝟎
For 3 𝑛 90
→ ¿ 22.5
An outlier is an observation that 4 4
falls either above , or below .
Find any outliers.  So take the 23rd value
𝑄 3=4.0
Outliers
Example 1 Greater than 𝑄 3 +𝑘 (𝑄3 −𝑄 1)

The blood glucose level of 30 Less than 𝑄1 −𝑘(𝑄 3 − 𝑄1 )


females is recorded. The results,
in mmol/litre, are shown below:
𝑘=1.5
𝑄1 =¿𝟑 . 𝟐
𝑄 3=¿𝟒 . 𝟎
Greater than
An outlier is an observation that
falls either above , or below .
Find any outliers. Greater than
No outliers greater than 5.2
Outliers
Example 1 Greater than 𝑄 3 +𝑘 (𝑄3 −𝑄 1)

The blood glucose level of 30 Less than 𝑄1 −𝑘(𝑄 3 − 𝑄1 )


females is recorded. The results,
in mmol/litre, are shown below:
𝑘=1.5
𝑄1 =¿𝟑 . 𝟐
𝑄 3=¿𝟒 . 𝟎
Less than
An outlier is an observation that
falls either above , or below .
Find any outliers. Less than
One outlier less than 2
Outliers
MEAN
𝑥=
∑𝑥
Example 2 𝑛
The lengths, in cm, of 12 giant Sub in values
252
African land snails are given below: 𝑥=
12
Calculate
𝑥=21 𝑐𝑚
STANDARD DEVIATION


a) Calculate the mean and
∑𝑥
( )
∑𝑥
2 2
standard deviation, given that
and . 𝜎= −
𝑛 𝑛
Sub in values


𝒙=𝟐𝟏 𝒄𝒎 𝝈=𝟑 .𝟖𝟑
( )
2
5468 252
𝜎= −
12 12
Calculate
𝜎 =3.83
Outliers
Example 2 Greater than
The lengths, in cm, of 12 giant Less than
African land snails are given
below: 𝑥=21 𝑐𝑚
𝜎 =3.83

b) An outlier is an observation
which lies standard deviations
from the mean. Identify any
outliers for this data.
Outliers
Example 2 Greater than
The lengths, in cm, of 12 giant Less than
African land snails are given
below: 𝑥=21 𝑐𝑚
𝜎 =3.83

Greater than
b) An outlier is an observation
which lies standard deviations Greater than
from the mean. Identify any
outliers for this data. One outlier greater
than 28.66
Outliers
Example 2 Greater than
The lengths, in cm, of 12 giant Less than
African land snails are given
below: 𝑥=21 𝑐𝑚
𝜎 =3.83

Less than
b) An outlier is an observation
which lies standard deviations Less than
from the mean. Identify any
outliers for this data. No outliers less than
13.34
Outliers

The process of finding and removing


outliers/anomalies is known as ‘cleaning the data’

Sometimes anomalies are legitimate data, or they


could be the result of an experimental error
Page 43
Exercise 3A
Box Plots
Box plots can be drawn
summarizing the quartiles,
lowest/greatest included
values, as well as any outliers

They are especially useful


for comparing data sets

3B
Box Plots
Smallest Lower Upper Largest
value Quartile Median Quartile value

Outlier
25% 25% 25% 25%

10 20 30 40 50 60 70 80

 Any outliers are plotted as crosses outside the main plot


 Each ‘section’ contains 25% of the observations in the sample
Box Plots
In section A, we saw the
following data for blood
glucose levels in 30 females:

1.7 2.2 2.3 2.3 2.5 2.7


3.1 3.2 3.6 3.7 3.7 3.7 0 1 2 3 4 5 6
3.8 3.8 3.8 3.8 3.9 3.9 Blood glucose level (mmol/litre)

3.9 4.0 4.0 4.0 4.0 4.4  Plot the smallest and largest values that
4.5 4.6 4.7 4.8 5.0 5.1 are not outliers (2.2 and 5.1)

From that, we calculated  Plot the 3 quartiles, and join to make the
the𝑄following: box shape
𝑄1 =3.2 2 =3.8 𝑄 3 =4.0

Outliers are less than 2  Mark on any outliers…


or greater than 5.2

Represent this on a box


plot…
3B
Box Plots
Box plots can be used to represent
key features of a data set Males

The blood glucose levels of 30 males Females


is recorded. The results, in
mmol/litre, are summarized below: 0 1 2 3 4 5 6
Blood glucose level (mmol/litre)

 By using the formulae in the question,


Lowest value = 1.4
outliers will be below 1.95 or above 6.35
Highest value = 5.2
 So there is one outlier, which is 1.4
An outlier falls either above , or
below .  The lower and upper boundaries are plotted
using the boundaries for the outliers (since
Given that there is only 1 outlier for we do not have the actual data values)
males, plot this information on the
same diagram as the females.  Plot the rest as before…

3B
Box Plots
Box plots can be used to represent
key features of a data set Males

Compare the blood glucose levels for Females


males and females.
0 1 2 3 4 5 6
 When comparing data you must Blood glucose level (mmol/litre)
always compare a measure of
location (average) and a measure
Based on this data, females have a
of spread (a range)
lower median blood glucose level, and
a smaller interquartile range. This
 In this case it makes sense to use means the levels are more consistent,
the median and interquartile range, and at a lower level than males.
as this is the information we have
already…
 Ensure you compare using the
context!

3B
Page 45
Exercise 3B
Q4
Cumulative Frequency Cumulative
Frequency 80

You can use a cumulative Height, (m) Frequency


CF
frequency diagram to estimate
4 70
the median and quartiles for 4
data in groups 7
11
15 60
26
33
The data in the table shows the 59
heights in metres, of 80 17 76 50
giraffes. 4 80

40
a) Draw a cumulative Start by plotting the
frequency diagram for the lowest possible value at 0
data (for the frequency) 30

b) Using the cumulative Then each CF value is


frequency diagram, plotted at the upper bound 20
estimate the median and for its group
quartiles
For example, the 26 will 10
c) Estimate the 90th percentile
be plotted at 5.2
d) Draw a box plot to
represent the diagram 0
4 4.5 5 5.5 6
Height (m)
Cumulative Frequency Cumulative
Frequency 80

You can use a cumulative The data is continuous, so


frequency diagram to estimate use , and for the 70
the median and quartiles for quartiles…
data in groups
𝑸𝟑
60
𝑄1 =5.15
The data in the table shows the 𝑄 2=5.3
heights in metres, of 80 50
giraffes. 𝑄 3=5.4
𝑸𝟐
40
a) Draw a cumulative
frequency diagram for the
data 30

b) Using the cumulative 𝑸𝟏


frequency diagram, 20
estimate the median and
quartiles
10
c) Estimate the 90th percentile
d) Draw a box plot to
represent the diagram 0
4 4.5 5 5.5 6
Height (m)
Cumulative Frequency Cumulative
Frequency 80

You can use a cumulative The data is continuous, so 𝑷 𝟗𝟎


frequency diagram to estimate use for the 90th 70
the median and quartiles for percentile
data in groups
60
90( 80)
=72 𝑛𝑑
The data in the table shows the 100
heights in metres, of 80 50
giraffes. 55

40
a) Draw a cumulative
frequency diagram for the
data 30
𝑄1 =5.15
b) Using the cumulative
frequency diagram,
𝑄 2=5.3 𝑄 3=5.4 20
55
estimate the median and
quartiles
10
c) Estimate the 90th percentile
d) Draw a box plot to
represent the diagram 0
4 4.5 5 5.5 6
Height (m)
Cumulative Frequency
You can use a cumulative Height, (m) Frequency
frequency diagram to estimate 4
the median and quartiles for
data in groups 7
15
33
The data in the table shows the 17
heights in metres, of 80
4
giraffes.

a) Draw a cumulative
frequency diagram for the
data
𝑄1 =5.15
b) Using the cumulative 4 4.5 5 5.5 6
frequency diagram,
𝑄 2=5.3 𝑄 3=5.4 Height (m)
55
estimate the median and
quartiles
c) Estimate the 90th percentile
d) Draw a box plot to
represent the diagram
3C
Page 48
Exercise 3C
Q4
Histograms
A Histogram is often used to
represent grouped continuous data

In a Histogram, the area of each bar


is proportional to the frequency for
each group.

 The following formula is usually


used to find the area:
When drawing a Histogram, you can
calculate the frequency density by using
the formula:

 If then the area is equal to the


frequency (this is usually the case
at GCSE)

 At A-level it could be that the


areas are half the frequencies, or
1
/3 of the frequencies etc…
3D
Histograms
Freq.
Time, t (mins) Frequency Density
A Histogram is often used to
represent grouped continuous 55 11
data 39 7.8
68 13.6
32 3.2
A random sample of 200 students
6 0.2
was asked how long it took them to
complete their homework the
previous night. The time was Frequency 15
recorded and summarised in the Density
table to the right.
10

a) Draw a Histogram and 5


frequency polygon for this data

0
20 30 40 50 60 70 80
b) Estimate how many students Time (mins)
took between 36 and 45
minutes to complete their  There should be no gaps between bars
homework
 The widths will differ depending on the groups

3D
Histograms
Freq.
Time, t (mins) Frequency Density
A Histogram is often used to
represent grouped continuous 55 11
data 39 7.8
68 13.6
32 3.2
A random sample of 200 students
6 0.2
was asked how long it took them to
complete their homework the
previous night. The time was Frequency 15
recorded and summarised in the Density
table to the right.
10

a) Draw a Histogram and 5


frequency polygon for this data

0
20 30 40 50 60 70 80
b) Estimate how many students Time (mins)
took between 36 and 45
minutes to complete their  To draw a frequency polygon, join the tops
homework
of each bar…

3D
Histograms
Freq.
Time, t (mins) Frequency Density
A Histogram is often used to
represent grouped continuous 55 11
data 39 7.8
68 13.6
32 3.2
A random sample of 200 students
6 0.2
was asked how long it took them to
complete their homework the
previous night. The time was Frequency 15
recorded and summarised in the Density
table to the right.
10

a) Draw a Histogram and 5


frequency polygon for this data

0
20 30 36 40 45 50 60 70 80
b) Estimate how many students Time (mins)
took between 36 and 45
 Calculate the area between 36 and 45 students (2
minutes to complete their
homework rectangles)
4 ×13.6+5 × 3.2
students
3D
Histograms
A Histogram is often used to
represent grouped continuous
data

A random sample of daily mean


temperatures was taken from the
large data set for Hurn in 2015.
The temperatures were
summarised in a grouped frequency
and represented by a Histogram. Since temperature is continuous, and the
data is already in groups, a Histogram is
appropriate
a) Give a reason to support the
use of a Histogram to
represent this data
b) Write down the underlying
feature associated with each The area of each bar is proportional
of the bars in a Histogram to the frequency

3D
Histograms
2
A Histogram is often used to 𝐴𝑟𝑒𝑎=6.4 𝑐𝑚
represent grouped continuous
data
3.2 𝑐𝑚
On the Histogram, the rectangle
representing the class was 3.2cm
high and 2cm wide. The frequency 2 𝑐𝑚
for this class was 8.
You could write the area and frequency as a ratio
(or use the formula to the top-left of the page)
c) Show that each day is 2
represented by an area of 0.8 8 𝑑𝑎𝑦𝑠=6.4 𝑐𝑚
Divide by 8
2
1 𝑑𝑎𝑦 =0.8 𝑐𝑚
d) Given that the total area of the
Histogram was 48cm2, find the
total number of days in the sample.

3D
Histograms
A Histogram is often used to 1 𝑑𝑎𝑦 =0.8 𝑐𝑚
2
Multiply by 60
represent grouped continuous (you can find
data this value using
60 𝑑𝑎𝑦𝑠 ¿ 48 𝑐𝑚2 your calculator)

On the Histogram, the rectangle


representing the class was 3.2cm
high and 2cm wide. The frequency
for this class was 8.

c) Show that each day is


represented by an area of 0.8

d) Given that the total area of the


Histogram was 48cm2, find the
total number of days in the sample.

3D
Page 50
Exercise 3D
Q1 and Q2
Comparing Data
You need to be able to compare
multiple data sets by using both
measures of location and measures
of spread

 You should usually use the mean and


standard deviation, or the median
and interquartile range

 If the data set contains extreme


values, it is better to use the
latter…

3E
Comparing Data
You need to be able to compare 𝑥=
∑𝑥
multiple data sets by using both 𝑛
Sub in values – there are
measures of location and 562.0 31 days in August
measures of spread 𝑥=
31
Calculate

From the large data set, the daily 𝑥=18.1 ℃


mean temperature during August
2015 is recorded at Heathrow and
Leeming.

√ ( )
∑ 𝑥2 − ∑ 𝑥
2

𝜎=
𝑛 𝑛
For Heathrow, and . Sub in values

√ ( )
2
10301.2 562.0
𝜎= −
31 31
a) Calculate the mean and standard Calculate
deviation for Heathrow
𝜎 =1.91 ℃

3E
Comparing Data
You need to be able to compare
multiple data sets by using both The mean daily temperature in
measures of location and Leeming is lower than in
measures of spread
Heathrow, but Leeming has a
greater spread of temperatures.
From the large data set, the daily
mean temperature during August (remember you must always
2015 is recorded at Heathrow and compare a measure of location as
Leeming. well as a measure of spread)

For Heathrow:
𝒙=𝟏𝟖 . 𝟏 ℃ 𝝈=𝟏 .𝟗𝟏 ℃
For Leeming, the mean
temperature was 15.6°C with a
standard deviation of 2.01°C.
b) Compare the data for the two
locations using the information
given

3E
Page 53
Exercise 3E
Q1, Q2 and Q3
HOMEWORK

You might also like