0% found this document useful (0 votes)
238 views19 pages

Greenwood High School 2021 - 2022 Mathematics - Project 2: Aarav Batra Grade 9, B

The student conducted a survey of their classmates to collect data on height, weight, and family size. They collected data from 28 classmates and recorded it. They then arranged the raw data in ascending order and created frequency tables to calculate the mean, median, and mode. The mean was calculated by summing the product of each value and its frequency and dividing by the total number of values. The median was found by taking the mean of the 14th and 15th values when the data is arranged in order. Frequency tables were used to represent the data in a way that makes analyzing and comparing the values easier.

Uploaded by

Aarav Batra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
238 views19 pages

Greenwood High School 2021 - 2022 Mathematics - Project 2: Aarav Batra Grade 9, B

The student conducted a survey of their classmates to collect data on height, weight, and family size. They collected data from 28 classmates and recorded it. They then arranged the raw data in ascending order and created frequency tables to calculate the mean, median, and mode. The mean was calculated by summing the product of each value and its frequency and dividing by the total number of values. The median was found by taking the mean of the 14th and 15th values when the data is arranged in order. Frequency tables were used to represent the data in a way that makes analyzing and comparing the values easier.

Uploaded by

Aarav Batra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Greenwood High School

2021 – 2022
Mathematics – Project 2
Statistical Representation of Surveyed Data

~ Aarav Batra
Grade 9th, B.

Internal Examiner External Examiner


Acknowledgement

I sincerely wish to express my deep gratitude and heartfelt


thanks to my Maths teacher for her encouragement and all
facilities and time provided by her for this project work. I shall
also gratefully thank my principal, Mr. Aloysius d'mello to give
me an opportunity to do a survey in my class and use the
obtained data to create a project.

This project work has not only been done for grades but it has
also given me skills as I did a lot of research. It taught me the
real meaning of hard work. I have gained lot of knowledge and
have become more aware of mathematical concepts.

I would also like to thank my classmates who helped me collect


data in this survey by providing input, and my parents for
giving their view on my project.
Index

Objectives 1
Problem Description 2
Introduction 3-4
Procedure 5-11
Observation 12
Conclusion 13
Further Study 14-15
Bibliography 16
Objectives

• To collect data from a survey


• To understand the steps to perform to represent data in a
way that makes the data easy to understand
• To examine the raw data and arrange it in ascending order,
tabulate it, and make graphs / statistical charts
• To use statistical tools to find the three centers / middle
points of data namely mean, median & mode that give the
entire data an overall value
• To represent the entire data in such a way that makes it easy
to see all the data entirely instead of looking at each
observation separately and ease the method of comparing
the data.
Problem Description

It is not possible to view and analyze data one by one especially


when the amount of data is at a really large scale. Comparing
all observations with each other in a collected data is time
consuming and ineffective at the same time. Representing each
comparison too without a graph or chart makes it hard to
observe and understand.
In this project we will look through the various statistical tools
to make understanding and representing data at a mass level
much easier than usual. Using such statistical tools, we can
obtain such middle values of the entire data and we even can
input them in such charts/graphs that make comparing and
analyzing data much easier.
Introduction
In this project I will collect data from my classmates regarding their height, weight and
family size by conducting a survey and will be showing the steps to arrange, tabulate
and graphically represent this raw data.

Now let’s look at some statistical tools I will be using further in this project:
o Arrangement: Arrangement of the raw data in an ascending or descending order
makes it look more organized.
o Tabulation: Tabulation of all the data (i.e. imputing all values into a table) makes
performing tasks and calculations easier.
o Central Tendencies: In statistics, a single value can be used to represent an entire
set of data which is known as a central tendency. Mean, Median, Mode and
Range are central tendencies of data calculated by different methods. Here are
the methods to calculate all of these:

▪ Mean: The mean of a data is also known as the average of all terms. To calculate
it, we add up the values of all the terms and then divide by the number of terms.
▪ Median: The median of a distribution is the middle term of the data arranged in
an ascending order.

i. If the frequency (the number of terms) is odd, then the central value is the
median. To calculate it we divide the successor or the frequency by two
and the term at that place is the median.
ii. If the frequency is an even number than we just divide it by 2 and find the
mean of both numbers at that place when counted from both ends.
Example:
1,2,3,4,5,6,7,8,9,10
f = 10
10/2=5
Median = mean of 5th term counted from both ends
= 5+6/2
=11/2
=5.5
Therefore, median is 5.5
▪ Mode: Mode of the data is the observation (value) that has the highest frequency
(occurs the highest number of times). A rough value for mode can be calculated
as the difference between thrice the median and twice the mean. This is known
as the empirical formula but however it gives an approximate value of the mode
and not the mode itself.
▪ Range: The range of a distribution with a discrete random variable is the
difference between the maximum value and the minimum value.

o Class distribution: Creating a class distribution or frequency table (i.e. creating a


table that shows “classes” or “intervals” of data entries with a count of the
number of entries in each class) reduces the number of entries.
o Graphical representation: representing data in graph or chart makes it easy to
look at and compare. We will use 3 types of charts – Bar graphs (to compare all
fields of data), Histograms (to compare classes of data) and Frequency polygons
(to compare classes of data).
Procedure
Now that we have understood the statistical tools that can be used in data to make it
clear and understandable, lets apply those tools to our data and see practically how
those tools help.

Raw data
Through my survey I have collected the weight, height and family size (number of
members in a family) from 28 or my classmates which is presented below in a table.
Name Weight (Kg) Height (Cm) Family Members
Aarav Batra 50 165 4
Aarnav Verma 60 175 3
Aditi Sathish 50 167 4
Aishwarya Garine 72 156 5
Aman Thimmaiah 55 181 5
Arjun Jagannathan 50 155 3
Arnav Gupta 56 163 4
Charvi M 41 157 7
Deepthi Menon 51 162 4
Dishita Bajaj 70 170 4
Harshitha Reddy 55 160 6
Jivika Dialani 46 162 3
Maayan Hazra 51 174 3
Mahi Rajne 60 165 3
Manik Bhatia 55 168 4
Mihir Halapeth 50 164 4
Palak Suri 68 164 3
Rishika Reddy 51 165 4
Saketh Shuntipadi 45 164 3
Shlok Rajiv 69 170 6
Shourya Sinha 65 170 8
Shruti Vijay Kumar 62 167 5
Shubhra Chatterjee 47 162 5
Sohan Jasti 51 175 4
Sohan Shanbhag 47 163 4
Sonit Saraf 51 175 3
Tanush Bhaumik 58 157 4
Trisha Shub 53 168 4
Thus, here are the observations collected:
• Height- 165, 175, 167, 156, 181, 155, 163, 157, 162, 170, 160, 162, 174, 165,
168, 164, 164, 165, 164, 170, 170, 167, 162, 175, 163, 175, 157, 168.
• Weight- 50, 60, 50, 72, 55, 50, 56, 41, 51, 70, 55, 46, 51, 60, 55, 50, 68, 51,
45, 69, 65, 62, 47, 51, 47, 51, 58, 53.
• Family Size- 4, 3, 4, 5, 5, 3, 4, 7, 4, 4, 6, 3, 3, 3, 4, 4, 3, 4, 3, 6, 8, 5, 5, 4, 4, 3,
4, 4.
Tabulation of data and calculation of central tendencies
Step 1- First let’s arrange all the raw data in ascending order.
• Height- 155, 156, 157, 157, 160, 162, 162, 162, 163, 163, 164, 164, 164, 165, 165,
165, 167, 167, 168, 168, 170, 170, 170, 174, 175, 175, 175, 181.
• Weight- 41, 45, 46, 47, 47, 50, 50, 50, 50, 51, 51, 51, 51, 51, 53, 55, 55, 55, 56, 58,
60, 60, 62, 65, 68, 69, 70, 72.
• Family Size- 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 8.
Step 2- Creating frequency tables to find central tendencies.

Xi Fi Fi Xi Cumulative Height
155 1 155 1
Mean = ∑Fi Xi / ∑Fi
156 1 156 2
=> 4644/28 = 165.86.
157 2 314 4
160 1 160 5
Median = mean of (no of
terms/2)th terms from both
162 3 486 8
sides.
163 2 326 10
=> 28/2 = mean of 14th and
164 3 492 13 15th terms.
165 3 495 16 => 165+165/2 = 165.
167 2 334 18
Mode =(162+164+165)/2=
168 2 336 20 245.5
170 3 510 23 using empirical formula,
174 1 174 24 mode= 3(165.86)-2(165)=
175 3 525 27 163.29
181 1 181 28 Range = 181-155 = 26
∑Fi =28 ∑Fi Xi =4644
Xi Fi Fi X i Cumulative Weight
41 1 41 1
45 1 45 2 Mean = ∑Fi Xi / ∑Fi
46 1 46 3 => 1539/28 = 54.96
47 2 94 5
50 4 200 9 Median = mean of (no of
51 5 255 14 terms/2)th terms from both
53 1 53 15 sides.
55 3 165 18
56 1 56 19 => 28/2 = mean of 14th and
58 1 58 20 15th terms.
60 2 120 22 => 51+53/2 = 52.
62 1 62 23
65 1 65 24 Mode = 51.
68 1 68 25 using empirical formula,
69 1 69 26 mode= 3(52)-2(54.96)= 46.07.
70 1 70 27
72 1 72 28 Range = 72-41 = 31
∑Fi =28 ∑Fi Xi =1539

Xi Fi Fi Xi Cumulative Family Size


3 8 24 8
4 12 48 20 Mean = ∑Fi Xi / ∑Fi
5 4 20 24 => 119/28 = 4.25
6 2 12 26 Median = mean of (no of
7 1 7 27 terms/2)th terms from both
8 1 8 28
sides.
∑Fi=28 ∑Fi Xi =119
=> 28/2 = mean of 14th and
15th terms.
‘ => 4+4/2 =4.
Mode = 4.
using empirical formula,
mode= 3(4)-2(4.25)= 3.5.
Range = 8-3 = 5

Class Distribution
We can also create a class distribution or frequency table and divide the data entries
into intervals and classes.
Frequency Table
Class Count Class Mark
150-155 0 152.5
155-160 4 157.5
160-165 9 162.5
165-170 7 167.5 Height
170-175 4 172.5 Mean = ∑Fi Xi / ∑Fi
175-180 3 177.5 => 4670/28 = 166.79
180-185 1 182.5
185-190 0 187.5 Median = mean of (no of terms/2)th
*We can use the class mark to find mean, median, mode & range. terms from both sides.
=> 28/2 = mean of 14th and 15th terms.
Xi Fi Fi Xi Cumulative
=> 167.5+167.5/2 =167.5.
152.5 0 0 0
157.5 4 630 4 Mode = 162.5.
162.5 9 1462.5 13 using empirical formula, mode=
167.5 7 1172.5 20 3(167.5)-2(166.79)= 168.92
172.5 4 690 24
177.5 3 532.5 27 Range = 182.5-157.5 = 25
182.5 1 182.5 28
187.5 0 0 28
∑Fi =28 ∑Fi Xi =4670

Frequency Table
Class Count Class mark Weight
30-40 0 35
40-50 5 45 Mean = ∑Fi Xi / ∑Fi
50-69 15 55 => 1539/28 = 56.79
60-70 6 65 Median = mean of (no of terms/2)th
70-80 2 75 terms from both sides.
80-90 0 85 => 28/2 = mean of 14th and 15th terms.
=> 55+55/2 =55.
Xi Fi Fi X i Cumulative
35 0 0 0 Mode = 55.
45 5 225 5 using empirical formula, mode= 3(55)-
55 15 825 20 2(56.79)= 51.42
65 6 390 26
75 2 150 28 Range = 75-45 = 30
85 0 0 28
∑Fi =28 ∑Fi Xi =1590
Frequency Table
Class Count Class mark
2-3 0 2.5
3-4 8 3.5
Family Size
4-5 12 4.5 Mean = ∑Fi Xi / ∑Fi
5-6 4 5.5 => 133/28 = 4.75
6-7 2 6.5
7-8 1 7.5 Median = mean of (no of
8-9 1 8.5 terms/2)th terms from both
9-10 0 9.5 sides.
=> 28/2 = mean of 14th and
Xi Fi Fi X i Cumulative 15th terms.
=> 4.5+4.5/2 =4.5
2.5 0 0 0
3.5 8 28 8 Mode = 4.5
4.5 12 54 20 using empirical formula,
5.5 4 22 24 mode= 3(4.5)-2(4.75)= 4.
6.5 2 13 26 Range = 8.5-3.5 = 5
7.5 1 7.5 27
8.5 1 8.5 28
9.5 0 0 28
∑Fi =28 ∑Fi Xi =133

Representation of data in graphical manner


Now let’s arrange all data in graphs that can be represented.
Bar graphs.
Height

Statistics
Lowest Observation 3
Highest Observation 8
Total Number of
Observations 28
Number of Distinct
Observations 6
Weight
Statistics

Lowest Observation 41

Highest Observation 72
Total Number of
Observations 28
Number of Distinct
Observations 17

Family Size

Statistics

Lowest Observation 3

Highest Observation 8
Total Number of
Observations 28
Number of Distinct
Observations 6
Histograms and frequency polygons.

*we add one class after the highest class and before the lowest class to make the graph

Height

Lowest Class Value 150


Highest Class Value 184
Number of Classes 7
Class Width 5

Weight

Lowest Class Value 30


Highest Class Value 99
Number of Classes 7
Class Width 10

Family Size

Lowest Class Value 2


Highest Class Value 9
Number of Classes 8
Class Width 1
Observation
From this survey I was exposed to the difficulty of
using mass data and understood the importance and
need of representing data in a neat format.
I understood how finding central tendencies of data is
helpful and how we need to distribute all data in
classes to make the amount of data lesser.
I observed that using these statistical tools makes the
data much more usable and easier to understand.
Without these tools and methods, it would have been
very hard to make any use of raw and unassembled
data, eventually forming a mass of jumbled numbers
as more data adds on.
Conclusion
Henceforth, I infer that statistics is very important in daily
life.
Statistics is a set of mathematical equations used to
examine data. It keeps us up to date with what's going on
in the globe. Statistics is essential because we live in an
information age, and most of what we know is based on
arithmetic. It entails being aware of the need of having
accurate data and understanding statics principles.
Statistical expertise aids in the collection of data, the
application of reliable analyses, and the efficient
presentation of results. Statistics is an important part of
how we make scientific discoveries, make data-driven
decisions, and make forecasts.
No government, organization, or research can probably
run efficiently without statistics.
Though statistics is not as much common in our lives as
chemistry or arithmetic, it is very important in our lives as
it makes things simpler.
Further Study
Here are some advanced statistical calculations which can be used:
1. Finding mean using assumed mean method
The assumed mean technique is used in statistics to compute the mean or arithmetic
mean of a set of data. This approach, rather than a direct method for computing
mean, is advised if the provided data is big. This approach reduces the number of
calculations required and yields tiny numerical values.
The Methodology
Step 1. Take the median of each observation (without repeating it) and take that
number as assumed mean (A). a center value can also be considered.
Step 2. Find the deviation (di ) using formula xi – A (for each observation or xi).
Step 3. Calculate product of fi and di ( fi di).
Step 4. Find ∑ fi and ∑ fi di.
Step 5. Calculate assumed mean using formula ( A + ∑ fi di /∑ fi )
Now let’s try using this method on our data
Height Weight
xi fi di fi di xi fi di fi di
41 1 -15 -15 155 1 -10 -10
45 1 -11 -11 156 1 -9 -9
46 1 -10 -10 157 2 -8 -16
47 2 -9 -18 160 1 -5 -5
50 4 -6 -24 162 3 -3 -9
51 5 -5 -25 163 2 -2 -4
53 1 -3 -3 164 3 -1 -3
55 3 -1 -3 A= 165 3 0 0
A = 56 1 0 0 167 2 2 4
58 1 2 2 168 2 3 6
60 2 4 8 170 3 5 15
62 1 6 6 174 1 9 9
65 1 9 9 175 3 10 30
68 1 12 12 181 1 16 16
69 1 13 13 ∑ fi = 28 ∑ fi di = 24
70 1 14 14
72 1 16 16
∑ fi = 28 ∑ fi di = -29
The assumed mean calculated for the data in varied heights of students in the first
scenario was 165.86, which turned out to be the exact mean determined above using
the longer technique, and the same was true in the second scenario, where the assumed
mean of variable weights of students was 54.96.
2. Finding mean using Step-Deviation Method
The step-deviation method is an even easier alternative to the assumed mean
method which can be used when all observations are equally distanced (eg.
2,5,8,11…).
The Methodology
Step 1. Take the median of each observation (without repeating it) and take that
number as assumed mean (A). a center value can also be considered.
Step 2. Find the distance between any two consecutive observations by finding their
difference and mark it as h.
Step 3. Calculate ui using the formula (xi – A)/h.
Step 4. Calculate fi ui and then calculate ∑ fi and ∑ fi ui.
Step 5. Finally calculate the mean by using the formula A + h* (∑ fi ui / ∑ fi ).
Family Size (h = 1)
xi fi di fi ui
3 8 -2 -16
4 12 -1 -12
A=5 4 0 0
6 2 1 2
7 1 2 2
8 1 3 3
∑ fi = 28 ∑ fi ui = -21

The mean using the Step-Deviation Method has come out to be 4.25 which was
exactly the same mean we found earlier.

*Please Note – In all three scenarios the mean came out to be exactly the same as when calculated
earlier using the proper method but this might not be the same always and thus the value of the
mean found using any on these methods may vary very slightly.
Bibliography
Https://www.google.co.in/
https://quillbot.com/
https://www.socscistatistics.com/
https://statisticsbyjim.com/basics/importance-statistics/
https://www.ilovepdf.com/
https://www.iloveimg.com/
https://www.photopea.com/

Internal Examiner External Examiner

You might also like