0% found this document useful (0 votes)

44 views

Ch3 Numerically Summarizing Data

- The document discusses various statistical measures used to summarize quantitative data, including the mean, median, mode, range, standard deviation, variance, percentiles, quartiles, and interquartile range. - It provides formulas and steps to calculate each measure and explains how to interpret the results. For example, the mean is the average value, the median splits the data in half, and the standard deviation indicates how spread out the data are around the mean. - Guidance is given on choosing the appropriate statistical measure based on the characteristics of the data, such as whether it is resistant to outliers. The five-number summary and boxplot are also introduced as visual tools to summarize a dataset.

Uploaded by

Grissel HernandezLara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Ch3 Numerically Summarizing Data

Uploaded by

Grissel HernandezLara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Business Statistics

Dr. Gómez
[email protected]

Numerically
Summarizing
data
• To compute the arithmetic mean of a set of
data, the data must be quantitative.
Measures of • The arithmetic mean of a variable is computed
by adding all the values of the variable in the
central data set and dividing by the number of
tendency: observations (population arithmetic mean ).
The population mean is a parameter.
Mean • The sample arithmetic mean, s computed
using sample data. The sample mean is a
statistic.
Formula to calculate the mean
• If are the N observations of a variable from a population, then the population
mean, , is:

• If are the n observations of a variable from a sample, then the sample mean, ,
is:
Mean as a center of gravity
• The median of a variable is the value that lies in
the middle of the data when arranged in
Measures of ascending order. We use M to represent the
median.
central • Steps to find the median:
tendency: 1. Arrange the data in ascending order.

Median 2. Determine the number of observations, n.

3. Determine the observations in the middle of
the dataset.
• If the number of observations is odd, then the
median is the data value exactly in the middle
Odd and even of the data set. That is, the median is the
observation that lies in the position.
number of • If the number of observations is even, then the
median is the mean of the two middle
observations observations in the data set. That is, the median
is the mean of the observations that lie in the
position and the position.
Mean vs
median
A numerical summary of data is said to be resistant if
extreme values (very large or small or outliers) relative
to the data do not affect its value substantially.

So in this example where the extreme value 48 alters

Resistant data the mean. We concluded that the median is resistant,
while the mean is not resistant.

If the mean and the median are close in value, and the
distribution is symmetric, we use the mean to describe
the data.
Mean or median?
Measures of central
tendency: Mode
• The mode of a variable is the most frequent
observation of the variable that occurs in
the data set.
• To compute the mode, tally the number of
observations that occur for each data value.
• The data value that occurs most often is the
mode.
• A set of data can have no mode, one mode,
or more than one mode.
• If no observation occurs more than once,
we say the data have no mode.
Bimodal distribution

• When the data set has two modes, we

call this the bimodal distribution.
• If the data set has more than two modes,
then it is called the multimodal
distribution.
• We cannot determine the value of the
mean or median of data that are
nominal.
• The only measure of central tendency
that can be determined for nominal data
is the mode.
When to use?
Measures of dispersion are
meant to describe how spread
out data are.
Measures of
dispersion In other words, they describe
how far, on average, each
observation is from the typical
data value.
Measures of
dispersion
Measures of dispersion: Range

• To compute the range, the data must be quantitative.

• The range, R, of a variable is the difference between the largest and the smallest data
value. That is,

• The range is affected by extreme values, so the range is not resistant.

Measures of dispersion:
Standard deviation

• Standard deviation is based on the

deviation about the mean.
• For a population, the deviation about the
mean for the ith observation is
• For a sample, the deviation about the mean
for the ith observation is
• The farther an observation is from the mean, the larger
the absolute value of the deviation.
• The sum of all deviations about the mean must equal
zero. This condition is always true. That is,

Understanding • and
• Because this sum is zero, we cannot use the average
the standard deviation about the mean as a measure of spread.
• We calculate the mean of the squared deviations because
deviation squaring a nonzero number always results in a positive
number. This leads to variance.
• Variance is difficult to interpret (such as dollars squared).
• We “undo” the squaring process by taking the square
root of the sum of squared deviations.
Population standard deviation

• The square root of the sum of squared deviations

about the population mean divided by the number of
observations in the population, N.
• It is the square root of the mean of the squared
deviations about the population mean.
• Represented by Greek letter sigma
Example:
population
standard
deviation

√ ∑ ( 𝑥𝑖 − 𝜇 )
2

𝜎=
𝑁
Sample standard deviation
• The sample standard deviation, s, of a variable is the square root
of the sum the squared deviations about the sample mean
divided by , where n is the sample size.

• We call n - 1 the degrees of freedom because the first n - 1

observations have freedom to be whatever value they wish, but
the nth value has no freedom.
• It must be whatever value forces the sum of the deviations
about the mean to equal zero.
Interpretation of the standard deviation

The mean measures the center of the distribution, while the standard
deviation measures the spread of the distribution.

If we are comparing two populations, then the larger the standard

deviation, the more dispersion the distribution has, provided that the
variable of interest from the two populations has the same unit of measure.
Measures of dispersion:
Variance
• The variance of a variable is the square of the standard
deviation. The population variance is and the sample
variance is .

• What if we divided by instead of to obtain the sample

variance, as one might expect?
• The sample variance would consistently underestimate
the population variance and would result in a biased
estimator.
Determine and Interpret z-Scores

• The z-score measures the number of standard deviations an observation is above or below the mean.

• If a data value is larger than the mean, the z-score is positive.

• If a data value is smaller than the mean, the z-score is negative.
• If the data value equals the mean, the z-score is zero.
Example: Comparing z – scores
• Percentiles divide a set of data that is written in ascending order into

Percentiles
100 parts; thus 99 percentiles can be determined.
• For example, P1 divides the bottom 1% of the observations from the
top 99%, P2 divides the bottom 2% of the observations from the top
98%, and so on.
Example: interpret percentiles
Quartiles
• The most common percentiles are quartiles. Quartiles divide data sets into fourths, or four equal
parts.
1. Arrange the dataset in ascending order.
2. Determine the median, M, or second quartile Q2
3. Divide the data set into halves: the observations below (to the left of) M and the observations
above M.
4. The first quartile, Q1, is the median of the bottom half of the data and the third quartile, Q3, is the
median of the top half of the data.
Example: Quartiles
• The Highway Loss Data Institute
routinely collects data on collision
coverage claims.
• Collision coverage insures against
physical damage to an insured
individual’s vehicle.
• The data represent a random sample
of 18 collision coverage claims based
on data obtained from the Highway
Loss Data Institute for 2007 models.
Find and interpret the first, second,
and third quartiles for collision
coverage claims.
• The interquartile range, IQR, is the range of the
middle 50% of the observations in a data set.
That is, the IQR is the difference between the
third and first quartiles and is found using the
formula.
Interquartile
range • The interpretation of the interquartile range is
similar to that of the range and standard
deviation.
• The more spread a set of data has, the higher
the interquartile range will be.
Which measure should I use?
Check for outliers
• Summaries of data represent an exploration, a famous
statistician named John Tukey called this material exploratory
data analysis.
• The five-number summary of a set of data consists of the
smallest data value, Q1, the median, Q 3, and the largest data
value. We organize, the five-number summary as follows:
The five
number
summary
Example:
Five-number
summary
The five number summary can be used to
create a boxplot
Boxplot

Predictive Modelling Project_Nandini
No ratings yet
Predictive Modelling Project_Nandini
31 pages
Dote 2011 L1
No ratings yet
Dote 2011 L1
35 pages
Math 10 - Q4 - Week 4 - 5 - Module 4 - Solves-Problems-Involving-Measures-Of-Position
50% (10)
Math 10 - Q4 - Week 4 - 5 - Module 4 - Solves-Problems-Involving-Measures-Of-Position
16 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
dddddd2
No ratings yet
dddddd2
5 pages
Describing Data_Numerical Measure
No ratings yet
Describing Data_Numerical Measure
33 pages
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
No ratings yet
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
34 pages
2a. Describing Variables with Numbers
No ratings yet
2a. Describing Variables with Numbers
30 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Central Tendency
No ratings yet
Central Tendency
5 pages
3.3.1 Data Summarization
No ratings yet
3.3.1 Data Summarization
56 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
UKP6053 L3 Descriptive Statsitcs
100% (1)
UKP6053 L3 Descriptive Statsitcs
92 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Freq. distribution Characteristics
No ratings yet
Freq. distribution Characteristics
13 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
No ratings yet
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
62 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Data Management
100% (1)
Data Management
51 pages
Unit 3 Summarising Data - Averages and Dispersion
No ratings yet
Unit 3 Summarising Data - Averages and Dispersion
22 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Measure of Central Tendency Dispersion A
No ratings yet
Measure of Central Tendency Dispersion A
8 pages
Introduction To Statistics PDF
No ratings yet
Introduction To Statistics PDF
32 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
Univariate Statistics
No ratings yet
Univariate Statistics
4 pages
Module 3 Descriptive Statistics Numerical Measures
No ratings yet
Module 3 Descriptive Statistics Numerical Measures
28 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Assignment
No ratings yet
Assignment
23 pages
Assignment
No ratings yet
Assignment
30 pages
Numerical Descriptive Measure, Lecture-2
No ratings yet
Numerical Descriptive Measure, Lecture-2
21 pages
Unit - 2 Biostatistics
No ratings yet
Unit - 2 Biostatistics
9 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Descriptive Statistics.pptx
No ratings yet
Descriptive Statistics.pptx
14 pages
Basic Statistics
No ratings yet
Basic Statistics
24 pages
Note Chapter 3
No ratings yet
Note Chapter 3
14 pages
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
No ratings yet
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
48 pages
Stat Handout
No ratings yet
Stat Handout
7 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Topic 3
No ratings yet
Topic 3
49 pages
Probability Theory
No ratings yet
Probability Theory
354 pages
المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
No ratings yet
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
10 pages
Lecture_04
No ratings yet
Lecture_04
88 pages
Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
statistics
No ratings yet
statistics
10 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
Measures of Central Tendency and Spread: Chapter 1, Section 2
No ratings yet
Measures of Central Tendency and Spread: Chapter 1, Section 2
36 pages
Quantitative Analysis
No ratings yet
Quantitative Analysis
27 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
EDA_W3_Obtaining-Data
No ratings yet
EDA_W3_Obtaining-Data
57 pages
Business Statistics - Session Descriptive Statistics
No ratings yet
Business Statistics - Session Descriptive Statistics
28 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Deciles
No ratings yet
Deciles
12 pages
Transforming Datas
No ratings yet
Transforming Datas
26 pages
Spatial Analyses of Homicide With Areal Data
No ratings yet
Spatial Analyses of Homicide With Areal Data
37 pages
4th Periodical Exam in Math10
No ratings yet
4th Periodical Exam in Math10
5 pages
Add Maths Sba Example 1
No ratings yet
Add Maths Sba Example 1
41 pages
MATH 10 Q4 WK1 LAS1
No ratings yet
MATH 10 Q4 WK1 LAS1
1 page
QUARTILES PPT.
No ratings yet
QUARTILES PPT.
25 pages
MS Excel: Let's Advance to the Next Level 2nd Edition Anurag Singal - Own the complete ebook set now in PDF and DOCX formats
100% (1)
MS Excel: Let's Advance to the Next Level 2nd Edition Anurag Singal - Own the complete ebook set now in PDF and DOCX formats
67 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
math-10-4th-quarter-examination
No ratings yet
math-10-4th-quarter-examination
4 pages
Chapter 3 Numerical Descriptive Measures
No ratings yet
Chapter 3 Numerical Descriptive Measures
65 pages
Machine Learning(BCSL606) Lab Manual (2) (1)
No ratings yet
Machine Learning(BCSL606) Lab Manual (2) (1)
117 pages
1st Term Final Exam
No ratings yet
1st Term Final Exam
24 pages
Handnote Chapter 4 Measures of Dispersio
No ratings yet
Handnote Chapter 4 Measures of Dispersio
45 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
Chapter 3 - Representing and Summary of Data Test
No ratings yet
Chapter 3 - Representing and Summary of Data Test
6 pages
Statistical Analysis With Software Application - Week2
No ratings yet
Statistical Analysis With Software Application - Week2
76 pages
2nd Summative Assessment Q4
No ratings yet
2nd Summative Assessment Q4
10 pages
Statistics Notes 2019 Certificate
No ratings yet
Statistics Notes 2019 Certificate
87 pages
Tutorial 04 Measure of Position
No ratings yet
Tutorial 04 Measure of Position
3 pages
Quartile-Percentile-and-Decile-Compatibility-Mode
No ratings yet
Quartile-Percentile-and-Decile-Compatibility-Mode
5 pages
IGCSE-IQR and CumulativeFrequency For GX (Girls)
No ratings yet
IGCSE-IQR and CumulativeFrequency For GX (Girls)
36 pages
Calculating Measures of Position: Grade 10
No ratings yet
Calculating Measures of Position: Grade 10
10 pages
Unit 2 - DA - Statistical Concepts
No ratings yet
Unit 2 - DA - Statistical Concepts
140 pages
Lesson-Plan-Quartiles-for-Grouped-Data-2021
No ratings yet
Lesson-Plan-Quartiles-for-Grouped-Data-2021
4 pages
Class X Maths Project 25-26
No ratings yet
Class X Maths Project 25-26
4 pages
Introduction To Business Statistics (Revision Questions) : IBS/Revision Worksheet/ BHRM/ 2020
No ratings yet
Introduction To Business Statistics (Revision Questions) : IBS/Revision Worksheet/ BHRM/ 2020
4 pages

Ch3 Numerically Summarizing Data

Uploaded by

Ch3 Numerically Summarizing Data

Uploaded by

Business Statistics

Median 2. Determine the number of observations, n.

So in this example where the extreme value 48 alters

• When the data set has two modes, we

• To compute the range, the data must be quantitative.

• The range is affected by extreme values, so the range is not resistant.

• Standard deviation is based on the

• The square root of the sum of squared deviations

• We call n - 1 the degrees of freedom because the first n - 1

If we are comparing two populations, then the larger the standard

• What if we divided by instead of to obtain the sample

• If a data value is larger than the mean, the z-score is positive.

You might also like