0% found this document useful (0 votes)

5 views

Statistics Refresher

Uploaded by

Abhishek Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Statistics Refresher

Uploaded by

Abhishek Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Importance of Statistics

1. Business:
○ Statistics helps in forecasting sales, analyzing trends, and improving decision-
making.
○ Example: A retail store uses sales data to predict demand during holiday
seasons.
2. Healthcare:
○ Analyzing patient outcomes and testing the efficacy of drugs rely on statistical
methods.
○ Example: A clinical trial uses statistical analysis to determine the effectiveness of
a new vaccine.
3. Social Sciences:
○ Statistics aids in studying behavior patterns and interpreting survey results.
○ Example: Analyzing voting patterns in elections to understand demographic
preferences.

Data Collection

1. Primary Data:
○ Collected directly for a specific purpose, often through surveys, interviews, or
experiments.
○ Example: A company surveys customers to gauge satisfaction with a new
product.
2. Secondary Data:
○ Obtained from existing sources like reports, databases, or government records.
○ Example: Using census data to analyze population growth trends for urban
planning.

Data Classification with Examples

1. Nominal Data

● Definition: Data that represents categories without inherent order.

● Example: Car Color (Red, Blue, Green).
● There is no ranking or logical order to the colors.
2. Ordinal Data

● Definition: Data with categories in a specific order, but the intervals between them are
not uniform.
● Example: Customer Rating (Poor, Average, Excellent).
● "Excellent" is higher than "Average," but the difference between "Poor" and "Average"
may not equal the difference between "Average" and "Excellent."

3. Interval Data

● Definition: Data with equal intervals between values but no true zero.
● Example: Temperature (°C) (15, 25, 30).
● The difference between 15°C and 25°C is the same as between 25°C and 30°C.
However, 0°C does not mean the absence of temperature (no true zero).

4. Ratio Data

● Definition: Data with equal intervals and a true zero, allowing for meaningful
comparisons like ratios.
● Example: Weight (kg) (60, 70, 85).
● 0 kg means no weight, and comparisons like "Person 2 is 1.2 times heavier than Person
3" are valid.

More Types Of Data

Discrete Data

● Definition: Discrete data consists of countable, distinct values, often integers. These
values represent quantities that cannot be subdivided meaningfully.
● Example from the Table:
○ Number of Pets: {0, 1, 2, 3}.
○ These are exact numbers representing how many pets a person owns. You can’t
have 1.5 pets.
● Key Features:
○ Finite or countable.
○ Typically whole numbers.
○ Example in Real Life: Number of books on a shelf, number of cars in a parking
lot.

Continuous Data

● Definition: Continuous data can take any value within a given range and often includes
decimals. It represents measurements that can be infinitely precise.
● Examples from the Table:
○ Height (cm): {160.5, 165.8, 172.3, 158.0}.
■ These are measurements, and any value is possible within the height
range (e.g., 160.51 cm or 160.52 cm).
○ Weight (kg): {70.2, 65.5, 68.9, 60.0}.
■ These are also measurements, and decimal precision is meaningful (e.g.,
68.91 kg or 68.92 kg).
● Key Features:
○ Infinite possible values within a range.
○ Measured rather than counted.
○ Example in Real Life: Distance, temperature, or speed.

Table of Statistical Measures

A concise table summary for the distribution of Nominal, Ordinal, Interval, and Ratio data
under Discrete and Continuous data types:
Measure Of Central Tendency

1. Mean

● Definition: The average of the values in a dataset.

● Formula: Mean = (Sum of all values)/(Number of values)
● Example (Age): Mean Age=(25+30+35+40+455=35) / 5
● Excel Formula: =AVERAGE(B2:B6)
● Use Case: Commonly used to calculate average performance, salaries, or ages in a
group.

2. Median

● Definition: The middle value when the data is sorted.

● Example (Salary):
Sorted Salaries: 40000, 48000, 50000, 55000, 60000
Median = 50000 (middle value in the sorted list).
● Excel Formula: =MEDIAN(E2:E6)
● Use Case: Useful in skewed data where extreme values (outliers) could distort the
mean. For example, finding the typical household income in a neighborhood.

3. Mode

● Definition: The most frequently occurring value in the dataset.

● Example (Weight):
Mode = 2 (appears 3 times in the Weight column).
● Excel Formula: =MODE(C2:C6)
● Use Case: Used in product preference studies (e.g., finding the most popular product
color or size).

4. Weighted Mean

● Definition: The mean where some values contribute more due to their assigned
weights.
● Formula: Weighted Mean=∑(Value×Weight)/∑(Weight)
● Example (Score with Weight):
Weighted Mean = [(85×2)+(90×3)+(80×1)+(95×2)+(88×2)] / (2+3+1+2+2) = 88.33
● Excel Formula: =SUMPRODUCT(B2:B6, C2:C6)/SUM(C2:C6)
● Use Case: Often used in academics (weighted average of grades), project evaluations,
and financial analysis.

5. Trimmed Mean

● Definition: The mean after removing a certain percentage of the smallest and largest
values.
● Example (Salary, 10% Trim):
Exclude 10% from both ends: 40000, 60000
Remaining Salaries: 48000, 50000, 55000
Trimmed Mean = (48000+50000+55000) / (3) = 51000
● Excel Formula: =TRIMMEAN(E2:E6, 0.1)
● Use Case: Used in sports scoring (removing outliers like highest and lowest judges'
scores) or in robust data analysis.

6. Weighted Median

● Definition: The median when weights are applied to the values.

● Example:
○ Values sorted by Score in ascending order first : {80 (1), 85 (2), 88 (2), 90 (3), 95
(2)}
○ Weighted Median = 90 (middle weighted value).
● Excel Formula: Weighted Median doesn’t have a direct Excel function but can be
computed manually
● Use Case: Used in income analysis to find the typical income, accounting for family size
or population density.

Measures of Dispersion

Measures of dispersion indicate how spread out the data is. They include Range, Variance,
Standard Deviation, Percentiles, and Quartiles. Understanding outliers is also crucial in data
analysis.

1. Range

● Definition: The difference between the maximum and minimum values in a dataset.
● Manual Formula: Range = Maximum - Minimum
● Example: Data: {10, 15, 20, 25, 30}
Range = 30−10=20
● Excel Formula: =MAX(A1:A5) - MIN(A1:A5)
● Use Case:
○ Quick measure of variability in datasets like daily temperatures or stock prices.

2. Variance

● Definition: Measures the average squared deviation of each data point from the mean.
● Manual Formula: Variance=∑(Value−Mean)^2 / Number of Values
● Example: Data: {10, 15, 20, 25, 30}
Mean = 20
Variance =¿
● Excel Formula:
○ For a population: =VAR.P(A1:A5)
○ For a sample: =VAR.S(A1:A5)
● Use Case:
○ Variance is used in finance to measure investment risk.

3. Standard Deviation

● Definition: The square root of variance, representing the average distance of data
points from the mean.
● Manual Formula: Standard Deviation=Variance
● Example: Data: {10, 15, 20, 25, 30}
Variance = 50
Standard Deviation =squareroot ( variance) = 7.07
● Excel Formula:
○ For a population: =STDEV.P(A1:A5)
○ For a sample: =STDEV.S(A1:A5)
● Use Case:
○ Used in quality control and risk analysis to assess consistency.

4. Percentiles

● Definition: The value below which a given percentage of data falls.

● Manual Formula:
Arrange data in ascending order and identify the value corresponding to the desired
percentage position. => Position = [(percentile/100)] x (n+1), n = no of data points
● If P is an integer, the 90th percentile is the value at that position.
● If P is a decimal, interpolate between the nearest two values.
❖ Pk=Value at Floor Position+(Fractional Part of Position)×(Difference Between Next
and Floor Values)
● Example: Data: {10, 15, 20, 25, 30}
25th Percentile (P,25) = 12.5
● Excel Formula: =PERCENTILE.EXC(A1:A5, 0.25)
● Use Case:
○ Percentiles are commonly used in test scoring and income distribution.

5. Quartiles

● Definition: Divide data into four equal parts.

○ Q1 (25th Percentile): First quartile.
○ Q2 (50th Percentile): Median.
○ Q3 (75th Percentile): Third quartile.
● Manual Formula: Quartiles are calculated like percentiles at 25%, 50%, and 75%.

Excel Formula
=QUARTILE.EXC(A1:A5, 1) (Q1)
=QUARTILE.EXC(A1:A5, 2) (Median or Q2)
=QUARTILE.EXC(A1:A5, 3) (Q3)

● Use Case:
○ Identifying outliers by using the Interquartile Range (IQR).

6. Outliers

● Definition: Data points that are significantly different from others in the dataset.
● Technique for Identifying Outliers:
○ Compute IQR: IQR=Q3−Q1
○ Lower Bound: Q1−1.5xIQR
○ Upper Bound: Q3+1.5xIQR
○ Example: Data: {10, 15, 20, 25, 30,100}
Q1 = 13.75 , Q3 = 47.5 , IQR = 33.75
Lower Bound = 13.75 − 1.5×33.75 = −36.875
○ Upper Bound = 47.5 + 1.5×33.75 = 98.125
Outlier = 100
○ Any data point below the Lower Bound or above the Upper Bound is
considered an outlier.
● Use Case:
○ Detecting outliers in sales, student test scores, or scientific experiments.

7. Coefficient of Variation

Definition:
● The Coefficient of Variation (CV) is a measure of relative variability, expressed as a
percentage of the mean. It helps in comparing the degree of variation between datasets
with different units or scales.

CV = (Standard deviation / Mean) x 100

Excel Formula:

● Mean: =AVERAGE(A1:A5)
● Standard Deviation: STDEV(A1:A5)
● CV: =STDEV(A1:A5)/AVERAGE(A1:A5)*100

Significance:

● A higher CV indicates more variability relative to the mean.

● Used in fields like finance (risk analysis), biology (experimental variability), and
manufacturing (quality control).

8. Covariance

Definition:

Covariance measures the directional relationship between two variables:

● Positive covariance: Variables increase or decrease together.

● Negative covariance: One variable increases while the other decreases.

Formula:
Covariance = [∑( X−X mean)(Y −Y mean)]/n

Manual Calculation:

● Datasets:
X = {10, 20, 30, 40, 50}
Y = {15, 25, 35, 45, 55}

● Calculate the Means:

mean(X) = 30 , mean(Y) = 35

● Find Deviations:
X− mean(X) = {-20, -10, 0, 10, 20}
Y− mean(Y) = {-20, -10, 0, 10, 20}
● Multiply Deviations:
[X− mean(X)][Y− mean(Y)] = 400,100,0,100,400

● Calculate Covariance:
Covariance= (400+100+0+100+400) / 5 = 200

Excel Formula:

● Use: =COVAR(range1, range2)

Significance:

● Covariance only indicates direction, not the strength or magnitude of the relationship.

9. Correlation

Definition:

Correlation measures the strength and direction of the linear relationship between two
variables. It is standardized, ranging from -1 to 1:

● 1: Perfect positive correlation.

● -1: Perfect negative correlation.

Pearson Correlation

Formula:
r = Covariance(X, Y) / SD(X) × SD(Y)

Manual Calculation:

● Datasets:
X = {10, 20, 30, 40, 50}
Y = {15, 25, 35, 45, 55}

From the previous covariance example:

● Covariance(X, Y) = 200
● SD(X)=14.14, SD(Y) = 14.14
● Correlation = r = 200 / (14.14×14.14) = (200 / 200) = 1

This indicates a perfect positive correlation.

Excel Formula:

● Use: = CORREL(range1, range2)

Comparison of Covariance and Correlation

10. Simple Linear Regression in Excel

Simple Linear Regression is a statistical technique used to predict the value of a dependent
variable (Y) based on an independent variable (X). The relationship between X and Y is
modeled as a straight line:

Y= mX + c
Where:

● c : Intercept (value of Y when X=0).

● m : Slope (change in Y for a one-unit change in X).

Example

● Independent Variable (X): Study Hours.

● Dependent Variable (Y): Exam Scores.
Manual calculation of Slope and intercept
● slope(m) = ∑(X− X mean)( y− y mean )/∑ ¿
● intercept(c) = Y mean−slope( X mean) = y – mx

Relationship of slope, covariance and correlation

m=r [SD(Y )/SD (X )]=Covariance( X ,Y )/ variance(X )

Use Excel Functions to Calculate Regression Coefficients

You can calculate the slope (m) and intercept (c) using these formulas:
● Slope (m): =SLOPE(known_y's, known_x's)

● Intercept (c): = INTERCEPT(known_y's, known_x's)

Make Predictions

Use the regression equation to predict Y for new values of X:

Y= mX + c

Example:

● If c = 45 , m = 6 , X=6 , Y = 6(6) + 45 = 81

Alternatively, use Excel’s function:

● =FORECAST.LINEAR(new_x, known_y's, known_x's)

Insights from Regression

1. Slope(m):
○ If positive, Y increases as X increases.
○ If negative, Y decreases as X increases.
2. Intercept(c):
○ The starting value of Y when X= 0
3. Predictions:
○ You can estimate Y for given values of X

Dote 2011 L1
No ratings yet
Dote 2011 L1
35 pages
NRES 1 Course Guide
No ratings yet
NRES 1 Course Guide
10 pages
Psych 1101 Syllabus - 2017
No ratings yet
Psych 1101 Syllabus - 2017
6 pages
Gymnastics Lesson Plan 1 - Creating Obstacle Course
No ratings yet
Gymnastics Lesson Plan 1 - Creating Obstacle Course
3 pages
Notes Week 3
No ratings yet
Notes Week 3
4 pages
Evans Analytics2e PPT 04 Revised
No ratings yet
Evans Analytics2e PPT 04 Revised
51 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
63 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
63 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
48 pages
MMW Quiz 2 Reviewer
No ratings yet
MMW Quiz 2 Reviewer
8 pages
3RD QUARTER STATISTICS AND PROBABILITY (1)
No ratings yet
3RD QUARTER STATISTICS AND PROBABILITY (1)
7 pages
Excel Formulas
No ratings yet
Excel Formulas
8 pages
DAAN436277 Buoi09 EDA
No ratings yet
DAAN436277 Buoi09 EDA
132 pages
Torturing Excel Into Doing Statistics: Preparing Your Spreadsheet
No ratings yet
Torturing Excel Into Doing Statistics: Preparing Your Spreadsheet
10 pages
Notes
No ratings yet
Notes
44 pages
ADS PRINT ans
No ratings yet
ADS PRINT ans
4 pages
Data Management
No ratings yet
Data Management
36 pages
Notes Stats
No ratings yet
Notes Stats
21 pages
[IN] measures
No ratings yet
[IN] measures
11 pages
Business Statistics: Dr. Basheer Ahmad Samim
No ratings yet
Business Statistics: Dr. Basheer Ahmad Samim
70 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Deciles and Percentiles
No ratings yet
Deciles and Percentiles
52 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
47 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
stat app ch 2 (3)
No ratings yet
stat app ch 2 (3)
7 pages
Basic Stats Session
No ratings yet
Basic Stats Session
16 pages
Section 1 Slide
No ratings yet
Section 1 Slide
132 pages
Module 1 Statistical Inference
No ratings yet
Module 1 Statistical Inference
67 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Descriptive Statistics (1)
No ratings yet
Descriptive Statistics (1)
63 pages
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
No ratings yet
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
16 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
Fundamentals of Statistics With MS Excel
No ratings yet
Fundamentals of Statistics With MS Excel
83 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
No ratings yet
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
24 pages
719 Final Syllabus Merged
No ratings yet
719 Final Syllabus Merged
200 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
Quantitative Analysis: Dr. Basheer Ahmad Samim
No ratings yet
Quantitative Analysis: Dr. Basheer Ahmad Samim
71 pages
UNIT V STATISTICAL DATA ANALYSIS (1)
No ratings yet
UNIT V STATISTICAL DATA ANALYSIS (1)
72 pages
Manual
No ratings yet
Manual
46 pages
Business Stats Chapter 1
No ratings yet
Business Stats Chapter 1
27 pages
SPSS Notes
No ratings yet
SPSS Notes
8 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Mmw Reviewer
No ratings yet
Mmw Reviewer
9 pages
RM-EBBA-class-8-CH0-11-Quatitative-analysis
No ratings yet
RM-EBBA-class-8-CH0-11-Quatitative-analysis
37 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
Section 6 Slides 2022
No ratings yet
Section 6 Slides 2022
372 pages
Session 1: Introduction Basic Concepts and Data Presentation
No ratings yet
Session 1: Introduction Basic Concepts and Data Presentation
19 pages
Lecture2 - Descriptive Statistics - 0909
No ratings yet
Lecture2 - Descriptive Statistics - 0909
29 pages
02Data (2)
No ratings yet
02Data (2)
36 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
63 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Datascience First Conti..and Second Unit
No ratings yet
Datascience First Conti..and Second Unit
49 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Nursing Research Methods: PH.D in Nursing
No ratings yet
Nursing Research Methods: PH.D in Nursing
66 pages
Midterms-Day-4 (1)
No ratings yet
Midterms-Day-4 (1)
51 pages
Business Analytics: Describing The Distribution of A Single Variable
No ratings yet
Business Analytics: Describing The Distribution of A Single Variable
58 pages
Quantitative Analysis Paper
No ratings yet
Quantitative Analysis Paper
15 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Chapter III
No ratings yet
Chapter III
7 pages
408-Article Text-951-1-10-20220910
No ratings yet
408-Article Text-951-1-10-20220910
7 pages
Chapter 1 Boyle
No ratings yet
Chapter 1 Boyle
9 pages
Edse 846 Morphology Assessment
No ratings yet
Edse 846 Morphology Assessment
23 pages
JU Research Protocol-1
100% (2)
JU Research Protocol-1
15 pages
Tutorial Quiz Archive
No ratings yet
Tutorial Quiz Archive
4 pages
Final Examination in Statistics and Probability
No ratings yet
Final Examination in Statistics and Probability
4 pages
Sample Thesis Chapter 1 in Education
100% (3)
Sample Thesis Chapter 1 in Education
8 pages
Validating Computational Models: Kathleen M. Carley
No ratings yet
Validating Computational Models: Kathleen M. Carley
24 pages
Ethics and Social Welfare: Publication Details, Including Instructions For Authors and Subscription Information
No ratings yet
Ethics and Social Welfare: Publication Details, Including Instructions For Authors and Subscription Information
18 pages
PQ
100% (1)
PQ
152 pages
Symbolic interactionism
No ratings yet
Symbolic interactionism
12 pages
MODULE01 - STAT - PROBA - Statistical Variables
No ratings yet
MODULE01 - STAT - PROBA - Statistical Variables
12 pages
Agbm 404 Topic 1 Handout
No ratings yet
Agbm 404 Topic 1 Handout
17 pages
About Mullakadu
No ratings yet
About Mullakadu
2 pages
Desmemoria de La Esperanza
No ratings yet
Desmemoria de La Esperanza
15 pages
Camouflage Personae: A Case Study of Bapsi Sidhwa's The Pakistani Bride'
No ratings yet
Camouflage Personae: A Case Study of Bapsi Sidhwa's The Pakistani Bride'
9 pages
wp1 Final Essay Only
No ratings yet
wp1 Final Essay Only
6 pages
A Research Review of Milk Thistle
100% (1)
A Research Review of Milk Thistle
10 pages
AAKASH KARTHICK V - Tragedy of The Commons - Encroachments Over Common Property Resources 30.07.2024
No ratings yet
AAKASH KARTHICK V - Tragedy of The Commons - Encroachments Over Common Property Resources 30.07.2024
16 pages
Co1 - Introduction To Statistics and Probability
100% (1)
Co1 - Introduction To Statistics and Probability
69 pages
ES Fitness Capabilities Statement v1 3 2016
No ratings yet
ES Fitness Capabilities Statement v1 3 2016
1 page
BRM Notes
No ratings yet
BRM Notes
4 pages
Stress Worksheet
No ratings yet
Stress Worksheet
7 pages
Module II - English
No ratings yet
Module II - English
40 pages
Research Methods Notes-1
No ratings yet
Research Methods Notes-1
7 pages
Action Learning and Action Research Are Closely Related Processes
No ratings yet
Action Learning and Action Research Are Closely Related Processes
4 pages