0% found this document useful (0 votes)

90 views23 pages

Data Types:: Basic Statistics

This document provides an overview of basic statistics concepts including data types, measures of central tendency and dispersion, probability distributions, hypothesis testing, and statistical techniques like simple linear regression, sampling, and graphical representations. Key points covered include defining random variables, expected value, the normal distribution and z-scores, confidence intervals, and the central limit theorem.

Uploaded by

maheshsakharpe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views23 pages

Data Types:: Basic Statistics

Uploaded by

maheshsakharpe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Basic Statistics

1) Data Types – Continuous, Discrete, Nominal, Ordinal, Interval,

Ratio, Random Variable, Probability, Probability Distribution
2) First, second, third & fourth moment business decisions
3) Graphical representation – Bar plot, Histogram, Boxplot,
Scatter diagram
4) simple Linear Regression
5) Hypothesis Testing
SLIDE-13
Data types:
1) Continuous 2) Discrete
SLIDE-14
Data types: Preliminaries
Normal: Merely labels, no further information can be gleaned.
Ex: “coke” and “Pepsi”
Ordinal: Conveys only up to preference information. Direction alone.
Ex: “I prefer coffee to tea”
Interval: Conveys relative magnitude information, in addition to
preference.
Ex: “I rate coke a 7 and Pepsi a 4 on a scale of 10.
Ratio: Conveys information on an absolute scale.
Ex: “I paid Rs11 for coke and Rs13 for Pepsi”.
SLIDE-15

Random Variable
A random variable describes the probabilities for an uncertain future
numerical outcome of a random process.
It is variable because it can take one of several possibilities.
It is a random because there is some chance associated with each
possible value.
SLIDE-16
Poker cards example:
Suppose you have randomly picked a card from the card deck. What
is the probability that this card will be?
- Bigger than 10?
- Equal to or Bigger than 10?
- Smaller than 3
- Greater than 4 and less than 8
SLIDE-17

What is probability of sale?

What is the probability of selling at least 3 tv’s?
SLIDE-18
Sampling Funnel:
1) Population 2) Sampling frame 3) SRS 4) Sample
SLIDE-19
Measures of central tendency
First moment Business decision:
Population –Mean or Average (µ) = (∑ (xi))/N
Sample-Mean or Average (𝑋) = (∑ (xi))/n
Median - Middle value of the data
Mode - Most occurring value in the data
SLIDE-20
Measures of Dispersion
Second moment Business decision:
Range= Max-Min
Population variance = σ2= (∑(X-µ)2)/N
Population standard deviation =sqrt ((∑ (xi-population mean) 2)/N)
Sample variance
(∑(x-𝑥) 2)/ (n-1)
Sample standard deviation = sqrt ((∑ (xi-sample mean) 2)/(n-1))
SLIDE-21

Expected Value
For a probability distribution, the mean of the distribution is known
as the expected value
The expected value intuitively refers to what one would find if they
repeated the experiment an infinite number of times and took the
average of all of the outcomes
Mathematically, it is calculated as the weighted average of each
possible value
The formula for calculating the expected value for a discrete random
variable X, denoted by μ, is:
∑ Xp(X)
The variance of a discrete random variable X, denoted by σ2 is
σ2 = ∑ [(x-µ/σ)] 2 = ∑ (x- µ)2p(x)
SLIDE-22
Graphical techniques:
1) Bar plot : plotting each point in bar shape

SLIDE-23
Histogram: Represents frequency distribution of data, how many
observations of take the value within certain interval.

SLIDE-24
Third Business Moment: Skewness
4rth Business Moment: Kurtosis
Skewness
• A measure of asymmetry in the distribution
• Mathematically it is given by: E [(x-µ/σ)] 3
• Negative skewness implies mass of the Distribution is
concentrated on the Right
Kurtosis
• A measure of the “Peakedness” of the distribution
• Mathematically it is given by E[(x-µ/σ)]4 -3
• For Symmetric distributions, negative Kurtosis implies wider peak
and thinner tails

SLIDE-25
Boxplot:

• Range (IQR): The middle half of a data set falls within the inter-
quartile range. – Inter Quartile Range.
• Box Plot: This graph shows the distribution of data by dividing the
data into four groups with the same number of data points in each
group. The box contains the middle 50% of the data points and
each of the two whiskers contain 25% of the data points. It displays
two common measures of the variability or spread in a data set
• Range: It is represented on a box plot by the distance between the
smallest value and the largest value, including any outliers. If you
ignore outliers, the range is illustrated by the distance between the
opposite ends of the whiskers
SLIDE-26

Normal Distribution
The normal random variable takes values from -∞ to +∞
The Probability associated with any single value of a random
variable is always zero
Area under the entire curve is always equal to 1.
SLIDE-27
. Characterized by bell shaped

Properties:
• 68.26% of values lie within ±1 σ from the mean
• 95.46% of the values lie within ±2 σ from the mean
• 99.73% of the values lie within ± 3σ from the mean

SLIDE-28
X~N(µ,σ)
Characterized by mean, µ, and standard deviation, σ

SLIDE-29
Z scores, Standard Normal Distribution:
• For every value (x) of the random variable X, we can calculate Z
score : Z = (X-µ)/ σ
• Interpretation − How many standard deviations away is the value
from the mean?

SLIDE-30
Calculating Probability from Z distribution
Suppose GMAT scores can be reasonably modelled using a normal
distribution
− µ = 711 σ = 29

What is p(x ≤ 680)?

Step 1: Calculate Z score corresponding to 680
Z = (680-711)/29 = -1.06
Step 2: Calculate the probabilities using Z – Tables
- P (Z ≤ -1) = 0.14

SLIDE-31
• What is P (697≤ X ≤ 740)?
• Step 1 : Use P(x1 ≤ X ≤ x2) = Use P( X ≤ x2) − P( X ≤ x1)

•
• Step 2 : Calculate P( X ≤ x2) and P( X ≤ x1) as before
P(X ≤ 740) = P (Z ≤ 1) = 0.84; P(X ≤ 697) = P ( Z ≤ - 0.5) = 0.31

• Step 3 : Calculate P( 697 ≤ X ≤ 740 ) = 0.84 – 0.31 = 0.53

SLIDE-32
Normal Quantile plot (Q-Q plot):

To check whether the data is normally distributed

If plot is straight line (do not have to be absolute straight line) then we say data is
normally distributed
If not then they are not normally distributed.
X-axis ->theoretical Quantiles
Y-axis ->Sample Quantiles

SLIDE-33
Sampling variation
- Sample mean varies from one sample to another.
- Sample mean can be (and most likely is) different from the
population mean.
- Sample mean is a random variable.
SLIDE-34
Central Limit Theorem
The Distribution of the sample mean
- will be normal when the distribution of data in the population is
normal
- will be approximately normal even if the distribution of data in the
population is not normal if the “sample size” is fairly large
Mean (X) = µ (the same as the population mean of the raw data)

Standard Deviation (X) = σ /√𝑛, where σ is the population standard

deviation and n is the sample size
- This is referred to as standard error of mean.
The standard error of the mean estimates the variability between
samples whereas the standard deviation measures the variability within
a single sample.

SLIDE-35
Sample Size Calculation
A Sample Size of 30 is considered large enough, but that may /may not
be adequate
More Precise conditions
- n > 10( K3 )2 , where ( K3 ) is sample skewness and
- n > 10( K4 ) , where ( K4) is sample kurtosis

SLIDE-36

Confidence Interval
• What is the Probability of tomorrow’s temperature being 42
degrees?
• Probability is ‘0’
• Can it be between [-50⁰C & 100⁰C]?

SLIDE-37
Case Study: Confidence Interval
• A University with 100,000 alumni is thinking of offering a new
affinity credit card to its alumni.
• Profitability of the card depends on the average balance
maintained by the card holders.
• A Market research campaign is launched, in which about 140
alumni accept the card in a pilot launch.
• Average balance maintained by these is $1990 and the standard
deviation is $2833. Assume that the population standard
deviation is $2500 from previous launches.
• What we can say about the average balance that will be held after
a full−fledged market launch?
SLIDE-38
Interval estimates of parameters
• Based on sample data
− The point estimate for mean balance = $1990
− Can we trust this estimate?
• What do you think will happen if we took another random sample
of 140 alumni?
• Because of this uncertainty, we prefer to provide the estimate as
an interval (range) and associate a level of confidence with it
• Interval Estimate = Point Estimate ± Margin of Error
SLIDE-39

Confidence Interval for the Population Mean

Start by choosing a confidence level (1-α) % (e.g. 95%, 99%, 90%)
Then, the population mean will be with in
X ± Z1-ᾳ σ/ √𝑛 where Z1-ᾳ satisfies p (-Z1-ᾳ ≤ Z ≤ Z1-ᾳ) = 1-ᾳ
Margin of error depends on the underlying uncertainty, confidence
level and sample size.
SLIDE-40
Calculate Z value - 90%, 95% & 99%
SLIDE-41
Confidence Interval Calculation
• Based on the survey and past data
• − n = 140; σ = $2500; X = $ 1990
σ𝑥 = σ/√𝑛 = 2500/√(140) = 211.29
• Construct a 95% confidence interval for the mean card balance
and interpret it?
• Construct a 90% confidence interval for the mean card balance
and interpret it?

SLIDE-42
Confidence Interval Interpretation
Consider the 95% Confidence interval for the mean income:
[$1576, $2404]
Does this mean that?
- The mean balance of the population lies in the range?
- The mean balance is in this range 95% of the time?
- 95% of the alumni have balance in this range?
Interpretation 1 : Mean of the population has a 95% chance of
being in this range for a random sample
Interpretation 2 : Mean of the population will be in this
range for 95% of the random samples
SLIDE-43
What if we don’t know Sigma?
• Suppose that the alumni of this university are very
different and hence population standard deviation from
previous launches cannot be used
We replace σ with our best guess (point estimate) s, which
is the standard deviation of the sample:

Calculate:

• If the underlying population is normally distributed , T is a

random variable distributed according to a t-distribution
with n-1 degrees of freedom Tn-1
• Research has shown that the t-distribution is fairly robust
to deviation of the population of the normal model
SLIDE-44
Student’s t-distribution

As n -> ꝏ
tn -> N(0,1)
i.e., as the degrees of the freedom increase, the t-
distribution approaches the standard normal distribution.
Slide-45
Confidence Interval for mean with unknown Sigma
𝑥 ± Z1-ᾳ σ/ √𝑛 where Z1-ᾳ satisfies p (-Z1-ᾳ ≤ Z ≤ Z1-ᾳ) = 1-ᾳ
Instead of above equation we can use the below t distribution
equation
𝑥 ± t1-ᾳ, n-1 s/ √𝑛 where t1-ᾳ, n-1 satisfies p (-t1-ᾳ, n-1 ≤ Tn-1≤ t1-ᾳ, n-1) = 1-ᾳ
Slide-46
Calculating t-value
• Construct a 95% confidence interval for the mean card balance
and interpret it?
n = 140; σ = $2500; 𝑥 = $ 1990
σ𝑥 = 2833/sqrt(140) = 239.46
Calculate t0.95, 139 = 1.98
Then the 95% confidence interval for balance is [$1516, $2464]

Statistics ESCP
No ratings yet
Statistics ESCP
383 pages
MMW PPT Weeks 9 12
No ratings yet
MMW PPT Weeks 9 12
31 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Statistics
No ratings yet
Statistics
36 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
Measures of Dispersion Updated
No ratings yet
Measures of Dispersion Updated
38 pages
Lec 1
No ratings yet
Lec 1
44 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
43 pages
Statistics
No ratings yet
Statistics
12 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
BA20 Session2 M
No ratings yet
BA20 Session2 M
40 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
History Reporting
No ratings yet
History Reporting
61 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
33 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
AP ECON 2500 Session 2
No ratings yet
AP ECON 2500 Session 2
22 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
FRM一级强化段定量分析 Crystal 金程教育 (标准版
No ratings yet
FRM一级强化段定量分析 Crystal 金程教育 (标准版
156 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Properties of The Normal Distribution
No ratings yet
Properties of The Normal Distribution
16 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Lecture No. 6 Measures of Variability
No ratings yet
Lecture No. 6 Measures of Variability
25 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Data Science Course
No ratings yet
Data Science Course
50 pages
Statistical Measures 2024 (Part 2) - Word
No ratings yet
Statistical Measures 2024 (Part 2) - Word
8 pages
Edexcel iAL Mathematics Formula Book
No ratings yet
Edexcel iAL Mathematics Formula Book
13 pages
CHAPTERS
No ratings yet
CHAPTERS
17 pages
Week 6 Lec and Act
No ratings yet
Week 6 Lec and Act
8 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Chapter 3 (Technical English For Statistics)
No ratings yet
Chapter 3 (Technical English For Statistics)
8 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Statistics Theories
No ratings yet
Statistics Theories
10 pages
Review of Basic Statistical Concepts
No ratings yet
Review of Basic Statistical Concepts
8 pages
Stat Reviewer
No ratings yet
Stat Reviewer
3 pages
Stats
No ratings yet
Stats
3 pages
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
No ratings yet
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
4 pages
Bayesian Inference 4 LMS PDF
No ratings yet
Bayesian Inference 4 LMS PDF
91 pages
Jimenez A. Lectures On Probability and Statistics..Graduate-Level Economics 2024
No ratings yet
Jimenez A. Lectures On Probability and Statistics..Graduate-Level Economics 2024
295 pages
Gardiner - Stochastic Meethods
No ratings yet
Gardiner - Stochastic Meethods
10 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
01 Introduction
No ratings yet
01 Introduction
52 pages
CHAPTER 3 Part A
No ratings yet
CHAPTER 3 Part A
70 pages
SPSS Assignment
No ratings yet
SPSS Assignment
24 pages
DML 2
No ratings yet
DML 2
117 pages
MP1 Parameter Estimation
No ratings yet
MP1 Parameter Estimation
11 pages
Statistics Notes BS-1
No ratings yet
Statistics Notes BS-1
13 pages
Regression1 Framework
No ratings yet
Regression1 Framework
52 pages
Practical Missing Data Analysis in SPSS
No ratings yet
Practical Missing Data Analysis in SPSS
19 pages
Komputasi Geologi: Muhammad Rizqy Septyandy, M.T
No ratings yet
Komputasi Geologi: Muhammad Rizqy Septyandy, M.T
40 pages
Chpt. 4 (Measurement, Scaling) Nepali
No ratings yet
Chpt. 4 (Measurement, Scaling) Nepali
30 pages
Assignment4 Group3.CC01.Forecasting-1
No ratings yet
Assignment4 Group3.CC01.Forecasting-1
11 pages
Unit 2 - Class - Preceptron
No ratings yet
Unit 2 - Class - Preceptron
13 pages
HW4
No ratings yet
HW4
7 pages
Lecture 1 - BIOL933 Design, Analysis, and Interpretation of Experiments PDF
No ratings yet
Lecture 1 - BIOL933 Design, Analysis, and Interpretation of Experiments PDF
43 pages
Bernoulli PDF
No ratings yet
Bernoulli PDF
19 pages
Teori Agensi Putri - 46-Ika-Yusrianti - Et - Al
No ratings yet
Teori Agensi Putri - 46-Ika-Yusrianti - Et - Al
17 pages
UNIT 4-STATISTICAL INFERENCE BM
No ratings yet
UNIT 4-STATISTICAL INFERENCE BM
7 pages
Forward Error Correction PDF
No ratings yet
Forward Error Correction PDF
7 pages
5 5 STS Handout
No ratings yet
5 5 STS Handout
5 pages
1916 Final Journal+of+Survey+and+fisheries+Sciences+
No ratings yet
1916 Final Journal+of+Survey+and+fisheries+Sciences+
9 pages
Types of Research Hypothesis
No ratings yet
Types of Research Hypothesis
1 page
IBM 2103 Tutorial 6: Testing On Population Mean
No ratings yet
IBM 2103 Tutorial 6: Testing On Population Mean
4 pages
Assignment #1 Data Analysis For Business - II
No ratings yet
Assignment #1 Data Analysis For Business - II
3 pages
Quiz-1 Paper - With Answers PDF
No ratings yet
Quiz-1 Paper - With Answers PDF
2 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Data Types:: Basic Statistics

Uploaded by

Data Types:: Basic Statistics

Uploaded by

Basic Statistics

1) Data Types – Continuous, Discrete, Nominal, Ordinal, Interval,

What is probability of sale?

What is p(x ≤ 680)?

• Step 3 : Calculate P( 697 ≤ X ≤ 740 ) = 0.84 – 0.31 = 0.53

To check whether the data is normally distributed

Standard Deviation (X) = σ /√𝑛, where σ is the population standard

Confidence Interval for the Population Mean

• If the underlying population is normally distributed , T is a

You might also like