0% found this document useful (0 votes)

111 views27 pages

Data Science With Python

This document provides an overview of statistics concepts for data science, including basics, probability distributions, and advanced topics. The agenda covers the basics of statistics like data scales, variance, and standard deviation. It then discusses probability distributions like the normal, binomial, and Poisson distributions. Finally, it outlines advanced statistical concepts like sampling, inferential statistics, hypothesis testing using z-tests and t-tests. The goal is to equip learners with both basic and advanced statistics skills for data science applications in Python.

Uploaded by

Nivas Srini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views27 pages

Data Science With Python

Uploaded by

Nivas Srini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Data Science with Python

Day 3 - Statistics for Data Science - Basic & Advanced

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Today’s Agenda

✓ Basics of Statistics ✓ Probability Distributions

• Type of Random Variables - Based on • Normal Distribution
Scale of Measurement • Standard Normal Distribution and Z-
o Nominal Score
o Ordinal • Binomial Distribution
o Interval • Poisson Distribution
o Ratio
• Variance
• Standard Deviation

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Basics of Statistics

Type of Random Variable - Based on Scale of Measurement

NOMINAL ORDINAL INTERVAL RATIO

• No Order • Order • Order • Order

• No Comparison • Comparison • Comparison • Comparison
• No Calculation • No Calculation • Calculation • Calculation
• No Interval • No Interval • Regular Interval • Regular Interval
• No Absolute Zero • Absolute Zero
Ex. Ex. • Cannot calculate • Can calculate Ratio
Gender {M,F} Size {S<M<L} ratio Ex.
Ex. Height, distance
Temp.{0 C= 32F}, IQ

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Basics of Statistics
Variance(σ2 ) and Standard Deviation(σ) – Data Speed X-Mu (X-
Point Mu)2
Variance ( σ2 ) – Average Squared deviation of value from Mean : Var(X) = 1/n [Sum [X-Mu]2 ]
Standard Deviation (σ) – Square Root of Variance : √ 1/n [Sum [X-Mu]2 ] 1 13 -8 64
2 12 -9 81
3 17 -4 16
4 18 -3 9
5 18 -3 9
6 21 0 0
7 26 5 25
8 30 9 81
9 29 8 64
Variance = 398/10 = 39.8
10 28 7 49
SD = 6.3 Average 21 398

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Basics of Statistics

Mean(µ), Variance(σ2 ) and Standard Deviation(σ) -

Discrete random variable :

Mean (µ) - Variance (σ2 ) - Standard Deviation (σ) -

https://en.wikipedia.org

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Basics of Statistics

Mean(µ), Variance(σ2 ) and Standard Deviation(σ) -

Continues random variable -

Mean (µ) - Variance (σ2 ) - Standard Deviation (σ) -

https://en.wikipedia.org/wiki/Mode_(statistics)

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Probability Distribution
Normal Distribution
Normal or Gaussian or bell shaped curve distribution is a very common continuous probability distribution. Normal Distribution has
bell shaped curve, it’s a symmetric single model distribution with highest density at and around the mean :
Ex. Age, Marks
Some Important properties :
• Mean = Median = Mode

• Area within 1 Std. Dev around the mean ~ 68.3 %

• Area within 2 Std. Dev around the mean ~ 95.4 %

• Area within 3 Std. Dev around the mean ~ 99.7 %

https://en.wikipedia.org/wiki/Normal_distribution

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Probability Distribution
Standard Normal Distribution and z Score/ z Statistic
Special case of Normal distribution with Mean = 0 and Variance = 1, Std. Deviation = 1. it has total area under the curve = 1 which
represents probability.
Any Normal distribution can be converted into Standard Normal Distribution by applying following transformation :

Z = (x - µ) / σ {This is called as z Score, it tells us how many SD far we are from mean}

https://www.mathsisfun.com/data/standard-normal-distribution.html

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Probability Distribution

Binomial Distribution
Bi→ 2, Nomial→ Nominal → Only 2 possible outcomes (Success or Failure)
When we perform any given experiment multiple times and we are interested in knowing #successes, this type of experiments are
known as Binomial experiments, Ex. Flipping the coins multiple times. Using Binomial Distribution we can answer probability related
questions for any Binomial experiments.
Probability of getting ‘x’ #Successes out of ‘n’ trials using Binomial Distribution –
P(x) = ncx Px (1-P)n-x ; P = Probability of Success in 1 trial
Some Important properties :
▪ N Fixed Number of Trials
▪ Only 2 Possible Exclusive Outcomes
▪ Probability of success remain same during the experiment
▪ All the trials are independent

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Probability Distribution
Poisson Distribution
When we analyze the probability of occurrence of any event during some specified interval of time or according to some other binding
conditions.

Probability of ‘x’ occurrence using Poisson Distribution –

P(x) = (x e- )/x! ;  = Mean/Expected #Occurrence

Some Important properties :
▪ All the occurrences are independent
▪ Expected #Occurrence doesn’t change over the period of time

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

“Qs & As”

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]
Data Science with Python
Day 4 - Statistics for Data Science - Advanced

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Today’s Agenda

✓ Inferential Statistics ✓ Hypothesis Testing

• Sampling • Hypothesis and hypothesis Testing

• Inferential Statistics • One tail/Two tail test
• Sampling Distribution • Type I and Type II Errors
• Central Limit Theorem • Hypothesis Testing using z test
• Central Limit Theorem Exercise • Hypothesis Testing t test

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Sampling
Sampling is taking random samples from over all population, sampling is done in order to make some judgements about overall
population because many a time it is not possible or practical to analyze the overall population and instead we can get approximately
same results even using sampling with sufficient sample size

Sample
Population
Sample
Sample
Sample Sample

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Inferential Statistics
With inferential statistics, we try to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential
statistics to try to infer from the sample data what the overall population might think. Or, we use inferential statistics to make
judgments of the probability of the overall population. This is also know as Point Estimation.

Point/Parameter
Point Estimators Estimation
• Sampling Mean (µ XBar) -> Population Mean (µ ) Population
• Sampling Standard Deviation (σ XBar) –> Population Standard Deviation (σ)
Sample

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Sampling Distribution
When we use the distribution of samples taken randomly from population to make judgment about the overall population. Different
Samples taken from same population can show different characteristics this is know as sampling variability. Larger the sample size –
less the variability.

Expected Value E(x) or Sampling Mean (µ XBar)-

We take multiple samples from overall population and analyze the distribution of these samples to make the decision about overall
population. The mean of these samples is known as expected value or sampling mean and it can be considered as Overall population
mean (µ).
Sample Size (n)-> Very large than Expected Value E (x) -> µ

Standard Error of the Mean (σ XBar)-

Standard deviation of sampling distribution is know as Standard Error of the Mean.
Sample Size (n)-> Very large than Standard Error of the mean -> 0

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Central Limit Theorem (CLT)

"Sample Mean will be approximately normally distributed for larger sample size regardless of the original distribution from which
we are taking samples."
With Mean = Population Mean (µ)
SD = σ /√n
{in case σ is not known then SD = s/ √n, s = Sample SD}

Application of CLT
So we can use standard normal distribution concepts for any non normal population by taking samples because as per CLT Samples
will be normally distributed for large sample size.
From CLT we know, sampling SD σx = σ /√n
From Standard Normal distribution we know – Z = (x - µ) / σ
So for any sampling distribution we can say – Z = (X - µ) / (σ /√n), so now we can calculate the probability using SND for any Non
normal Population.

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Exercise - Central Limit Theorem

A large freight elevator can transport a maximum of 9800 pounds. Suppose a load of cargo containing 49 boxes must be transported via the elevator. Experience
has shown that the weight of boxes of this type of cargo follows a distribution with mean µ = 205 pounds and standard deviation σ = 15 pounds. Based on this
information, what is the probability that all 49 boxes can be safely loaded onto the freight elevator and transported?
Solution – Given : µ = 205 , σ = 15 , n=49; Average total weight = 49*205 = 10045 > 9800
We know nothing about the original probability distribution weather its normal or not but from CLT we know sample mean will be normally distributed,
We are interested in the weight of 49 boxes not 1 so lets calculate : µ and σ for 49 boxes :

µ = 10045 , σ = 15*49= 735, now we need to know the probability that total weight would be <=9800 so X=9800
Z = (X - µ) / σ/ √n
(9800- 10045 ) / (735/7)
-245/105
Z = -2.33
let use z table to get the probability for z <= -2.33 → 0.0099

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. In Statistics Hypothesis can be any theory about the
data that we want to validate (generally accept or reject) – we will be mainly working of two type of hypotheses :

1. Null Hypothesis (H0) – Current Assumption or Theory which is currently assumed to be correct
2. Alternative Hypothesis (H1) – Claim or theory that we want to prove

Ex. H0: While flipping a coin the probability of getting head is 0.5; H1 : Probability of getting head is less than 0.5

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Hypothesis Testing
Validating the null hypothesis (H0) against some Alternative
Hypothesis (H1) based on some given sample data , Steps
involved in Hypothesis Testing using P values –

1. Define your Null and Alternative hypothesis, H0 & H1

2. Decide the type of test (One tail or two tail)
3. Define level of significance α , generally assumed to be 0.05 or 0.01
4. Find the Test Statistics TS (t Test or z test)
5. Find P Value
6. Reject the null hypothesis or you may accept the alternative
hypothesis if P < α

• P value – Probability of getting the given sample or even more

extreme samples if null hypothesis is true
• Significance level (α ) – Minimum acceptable P Value/ border line

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Type I Error or False Positive

Getting Positive result when it should be Negative in reality.

Rejecting null hypothesis (H0) while H0 is correct and should not be

rejected, Probability of Type I error is known as Alpha Risk.

Type II Error or False Negative

Getting Negative result when it should be Positive in reality.

Failing to Reject null hypothesis (H0) while H0 is not correct and should
be rejected, Probability of Type II error is known as Beta Risk.

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

Z-Test for Hypothesis Testing

Z-Test is used to perform hypothesis testing when we know population Standard Deviation (σ) or the sample size n > 30, Steps for 1
sample z test –

1. Define your Null and Alternative hypothesis, H0 & H1

2. Define level of significance α , generally assumed to be 0.05 or 0.01
3. Find the Test Statistics using TS = (X- µ)/ (σ/ √n) or TS = (X- µ)/ (s/ √n) {when σ is not known but n>=30}
4. Find P Value using Z Table and TS, if it’s a two sided test then double P value
5. Reject the null hypothesis or you may accept the alternative hypothesis if P < α

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

Advanced Statistics

T-Test for Hypothesis Testing

T-Test is used to perform hypothesis testing when we don’t know population Standard Deviation (σ) and sample size n < 30, Steps for
1 sample t test –

1. Define your Null and Alternative hypothesis, H0 & H1

2. Define level of significance α , generally assumed to be 0.05 or 0.01
3. Calculate Degree of Freedom DF = n-1
4. Find the Test Statistics using TS = (X- µ)/ (s/ √n)
5. Find P Value or P Range using T Table for calculated TS and DF
6. Reject the null hypothesis or you may accept the alternative hypothesis if P < α

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Data Science with Python

“Qs & As”

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

UCT PSY2015F Statistics 2023
No ratings yet
UCT PSY2015F Statistics 2023
34 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
MLCourseSlides
No ratings yet
MLCourseSlides
427 pages
Formula Sheet
No ratings yet
Formula Sheet
13 pages
Final Srb Unit 2
No ratings yet
Final Srb Unit 2
162 pages
Analyzing Inequalities An Introduction To Race Class Gender and Sexuality Using The General Social Survey 1st Edition Harnois Solutions Manual
No ratings yet
Analyzing Inequalities An Introduction To Race Class Gender and Sexuality Using The General Social Survey 1st Edition Harnois Solutions Manual
3 pages
Chapter 8
0% (1)
Chapter 8
55 pages
The Ultimate Guide To Effective Data Collection PDF
100% (1)
The Ultimate Guide To Effective Data Collection PDF
44 pages
Mastering Cbda: Save 80 Hours of Your CBDA Preparation Efforts
No ratings yet
Mastering Cbda: Save 80 Hours of Your CBDA Preparation Efforts
18 pages
7. Hypothesis Testing
No ratings yet
7. Hypothesis Testing
98 pages
Solved Jmi 2021 (Paper Ma Psychology Entrance Exam 2021)
No ratings yet
Solved Jmi 2021 (Paper Ma Psychology Entrance Exam 2021)
102 pages
NCM 111: Introduction To Nursing Research and Its Importance in Building Evidenced-Based Practice
No ratings yet
NCM 111: Introduction To Nursing Research and Its Importance in Building Evidenced-Based Practice
12 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Solution Manual For Modern Business Statistics 5th Edition by Anderson
No ratings yet
Solution Manual For Modern Business Statistics 5th Edition by Anderson
8 pages
PPT Module 3-FDS
No ratings yet
PPT Module 3-FDS
52 pages
Itis 1p97 Chapter 4 Notes
No ratings yet
Itis 1p97 Chapter 4 Notes
50 pages
QM Mid Exam Slides Sachin Gupta
No ratings yet
QM Mid Exam Slides Sachin Gupta
404 pages
Unit-5
No ratings yet
Unit-5
36 pages
Thomas L. Saaty
100% (1)
Thomas L. Saaty
18 pages
ES Chapter 4 Continuous Probability Distributions 2
No ratings yet
ES Chapter 4 Continuous Probability Distributions 2
82 pages
UCLAChapter 9
No ratings yet
UCLAChapter 9
30 pages
Research Methods Slide Show
100% (1)
Research Methods Slide Show
19 pages
Inferential Statistics and Linear Regression
No ratings yet
Inferential Statistics and Linear Regression
35 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
2022 07 28 - NormalDist
No ratings yet
2022 07 28 - NormalDist
25 pages
Complete Data Analysts RoadMap
No ratings yet
Complete Data Analysts RoadMap
47 pages
4 - Stat - Measures of Variation 2021
No ratings yet
4 - Stat - Measures of Variation 2021
26 pages
ERM 4b Final
No ratings yet
ERM 4b Final
114 pages
Sampling
No ratings yet
Sampling
50 pages
Inbound 588667172330667162
No ratings yet
Inbound 588667172330667162
30 pages
3-Lect- Finding the Center of Data Set. Mean, Median, Mode
No ratings yet
3-Lect- Finding the Center of Data Set. Mean, Median, Mode
29 pages
Python Javtpoint
50% (2)
Python Javtpoint
165 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
110 pages
Inferential Statistics FInal
No ratings yet
Inferential Statistics FInal
34 pages
Parametric and non parametric test
No ratings yet
Parametric and non parametric test
76 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
RMP470S Lecture 7 - One-Dimensionalstatistics
No ratings yet
RMP470S Lecture 7 - One-Dimensionalstatistics
27 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
Static Tics
No ratings yet
Static Tics
47 pages
Statistics
No ratings yet
Statistics
36 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
BUCSEP236P
No ratings yet
BUCSEP236P
45 pages
111
No ratings yet
111
7 pages
Descriptive Statistics and Probability Distributions: Session 1
No ratings yet
Descriptive Statistics and Probability Distributions: Session 1
34 pages
Analysis Part 2
No ratings yet
Analysis Part 2
71 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Lecture 2 Experimental Research
No ratings yet
Lecture 2 Experimental Research
6 pages
BBA116 Basic Statistics Weekly Summarised Notes
No ratings yet
BBA116 Basic Statistics Weekly Summarised Notes
87 pages
5 Random Var PDF
No ratings yet
5 Random Var PDF
74 pages
Mmw-Chapter 1docx-Pdf-Free
No ratings yet
Mmw-Chapter 1docx-Pdf-Free
5 pages
6.Lab Activity
No ratings yet
6.Lab Activity
23 pages
Data Science Question Bank Updated - Google Docs
No ratings yet
Data Science Question Bank Updated - Google Docs
15 pages
Business Research Objective Questions PDF
100% (1)
Business Research Objective Questions PDF
22 pages
Statistics Again?: Cal State Northridge Ψ 427 Andrew Ainsworth Phd
No ratings yet
Statistics Again?: Cal State Northridge Ψ 427 Andrew Ainsworth Phd
21 pages
Exercise CH 07
0% (1)
Exercise CH 07
11 pages
Module 25 - Statistics 2
No ratings yet
Module 25 - Statistics 2
9 pages
Option-Command-Esc Control-Command-Space Bar Space Bar
0% (1)
Option-Command-Esc Control-Command-Space Bar Space Bar
12 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
Internals Answers
No ratings yet
Internals Answers
53 pages
One Dimensional Statistics
No ratings yet
One Dimensional Statistics
21 pages
Lec 6
No ratings yet
Lec 6
20 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Statistics and Probability Reviewer Quarter 3
No ratings yet
Statistics and Probability Reviewer Quarter 3
19 pages
STATISTICS AND PROBABILITY REVIEWER
No ratings yet
STATISTICS AND PROBABILITY REVIEWER
7 pages
ECM1001 Formula Sheet
No ratings yet
ECM1001 Formula Sheet
15 pages
D2 Basic Stat
No ratings yet
D2 Basic Stat
53 pages
Where and When The Exam Is!!!: BM 1200 Quantitative Methods & Analytics
No ratings yet
Where and When The Exam Is!!!: BM 1200 Quantitative Methods & Analytics
11 pages
Market Research Midterm
100% (1)
Market Research Midterm
18 pages
AP Stats Cheat Sheet FINAL
No ratings yet
AP Stats Cheat Sheet FINAL
8 pages
MMW Midterm Reviewer
No ratings yet
MMW Midterm Reviewer
6 pages
Statistics 1 Revision Sheet
No ratings yet
Statistics 1 Revision Sheet
9 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
Activity
No ratings yet
Activity
11 pages
Ugc Net Solved
No ratings yet
Ugc Net Solved
38 pages
Build ETL Using Python
No ratings yet
Build ETL Using Python
7 pages
Formula Sheet_Test 2 - STAT4001
No ratings yet
Formula Sheet_Test 2 - STAT4001
5 pages
IGNOU Ms 66 Ans
No ratings yet
IGNOU Ms 66 Ans
17 pages
PT2__D__Answer
No ratings yet
PT2__D__Answer
3 pages
BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
ISDS 361A - Cheat Sheet Exam 1.pdf
No ratings yet
ISDS 361A - Cheat Sheet Exam 1.pdf
2 pages
Workday Sample Resume 3
No ratings yet
Workday Sample Resume 3
1 page
Alagappa University, Karaikudi SYLLABUS UNDER CBCS PATTERN (W.e.f. 2011-12)
No ratings yet
Alagappa University, Karaikudi SYLLABUS UNDER CBCS PATTERN (W.e.f. 2011-12)
26 pages
Data Science Using Python - Day 1-2
No ratings yet
Data Science Using Python - Day 1-2
25 pages
MAT2377 Final Formula Sheet
No ratings yet
MAT2377 Final Formula Sheet
4 pages
Statistical Analysis With Software Application
No ratings yet
Statistical Analysis With Software Application
3 pages
Business Research 2017-18 NOTES (IV-SEM-BMS-SSCBS-DU)
100% (3)
Business Research 2017-18 NOTES (IV-SEM-BMS-SSCBS-DU)
28 pages
Mann-Whitney U Test
No ratings yet
Mann-Whitney U Test
12 pages
Sampling Distributions of Sample Means
No ratings yet
Sampling Distributions of Sample Means
7 pages
Tips On Developing Indicators
No ratings yet
Tips On Developing Indicators
12 pages
Carepoint Review Center: Practice Test (Social Research/Research Methods)
No ratings yet
Carepoint Review Center: Practice Test (Social Research/Research Methods)
8 pages
WWWWW
No ratings yet
WWWWW
3 pages
Complete WD Report Writer Course PDF
75% (4)
Complete WD Report Writer Course PDF
235 pages
Complete WD Report Writer Course PDF
75% (4)
Complete WD Report Writer Course PDF
235 pages
PR2 Lesson 3. Variables
No ratings yet
PR2 Lesson 3. Variables
3 pages
LCSW Exam 1
83% (6)
LCSW Exam 1
25 pages
Using Workday Reports PDF
100% (1)
Using Workday Reports PDF
5 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
data science course training in india hyderabad: innomatics research labs
From Everand
data science course training in india hyderabad: innomatics research labs
innomatics research labs
No ratings yet

Data Science With Python

Uploaded by

Data Science With Python

Uploaded by

Data Science with Python

Day 3 - Statistics for Data Science - Basic & Advanced

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

✓ Basics of Statistics ✓ Probability Distributions

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Type of Random Variable - Based on Scale of Measurement

NOMINAL ORDINAL INTERVAL RATIO

• No Order • Order • Order • Order

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Mean(µ), Variance(σ2 ) and Standard Deviation(σ) -

Mean (µ) - Variance (σ2 ) - Standard Deviation (σ) -

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Mean(µ), Variance(σ2 ) and Standard Deviation(σ) -

Mean (µ) - Variance (σ2 ) - Standard Deviation (σ) -

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

• Area within 1 Std. Dev around the mean ~ 68.3 %

• Area within 2 Std. Dev around the mean ~ 95.4 %

• Area within 3 Std. Dev around the mean ~ 99.7 %

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Probability of ‘x’ occurrence using Poisson Distribution –

P(x) = (x e- )/x! ;  = Mean/Expected #Occurrence

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

“Qs & As”

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

✓ Inferential Statistics ✓ Hypothesis Testing

• Sampling • Hypothesis and hypothesis Testing

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Expected Value E(x) or Sampling Mean (µ XBar)-

Standard Error of the Mean (σ XBar)-

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Central Limit Theorem (CLT)

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Exercise - Central Limit Theorem

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

1. Define your Null and Alternative hypothesis, H0 & H1

• P value – Probability of getting the given sample or even more

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Type I Error or False Positive

Rejecting null hypothesis (H0) while H0 is correct and should not be

Type II Error or False Negative

Getting Negative result when it should be Positive in reality.

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

Z-Test for Hypothesis Testing

1. Define your Null and Alternative hypothesis, H0 & H1

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

T-Test for Hypothesis Testing

1. Define your Null and Alternative hypothesis, H0 & H1

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

“Qs & As”

18008338228 +65 31586636 +1(973) 598-3969 44 203-808-4216 www.cognixia.com • [email protected]

You might also like