0% found this document useful (0 votes)

22 views28 pages

Statistics

The document provides an overview of key statistical concepts, including the distinction between population and sample, the importance of representativeness, and the role of estimators. It discusses various statistical methods such as hypothesis testing, confidence intervals, and measures of central tendency like mean, median, and mode, along with their applications. Additionally, it highlights common errors in statistical interpretation and the significance of sample size in research accuracy.

Uploaded by

wer.przysiezna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views28 pages

Statistics

Uploaded by

wer.przysiezna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Statistics

Weronika Przysiężna, Mikołaj Straszewicz, Piotr Wilkowski

26.03.2025

1 / 28
Population vs sample

Population it is a whole group that is of our interest (ex. all trees in a

forest, all inhabitants of a country).
Sample is a subset of a population which has representative meaning for
the whole set. The size of the sample is crucial for the accuracy and
reliability of further research (ex. 50 trees from a forest, 1000 citizens of
a country).

2 / 28
Representativeness of the Group

Representativeness of a sample means compatibility of features with

characteristics of the population from which the sample was drawn.
For example, if we know that the mean height of trees in a forest is 35
meters, we can expect that the mean height of trees in the sample will
be approximately 35 meters.
Properties for representatives samples:
randomness - every unit in the population has an equal probability
of being included in the sample,
quantity - larger sample sizes lead to more accurate and reliable
results.

3 / 28
Estimators

A parameter represents a numerical characteristic of a population, such

as the mean, median, variance, or standard deviation, and it remains
constant for that population unless the population itself changes.
An estimator is a function which is used to approximate a value of
parameter from a sample. Good estimator is:
Consistent - increasing the sample size increases the probability of
the estimator being close to the population parameter,
Efficient - where estimator has minimal error,
Unbiased - the expected value of the estimator equals the true
parameter.

4 / 28
Measurement error

A measurement error shows the difference between approximation and

real result.
Standard error is a theoretical term that represents the approximation
error of a parameter if the research is repeated multiple times. In
different words this value reflects the variation in a measurement due to
differences between samples.

5 / 28
Optimal sample size
The optimal sample size is typically 30. As this number is both easy
to collect and sufficiently large to ensure accuracy and reliability.
However, the rule of 30 does not always apply, as it depends on the
specific characteristics of the population being studied.

Figure: Standard error and sample size relation

6 / 28
Histogram
An easy and graphical way to present research results is to show them in
a histogram. This method allows us to identify outcomes that
significantly deviate from others.

Figure: Exemplary histogram

7 / 28
Normal distribution

A normal distribution or Gaussian distribution is a type of

continuous probability distribution for a real-valued random variable.
The general form of its probability density function is
(x−µ)2
f (x) = √ 12 e − 2σ2 where µ - mean and σ - standard deviation. The
σ 2π
normal distribution is easy to analyze due to its mathematical simplicity
and predictable properties, making it an extremely practical tool in
statistics and data analysis.

8 / 28
The Central Limit Theorem
The Central Limit Theorem states: Regardless of the shape of the
population distribution, with random and independent measurements
taken from it, the distribution of sample means will approach a normal
distribution, - and the more observations we collect, the closer it gets to
normality.

Figure: Sampling distribution and sample size

9 / 28
Interval estimation

Calculating probabilities for specific values of the standard

Standardizing the normal probability distribution makes it possible
to accurately estimate the level of "trust," known in statistics as
the confidence level (typically set at 95%).
Based on the chosen confidence level (e.g., 95%), a specific range
within the standard normal distribution is called the confidence
interval. In a normal distribution, 95% of the values fall within
±1.96 standard deviations from the mean.

10 / 28
Null Hypothesis Significance Testing (NHST)

Null Hypothesis Significance Testing

In this approach, verification is not about determining the probability
that the alternative hypothesis is true — the one claiming that an effect
exists. Instead, it is about rejecting the null hypothesis, which states
that there is no effect.

research hypotheses often assume very specific effects (directional

hypotheses),
statistical hypotheses (typically for two-tailed tests) only describe
general relationships (non-directional hypotheses).

11 / 28
Null and Alternative Hypotheses

Null Hypothesis (H0 ): No effect or difference.

Alternative Hypothesis (HA ): There is an effect or difference.

H0 : µ 1 = µ 2
HA : µ1 ̸= µ2

The Greek symbols used here (µ1 , µ2 ) indicate that we are referring to
population means, not sample means.
According to the NHST approach, the null hypothesis is superior to the
alternative hypothesis because it is the one that is actually tested.
Accepting (more precisely: having no grounds to reject) or
rejecting the null hypothesis is not evidence of the non-existence
or existence of a particular effect or relationship!

12 / 28
Type I and Type II Errors
Type I Error (α): Rejecting H0 when it is true (False Positive).
Type II Error (β): Failing to reject H0 when HA is true (False Negative).

13 / 28
Significance Level (α) and Confidence Level
Significance Level (α): Probability of a Type I error, commonly
set at 5% (sometimes 1% or 10%).

14 / 28
Significance Level (α) and Confidence Level

Confidence Level (α − 1): Percentage of confidence intervals

(e.g., for a 95% level, there are 95%) that, estimated from an
infinite number of repetitions of a given study, contain the true
value of the estimated parameter.

15 / 28
Confidence Intervals
Confidence intervals (CI) are used to illustrate how reliable the estimator
obtained in the study is and are related to the standard error.

16 / 28
Power of the Test
it is the probability of avoiding a Type II error
the greater the test’s power, the better its ability to reject the null
hypothesis (if it is indeed false!)
the sample should be chosen to achieve a power of at least 80%
power = 1 − β

17 / 28
Test Statistic
calculated from sample data; used to decide if H0 should be rejected
when running a statistical test, we calculate the probability of
obtaining a specific value of the test statistic based on our sample
the smaller the probability (p-value) of obtaining your result
under H0 , the stronger the indication that your result is
significant and that the null hypothesis might be false

18 / 28
p-Value and Critical Value

If the probability of getting a test statistic at least as extreme as the one

calculated from the sample is less than 5% (p < 0.05), we reject the null
hypothesis.
p-value: probability of getting data as extreme as observed,
assuming H0 is true,
critical value: the point beyond which we consider the result
meaningful, not due to chance.

19 / 28
p-Value and Critical Value
Example: If p = 0.03 and α = 0.05, we reject H0 .

20 / 28
Common mistakes

Common mistakes in reporting and interpreting statistical results:

confusing p-value with significance level α,

wrong interpretation of the p-value,
ignoring sample size effects,
misinterpreting non-significant results,
low-powered tests leading to overestimated effects,
publication bias / winner’s curse.

21 / 28
Statistical Hypothesis Testing Process

22 / 28
Mean (Average)

Definition: The mean, or average, is the sum of all data values divided
by the number of values.

Formula:
n
1X
x̄ = xi
n
i=1

Example:
For the data: 5, 7, 8, 10, 10
Mean = 5+7+8+10+10
5 =8
Use: Useful for understanding the central tendency of a dataset.

23 / 28
Median

Definition: The median is the middle value of an ordered dataset. If

there is an even number of values, it is the average of the two middle
ones.

Example:
Data: 3, 5, 7, 8, 10
Median = 7
Data: 3, 5, 7, 8
Median = 5+72 =6
Use: Useful when data contains outliers.

24 / 28
Quantiles

Definition: Quantiles divide a dataset into equal-sized intervals.

Common quantiles include quartiles (4 parts), deciles (10 parts), and
percentiles (100 parts).

Example:
The first quartile (Q1) is the 25th percentile, the median (Q2) is the
50th percentile, and the third quartile (Q3) is the 75th percentile.
Use: Helps describe the distribution and spread of data.

25 / 28
Mode (Dominant Value)

Definition: The mode is the value that appears most frequently in a

dataset. A dataset can have more than one mode.

Example:
Data: 3, 4, 4, 5, 6, 6, 6, 7
Mode = 6
Use: Useful for categorical data or to identify common values.

26 / 28
Variance

Definition: Variance measures how far data values are spread out from
the mean.

Formula:
n
1X
σ2 = (xi − x̄)2
n
i=1

Example:
If data = 2, 4, 4, 4, 5, 5, 7, 9 and mean = 5
Then variance = average of squared differences from 5
Use: Important in probability and statistical modeling.

27 / 28
Standard Deviation

Definition: Standard deviation is the square root of the variance. It

indicates how much the values typically differ from the mean.

Formula: v
u n
u1 X
σ=t (xi − x̄)2
n
i=1

Use: Easier to interpret than variance because it has the same unit as
the data. Useful for comparing variability.

28 / 28

Hypothesis Testing and Errors Explained
67% (3)
Hypothesis Testing and Errors Explained
37 pages
Presenting Data: Descriptive Statistics
No ratings yet
Presenting Data: Descriptive Statistics
21 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
2statistical Analysis of Data 2
No ratings yet
2statistical Analysis of Data 2
43 pages
250 Lec 5 Fall 13
No ratings yet
250 Lec 5 Fall 13
42 pages
Statistics
No ratings yet
Statistics
33 pages
‎⁨محاضرة اولى احصاء⁩
No ratings yet
‎⁨محاضرة اولى احصاء⁩
16 pages
Intro to Statistical Analysis
No ratings yet
Intro to Statistical Analysis
42 pages
Session 2
No ratings yet
Session 2
49 pages
IB372 FA10 Lab01 Intro Statistics Presentation
100% (1)
IB372 FA10 Lab01 Intro Statistics Presentation
75 pages
MMW Data Management
No ratings yet
MMW Data Management
87 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Normal Distribution
No ratings yet
Normal Distribution
8 pages
Lecture 5: Chapter 5 Statistical Analysis of Data Yes The "S" Word
No ratings yet
Lecture 5: Chapter 5 Statistical Analysis of Data Yes The "S" Word
42 pages
Maureen Kobusingye COURSE WORK.
No ratings yet
Maureen Kobusingye COURSE WORK.
17 pages
Inferential Statistics
No ratings yet
Inferential Statistics
23 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
40 pages
RM Module 3
No ratings yet
RM Module 3
30 pages
Lecture 1
No ratings yet
Lecture 1
72 pages
Understanding Biostatistics Concepts
No ratings yet
Understanding Biostatistics Concepts
56 pages
Lecture 7.descriptive and Inferential Statistics
100% (1)
Lecture 7.descriptive and Inferential Statistics
44 pages
Statistics Notes BS
No ratings yet
Statistics Notes BS
11 pages
Biostatistics Notes: Descriptive Statistics
No ratings yet
Biostatistics Notes: Descriptive Statistics
16 pages
Biostatistics Notes
100% (1)
Biostatistics Notes
8 pages
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
No ratings yet
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
26 pages
Statistics Is A Branch of
No ratings yet
Statistics Is A Branch of
6 pages
Data Visualization Notes Ou
No ratings yet
Data Visualization Notes Ou
125 pages
DV Unit 1&2 Notes
No ratings yet
DV Unit 1&2 Notes
50 pages
Lecture 2 - MAT361 (21 JAN 2025)
No ratings yet
Lecture 2 - MAT361 (21 JAN 2025)
40 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
37 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
Statistical Inference Basics Explained
No ratings yet
Statistical Inference Basics Explained
24 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
Week - 1 Day - 2 Inferential Statistics
No ratings yet
Week - 1 Day - 2 Inferential Statistics
34 pages
ML Unit 3
No ratings yet
ML Unit 3
46 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Data Analysis: Hypothesis Testing Basics
No ratings yet
Data Analysis: Hypothesis Testing Basics
17 pages
The Most Important Probability Distribution in Statistics
100% (1)
The Most Important Probability Distribution in Statistics
57 pages
Unit IV - Analytics Tasks (Students)
No ratings yet
Unit IV - Analytics Tasks (Students)
127 pages
Key Business Statistics Exam Questions
No ratings yet
Key Business Statistics Exam Questions
8 pages
PSM 201 Sampling Distributions and Hypothesis Testing
No ratings yet
PSM 201 Sampling Distributions and Hypothesis Testing
31 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
Statss 2
No ratings yet
Statss 2
7 pages
Inferential Statistics
No ratings yet
Inferential Statistics
48 pages
Psych Stats Reviewer
No ratings yet
Psych Stats Reviewer
21 pages
Statistics SS2020
No ratings yet
Statistics SS2020
12 pages
Sampling
No ratings yet
Sampling
11 pages
Understanding Hypothesis Testing in Statistics
No ratings yet
Understanding Hypothesis Testing in Statistics
18 pages
Lecture 4 - Data Science Statistics
No ratings yet
Lecture 4 - Data Science Statistics
21 pages
Begreber Note For Statistics
No ratings yet
Begreber Note For Statistics
17 pages
90156hypothesis Testing
No ratings yet
90156hypothesis Testing
34 pages
What Is A Probability Distribution
No ratings yet
What Is A Probability Distribution
11 pages
Part 4
No ratings yet
Part 4
25 pages
Inferential Statistics for Data Science
No ratings yet
Inferential Statistics for Data Science
48 pages
Sampling Theory
No ratings yet
Sampling Theory
7 pages
Scatter Plot Advertisment Vs Sales
No ratings yet
Scatter Plot Advertisment Vs Sales
5 pages
Nursing Statistics Assignment
100% (1)
Nursing Statistics Assignment
7 pages
B11
100% (1)
B11
20 pages
Handout#3 - Statistical Inference, Z and T Test
No ratings yet
Handout#3 - Statistical Inference, Z and T Test
3 pages
DP 1 Paper 2 EOY Exam QP
No ratings yet
DP 1 Paper 2 EOY Exam QP
10 pages
Business Research Method: Unit 5
No ratings yet
Business Research Method: Unit 5
19 pages
Parenting Styles & Nursing Students' Performance
No ratings yet
Parenting Styles & Nursing Students' Performance
32 pages
Engineering Students' Exam Performance Analysis
No ratings yet
Engineering Students' Exam Performance Analysis
6 pages
Marketing Skill and Promotion of Small and Medium Scale Enterprise
No ratings yet
Marketing Skill and Promotion of Small and Medium Scale Enterprise
22 pages
1.1 Introduction To Hypothesis Testing
No ratings yet
1.1 Introduction To Hypothesis Testing
10 pages
06 Statistical Inference
No ratings yet
06 Statistical Inference
60 pages
Social Stability and Promotion in The Communist Party of China
No ratings yet
Social Stability and Promotion in The Communist Party of China
13 pages
Online Shopping Behavior of It Employees
No ratings yet
Online Shopping Behavior of It Employees
6 pages
Validity and Reliability in Research Instruments
No ratings yet
Validity and Reliability in Research Instruments
21 pages
Lesson 6 - Chi-Square
No ratings yet
Lesson 6 - Chi-Square
18 pages
Data Analytics Study Guide
No ratings yet
Data Analytics Study Guide
3 pages
Stats Project 2024
No ratings yet
Stats Project 2024
13 pages
Research Hypothesis and Design Guide
100% (1)
Research Hypothesis and Design Guide
13 pages
Applications of Z-Test in Business Decisions
No ratings yet
Applications of Z-Test in Business Decisions
2 pages
Critical Value for Right-Tailed Test
No ratings yet
Critical Value for Right-Tailed Test
8 pages
Applied Statistics and Probability Exercises
No ratings yet
Applied Statistics and Probability Exercises
48 pages
Solutions 11
100% (1)
Solutions 11
2 pages
Factors Influencing The Growth of Hair Salon Enterprises in Kenya: A Surveyof Hair Salon Enterprises in Kisii Town
No ratings yet
Factors Influencing The Growth of Hair Salon Enterprises in Kenya: A Surveyof Hair Salon Enterprises in Kisii Town
15 pages
PHP Go M VSD
No ratings yet
PHP Go M VSD
33 pages
Model 1 Ompt A PDF
100% (1)
Model 1 Ompt A PDF
9 pages
Test of Significance T-Test
No ratings yet
Test of Significance T-Test
23 pages
Econ 251 PS5 Solutions
No ratings yet
Econ 251 PS5 Solutions
16 pages
Factors Influencing Foreign Direct Investment
No ratings yet
Factors Influencing Foreign Direct Investment
22 pages
Contribution of Financial Literacy and Demographics On Investment Decision
No ratings yet
Contribution of Financial Literacy and Demographics On Investment Decision
5 pages
Chapter 4 (Hypothesis Testing)
No ratings yet
Chapter 4 (Hypothesis Testing)
20 pages

Statistics

Uploaded by

Statistics

Uploaded by

Statistics

Weronika Przysiężna, Mikołaj Straszewicz, Piotr Wilkowski

Population it is a whole group that is of our interest (ex. all trees in a

Representativeness of a sample means compatibility of features with

A parameter represents a numerical characteristic of a population, such

A measurement error shows the difference between approximation and

Figure: Standard error and sample size relation

Figure: Exemplary histogram

A normal distribution or Gaussian distribution is a type of

Figure: Sampling distribution and sample size

Calculating probabilities for specific values of the standard

Null Hypothesis Significance Testing

research hypotheses often assume very specific effects (directional

Null Hypothesis (H0 ): No effect or difference.

Confidence Level (α − 1): Percentage of confidence intervals

If the probability of getting a test statistic at least as extreme as the one

Common mistakes in reporting and interpreting statistical results:

confusing p-value with significance level α,

Definition: The median is the middle value of an ordered dataset. If

Definition: Quantiles divide a dataset into equal-sized intervals.

Definition: The mode is the value that appears most frequently in a

Definition: Standard deviation is the square root of the variance. It

You might also like