ST Formula Sheet Midterm
ST Formula Sheet Midterm
Statistical approach
Statistical methods Sampling techniques Types of data
Random sampling:
Selection from the population in such Qualitative Quantitative
Descriptive Inferential way that every different sample of the
Description of the Using data from a (Categorical) Numerical measures
same size has an equal chance of Description of attributes or counts
properties of the sample to make selection
sample data forecasts about a
larger group Systematic sampling:
Selection of every kth experimental unit Nominal Ordinal Discrete Continuous
from a list of all experimental units No order Order on a scale Integers Decimals
Ways of obtaining data (e. g. hair colour) (e. g. ranking) (e. g. number of people) (e. g. speed)
§ Published source Stratified sampling:
§ Designed experiment Identification of subgroups, selection of a Identifier variable = Categorical variable with the special variable
§ Survey random sample within each subgroup that there is only one case in each category (e. g. ID number)
§ Observational study and putting them together
Cluster sampling:
Steps in a statistical study Division of a population into clusters and
random selection of some of these
1. Identify goals.
clusters
2. Draw a sample from a population.
3. Collect raw data and summarise. Convenience sampling:
4. Make inferences about population. Selection of experimental units that
5. Draw conclusions. convenient to reach
Key measures
16, 18, 18
Range Variance Standard deviation
= Difference between the largest and the = Degree of dispersion = Degree of dispersion
smallest value ∑!
89.("G * %̄ )^F
s2 = = ∑!
89.("G * %̄ )^F
Range = xlargest – xsmallest $*! s = √8^2 = ) =
("! * %̄ ): 7 ("F * %̄ ): 7 … 7 ("$ * %̄ ): $*!
Data display
Qualitative data
Quantitative data
Stem-and-leaf plot Dot plot Absolute frequency histogram Relative frequency histogram
Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
2 144677
3 028
4 1
Examining a distribution
Mode Skewness Unusual features Variation
K
§ Outlier Coefficient of variation: CV =
%̄
§ Cluster
§ Gap Interpretation:
CV < 1 ® Low variability
Whenever one of those is CV > 1 ® High variability
present, it is better to use the
median instead of the mean. The higher the CV, the greater the
Relative standing level of dispersion around the mean.
" * %̄
Z-score: z =
L Empirical Rule Chebyshev’s Rule
Interpretation:
z > 0 → Data value is above the mean
z < 0 → Data value is below the mean
z is close to 0 → Data value is not unusual
|z| > 2 → Data value is unusual
|z| > 3 → Data value is very unusual
pth percentile = Number such that p% of the
data fall below it
Box plot:
(Any distribution)
(Only normal (bell-shaped) distributions)
25% 25%
25% 25%
Interquartile range
with
X
Each case in the data Explanatory
Interpretation:
set is assigned to a / Predictor
/ Independent
dot of the form (Xi,Yi). variable Interpolation: Using values within the domain
Extrapolation: Using values outside the domain
Elements to describe a scatterplot:
Simpson’s Paradox
Statistical situation in which a
trend or relationship that is
observed between two
variables within multiple
groups disappears when the
Chi-squared statistic
Cramer’s V groups are combined
Difference between the observed counts and the counts that would according to a third variable
be expected if there were no relationship between the variables at all (lurking variable)
Probability
Fundamental conditions Disjoint Independent
1. For any event A: 0 ≤ P(A) ≤ 1 Events that have no outcomes in common If the outcome of one event does not influence the outcome of
2. P(S) = 1 (with S representing are called disjoint or mutually exclusive. another event, those events are independent.
the set of all possible
outcomes) Independent if:
Complement P(A∩B) = P(A) · P(B)
The set of outcomes that are not
Disjoint Potentially independent
in the event A is called the
complement of A, denoted AC.
Law of large numbers
Calculation rules As a random trial is repeated over and over again, the proportion of
Complement Rule: P(AC) = 1 – P(A) times that an event occurs gets closer and closer to a single value
(empirical probability).
Addition Rule (for disjoint events): P(A∪B) = P(A or B) = P(A) + P(B)
MN,0OP :E QG,OL 2 :RRNPL
Multiplication Rule (for disjoint events): P(A∩B) = P(A and B) = P(A) · P(B) Empirical probability (in the long run): P(A) =
MN,0OP :E QPG/9L
Conditional probability
P(B|A) is the probability of event B occurring, given that event A
occurs.
S(2∩3)
P(B|A) = P(B given A) = PAB =
S(2)
P(A) = Whole left circle P(A) = Left circle without intersection Sampling without replacement (drawn individual does not return to the pool) is an
instance of working with conditional probability. When dealing with a large population,
P(B) = Whole right circle P(B) = Right circle without intersection
sampling without replacement does not really matter. However, in a small population,
P(A∪B) = P(A) + P(B) – P(A∩B) P(A∪B) = P(A) + P(B) + P(A∩B) probabilities need to be adjusted accordingly.
add up to 1. S(3)
Example:
To calculate the probability of a final outcome, all
Given: P(Cancer) = 0.05, P(Smoker) = 0.10, P(Smoker|Cancer) = 0.20
probabilities of the branches leading towards that
;.); · ;.;3
outcome are multiplied together. ® P(Cancer|Smoker) = = 0.1
;.';
Binomial model
Bernoulli trials Probability
Trials with only two possible outcomes (success and failure), where If there are n Bernoulli trials given with a probability of success p, the probability
the probabilities are p for success and q = 1 – p for failure, and for of having k successful trials can be calculated like this:
which successive trials are independent D
P(X = k) = B(n,p,k) = ( ) · pr · qn-r with
® Binomial model examines the number of successful trials out of a E
total of n Bernoulli trials
Poisson distribution
Case of binomial distribution with a large number of trials (n → ∞), and a small probability of success (p → 0)
If a random variable X occurs in a Poisson distribution, the probability of having x events per unit of measurement is given by:
E(X) = Var(X) = L
= = √L
summaryking.escp.b1 Midterm Formula Sheet Page 4
Individual space