0% found this document useful (0 votes)

60 views21 pages

Formulas

The document appears to be a chapter containing formulas, R commands, and descriptions related to statistics and probability. It includes formulas and explanations for key concepts like mean, median, variance, standard deviation, probability density functions, cumulative distribution functions, and how to calculate these values and properties for discrete and continuous random variables. It also provides the R commands for calculating some of these common statistical measures.

Uploaded by

Samantha Marie Rebolledo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views21 pages

Formulas

Uploaded by

Samantha Marie Rebolledo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter A

Appendix A

Collection of formulas and R commands

Chapter A

Contents

A Collection of formulas and R commands

A.1 Introduction, descriptive statistics, R and data visualization . . . . . . . . . . 1
A.2 Probability and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A.2.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
A.3 Statistics for one and two samples . . . . . . . . . . . . . . . . . . . . . . . . . 9
A.4 Simulation based statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
A.5 Simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
A.6 Multiple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
A.7 Inference for proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
A.8 Comparing means of multiple groups - ANOVA . . . . . . . . . . . . . . . . . 16

Glossaries 18

Acronyms 19
Chapter A A.1 INTRODUCTION, DESCRIPTIVE STATISTICS, R AND DATA
VISUALIZATION 1

This appendix chapter holds a collection of formulas. All the relevant equations from def-
initions, methods and theorems are included – along with associated R functions. All are
in included in the same order as in the book, except for the distributions which are listed
together.

A.1 Introduction, descriptive statistics, R and data

visualization

Description Formula R command

Sample mean 1 n
n i∑
1.4 x̄ = xi mean(x)
The mean of a sample. =1

Sample median
The value that divides a sam- 
x
( n+ 1 for odd n
1.5 ple in two halves with equal 2 ) median(x)
Q2 = x ( n ) + x ( n +2 )
2
number of observations in 
2
2
for even n
each.
Sample quantile
The value that divide a sam- (x
(np) + x(np+1)
ple such that p of the obser- for pn integer quantile(x,p,type=2),
1.7 qp = 2
vations are less that the value. x(dnpe) for pn non-integer
The 0.5 quantile is the Me-
dian.
Sample quartiles Q0 = q0 = “minimum”
The quartiles are the five Q1 = q0.25 = “lower quartile” quantile(x,
quantiles dividing the sample Q2 = q0.5 = “median” probs,type=2)
1.8
in four parts, such that each where
Q3 = q0.75 = “upper quartile”
part holds an equal number of probs=p
Q4 = q1 = “maximum”
observations
Sample variance
n
The sum of squared differ- 1
n − 1 i∑
1.10 s2 = ( xi − x̄ )2 var(x)
ences from the mean divided =1
by n − 1.
Sample standard deviation s
√ 1 n

n − 1 i∑
1.11 The square root of the sample s= s2 = ( xi − x̄ )2 sd(x)
variance. =1

Sample coefficient of vari-

ance
s
1.12 The sample standard devia- V= sd(x)/mean(x)
x̄
tion seen relative to the sam-
ple mean.
Sample Inter Quartile Range
1.15 IQR: The middle 50% range of IQR = Q3 − Q1 IQR(x, type=2)
data
Chapter A A.1 INTRODUCTION, DESCRIPTIVE STATISTICS, R AND DATA
VISUALIZATION 2

Description Formula R command

Sample covariance
1.18 Measure of linear strength of s xy = 1
n −1 ∑in=1 ( xi − x̄ ) (yi − ȳ) cov(x,y)
relation between two samples
Sample correlation
Measure of the linear strength
xi − x̄

yi −ȳ

s xy
1.19 r= 1
n −1 ∑in=1 sx sy = s x ·sy cor(x,y)
of relation between two sam-
ples between -1 and 1.
Chapter A A.2 PROBABILITY AND SIMULATION 3

A.2 Probability and Simulation

Description Formula R command

Probability density function

(pdf) for a discrete variable
dnorm,dbinom,dhyper,
2.6 fulfills two conditions: f ( x ) ≥ f ( x ) = P( X = x )
dpois
0 and ∑all x f ( x ) = 1 and finds
the probality for one x value.
Cumulated distribution
function (cdf)
pnorm,pbinom,phyper,
2.9 gives the probability in a F ( x ) = P( X ≤ x )
ppois
range of x values where
P ( a < X ≤ b ) = F ( b ) − F ( a ).
Mean of a discrete random
2.13 variable µ = E( X ) = ∑i∞=1 xi f ( xi )

Variance of a discrete ran-

2.16 dom variable X σ2 = Var( X ) = E[( X − µ)2 ]

Pdf of a continuous random

variable
is a non-negative function for Rb
2.32 P( a < X ≤ b) = a
f ( x )dx
all possible outcomes and has
an area below the function of
one
Cdf of a continuous random
variable Rx
2.33 is non-decreasing F ( x ) = P( X ≤ x ) = −∞ f (u)du
and limx→−∞ F ( x ) =
0 and limx→∞ F ( x ) = 1
Mean and variance for a con- R∞
µ = E( X ) = −∞ x f ( x )dx
2.34 tinuous random variable X R∞
σ2 = E[( X − µ)2 ] = −∞ ( x − µ)2 f ( x )dx

Mean and variance of a linear

function
E( aX + b) = a E( X ) + b
2.54 The mean and variance of a
linear function of a random V( aX + b) = a2 V( X )
variable X.
Mean and variance of a linear E ( a 1 X1 + a 2 X2 + · · · + a n X n ) =
combination a 1 E ( X1 ) + a 2 E ( X2 ) + · · · + a n E ( X n )
2.56 The mean and variance of a V ( a 1 X1 + a 2 X2 + . . . + a n X n ) =
linear combination of random
a21 V( X1 ) + a22 V( X2 ) + · · · + a2n V( Xn )
variables.
Chapter A A.2 PROBABILITY AND SIMULATION 4

Description Formula R command

Covariance
The covariance between be
2.58 Cov( X, Y ) = E [( X − E[ X ])(Y − E[Y ])]
two random variables X and
Y.
Chapter A A.2 PROBABILITY AND SIMULATION 5

A.2.1 Distributions

Here all the included distributions are listed including some important theorems and definitions
related specifically with a distribution.

Description Formula R command

Binominal distribution
f ( x; n, p) = P( X = x ) dbinom(x, size, prob)
n is the number of indepen-
n x pbinom(q, size, prob)
dent draws and p is the prob- = p (1 − p ) n − x
x qbinom(p, size, prob)
2.20 ability of a success in each
n n! rbinom(n, size, prob)
draw. The Binominal pdf de- where = where
scribes the probability of x x x!(n − x )!
size=n, prob=p
succeses.
Mean and variance of a bino-
µ = np
2.21 mial distributed random vari-
able. σ2 = np(1 − p)

Hypergeometric distribution f ( x; n, a, N ) = P( X = x ) dhyper(x,m,n,k)

n is the number of draws ( xa )( Nn−−xa) phyper(q,m,n,k)
without replacement, a is = qhyper(p,m,n,k)
2.24 ( Nn )
number of succeses and N is rhyper(nn,m,n,k)
the population size. a a!
where = where
b b!( a − b)!
m=a, n=N − a, k=n
Mean and variance of a hyper- a
µ=n
geometric distributed random N
2.25 a ( N − a) N − n
variable. σ2 = n
N2 N−1

Poisson distribution dpois(x,lambda)

λ is the rate (or intensity) i.e. ppois(q,lambda)
the average number of events λ x −λ qpois(p,lambda)
2.27 f ( x; λ) = e
per interval. The Poisson pdf x! rpois(n,lambda)
describes the probability of x where
events in an interval. lambda=λ
Mean and variance of a Pois-
µ=λ
2.28 son distributed random vari-
able. σ2 = λ

Uniform distribution 0

 for x < α
α and β defines the range of f ( x; α, β) = 1
for x ∈ [α, β] dunif(x,min,max)
β−α
possible outcomes. random 
 punif(q,min,max)
0 for x > β
variable following the uni- qunif(p,min,max)
2.35

form distribution has equal 0

 for x < α runif(n,min,max)
density at any value within a F ( x; α, β) = x −α
for x ∈ [α, β] where
defined range.  β−α
 min=α, max=β
0 for x > β
Chapter A A.2 PROBABILITY AND SIMULATION 6

Description Formula R command

Mean and variance of a uni- 1

µ= (α + β)
form distributed random vari- 2
2.36 1
able X. σ2 = ( β − α )2
12
dnorm(x,mean,sd)
pnorm(q,mean,sd)
Normal distribution ( x − µ )2
1 qnorm(p,mean,sd)
2.37 Often also called the Gaussian f ( x; µ, σ) = √ e− 2σ2
σ 2π rnorm(n,mean,sd)
distribution.
where
mean=µ, sd=σ.
Mean and variance of a nor-
µ
2.38 mal distributed random vari-
able. σ2

Transformation of a normal
distributed random variable X−µ
2.43 Z=
X into a standardized normal σ
random variable.

dlnorm(x,meanlog,sdlog)
Log-normal distribution
plnorm(q,meanlog,sdlog)
α is the mean and β2 is the 2
1 − (ln x−α) qlnorm(p,meanlog,sdlog)
2.46 variance of the normal distri- f (x) = √ e 2β2
x 2πβ rlnorm(n,meanlog,sdlog)
bution obtained when taking
where
the natural logarithm to X.
meanlog=α, sdlog=β.
Mean and variance of a log- 2 /2
µ = eα+ β
normal distributed random
2 2
2.47 variable. σ2 = e2α+ β (e β − 1)

dexp(x,rate)
( pexp(q,rate)
2.48
Exponential distribution λe−λx for x ≥ 0 qexp(p,rate)
λ is the mean rate of events. f ( x; λ) = rexp(n,rate)
0 for x < 0
where
rate=λ.
Mean and variance of a ex- 1
µ=
ponential distributed random λ
2.49 1
variable. σ2 = 2
λ
dchisq(x,df)
pchisq(q,df)
χ2 -distribution
1 x qchisq(p,df)
x 2 −1 e − 2 ;
ν
Γ ν2 is the Γ-function and ν is

2.78 f (x) = x≥0
2 Γ 2
ν ν
2 rchisq(n,df)
the degrees of freedom.
where
df=ν.
Chapter A A.2 PROBABILITY AND SIMULATION 7

Description Formula R command

Given a sample of size n from

the normal distributed ran-
dom variables Xi with vari-
ance σ2 , then the sample vari-
ance S2 (viewed as random
( n − 1) S2
2.81 χ2 =
variable) can be transformed σ2
to follow the χ2 distribution
with the degrees of freedom
ν = n − 1.
Mean and variance of a χ2 dis- E( X ) = ν
2.83
tributed random variable. V ( X ) = 2ν
t-distribution
ν is the degrees of freedom Γ ( ν+ 1 − ν+2 1
2 )

2.86 t2
f T (t) = √ 1+
and Γ() is the Gamma func- νπ Γ( 2ν ) ν

tion.
dt(x,df)
Relation between normal pt(q,df)
random variables and χ2 - Z qt(p,df)
2.87 X= √ ∼ t(ν)
distributed random variables. Y/ν rt(n,df)
Z ∼ N (0, 1) and Y ∼ χ2 (ν). where
df=ν.
For normal distributed ran-
dom variables X1 , . . . , Xn , the
random variable follows the
t-distribution, where X is the X−µ
2.89 T= √ ∼ t ( n − 1)
sample mean, µ is the mean of S/ n
X, n is the sample size and S
is the sample standard devia-
tion.
Mean and variance of a t- µ = 0; ν>1
2.93 distributed variable X. σ2 =
ν
; ν>2
ν−2
ν21 df(x,df1,df2)
F-distribution 1 ν1
f F (x) = pf(q,df1,df2)
ν1 an ν2 are the degrees of ν1 ν2

B 2, 2 ν2
qf(p,df1,df2)
2.95 freedom and B(·, ·) is the Beta − ν1 +2 ν2
ν1 ν1 rf(n,df1,df2)
function. · x 2 −1 1 + x where
ν2
df1=ν1 ,df2=µ2 .
The F-distribution appears as
the ratio between two inde-
U/ν1
2.96 pendent χ2 -distributed ran- ∼ F (ν1 , ν2 )
V/ν2
dom variables with U ∼
χ2 (ν1 ) and V ∼ χ2 (ν2 ).
Chapter A A.2 PROBABILITY AND SIMULATION 8

Description Formula R command

X1 , . . . , Xn1 and Y1 , . . . , Yn2

with the mean µ1 and µ2
S12 /σ12
2.98 and the variance σ12 and σ22 ∼ F (n1 − 1, n2 − 1)
is independent and sampled S22 /σ22
from a normal distribution.
Mean and variance of a F- ν2
µ= ; ν2 > 2
distributed variable X. ν2 − 2
2.101 2ν22 (ν1 + ν2 − 2)
σ= ; ν2 > 4
ν1 (ν2 − 2)2 (ν2 − 4)
Chapter A A.3 STATISTICS FOR ONE AND TWO SAMPLES 9

A.3 Statistics for one and two samples

Description Formula R command

1 n 2

The distribution of the mean σ
3.3
of normal random variables.
X̄ = ∑
n i =1
Xi ∼ N µ,
n
The distribution of the σ-
X̄ − µ
√ ∼ N 0, 12

3.5 standardized mean of normal Z=
σ/ n
random variables
The distribution of the S-
X̄ − µ
3.5 standardized mean of normal T= √ ∼ t ( n − 1)
S/ n
random variables
Standard Error of the mean s
SEx̄ = √
3.7 n
The one sample confidence in- s
3.9 x̄ ± t1−α/2 · √
terval for µ n
X̄ − µ
3.14 Central Limit Theorem (CLT) Z= √
σ/ n
" #
( n − 1 ) s 2 ( n − 1) s2
σ2 : ;
Confidence interval for the χ21−α/2 χ2α/2
3.19 variance and standard devia- "s s #
( n − 1) s2 ( n − 1) s2
tion σ: ;
χ21−α/2 χ2α/2

The p-value is the probability of obtain-

ing a test statistic that is at least as ex-
treme as the test statistic that was actu-
3.22 The p-value P(T>x)=2(1-pt(x,n-1))
ally observed. This probability is calcu-
lated under the assumption that the null
hypothesis is true.

p-value = 2 · P( T > |tobs |)

x̄ − µ0
The one-sample t-test statistic tobs = √
3.23 s/ n
and p-value
H0 : µ = µ0

Rejected: p-value < α

3.24 The hypothesis test
Accepted: otherwise
3.29 Significant effect An effect is significant if the p-value< α
The critical values: α/2- and
1 − α/2-quantiles of the t-
3.31 tα/2 and t1−α/2
distribution with n − 1 de-
grees of freedom
The one-sample hypothesis Reject: |tobs | > t1−α/2
3.32
test by the critical value accept: otherwise
Chapter A A.3 STATISTICS FOR ONE AND TWO SAMPLES 10

Description Formula R command

x̄ ± t1−α/2 · √sn
3.33 Confidence interval for µ
acceptance region/CI: H0 : µ = µ0
Test: H0 : µ = µ0 and H1 : µ 6= µ0 by
p-value = 2 · P( T > |tobs |)
3.36 The level α one-sample t-test
Reject: p-value < α or |tobs | > t1−α/2
Accept: Otherwise
The one-sample confidence
z1−α/2 ·σ 2
3.63 interval (CI) sample size for- n= ME
mula
The one-sample sample size 2
z +z
3.65 n = σ 1−(µβ −µ1−)α/2
formula 0 1

naive approach: pi = ni , i = 1, . . . , n
The Normal q-q plot with
3.42 commonly aproach: pi = in−+0.5 1, i =
n > 10
1, . . . , n

δ = µ2 − µ1
The (Welch) two-sample t-test H0 : δ = δ0
3.49 ( x̄ − x̄ )−δ
statistic tobs = √ 21 2 2 0
s1 /n1 +s2 /n2

( X̄ − X̄ )−δ
T = √ 21 2 2 0
S /n1 +S2 /n2
1 2
The distribution of the s s2
2
3.50 1
n +n
2
(Welch) two-sample statistic ν=
1 2
(s21 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1

Test: H0 : µ1 − µ2 = δ0 and H1 : µ1 −
µ2 6= δ0 by p-value = 2 · P( T > |tobs |)
3.51 The level α two-sample t-test
Reject: p-value < α or |tobs | > t1−α/2
Accept: Otherwise
The pooled two-sample esti- (n1 −1)s21 +(n2 −1)s22
3.52 s2p = n1 + n2 −2
mate of variance

δ = µ1 − µ2
The pooled two-sample t-test H0 : δ = δ0
3.53 ( x̄ − x̄ )−δ
statistic tobs = √ 21 2 2 0
s p /n1 +s p /n2

The distribution of the pooled ( X̄ − X̄ )−δ

3.54 T = √ 21 2 2 0
two-sample t-test statistic S p /n1 +S p /n2
q
s21 s22
x̄ − ȳ ± t1−α/2 · n1 + n2
2
The two-sample confidence s2 s2

1 2
3.47 n1 + n2
interval for µ1 − µ2 ν= (s21 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1
Chapter A A.4 SIMULATION BASED STATISTICS 11

A.4 Simulation based statistics

Description Formula R command

The non-linear approximative 2

∂f
4.3 σ2f (X
1 ,...,Xn )
= ∑in=1 σi2
error propagation rule ∂xi

1. Simulate k outcomes
Non-linear error propagation 2. Calculate the
4.4
by simulation q standard deviation by
s f (X ,...,Xn ) = k−1 1 ∑ik=1 ( f j − f¯)2
sim
1

Confidence interval for any 1.Simulate k samples

4.7 feature θ by parametric boot- 2.Calculate the hstatistic θ̂ i
strap ∗
3.Calculate CI: q100 , q ∗
(α/2)% 100(1−α/2)%

Two-sample confidence in- 1.Simulate k sets of 2 samples

terval for any feature com- ∗ − θ̂ ∗
2.Calculate the statistic θ̂ xk
4.10 yk
parison θ1 − θ2 by parametric h i
∗ ∗
3.Calculate CI: q100(α/2)% , q100(1−α/2)%
bootstrap
Chapter A A.5 SIMPLE LINEAR REGRESSION 12

A.5 Simple linear regression

Description Formula R command

∑in=1 (Yi − Ȳ )( xi − x̄ )
β̂ 1 =
Sxx
5.4 Least square estimators β̂ 0 = Ȳ − β̂ 1 x̄
where Sxx = ∑in=1 ( xi − x̄ )2

σ2 x̄2 σ2
V[ β̂ 0 ] = +
n Sxx
σ 2
5.8 Variance of estimators V[ β̂ 1 ] =
Sxx
x̄σ2
Cov[ β̂ 0 , β̂ 1 ] = −
Sxx

β̂ 0 − β 0,0
Tβ0 =
σ̂β0
Tests statistics for H0 : β 0 = 0
5.12 β̂ 1 − β 0,1
and H0 : β 1 = 0 Tβ1 =
σ̂β1

Test H0,i : β i = β 0,i vs. H1,i : β i 6= β 0,i

with p-value = 2 · P( T > |tobs,βi |) D <- data.frame(
β̂ i − β 0,i x=c(), y=c())
5.14 Level α t-tests for parameter where tobs,βi = σ̂βi . fit <- lm(y~x, data=D)
If p-value < α then reject H0 , summary(fit)
otherwise accept H0

β̂ 0 ± t1−α/2 σ̂β0
Parameter confidence inter- confint(fit,level=0.95)
5.15 β̂ 1 ± t1−α/2 σ̂β1
vals

predict(fit,
newdata=data.frame(),
Confidence interval for the line:
q interval="confidence",
1 ( xnew − x̄ )2
Confident and prediction in- β̂ 0 + β̂ 1 xnew ± t1−α/2 σ̂ n + Sxx
level=0.95)
5.18 predict(fit,
terval Interval for a new point prediction:
q newdata=data.frame(),
1 ( xnew − x̄ )2 interval="prediction",
β̂ 0 + β̂ 1 xnew ± t1−α/2 σ̂ 1+ n + Sxx
level=0.95)

β̂ = ( X T X )−1 X T Y
The matrix formulation of
the parameter estimators in V [ β̂] = σ2 ( X T X )−1
5.23
the simple linear regression RSS
σ̂2 =
model n−2

Coefficient of determination ∑i (yi −ŷi )2

r2 = 1 − ∑i (yi −ȳ)2
R2
5.25
Chapter A A.5 SIMPLE LINEAR REGRESSION 13

Description Formula R command

> Check the normality assumption with qqnorm(fit$residuals)

a q-q plot of the residuals. qqline(fit$residuals)
Model validation of assump-
5.7 > Check the systematic behavior by
tions plot(fit$fitted.values,
plotting the residuals ei as a function of
fitted values ŷi fit$residuals)
Chapter A A.6 MULTIPLE LINEAR REGRESSION 14

A.6 Multiple linear regression

Description Formula R command

Test H0,i : β i = β 0,i vs. H1,i : β i 6= β 0,i D<-data.frame(x1=c(),

with p-value = 2 · P( T > |tobs,βi |) x2=c(),y=c())
β̂ i − β 0,i
6.2 Level α t-tests for parameter where tobs,βi = σ̂βi .
fit <- lm(y~x1+x2,
If p-value < α the reject H0 , data=D)
otherwise accept H0 summary(fit)

Parameter confidence inter-

6.5 β̂ i ± t1−α/2 σ̂βi confint(fit,level=0.95)
vals
predict(fit,
newdata=data.frame(),
Confident interval for the line interval="confidence",
β̂ 0 + β̂ 1 x1,new + · · · + β̂ p x p,new level=0.95)
Confident and prediction in-
6.9 predict(fit,
terval (in R)
Interval for a new point prediction newdata=data.frame(),
β̂ 0 + β̂ 1 x1,new + · · · + β̂ p x p,new + ε new interval="prediction",
level=0.95)

β̂ = ( X T X )−1 X T Y
The matrix formulation of
the parameter estimators in V [ β̂] = σ2 ( X T X )−1
6.17
the multiple linear regression RSS
σ̂2 =
model n − ( p + 1)

Backward selection: start with full

6.16 Model selection procedure model and stepwise remove insignifi-
cant terms
Chapter A A.7 INFERENCE FOR PROPORTIONS 15

A.7 Inference for proportions

Description Formula R command

x
Proportion estimate and con- p̂ = n prop.test(x=, n=,
7.3 q
p̂(1− p̂)
fidence interval p̂ ± z1−α/2 correct=FALSE)
n

Approximate proportion with X −np0

7.10 Z= √ ∼ N (0, 1)
Z np0 (1− p0 )

Test: H0 : p = p0 , vs. H1 : p 6= p0
by p-value = 2 · P( Z > |zobs |)
The level α one-sample pro- prop.test(x=, n=,
7.11 where Z ∼ N (0, 12 )
portion hypothesis test correct=FALSE)
If p-value < α the reject H0 ,
otherwise accept H0
Guessed p (with prior knowledge):
z −α/2 2
Sample size formula for the CI n = p(1 − p)( 1ME )
7.13
of a proportion Unknown p:
z −α/2 2
n = 14 ( 1ME )

Difference of two proportions

q
p̂1 (1− p̂1 ) p̂2 (1− p̂2 )
σ̂p̂1 − p̂2 = n1 + n2
estimator p̂1 − p̂2 and confi-
7.15
dence interval for the differ-
( p̂1 − p̂2 ) ± z1−α/2 · σ̂p̂1 − p̂2
ence

Test: H0 : p1 = p2 , vs. H1 : p1 6= p2
by p-value = 2 · P( Z > |zobs |)
prop.test(x=, n=,
7.18 The level α one-sample t-test where Z ∼ N (0, 12 )
correct=FALSE)
If p-value < α the reject H0 ,
otherwise accept H0

The multi-sample proportions Test: H0 : p1 = p2 = . . . = pc = p chisq.test(X,

7.20 (oij −eij )2
χ2 -test by χ2obs = ∑2i=1 ∑cj=1 eij
correct = FALSE)

Test: H0 : pi1 = pi2 = . . . = pic = pi

for all rows i = 1, 2, . . . , r
The r × c frequency table χ2 - ( o − e )2 chisq.test(X,
7.22 by χ2obs = ∑ri=1 ∑cj=1 ij eij ij
test correct = FALSE)
Reject if χ2obs > χ21−α (r − 1)(c − 1)

Otherwise accept
Chapter A A.8 COMPARING MEANS OF MULTIPLE GROUPS - ANOVA 16

A.8 Comparing means of multiple groups - ANOVA

Description Formula R command

k ni k ni

One-way ANOVA variation

∑ ∑ (yij − ȳ)2 = ∑ ∑ (yij − ȳi )2 +
i =1 j =1 i =1 j =1
8.2
decomposition | {z } | {z }
SST SSE
k
∑ ni (ȳi − ȳ)2
i =1
| {z }
SS(Tr)

SSE (n1 −1)s21 +···+(nk −1)s2k

MSE = n−k = n−k
One-way within group vari-
8.4
ability 1 n
s2i = n i −1 ∑i=i 1 (yij − ȳi )2

H0 : αi = 0; i = 1, 2, . . . , k,

SS( Tr )/(k −1)

One-way test for difference in F= SSE/(n−k)
8.6 anova(lm(y~treatm))
mean for k groups
F-distribution with k − 1 and n − k de-
grees of freedom
r
SSE 1 1
ȳi − ȳ j ± t1−α/2 n−k ni + nj
Post hoc pairwise confidence
8.9 If all M = k (k − 1)/2 combinations,
intervals
then use αBonferroni = α/M

Test: H0 : µi = µ j vs. H1 : µi 6= µ j
by p-value = 2 · P( T > |tobs |)
ȳi −ȳ j
Post hoc pairwise hypothesis where tobs = s
8.10

1
tests MSE ni + n1
j

Test M = k (k − 1)/2 times, but each

time with αBonferroni = α/M
Least Significant Difference √
8.13 LSDα = t1−α/2 2 · MSE/m
(LSD) values
k l
∑ ∑ (yij − µ̂)2 =
i =1 j =1
| {z }
SST
Two-way ANOVA variation k l
8.20
decomposition ∑ ∑ (yij − α̂i − β̂ j − µ̂)2 +
i =1 j =1
| {z }
SSE
k l
l · ∑ α̂2i + k · ∑ β̂2j
i =1 j =1
| {z } | {z }
SS(Tr) SS(Bl)
Chapter A A.8 COMPARING MEANS OF MULTIPLE GROUPS - ANOVA 17

Description Formula R command

H0,Tr : αi = 0, i = 1, 2, . . . , k
Test for difference in means in
8.22 two-way ANOVA grouped in SS(Tr)/(k − 1) fit<-lm(y~treatm+block)
FTr = anova(fit)
treatments and in blocks SSE/((k − 1)(l − 1))
H0,Bl : β j = 0, j = 1, 2, . . . , l
SS(Bl)/(l − 1)
FBl =
SSE/((k − 1)(l − 1))

One-way ANOVA

Source of Degrees of Sums of Mean sum of Test- p-

variation freedom squares squares statistic F value
SS(Tr) MS( Tr )
Treatment k−1 SS(Tr) MS( Tr ) = k −1 Fobs = MSE P( F > Fobs )
SSE
Residual n−k SSE MSE = n−k

Total n−1 SST

Two-way ANOVA

Source of Degrees of Sums of Mean sums of Test p-

variation freedom squares squares statistic F value
SS(Tr) MS(Tr)
Treatment k−1 SS(Tr) MS(Tr) = k −1 FTr = MSE P( F > FTr )
SS(Bl) MS(Bl)
Block l−1 SS(Bl) MS(Bl) = l −1 FBl = MSE P( F > FBl )
SSE
Residual (l − 1)(k − 1) SSE MSE = (k−1)(l −1)

Total n−1 SST

Chapter A Glossaries 18

Glossaries

cumulated distribution function [Fordelingsfunktion]The cdf is the function which determines the
probability of observing an outcome of a random variable below a given value 3

Continuous random variable [Kontinuert stokastisk variabel] If an outcome of an experiment takes

a continuous value, for example: a distance, a temperature, a weight, etc., then it is represented
by a continuous random variable 3

Correlation [Korrelation] The sample correlation coefficient are a summary statistic that can be cal-
culated for two (related) sets of observations. It quantifies the (linear) strength of the relation
between the two. See also: Covariance 2

Covariance [Kovarians] The sample covariance coefficient are a summary statistic that can be cal-
culated for two (related) sets of observations. It quantifies the (linear) strength of the relation
between the two. See also: Correlation 2, 4

F-distribution [F-fordelingen] The F-distribution appears as the ratio between two independent χ2 -
distributed random variables 16

Inter Quartile Range [Interkvartil bredde] The Inter Quartile Range (IQR) is the middle 50% range
of data 1

Median [Median, stikprøvemedian] The median of population or sample (note, in text no distin-
guishment between population median and sample median) 1

probability density function The pdf is the function which determines the probability of every pos-
sible outcome of a random variable 3

Quantile [Fraktil, stikprøvefraktil] The quantiles of population or sample (note, in text no distin-
guishment between population quantile and sample quantile) 1

Quartile [Fraktil, stikprøvefraktil] The quartiles of population or sample (note, in text no distin-
guishment between population quartile and sample quartile) 1

Sample variance [Empirisk varians, stikprøvevarians] 1

Sample mean [Stikprøvegennemsnit] The average of a sample 1

Standard deviation [Standard afvigelse] 1

Chapter A Acronyms 19

Acronyms

ANOVA Analysis of Variance Glossary: Analysis of Variance

cdf cumulated distribution function 3, Glossary: cumulated distribution function

CI confidence interval 10–12, 15, Glossary: confidence interval

CLT Central Limit Theorem Glossary: Central Limit Theorem

IQR Inter Quartile Range 1, Glossary: Inter Quartile Range

LSD Least Significant Difference Glossary: Least Significant Difference

pdf probability density function 3, Glossary: probability density function

Pediatric Neonatal Dosage Handbook 19th Edition
33% (6)
Pediatric Neonatal Dosage Handbook 19th Edition
3 pages
RSB-D 45 Draw Frames Brochure 2287-v2 en Original 32848 PDF
67% (6)
RSB-D 45 Draw Frames Brochure 2287-v2 en Original 32848 PDF
24 pages
DSML
No ratings yet
DSML
510 pages
Completing Story: 01. King Midas and The Golden Touch Story
100% (2)
Completing Story: 01. King Midas and The Golden Touch Story
4 pages
Core Statistics PDF
100% (4)
Core Statistics PDF
256 pages
Review of Statistics Econ3005 L1 AEF
No ratings yet
Review of Statistics Econ3005 L1 AEF
42 pages
Slide Mathematical Statistics 220802
No ratings yet
Slide Mathematical Statistics 220802
254 pages
6
No ratings yet
6
108 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
49538ad5e2701462f3121414ecb10ba7
No ratings yet
49538ad5e2701462f3121414ecb10ba7
241 pages
-Skewness 2025
No ratings yet
-Skewness 2025
62 pages
Lec 1
No ratings yet
Lec 1
30 pages
UNIT-4
No ratings yet
UNIT-4
38 pages
DA UNIT-4
No ratings yet
DA UNIT-4
37 pages
ST3236_Note3
No ratings yet
ST3236_Note3
17 pages
13 Discrete RV
No ratings yet
13 Discrete RV
29 pages
R_FS
No ratings yet
R_FS
52 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
19 pages
2.1 Random Variables 2.1.1 Definition: PX PX X
100% (1)
2.1 Random Variables 2.1.1 Definition: PX PX X
13 pages
Mathematical Computations Using R
No ratings yet
Mathematical Computations Using R
53 pages
Sta 2200 Probability & Statistics II (Course Outline With Notes)
No ratings yet
Sta 2200 Probability & Statistics II (Course Outline With Notes)
155 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
ps project file
No ratings yet
ps project file
33 pages
Book IntroStatistics PDF
No ratings yet
Book IntroStatistics PDF
263 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
Probability Review Part III Sep 21
No ratings yet
Probability Review Part III Sep 21
13 pages
doc-cours_MathsV
No ratings yet
doc-cours_MathsV
69 pages
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
No ratings yet
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
62 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Review of Basic Statistics: Appendix A
No ratings yet
Review of Basic Statistics: Appendix A
12 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Study note chap 2
No ratings yet
Study note chap 2
23 pages
Statistics With MATLABOctave
No ratings yet
Statistics With MATLABOctave
46 pages
2.2+Random+variables
No ratings yet
2.2+Random+variables
23 pages
Untitled 3
No ratings yet
Untitled 3
32 pages
Basic Statistics in Fluid Mechanics
No ratings yet
Basic Statistics in Fluid Mechanics
34 pages
Presentation 3
No ratings yet
Presentation 3
29 pages
Check All Questions and Expand in Short if Needed ...
No ratings yet
Check All Questions and Expand in Short if Needed ...
6 pages
Chapter 01 Preliminaries (1)
No ratings yet
Chapter 01 Preliminaries (1)
10 pages
Nonlife Actuarial Models: Claim-Severity Distribution
No ratings yet
Nonlife Actuarial Models: Claim-Severity Distribution
62 pages
Notes ch1 Random Variables and Probability Distributions
No ratings yet
Notes ch1 Random Variables and Probability Distributions
30 pages
Statistics BI: Models of Random Outcomes. What Is A Model?
No ratings yet
Statistics BI: Models of Random Outcomes. What Is A Model?
22 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
33 pages
Stat1 Formulas and Tables For Statistics 2022
No ratings yet
Stat1 Formulas and Tables For Statistics 2022
34 pages
AP Statistics Study Guide
100% (1)
AP Statistics Study Guide
12 pages
Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop
No ratings yet
Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop
36 pages
FE - Engineering Probability and Statistics
No ratings yet
FE - Engineering Probability and Statistics
22 pages
Week 5-8 Short Notes
No ratings yet
Week 5-8 Short Notes
10 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
Leadership Styles Workbook
100% (4)
Leadership Styles Workbook
13 pages
Chapter 3: Random Variables: Random Variable Assigns A Numerical Value To Each
No ratings yet
Chapter 3: Random Variables: Random Variable Assigns A Numerical Value To Each
19 pages
Midterm2 Cheatsheet Annotated
No ratings yet
Midterm2 Cheatsheet Annotated
3 pages
5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability
No ratings yet
5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability
9 pages
Review
No ratings yet
Review
6 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Fe Engineering Probability Statistics
No ratings yet
Fe Engineering Probability Statistics
9 pages
B. V. Doshi
100% (1)
B. V. Doshi
28 pages
Motor Oil Test Data
No ratings yet
Motor Oil Test Data
203 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Lecture 02 Electrical Networks Transfer Function
No ratings yet
Lecture 02 Electrical Networks Transfer Function
18 pages
Design Regulations BKR
No ratings yet
Design Regulations BKR
187 pages
LS English 9 Worksheet Answers
100% (1)
LS English 9 Worksheet Answers
50 pages
Moyno 1000-Serivce-Manual PDF
No ratings yet
Moyno 1000-Serivce-Manual PDF
34 pages
Static Analysis of A 4 Bar Mechanism - Theory of Machines
No ratings yet
Static Analysis of A 4 Bar Mechanism - Theory of Machines
15 pages
61970-452-Rev2 (CIM Model Exchange) 15 June 2006
No ratings yet
61970-452-Rev2 (CIM Model Exchange) 15 June 2006
59 pages
8366probability Summary Sheet
No ratings yet
8366probability Summary Sheet
4 pages
upbhulekh.gov.in_public_public_ror_action_captchamatche1
No ratings yet
upbhulekh.gov.in_public_public_ror_action_captchamatche1
3 pages
Step-By-Step Build and Deploy Xen Hypervisor Virtualization On Linux OpenSUSE 11.3 Server
100% (1)
Step-By-Step Build and Deploy Xen Hypervisor Virtualization On Linux OpenSUSE 11.3 Server
11 pages
Unit 4
No ratings yet
Unit 4
10 pages
Transbase Release Notes Version 6 8 1 English
No ratings yet
Transbase Release Notes Version 6 8 1 English
37 pages
Albirex Niigata Singapore: Top Team Players Idp
No ratings yet
Albirex Niigata Singapore: Top Team Players Idp
11 pages
344 607 1 SM
No ratings yet
344 607 1 SM
6 pages
Mathematical Operations in Sadratnamālā An Analysis With Modern Interpretation
No ratings yet
Mathematical Operations in Sadratnamālā An Analysis With Modern Interpretation
3 pages
NLM DPP-04 Manish Raj Sir (Neet Crash Course Relaunch) ~ (Physics)
No ratings yet
NLM DPP-04 Manish Raj Sir (Neet Crash Course Relaunch) ~ (Physics)
3 pages
April 3 Science 6: Teacher Name Date Subject Area Grade Topic Time
No ratings yet
April 3 Science 6: Teacher Name Date Subject Area Grade Topic Time
3 pages
1.0 Team Assessment Questionnaire
No ratings yet
1.0 Team Assessment Questionnaire
3 pages
Statement of Purpose For Civil Engineering SOP Iit
100% (2)
Statement of Purpose For Civil Engineering SOP Iit
1 page
Envmath 4 12 TA P
100% (1)
Envmath 4 12 TA P
2 pages
3rd SUMMATIVE TEST
No ratings yet
3rd SUMMATIVE TEST
3 pages
Information: Reading Images - The Grammar of Visual Design
No ratings yet
Information: Reading Images - The Grammar of Visual Design
5 pages
History of Economic Thought
No ratings yet
History of Economic Thought
5 pages
List of Free Ebook Sites
No ratings yet
List of Free Ebook Sites
4 pages
LIVE TEch Training Institute
No ratings yet
LIVE TEch Training Institute
3 pages
Interview Preparation
100% (2)
Interview Preparation
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Formulas

Uploaded by

Formulas

Uploaded by

Chapter A

Collection of formulas and R commands

A Collection of formulas and R commands

A.1 Introduction, descriptive statistics, R and data

Description Formula R command

Sample coefficient of vari-

Description Formula R command

A.2 Probability and Simulation

Description Formula R command

Probability density function

Variance of a discrete ran-

Pdf of a continuous random

Mean and variance of a linear

Description Formula R command

Description Formula R command

Hypergeometric distribution f ( x; n, a, N ) = P( X = x ) dhyper(x,m,n,k)

Poisson distribution dpois(x,lambda)

Description Formula R command

Mean and variance of a uni- 1

Description Formula R command

Given a sample of size n from

Description Formula R command

X1 , . . . , Xn1 and Y1 , . . . , Yn2

A.3 Statistics for one and two samples

Description Formula R command

The p-value is the probability of obtain-

p-value = 2 · P( T > |tobs |)

Rejected: p-value < α

Description Formula R command

The distribution of the pooled ( X̄ − X̄ )−δ

A.4 Simulation based statistics

Description Formula R command

The non-linear approximative  2

Confidence interval for any 1.Simulate k samples

Two-sample confidence in- 1.Simulate k sets of 2 samples

A.5 Simple linear regression

Description Formula R command

Test H0,i : β i = β 0,i vs. H1,i : β i 6= β 0,i

Coefficient of determination ∑i (yi −ŷi )2

Description Formula R command

> Check the normality assumption with qqnorm(fit$residuals)

A.6 Multiple linear regression

Description Formula R command

Test H0,i : β i = β 0,i vs. H1,i : β i 6= β 0,i D<-data.frame(x1=c(),

Parameter confidence inter-

Backward selection: start with full

A.7 Inference for proportions

Description Formula R command

Approximate proportion with X −np0

Difference of two proportions

The multi-sample proportions Test: H0 : p1 = p2 = . . . = pc = p chisq.test(X,

Test: H0 : pi1 = pi2 = . . . = pic = pi

A.8 Comparing means of multiple groups - ANOVA

Description Formula R command

One-way ANOVA variation

SSE (n1 −1)s21 +···+(nk −1)s2k

SS( Tr )/(k −1)

Test M = k (k − 1)/2 times, but each

Description Formula R command

Source of Degrees of Sums of Mean sum of Test- p-

Total n−1 SST

Source of Degrees of Sums of Mean sums of Test p-

Total n−1 SST

Continuous random variable [Kontinuert stokastisk variabel] If an outcome of an experiment takes

Sample variance [Empirisk varians, stikprøvevarians] 1

Sample mean [Stikprøvegennemsnit] The average of a sample 1

Standard deviation [Standard afvigelse] 1

ANOVA Analysis of Variance Glossary: Analysis of Variance

cdf cumulated distribution function 3, Glossary: cumulated distribution function

CI confidence interval 10–12, 15, Glossary: confidence interval

CLT Central Limit Theorem Glossary: Central Limit Theorem

IQR Inter Quartile Range 1, Glossary: Inter Quartile Range

LSD Least Significant Difference Glossary: Least Significant Difference

pdf probability density function 3, Glossary: probability density function

You might also like

The non-linear approximative 2