PermutationTests
PermutationTests
Alejandra Cabaña
An example
0.08
Density
0.04
0.00
noseed
0.04
0.00
seed
Data are non-symetric... should we trust in t-test?
2 ) and Y ∼N(µ , σ 2 ), consider the statistic
If Xi ∼N(µX , σX i Y Y
(X̄ − Ȳ ) − (µX − µY )
T = q
2 /n + S 2 /n
SX X Y Y
nX nY n
X
X X
2 1 X
where X̄ = Xi , Ȳ = Yi S X = (Xi − X̄)2 and
nX − 1
i=1 i=1 i=1
Y n
1 X
SY2 = (Yj − Ȳ )2 .
nY − 1
j=1
2 = σ 2 then T ∼ t
If σX Y nX +nY −2 , but otherwise, the distribution
of T is unknown, and there are several classical approximations.
If the data are non-normal, then the test based on T and its
approximated distribution need not be precise 1 .
1
Efron, 1977
x → non-seeded cloud data y→ seeded cloud data
0.30
0.30
Density
Density
0.15
0.15
0.00
0.00
-2 0 2 4 -2 0 2 4
log(x) log(y)
Sample Quantiles
4
2
0
-2
-2
-2 -1 0 1 2 -2 -1 0 1 2
-Inf -0.261692
sample estimates:
mean of x mean of y
1.066551 2.078881
2
Pitman (1937/38 developed exact permutation methods consistent with
the Neyman–Pearson approach for the comparison of k-samples and for
bivariate correlation
Permutation tests are also known as conditional tests.
If the null hypothesis were true, the shuffled data sets should
have all the same distribution, and thus statistics based on the
new samples should look like the ones computed form the original
data, otherwise they should look different.
• The permutation distribution of a test statistic T is obtained
by repeatedly rearranging the observations.
muestra=c(x,y)
rep=999
original=mean(x)-mean(y)
distrib=numeric(rep)
l=length(x)+length(y)
for(i in 1:rep){
sam=sample(muestra,l)
newx=sam[1:length(x)]
newy=sam[(length(x)+1):l]
distrib[i]=mean(newx)-mean(newy)
}
pval=sum(c(original,distrib)<=original)/(rep+1);pval
[1] 0.029
The t-test was not misleading! But this approach is more sound.
Permutation distribution for differences in mean Permutation distribution for differences in median
250
250
200
200
150
Frequency
Frequency
150
100
100
50
50
0
0
−20 −10 0 10 20 −10 −5 0 5 10
distrib mediana
The permutations distribution
X1 , . . . , Xn ∼ FX and Y1 , . . . , Ym ∼ FY
Z = {X1 , . . . , Xn , Y1 , . . . , Ym }
3
Davison and Hinkley (1997), Bootstrap methods and their applications,
p.159, CUP
Tests for H0 : F = G
From independent samples
X1 , . . . , Xn ∼ F Y1 , . . . , Ym ∼ G
250
200
150
Frequency
100
50
0
distribks
A diver has agreed with the boat crew that, in case of danger, he
will transmit a binary message ...01010101010101...
When the diver does not transmit, the boat receives background
noise
. . . , yi , yi+1 , . . .
where yi are independent Bernoulli r.v.’s with unknown parame-
ter p . A particular day the boat receives the message
000101100101010100010101
Is this background noise or a cry for help?
A statistic for the null hypothesis
H0 : “ the message is noise”
is
T (y) = max(|y − I1 |, |y − I2 |)
where y is the vector that contains the message, I1 and I2 are
sequences (of the same length of the signal) with alternating 01
in I1 and 10 in I2 . P
The total number of ‘ones”, S(y) = yi is a sufficient statistic
for p. So, conditional on S(y) = 10, the vector
Y |{S(y) = 10}
(X1 , Y1 ), . . . , (Xn , Yn )
X1 , . . . , Xn ∼ F Y1 , . . . , Ym ∼ G
(X1 , Y1 ), . . . , (Xn , Yn )
La hipótesis nula es que X e Y son intercambiables en cada par.
• Una vez elegido el estadı́stico adecuado para el test,
cambiaremos aleatoriamente los pares:
• Para cada par, lanzamos una moneda justa, si sale cara
dejamos los pares como están, si no, intercambiamos X con
Y.
• Comparamos el valor del estadı́stico original, con la
distribución obtenida con las permutaciones.
Ver ejemplo shoes en Permutaciones.R
Test de correlación
Supongamos ahora que tenemos n pares de observaciones
(X1 , Y1 ), . . . , (Xn , Yn )
Queremos probar la hipótesis nula H0 de que X e Y no están
correlacionadas.
Bajo H0 , podemos intercambiar las Xi entre ellas, y tendrı́amos
pares equivalentes.
Para hacer un test de permutaciones, elegimos un estadı́stico: el
coeficiente de correlación ρ de Pearson, si buscamos correlaciones
lineales, o la τ de Kendall, etc.
• Permutamos aleatoriamente las Y dejando las X fijas.
• Comparamos el estadı́stico observado en la muestra original
con la distribución obtenida con las permutaciones.
Ver ejemplo aire en Permutaciones.R
Modelos Lineales y Análisis de la Varianza
Si los residuos en un modelo lineal o un ANOVA no son norma-
les, podrı́amos estar haciendo inferencias equivocadas sobre los
parámetros del modelo y sus posibles interacciones.
Una manera de evitar esos problemas es hacer tests de permu-
taciones. Si las respuestas no obedecieran el modelo, podrı́amos
permutarlas y obtendrı́amos los mismos resultados. Esto nos per-
mite incluso hacer los F -tests individuales para cada coeficiente.