Auto-Calibration Tests for Discrete Finite Regression Functions

Mario V. Wüthrich111RiskLab, Department of Mathematics, ETH Zurich, [email protected]
(Version of August 12, 2024)
Abstract

Auto-calibration is an important property of regression functions for actuarial applications. Comparably little is known about statistical testing of auto-calibration. Denuit et al. (2024) recently published a test with an asymptotic distribution that is not fully explicit and its evaluation needs non-parametric Monte Carlo sampling. In a simpler set-up, we present three test statistics with fully known and interpretable asymptotic distributions.

Keywords. Auto-calibration, concentration curve, Lorenz curve, area between the curves.

1 Introduction

Recent actuarial and financial literature acknowledges the importance of the statistical concept of auto-calibration; see, e.g., Krüger–Ziegel [6], Denuit et al. [1] and Wüthrich [7]. Select an integrable response variable Y𝑌Yitalic_Y and covariates 𝑿𝑿\boldsymbol{X}bold_italic_X with support 𝒳𝒳{\cal X}caligraphic_X.

Definition 1.1

A measurable regression function π:𝒳:𝜋𝒳\pi:{\cal X}\to{\mathbb{R}}italic_π : caligraphic_X → blackboard_R is auto-calibrated for (Y,𝐗)𝑌𝐗(Y,\boldsymbol{X})( italic_Y , bold_italic_X ) if

π(𝑿)=𝔼[Y|π(𝑿)],-a.s.𝜋𝑿𝔼delimited-[]conditional𝑌𝜋𝑿-a.s.\pi(\boldsymbol{X})={\mathbb{E}}\left[\left.Y\right|\pi(\boldsymbol{X})\right]% ,\qquad\text{${\mathbb{P}}$-a.s.}italic_π ( bold_italic_X ) = blackboard_E [ italic_Y | italic_π ( bold_italic_X ) ] , blackboard_P -a.s.

In an actuarial pricing context this means that every price cohort π(𝑿)𝜋𝑿\pi(\boldsymbol{X})italic_π ( bold_italic_X ) is on average self-financing for the claims Y𝑌Yitalic_Y, or in other words, there is no systematic cross-financing within a pricing scheme designed by the regression function π𝜋\piitalic_π.

Surprisingly, there is no mature literature on testing for auto-calibration. Most proposals only consider binary responses, e.g., Gneiting–Resin [5] discuss a bootstrap test and Dimitriadis et al. [4] study calibration bands. Recently, Denuit et al. [2] presented an auto-calibration test that studies the difference between the concentration curve (CC) and the Lorenz curve (LC). Also this test requires simulations because the asymptotic distribution of the test statistics is not sufficiently explicitly. We take one step back here, and we present simpler test statistics with fully known and interpretable asymptotic distributions, though, in a simpler set-up.

One needs three ingredients for an auto-calibration test. (a) A regression function π:𝒳:𝜋𝒳\pi:{\cal X}\to{\mathbb{R}}italic_π : caligraphic_X → blackboard_R. This regression function π𝜋\piitalic_π can be fully general, i.e., we do not require that it is close (in some metric) to the conditional mean 𝔼[Y|𝑿]𝔼delimited-[]conditional𝑌𝑿{\mathbb{E}}[Y|\boldsymbol{X}]blackboard_E [ italic_Y | bold_italic_X ], nor do we specify whether π𝜋\piitalic_π has been estimated from past data 𝒟𝒟{\cal D}caligraphic_D or whether it has been set by an expert. (b) A pair (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ). For simplicity, we assume that the response Y𝑌Yitalic_Y is positive and square integrable. The covariates 𝑿𝑿\boldsymbol{X}bold_italic_X have support 𝒳𝒳{\cal X}caligraphic_X. (c) An i.i.d. sample 𝒯=(Yi,𝑿i)i1𝒯subscriptsubscript𝑌𝑖subscript𝑿𝑖𝑖1{\cal T}=(Y_{i},\boldsymbol{X}_{i})_{i\geq 1}caligraphic_T = ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT for testing. This sample should have the same law as (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ). These three ingredients (a)-(c) are sufficient for testing for auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ); if π𝜋\piitalic_π has been estimated from past data 𝒟𝒟{\cal D}caligraphic_D, we generally assume that (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ), 𝒯𝒯{\cal T}caligraphic_T and 𝒟𝒟{\cal D}caligraphic_D are independent, and all subsequent statements need then be understood conditional on 𝒟𝒟{\cal D}caligraphic_D.

2 Tests for auto-calibration

Assume that the regression function π:𝒳:𝜋𝒳\pi:{\cal X}\to{\mathbb{R}}italic_π : caligraphic_X → blackboard_R takes only finitely many (ordered) values <π1<<πK<subscript𝜋1subscript𝜋𝐾-\infty<\pi_{1}<\cdots<\pi_{K}<\infty- ∞ < italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ⋯ < italic_π start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT < ∞. This gives us a partition of the covariate space 𝒳𝒳{\cal X}caligraphic_X with

[π(𝑿)=πk]=pk>0 for all 1kK.formulae-sequencedelimited-[]𝜋𝑿subscript𝜋𝑘subscript𝑝𝑘0 for all 1kK.{\mathbb{P}}\left[\pi(\boldsymbol{X})=\pi_{k}\right]=p_{k}>0\qquad\text{ for % all $1\leq k\leq K$.}blackboard_P [ italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 for all 1 ≤ italic_k ≤ italic_K . (2.1)

We assume probabilities pk>0subscript𝑝𝑘0p_{k}>0italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0, otherwise the corresponding part of the covariate space can be dropped. In this finite partition case (2.1), auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ) is equivalent to

πk=𝔼[Y|π(𝑿)=πk] for all 1kK.subscript𝜋𝑘𝔼delimited-[]conditional𝑌𝜋𝑿subscript𝜋𝑘 for all 1kK.\pi_{k}={\mathbb{E}}\left[\left.Y\right|\pi(\boldsymbol{X})=\pi_{k}\right]% \qquad\text{ for all $1\leq k\leq K$.}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_Y | italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] for all 1 ≤ italic_k ≤ italic_K .

Using the tower property, auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ) implies for all 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K

S(k):=𝔼[(Yπ(𝑿))𝟙{π(𝑿)=πk}]=𝔼[(𝔼[Y|π(𝑿)]π(𝑿))𝟙{π(𝑿)=πk}]=0,assignsuperscript𝑆𝑘𝔼delimited-[]𝑌𝜋𝑿subscript1𝜋𝑿subscript𝜋𝑘𝔼delimited-[]𝔼delimited-[]conditional𝑌𝜋𝑿𝜋𝑿subscript1𝜋𝑿subscript𝜋𝑘0S^{(k)}:={\mathbb{E}}\left[\left(Y-\pi(\boldsymbol{X})\right)\mathds{1}_{\{\pi% (\boldsymbol{X})=\pi_{k}\}}\right]={\mathbb{E}}\left[\left({\mathbb{E}}\left[% \left.Y\right|\pi(\boldsymbol{X})\right]-\pi(\boldsymbol{X})\right)\mathds{1}_% {\{\pi(\boldsymbol{X})=\pi_{k}\}}\right]=0,italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT := blackboard_E [ ( italic_Y - italic_π ( bold_italic_X ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ] = blackboard_E [ ( blackboard_E [ italic_Y | italic_π ( bold_italic_X ) ] - italic_π ( bold_italic_X ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ] = 0 , (2.2)

this statement is essentially the same as Wüthrich [7, Proposition 4.1]. For a given i.i.d. sample 𝒯=(Yi,𝑿i)i=1n𝒯superscriptsubscriptsubscript𝑌𝑖subscript𝑿𝑖𝑖1𝑛{\cal T}=(Y_{i},\boldsymbol{X}_{i})_{i=1}^{n}caligraphic_T = ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, this motivates the statistics

Sn(k)=1ni=1n(Yiπ(𝑿i))𝟙{π(𝑿i)=πk} for 1kK.subscriptsuperscript𝑆𝑘𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑌𝑖𝜋subscript𝑿𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘 for 1kKS^{(k)}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(Y_{i}-\pi(\boldsymbol{X}_{i})\right% )\mathds{1}_{\{\pi(\boldsymbol{X}_{i})=\pi_{k}\}}\qquad\text{ for $1\leq k\leq K% $}.italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT for 1 ≤ italic_k ≤ italic_K .

Under auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ), these empirical quantities Sn(k)subscriptsuperscript𝑆𝑘𝑛S^{(k)}_{n}italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K, converge to zero, {\mathbb{P}}blackboard_P-a.s., as n𝑛n\to\inftyitalic_n → ∞, and we have the following central limit theorem.

Proposition 2.1

Under auto-calibration of π𝜋\piitalic_π for (Y,𝐗)𝑌𝐗(Y,\boldsymbol{X})( italic_Y , bold_italic_X )

n(Sn(1),,Sn(K))𝒩(0,diag(p1τ12,,pKτK2)) as n,\sqrt{n}\left(S^{(1)}_{n},\ldots,S^{(K)}_{n}\right)^{\top}~{}\Longrightarrow% \quad{\cal N}\left(0,\,{\rm diag}\left(p_{1}\tau_{1}^{2},\ldots,p_{K}\tau_{K}^% {2}\right)\right)\qquad\text{ as $n\to\infty$,}square-root start_ARG italic_n end_ARG ( italic_S start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , … , italic_S start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⟹ caligraphic_N ( 0 , roman_diag ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) as italic_n → ∞ ,

with conditional variances τk2=Var(Y|π(𝐗)=πk)superscriptsubscript𝜏𝑘2Varconditional𝑌𝜋𝐗subscript𝜋𝑘\tau_{k}^{2}={\rm Var}\left(\left.Y\right|\pi(\boldsymbol{X})=\pi_{k}\right)italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_Var ( italic_Y | italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K.

The proof of this proposition is standard and based on characteristic functions.

Test 1. Under the null hypothesis of π𝜋\piitalic_π being auto-calibrated for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ), (2.2) is a necessary condition for all 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K. We test this against the alternative that there exists a 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K with S(k)0superscript𝑆𝑘0S^{(k)}\neq 0italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ≠ 0. Under the null hypothesis, Proposition 2.1 gives us for s>0𝑠0s>0italic_s > 0 and n𝑛nitalic_n large

[max1kKn|Sn(k)|s]=[1kK{|nSn(k)|s}]k=1K(2Φ(spkτk)1).delimited-[]subscript1𝑘𝐾𝑛subscriptsuperscript𝑆𝑘𝑛𝑠delimited-[]subscript1𝑘𝐾𝑛subscriptsuperscript𝑆𝑘𝑛𝑠superscriptsubscriptproduct𝑘1𝐾2Φ𝑠subscript𝑝𝑘subscript𝜏𝑘1{\mathbb{P}}\left[\max_{1\leq k\leq K}\sqrt{n}|S^{(k)}_{n}|\leq s\right]={% \mathbb{P}}\left[\bigcap_{1\leq k\leq K}\{|\sqrt{n}S^{(k)}_{n}|\leq s\}\right]% ~{}\approx~{}\prod_{k=1}^{K}\left(2\Phi\left(\frac{s}{\sqrt{p_{k}}\tau_{k}}% \right)-1\right).blackboard_P [ roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG | italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ italic_s ] = blackboard_P [ ⋂ start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT { | square-root start_ARG italic_n end_ARG italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ italic_s } ] ≈ ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( 2 roman_Φ ( divide start_ARG italic_s end_ARG start_ARG square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) - 1 ) . (2.3)

Often, it is beneficial to test for the maximum of the normalized quantities n|Sn(k)|/(pkτk)𝑛subscriptsuperscript𝑆𝑘𝑛subscript𝑝𝑘subscript𝜏𝑘\sqrt{n}|S^{(k)}_{n}|/(\sqrt{p_{k}}\tau_{k})square-root start_ARG italic_n end_ARG | italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | / ( square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), to have all terms on the same scale. This provides asymptotic limit (2Φ(s)1)Ksuperscript2Φ𝑠1𝐾(2\Phi(s)-1)^{K}( 2 roman_Φ ( italic_s ) - 1 ) start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT.

Denuit et al. [2, formula (2.4)] consider an aggregated version of S(k)superscript𝑆𝑘S^{(k)}italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT. Namely, auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ) implies for all 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K

T(k):=𝔼[(Yπ(𝑿))𝟙{π(𝑿)πk}]=0.assignsuperscript𝑇𝑘𝔼delimited-[]𝑌𝜋𝑿subscript1𝜋𝑿subscript𝜋𝑘0T^{(k)}:={\mathbb{E}}\left[\left(Y-\pi(\boldsymbol{X})\right)\mathds{1}_{\{\pi% (\boldsymbol{X})\leq\pi_{k}\}}\right]=0.italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT := blackboard_E [ ( italic_Y - italic_π ( bold_italic_X ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) ≤ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ] = 0 . (2.4)

For a given i.i.d. sample 𝒯=(Yi,𝑿i)i=1n𝒯superscriptsubscriptsubscript𝑌𝑖subscript𝑿𝑖𝑖1𝑛{\cal T}=(Y_{i},\boldsymbol{X}_{i})_{i=1}^{n}caligraphic_T = ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, this motivates the statistics

Tn(k)=1ni=1n(Yiπ(𝑿i))𝟙{π(𝑿i)πk}=j=1kSn(j).subscriptsuperscript𝑇𝑘𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑌𝑖𝜋subscript𝑿𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘superscriptsubscript𝑗1𝑘subscriptsuperscript𝑆𝑗𝑛T^{(k)}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(Y_{i}-\pi(\boldsymbol{X}_{i})\right% )\mathds{1}_{\{\pi(\boldsymbol{X}_{i})\leq\pi_{k}\}}=\sum_{j=1}^{k}S^{(j)}_{n}.italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

The following corollary is an immediate consequence of Proposition 2.1.

Corollary 2.2

Under auto-calibration of π𝜋\piitalic_π for (Y,𝐗)𝑌𝐗(Y,\boldsymbol{X})( italic_Y , bold_italic_X )

n(Tn(1),,Tn(K))𝒩(0,(j=1min{k,m}pjτj2)1k,mK) as n.\sqrt{n}\left(T^{(1)}_{n},\ldots,T^{(K)}_{n}\right)^{\top}~{}\Longrightarrow% \quad{\cal N}\left(0,\left(\sum\nolimits_{j=1}^{\min\{k,m\}}p_{j}\tau^{2}_{j}% \right)_{1\leq k,m\leq K}\right)\qquad\text{ as $n\to\infty$.}square-root start_ARG italic_n end_ARG ( italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⟹ caligraphic_N ( 0 , ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min { italic_k , italic_m } end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 ≤ italic_k , italic_m ≤ italic_K end_POSTSUBSCRIPT ) as italic_n → ∞ .

Thus, the aggregated statistics (Tn(k))1=kKsuperscriptsubscriptsubscriptsuperscript𝑇𝑘𝑛1𝑘𝐾(T^{(k)}_{n})_{1=k}^{K}( italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 = italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT can asymptotically be described by a random walk

Zk=j=1kpjτjεj,subscript𝑍𝑘superscriptsubscript𝑗1𝑘subscript𝑝𝑗subscript𝜏𝑗subscript𝜀𝑗Z_{k}=\sum_{j=1}^{k}\sqrt{p_{j}}\,\tau_{j}\,\varepsilon_{j},italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT square-root start_ARG italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (2.5)

with i.i.d. standard Gaussian innovations εj𝒩(0,1)similar-tosubscript𝜀𝑗𝒩01\varepsilon_{j}\sim{\cal N}(0,1)italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) for 1jK1𝑗𝐾1\leq j\leq K1 ≤ italic_j ≤ italic_K.

Test 2. Under the null hypothesis of π𝜋\piitalic_π being auto-calibrated for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ), (2.4) is a necessary condition for all 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K. We test this against the alternative that there exists a 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K with T(k)0superscript𝑇𝑘0T^{(k)}\neq 0italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ≠ 0. Under the null hypothesis, Corollary 2.2 gives us for s>0𝑠0s>0italic_s > 0 and n𝑛nitalic_n large

[max1kKn|Tn(k)|s][max1kK|Zk|s].delimited-[]subscript1𝑘𝐾𝑛subscriptsuperscript𝑇𝑘𝑛𝑠delimited-[]subscript1𝑘𝐾subscript𝑍𝑘𝑠{\mathbb{P}}\left[\max_{1\leq k\leq K}\sqrt{n}|T^{(k)}_{n}|\leq s\right]~{}% \approx~{}{\mathbb{P}}\left[\max_{1\leq k\leq K}|Z_{k}|\leq s\right].blackboard_P [ roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG | italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ italic_s ] ≈ blackboard_P [ roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT | italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ≤ italic_s ] . (2.6)

Up to one point discussed below, asymptotic approximation (2.6) gives an explicit explanation to the intractable limit in Denuit et al. [2, Proposition 3.1]. Namely, the asymptotic distribution of the test statistics in (2.6) corresponds to the maximum of the random walk (2.5) whose increments are fully determined by the probabilities (pk)k=1Ksuperscriptsubscriptsubscript𝑝𝑘𝑘1𝐾(p_{k})_{k=1}^{K}( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, given in (2.1), and the conditional variances (τk2)k=1Ksuperscriptsubscriptsuperscriptsubscript𝜏𝑘2𝑘1𝐾(\tau_{k}^{2})_{k=1}^{K}( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, given in Proposition 2.1. These two parameter sets can be determined from past data 𝒟𝒟{\cal D}caligraphic_D, being independent of the i.i.d. sample 𝒯𝒯{\cal T}caligraphic_T, see discussion in Section 1. The rejection area is then received by (easy) random walk simulations involving only these two (estimated) parameter sets (pk)k=1Ksuperscriptsubscriptsubscript𝑝𝑘𝑘1𝐾(p_{k})_{k=1}^{K}( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT and (τk2)k=1Ksuperscriptsubscriptsuperscriptsubscript𝜏𝑘2𝑘1𝐾(\tau_{k}^{2})_{k=1}^{K}( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. This seems simpler than the non-parametric Monte Carlo method used in Denuit et al. [2, Section 3.1].

3 Testing for the area between the curves

The consideration of T(k)superscript𝑇𝑘T^{(k)}italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is motivated by the difference of the CC and the LC. Denote by Fπ(𝑿)1subscriptsuperscript𝐹1𝜋𝑿F^{-1}_{\pi(\boldsymbol{X})}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π ( bold_italic_X ) end_POSTSUBSCRIPT the left-continuous generalized inverse of the distribution function Fπ(𝑿)subscript𝐹𝜋𝑿F_{\pi(\boldsymbol{X})}italic_F start_POSTSUBSCRIPT italic_π ( bold_italic_X ) end_POSTSUBSCRIPT of π(𝑿)𝜋𝑿\pi(\boldsymbol{X})italic_π ( bold_italic_X ). The difference between the CC and the LC at probability level α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] is defined by

U(α)=𝔼[(Y𝔼[Y]π(𝑿)𝔼[π(𝑿)])𝟙{π(𝑿)Fπ(𝑿)1(α)}].𝑈𝛼𝔼delimited-[]𝑌𝔼delimited-[]𝑌𝜋𝑿𝔼delimited-[]𝜋𝑿subscript1𝜋𝑿subscriptsuperscript𝐹1𝜋𝑿𝛼U(\alpha)={\mathbb{E}}\left[\left(\frac{Y}{{\mathbb{E}}[Y]}-\frac{\pi(% \boldsymbol{X})}{{\mathbb{E}}[\pi(\boldsymbol{X})]}\right)\mathds{1}_{\{\pi(% \boldsymbol{X})\leq F^{-1}_{\pi(\boldsymbol{X})}(\alpha)\}}\right].italic_U ( italic_α ) = blackboard_E [ ( divide start_ARG italic_Y end_ARG start_ARG blackboard_E [ italic_Y ] end_ARG - divide start_ARG italic_π ( bold_italic_X ) end_ARG start_ARG blackboard_E [ italic_π ( bold_italic_X ) ] end_ARG ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) ≤ italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π ( bold_italic_X ) end_POSTSUBSCRIPT ( italic_α ) } end_POSTSUBSCRIPT ] .

For a regression function π𝜋\piitalic_π with discrete finite range (2.1), U()𝑈U(\cdot)italic_U ( ⋅ ) only takes K+1𝐾1K+1italic_K + 1 different values in the cumulative probabilities αk:=j=1kpjassignsubscript𝛼𝑘superscriptsubscript𝑗1𝑘subscript𝑝𝑗\alpha_{k}:=\sum_{j=1}^{k}p_{j}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and we set α0=0subscript𝛼00\alpha_{0}=0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0. Namely, we have

U(k):=U(αk)=𝔼[(Y𝔼[Y]π(𝑿)𝔼[π(𝑿)])𝟙{π(𝑿)πk}] for 1kK.formulae-sequenceassignsuperscript𝑈𝑘𝑈subscript𝛼𝑘𝔼delimited-[]𝑌𝔼delimited-[]𝑌𝜋𝑿𝔼delimited-[]𝜋𝑿subscript1𝜋𝑿subscript𝜋𝑘 for 1kK.U^{(k)}:=U(\alpha_{k})={\mathbb{E}}\left[\left(\frac{Y}{{\mathbb{E}}[Y]}-\frac% {\pi(\boldsymbol{X})}{{\mathbb{E}}[\pi(\boldsymbol{X})]}\right)\mathds{1}_{\{% \pi(\boldsymbol{X})\leq\pi_{k}\}}\right]\qquad\text{ for $1\leq k\leq K$.}italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT := italic_U ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = blackboard_E [ ( divide start_ARG italic_Y end_ARG start_ARG blackboard_E [ italic_Y ] end_ARG - divide start_ARG italic_π ( bold_italic_X ) end_ARG start_ARG blackboard_E [ italic_π ( bold_italic_X ) ] end_ARG ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) ≤ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ] for 1 ≤ italic_k ≤ italic_K . (3.1)

Under unbiasedness 𝔼[π(𝑿)]=𝔼[Y]𝔼delimited-[]𝜋𝑿𝔼delimited-[]𝑌{\mathbb{E}}[\pi(\boldsymbol{X})]={\mathbb{E}}[Y]blackboard_E [ italic_π ( bold_italic_X ) ] = blackboard_E [ italic_Y ], we have

U(k)=1𝔼[Y]T(k)=1𝔼[π(𝑿)]T(k).superscript𝑈𝑘1𝔼delimited-[]𝑌superscript𝑇𝑘1𝔼delimited-[]𝜋𝑿superscript𝑇𝑘U^{(k)}=\frac{1}{{\mathbb{E}}[Y]}\,T^{(k)}=\frac{1}{{\mathbb{E}}[\pi(% \boldsymbol{X})]}\,T^{(k)}.italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG blackboard_E [ italic_Y ] end_ARG italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG blackboard_E [ italic_π ( bold_italic_X ) ] end_ARG italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT .

These normalized differences U(k)superscript𝑈𝑘U^{(k)}italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT motivate the study of T(k)superscript𝑇𝑘T^{(k)}italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT under auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ), which implies the above unbiasedness. Denuit et al. [2, Proposition 3.1] do not exploit an auto-calibration test for T(k)superscript𝑇𝑘T^{(k)}italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, but rather for U(α)𝑈𝛼U(\alpha)italic_U ( italic_α ). Unfortunately, the normalized quantities U(α)𝑈𝛼U(\alpha)italic_U ( italic_α ) and U(k)superscript𝑈𝑘U^{(k)}italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT are more involved. For a given i.i.d. sample 𝒯=(Yi,𝑿i)i=1n𝒯superscriptsubscriptsubscript𝑌𝑖subscript𝑿𝑖𝑖1𝑛{\cal T}=(Y_{i},\boldsymbol{X}_{i})_{i=1}^{n}caligraphic_T = ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, consider

Un(k)=1ni=1n(Yiy¯π(𝑿i)π¯)𝟙{π(𝑿i)πk},subscriptsuperscript𝑈𝑘𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑌𝑖¯𝑦𝜋subscript𝑿𝑖¯𝜋subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘U^{(k)}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(\frac{Y_{i}}{\overline{y}}-\frac{% \pi(\boldsymbol{X}_{i})}{\overline{\pi}}\right)\mathds{1}_{\{\pi(\boldsymbol{X% }_{i})\leq\pi_{k}\}},italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG over¯ start_ARG italic_y end_ARG end_ARG - divide start_ARG italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG over¯ start_ARG italic_π end_ARG end_ARG ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ,

with y¯¯𝑦\overline{y}over¯ start_ARG italic_y end_ARG being the empirical mean of (Yi)i=1nsuperscriptsubscriptsubscript𝑌𝑖𝑖1𝑛(Y_{i})_{i=1}^{n}( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and π¯¯𝜋\overline{\pi}over¯ start_ARG italic_π end_ARG the empirical mean of (π(𝑿i))i=1nsuperscriptsubscript𝜋subscript𝑿𝑖𝑖1𝑛(\pi(\boldsymbol{X}_{i}))_{i=1}^{n}( italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Dealing with Un(k)subscriptsuperscript𝑈𝑘𝑛U^{(k)}_{n}italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT instead of Tn(k)subscriptsuperscript𝑇𝑘𝑛T^{(k)}_{n}italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is more cumbersome because of these normalizations. These normalizations are mainly motivated by the fact that they imply that both the CC and the LC are calibrated to 1 for α1𝛼1\alpha\uparrow 1italic_α ↑ 1. In statistical modeling, this then allows one to perform model selection by selecting the model that has the most convex CC, as a higher convexity implies better discrimination; see Wüthrich [7]. Similarly, in economics, a more convex LC indicates higher inequality in wealth distribution. However, for testing of auto-calibration this normalization seems not justified, and we give preference to the simpler unscaled quantity Tn(k)subscriptsuperscript𝑇𝑘𝑛T^{(k)}_{n}italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Note that

nUn(k)𝑛subscriptsuperscript𝑈𝑘𝑛\displaystyle\sqrt{n}U^{(k)}_{n}square-root start_ARG italic_n end_ARG italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT =\displaystyle== n1y¯Tn(k)+n(1y¯1π¯)1ni=1nπ(𝑿i)𝟙{π(𝑿i)πk}𝑛1¯𝑦subscriptsuperscript𝑇𝑘𝑛𝑛1¯𝑦1¯𝜋1𝑛superscriptsubscript𝑖1𝑛𝜋subscript𝑿𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘\displaystyle\sqrt{n}\,\frac{1}{\overline{y}}\,T^{(k)}_{n}+\sqrt{n}\left(\frac% {1}{\overline{y}}-\frac{1}{\overline{\pi}}\right)\frac{1}{n}\sum_{i=1}^{n}\pi(% \boldsymbol{X}_{i})\mathds{1}_{\{\pi(\boldsymbol{X}_{i})\leq\pi_{k}\}}square-root start_ARG italic_n end_ARG divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_y end_ARG end_ARG italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + square-root start_ARG italic_n end_ARG ( divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_y end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_π end_ARG end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT (3.2)
=\displaystyle== n1y¯Tn(k)+n1y¯π¯(π¯y¯)1ni=1nπ(𝑿i)𝟙{π(𝑿i)πk}.𝑛1¯𝑦subscriptsuperscript𝑇𝑘𝑛𝑛1¯𝑦¯𝜋¯𝜋¯𝑦1𝑛superscriptsubscript𝑖1𝑛𝜋subscript𝑿𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘\displaystyle\sqrt{n}\,\frac{1}{\overline{y}}\,T^{(k)}_{n}+\sqrt{n}\,\frac{1}{% \overline{y}\,\overline{\pi}}\left(\overline{\pi}-\overline{y}\right)\frac{1}{% n}\sum_{i=1}^{n}\pi(\boldsymbol{X}_{i})\mathds{1}_{\{\pi(\boldsymbol{X}_{i})% \leq\pi_{k}\}}.square-root start_ARG italic_n end_ARG divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_y end_ARG end_ARG italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + square-root start_ARG italic_n end_ARG divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_y end_ARG over¯ start_ARG italic_π end_ARG end_ARG ( over¯ start_ARG italic_π end_ARG - over¯ start_ARG italic_y end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT .

Corollary 2.2 and Slutsky’s theorem give weak convergence of the first term in (3.2) to Zk/𝔼[Y]subscript𝑍𝑘𝔼delimited-[]𝑌Z_{k}/{\mathbb{E}}[Y]italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / blackboard_E [ italic_Y ]. For the second term in (3.2), one establishes weak convergence of n(π¯y¯)𝑛¯𝜋¯𝑦\sqrt{n}(\overline{\pi}-\overline{y})square-root start_ARG italic_n end_ARG ( over¯ start_ARG italic_π end_ARG - over¯ start_ARG italic_y end_ARG ), and the other terms are treated by Slutsky’s theorem. Finally, one needs to compute the covariance between the two terms in (3.2) to get the asymptotic variance of nUn(k)𝑛subscriptsuperscript𝑈𝑘𝑛\sqrt{n}U^{(k)}_{n}square-root start_ARG italic_n end_ARG italic_U start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. This is doable, but cumbersome. Therefore, we prefer to study the non-normalized quantities Tn(k)subscriptsuperscript𝑇𝑘𝑛T^{(k)}_{n}italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Based on U(α)𝑈𝛼U(\alpha)italic_U ( italic_α ), Denuit et al. [3, formula (4.4)] introduced the area between the curves (ABC) as a model selection criterion. The ABC is defined by

ABC=01U(α)𝑑α=01𝔼[(Y𝔼[Y]π(𝑿)𝔼[π(𝑿)])𝟙{π(𝑿)Fπ(𝑿)1(α)}]𝑑α.ABCsuperscriptsubscript01𝑈𝛼differential-d𝛼superscriptsubscript01𝔼delimited-[]𝑌𝔼delimited-[]𝑌𝜋𝑿𝔼delimited-[]𝜋𝑿subscript1𝜋𝑿subscriptsuperscript𝐹1𝜋𝑿𝛼differential-d𝛼{\rm ABC}=\int_{0}^{1}U(\alpha)\,d\alpha=\int_{0}^{1}{\mathbb{E}}\left[\left(% \frac{Y}{{\mathbb{E}}[Y]}-\frac{\pi(\boldsymbol{X})}{{\mathbb{E}}[\pi(% \boldsymbol{X})]}\right)\mathds{1}_{\{\pi(\boldsymbol{X})\leq F^{-1}_{\pi(% \boldsymbol{X})}(\alpha)\}}\right]d\alpha.roman_ABC = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_U ( italic_α ) italic_d italic_α = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT blackboard_E [ ( divide start_ARG italic_Y end_ARG start_ARG blackboard_E [ italic_Y ] end_ARG - divide start_ARG italic_π ( bold_italic_X ) end_ARG start_ARG blackboard_E [ italic_π ( bold_italic_X ) ] end_ARG ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) ≤ italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π ( bold_italic_X ) end_POSTSUBSCRIPT ( italic_α ) } end_POSTSUBSCRIPT ] italic_d italic_α .

Again, we prefer the unscaled version. Under the discrete finite regression function, we have

ABC:=01𝔼[(Yπ(𝑿))𝟙{π(𝑿)Fπ(𝑿)1(α)}]𝑑α=k=1K1pk+1T(k).assignsuperscriptABCsuperscriptsubscript01𝔼delimited-[]𝑌𝜋𝑿subscript1𝜋𝑿subscriptsuperscript𝐹1𝜋𝑿𝛼differential-d𝛼superscriptsubscript𝑘1𝐾1subscript𝑝𝑘1superscript𝑇𝑘{\rm ABC}^{\circ}:=\int_{0}^{1}{\mathbb{E}}\left[\left(Y-\pi(\boldsymbol{X})% \right)\mathds{1}_{\{\pi(\boldsymbol{X})\leq F^{-1}_{\pi(\boldsymbol{X})}(% \alpha)\}}\right]d\alpha=\sum_{k=1}^{K-1}p_{k+1}\,T^{(k)}.roman_ABC start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT := ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT blackboard_E [ ( italic_Y - italic_π ( bold_italic_X ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) ≤ italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π ( bold_italic_X ) end_POSTSUBSCRIPT ( italic_α ) } end_POSTSUBSCRIPT ] italic_d italic_α = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT .

For a given i.i.d. sample 𝒯=(Yi,𝑿i)i=1n𝒯superscriptsubscriptsubscript𝑌𝑖subscript𝑿𝑖𝑖1𝑛{\cal T}=(Y_{i},\boldsymbol{X}_{i})_{i=1}^{n}caligraphic_T = ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, this motivates the an integrated random walk statistics

ABC^n=k=1K1pk+1Tn(k)=k=1K1pk+1j=1kSn(j)=k=1K1(1αk)Sn(k).superscriptsubscript^ABC𝑛superscriptsubscript𝑘1𝐾1subscript𝑝𝑘1superscriptsubscript𝑇𝑛𝑘superscriptsubscript𝑘1𝐾1subscript𝑝𝑘1superscriptsubscript𝑗1𝑘superscriptsubscript𝑆𝑛𝑗superscriptsubscript𝑘1𝐾11subscript𝛼𝑘superscriptsubscript𝑆𝑛𝑘\widehat{\rm ABC}_{n}^{\circ}=\sum_{k=1}^{K-1}p_{k+1}T_{n}^{(k)}=\sum_{k=1}^{K% -1}p_{k+1}\sum_{j=1}^{k}S_{n}^{(j)}=\sum_{k=1}^{K-1}\left(1-\alpha_{k}\right)S% _{n}^{(k)}.over^ start_ARG roman_ABC end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT .

Under auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ), statistics ABC^nsuperscriptsubscript^ABC𝑛\widehat{\rm ABC}_{n}^{\circ}over^ start_ARG roman_ABC end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT converges to zero, {\mathbb{P}}blackboard_P-a.s. Slightly modifying the terms, we propose the following weighted L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-norm statistics of the increments

Vn2:=k=1K(1αk1)(Sn(k))2,V^{2}_{n}:=\sum_{k=1}^{K}\left(1-\alpha_{k-1}\right)(S_{n}^{(k)})^{2},italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (3.3)

thus, the random walk increments Sn(k)superscriptsubscript𝑆𝑛𝑘S_{n}^{(k)}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT with different signs cannot compensate each other.

Corollary 3.1

Under auto-calibration of π𝜋\piitalic_π for (Y,𝐗)𝑌𝐗(Y,\boldsymbol{X})( italic_Y , bold_italic_X )

nVn2k=1K(1αk1)pkτk2χk2 as n,𝑛subscriptsuperscript𝑉2𝑛superscriptsubscript𝑘1𝐾1subscript𝛼𝑘1subscript𝑝𝑘superscriptsubscript𝜏𝑘2superscriptsubscript𝜒𝑘2 as nn\,V^{2}_{n}\quad\Longrightarrow\quad\sum_{k=1}^{K}\left(1-\alpha_{k-1}\right)% p_{k}\,\tau_{k}^{2}\,\chi_{k}^{2}\qquad\text{ as $n\to\infty$},italic_n italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟹ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as italic_n → ∞ ,

where χk2superscriptsubscript𝜒𝑘2\chi_{k}^{2}italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are i.i.d. χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-distributed random variables with one degree of freedom.

Test 3. Under the above assumptions, we can test for auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ) by exploiting the limiting distribution of Corollary 3.1 numerically. As in Test 2, this limiting distribution only depends on the two parameter sets (pk)k=1Ksuperscriptsubscriptsubscript𝑝𝑘𝑘1𝐾(p_{k})_{k=1}^{K}( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT and (τk2)k=1Ksuperscriptsubscriptsuperscriptsubscript𝜏𝑘2𝑘1𝐾(\tau_{k}^{2})_{k=1}^{K}( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT.

Dropping the weighting 1αk11subscript𝛼𝑘11-\alpha_{k-1}1 - italic_α start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT in (3.3) and scaling the individual terms (Sn(k))2superscriptsuperscriptsubscript𝑆𝑛𝑘2(S_{n}^{(k)})^{2}( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by pkτk2subscript𝑝𝑘superscriptsubscript𝜏𝑘2p_{k}\tau_{k}^{2}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT gives a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test with K𝐾Kitalic_K degrees of freedom.

4 Conclusions

This letter considers statistical testing of auto-calibration. In the simplified set-up of a discrete finite regression function, we provide three different test statistics that have fully known asymptotic distributions under auto-calibration, see (2.3), (2.6) and Corollary 3.1. These three test statistics consider random walk increments, a random walk and an integrated random walk. The three test statistics can be used for statistical testing of auto-calibration in our simpler set-up; Test 2 is a modified version of Denuit et al. [2, Proposition 3.1].

In this letter, we did not cover a study of the powers of these tests. This will depend on the kind of violation of auto-calibration; in fact, we believe that it is beneficial to normalize all random walk increments to unit variance in any of the three presented tests. Another open problem is to generalize these tests to arbitrary regression functions, this seems feasible for Tests 2 and 3.


References

  • [1] Denuit, M., Charpentier, A., Trufin, J. (2021). Autocalibration and Tweedie-dominance for insurance pricing in machine learning. Insurance: Mathematics and Economics 101/B, 485-497.
  • [2] Denuit, M., Huyghe, J., Trufin, J., Verdebout, T. (2024). Testing for auto-calibration with Lorenz and concentration curves. Insurance: Mathematics and Economics 117, 130-139.
  • [3] Denuit, M., Sznajder, D., Trufin, J. (2019). Model selection based on Lorenz and concentration curves, Gini indices and convex order. Insurance: Mathematics and Economics 89, 128-139.
  • [4] Dimitriadis, T., Dümbgen, L., Henzi, A., Puke, M., Ziegel, J. (2023). Honest calibration assessment for binary outcome predictions. Biometrika 110/3, 663-680.
  • [5] Gneiting, T., Resin, J. (2023). Regression diagnostics meets forecst evaluation: conditional calibration, reliability diagrams, and coefficient of determination. Electronic Journal of Statistics 17, 3226-3286.
  • [6] Krüger, F., Ziegel, J.F. (2021). Generic conditions for forecast dominance. Journal of Business and Economics Statistics 39/4, 972-983.
  • [7] Wüthrich, M.V. (2023). Model selection with Gini indices under auto-calibration. European Actuarial Journal 13/1, 469-477.

Supplementary


Proofs


Proof of Proposition 2.1. Set 𝑺n=(Sn(1),,Sn(K))subscript𝑺𝑛superscriptsubscriptsuperscript𝑆1𝑛subscriptsuperscript𝑆𝐾𝑛top\boldsymbol{S}_{n}=(S^{(1)}_{n},\ldots,S^{(K)}_{n})^{\top}bold_italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( italic_S start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , … , italic_S start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. For 𝒓K𝒓superscript𝐾\boldsymbol{r}\in{\mathbb{R}}^{K}bold_italic_r ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, consider the characteristic function

𝔼[exp{in𝒓𝑺n}]𝔼delimited-[]𝑖𝑛superscript𝒓topsubscript𝑺𝑛\displaystyle{\mathbb{E}}\left[\exp\left\{i\sqrt{n}\boldsymbol{r}^{\top}% \boldsymbol{S}_{n}\right\}\right]blackboard_E [ roman_exp { italic_i square-root start_ARG italic_n end_ARG bold_italic_r start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ] =\displaystyle== 𝔼[exp{inj=1nk=1Krk(Yjπ(𝑿j))𝟙{π(𝑿j)=πk}}]𝔼delimited-[]𝑖𝑛superscriptsubscript𝑗1𝑛superscriptsubscript𝑘1𝐾subscript𝑟𝑘subscript𝑌𝑗𝜋subscript𝑿𝑗subscript1𝜋subscript𝑿𝑗subscript𝜋𝑘\displaystyle{\mathbb{E}}\left[\exp\left\{\frac{i}{\sqrt{n}}\sum_{j=1}^{n}\sum% _{k=1}^{K}r_{k}\left(Y_{j}-\pi(\boldsymbol{X}_{j})\right)\mathds{1}_{\{\pi(% \boldsymbol{X}_{j})=\pi_{k}\}}\right\}\right]blackboard_E [ roman_exp { divide start_ARG italic_i end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT } ]
=\displaystyle== j=1nk=1K𝔼[exp{inrk(Yjπ(𝑿j))}𝟙{π(𝑿j)=πk}]superscriptsubscriptproduct𝑗1𝑛superscriptsubscript𝑘1𝐾𝔼delimited-[]𝑖𝑛subscript𝑟𝑘subscript𝑌𝑗𝜋subscript𝑿𝑗subscript1𝜋subscript𝑿𝑗subscript𝜋𝑘\displaystyle\prod_{j=1}^{n}\sum_{k=1}^{K}{\mathbb{E}}\left[\exp\left\{\frac{i% }{\sqrt{n}}r_{k}\left(Y_{j}-\pi(\boldsymbol{X}_{j})\right)\right\}\mathds{1}_{% \{\pi(\boldsymbol{X}_{j})=\pi_{k}\}}\right]∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E [ roman_exp { divide start_ARG italic_i end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) } blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ]
=\displaystyle== exp{nlog(k=1Kpk𝔼[exp{inrk(Yπ(𝑿))}|π(𝑿)=πk])}𝑛superscriptsubscript𝑘1𝐾subscript𝑝𝑘𝔼delimited-[]conditional𝑖𝑛subscript𝑟𝑘𝑌𝜋𝑿𝜋𝑿subscript𝜋𝑘\displaystyle\exp\left\{n\log\left(\sum_{k=1}^{K}p_{k}\,{\mathbb{E}}\left[% \left.\exp\left\{\frac{i}{\sqrt{n}}r_{k}\left(Y-\pi(\boldsymbol{X})\right)% \right\}\right|\pi(\boldsymbol{X})=\pi_{k}\right]\right)\right\}roman_exp { italic_n roman_log ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ roman_exp { divide start_ARG italic_i end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_Y - italic_π ( bold_italic_X ) ) } | italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) }
=\displaystyle== exp{nlog(k=1Kpk(1rk22n𝔼[(Yπ(𝑿))2|π(𝑿)=πk]+o(n1)))}𝑛superscriptsubscript𝑘1𝐾subscript𝑝𝑘1subscriptsuperscript𝑟2𝑘2𝑛𝔼delimited-[]conditionalsuperscript𝑌𝜋𝑿2𝜋𝑿subscript𝜋𝑘𝑜superscript𝑛1\displaystyle\exp\left\{n\log\left(\sum_{k=1}^{K}p_{k}\left(1-\frac{r^{2}_{k}}% {2n}{\mathbb{E}}\left[\left.\left(Y-\pi(\boldsymbol{X})\right)^{2}\right|\pi(% \boldsymbol{X})=\pi_{k}\right]+o(n^{-1})\right)\right)\right\}roman_exp { italic_n roman_log ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 1 - divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_n end_ARG blackboard_E [ ( italic_Y - italic_π ( bold_italic_X ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] + italic_o ( italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ) ) }
=\displaystyle== k=1Kexp{rk2pkτk22}exp{o(1)} as n,superscriptsubscriptproduct𝑘1𝐾superscriptsubscript𝑟𝑘2subscript𝑝𝑘superscriptsubscript𝜏𝑘22𝑜1 as n,\displaystyle\prod_{k=1}^{K}\exp\left\{-r_{k}^{2}\,\frac{p_{k}\tau_{k}^{2}}{2}% \right\}\exp\{o(1)\}\qquad\text{ as $n\to\infty$,}∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_exp { - italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG } roman_exp { italic_o ( 1 ) } as italic_n → ∞ ,

where in the second last step we use auto-calibration of π𝜋\piitalic_π for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ). This completes the proof. \Box


Example

We study a gamma distribution example with K=6𝐾6K=6italic_K = 6 expected response levels (πk)k=1Ksuperscriptsubscriptsubscript𝜋𝑘𝑘1𝐾(\pi_{k})_{k=1}^{K}( italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. Table 1 shows the selected parameters. Firstly, we choose the probabilities (pk)k=1Ksuperscriptsubscriptsubscript𝑝𝑘𝑘1𝐾(p_{k})_{k=1}^{K}( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT such that the boundary levels πk{10,15}subscript𝜋𝑘1015\pi_{k}\in\{10,15\}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { 10 , 15 } receive the smallest probabilities, and the levels in the middle πk{12,13}subscript𝜋𝑘1213\pi_{k}\in\{12,13\}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { 12 , 13 } get the highest probabilities. This is a quite common feature in real data. Secondly, the variance parameters (τk2)k=1Ksuperscriptsubscriptsubscriptsuperscript𝜏2𝑘𝑘1𝐾(\tau^{2}_{k})_{k=1}^{K}( italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT are increasing in regression means (πk)k=1Ksuperscriptsubscriptsubscript𝜋𝑘𝑘1𝐾(\pi_{k})_{k=1}^{K}( italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. Also this is a rather common feature, e.g., a Poisson or a gamma generalized linear model (GLM) have this property. Based on these parameters, we simulate first the regression level πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT using the probabilities (pk)k=1Ksuperscriptsubscriptsubscript𝑝𝑘𝑘1𝐾(p_{k})_{k=1}^{K}( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. Based on this level πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we then simulate the response Y|π(𝑿)=πkΓ(γk,ck)similar-toevaluated-at𝑌𝜋𝑿subscript𝜋𝑘Γsubscript𝛾𝑘subscript𝑐𝑘Y|_{\pi(\boldsymbol{X})=\pi_{k}}\sim\Gamma(\gamma_{k},c_{k})italic_Y | start_POSTSUBSCRIPT italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ roman_Γ ( italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) with shape parameter γk=3πksubscript𝛾𝑘3subscript𝜋𝑘\gamma_{k}=3\pi_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3 italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and scale parameter ck=3subscript𝑐𝑘3c_{k}=3italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3. This gives us conditional mean πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and conditional variance τk2=πk/3superscriptsubscript𝜏𝑘2subscript𝜋𝑘3\tau_{k}^{2}=\pi_{k}/3italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / 3, see Table 1. In particular, auto-calibration is fulfilled in this example because we simulate from the correct means.

k𝑘kitalic_k 1 2 3 4 5 6
πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 10 11 12 13 14 15
pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 10/1001010010/10010 / 100 15/1001510015/10015 / 100 25/1002510025/10025 / 100 25/1002510025/10025 / 100 15/1001510015/10015 / 100 10/1001010010/10010 / 100
τk2subscriptsuperscript𝜏2𝑘\tau^{2}_{k}italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 10/3 11/3 12/3 13/3 14/3 15/3
Table 1: Chosen parameters for the gamma example with K=6𝐾6K=6italic_K = 6.
Refer to caption
Refer to caption
Refer to caption
Figure 1: Simulation of an i.i.d. sample (Yi,π(𝑿i))i=1nsuperscriptsubscriptsubscript𝑌𝑖𝜋subscript𝑿𝑖𝑖1𝑛(Y_{i},\pi(\boldsymbol{X}_{i}))_{i=1}^{n}( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of sample size n=1000𝑛1000n=1000italic_n = 1000: (lhs) boxplot of the responses (Yi)i=1nsuperscriptsubscriptsubscript𝑌𝑖𝑖1𝑛(Y_{i})_{i=1}^{n}( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT classified w.r.t. π(𝑿i)=πk𝜋subscript𝑿𝑖subscript𝜋𝑘\pi(\boldsymbol{X}_{i})=\pi_{k}italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, (middle) lift plot showing the empirical level means y¯ksubscript¯𝑦𝑘\overline{y}_{k}over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT against their expectations πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and (rhs) statistics Sn(k)subscriptsuperscript𝑆𝑘𝑛S^{(k)}_{n}italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K.

Based on the parameters given in Table 1, we simulate an i.i.d. sample (Yi,π(𝑿i))i=1nsuperscriptsubscriptsubscript𝑌𝑖𝜋subscript𝑿𝑖𝑖1𝑛(Y_{i},\pi(\boldsymbol{X}_{i}))_{i=1}^{n}( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of sample size n=1000𝑛1000n=1000italic_n = 1000. Figure 1 (lhs) shows the resulting boxplot of the responses (Yi)i=1nsuperscriptsubscriptsubscript𝑌𝑖𝑖1𝑛(Y_{i})_{i=1}^{n}( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT classified w.r.t. their conditional means π(𝑿i)=πk𝜋subscript𝑿𝑖subscript𝜋𝑘\pi(\boldsymbol{X}_{i})=\pi_{k}italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Remark that there is auto-calibration in this example. Figure 1 (middle) plots the empirical level means

y¯k=1i=1n𝟙{π(𝑿i)=πk}i=1nYi 1{π(𝑿i)=πk},subscript¯𝑦𝑘1superscriptsubscript𝑖1𝑛subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘superscriptsubscript𝑖1𝑛subscript𝑌𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘\overline{y}_{k}=\frac{1}{\sum_{i=1}^{n}\mathds{1}_{\{\pi(\boldsymbol{X}_{i})=% \pi_{k}\}}}\sum_{i=1}^{n}Y_{i}\,\mathds{1}_{\{\pi(\boldsymbol{X}_{i})=\pi_{k}% \}},over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ,

against their (true) conditional expectations πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT; this plot is sometimes also called lift plot. Under auto-calibration, the resulting scatter plot should lie fairly much on the diagonal, and their deviation from the diagonal is described (asymptotically) by Proposition 2.1. Figure 1 (rhs) shows the resulting statistics Sn(k)subscriptsuperscript𝑆𝑘𝑛S^{(k)}_{n}italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K. These are obtained from the lift plot by using a different normalization

Sn(k)=1ni=1n(Yiπ(𝑿i))𝟙{π(𝑿i)=πk}=i=1k𝟙{π(𝑿i)=πk}n(y¯kπk),subscriptsuperscript𝑆𝑘𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑌𝑖𝜋subscript𝑿𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘superscriptsubscript𝑖1𝑘subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘𝑛subscript¯𝑦𝑘subscript𝜋𝑘S^{(k)}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(Y_{i}-\pi(\boldsymbol{X}_{i})\right% )\mathds{1}_{\{\pi(\boldsymbol{X}_{i})=\pi_{k}\}}=\frac{\sum_{i=1}^{k}\mathds{% 1}_{\{\pi(\boldsymbol{X}_{i})=\pi_{k}\}}}{n}\left(\overline{y}_{k}-\pi_{k}% \right),italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ( over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ,

the ratio on the right-hand side is an empirical estimate of pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The magnitude of fluctuations of these statistics Sn(k)subscriptsuperscript𝑆𝑘𝑛S^{(k)}_{n}italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT around zero should be of order pkτk/nsubscript𝑝𝑘subscript𝜏𝑘𝑛\sqrt{p_{k}}\tau_{k}/\sqrt{n}square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG, see Proposition 2.1.

We repeat this simulation of an i.i.d. sample (Yi,π(𝑿i))i=1nsuperscriptsubscriptsubscript𝑌𝑖𝜋subscript𝑿𝑖𝑖1𝑛(Y_{i},\pi(\boldsymbol{X}_{i}))_{i=1}^{n}( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 10,0001000010,00010 , 000 times to study the empirical distribution of the statistics n𝑺n=n(Sn(1),,Sn(K))𝑛subscript𝑺𝑛𝑛superscriptsubscriptsuperscript𝑆1𝑛subscriptsuperscript𝑆𝐾𝑛top\sqrt{n}\boldsymbol{S}_{n}=\sqrt{n}(S^{(1)}_{n},\ldots,S^{(K)}_{n})^{\top}square-root start_ARG italic_n end_ARG bold_italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = square-root start_ARG italic_n end_ARG ( italic_S start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , … , italic_S start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. For large sample sizes n𝑛nitalic_n, this empirical distribution should approximately look like the Gaussian limiting distribution given in Proposition 2.1. Our simulation has an empirical mean 𝔼^[n𝑺n]^𝔼delimited-[]𝑛subscript𝑺𝑛\widehat{{\mathbb{E}}}[\sqrt{n}\boldsymbol{S}_{n}]over^ start_ARG blackboard_E end_ARG [ square-root start_ARG italic_n end_ARG bold_italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] of magnitude 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, thus, close to zero. The empirical covariance matrix reads as

Cov^(n𝑺n)=(0.340.000.000.010.000.000.000.550.010.010.000.010.000.011.010.010.000.010.010.010.011.100.010.010.000.000.000.010.690.000.000.010.010.010.000.50).^Cov𝑛subscript𝑺𝑛matrix0.340.000.000.010.000.000.000.550.010.010.000.010.000.011.010.010.000.010.010.010.011.100.010.010.000.000.000.010.690.000.000.010.010.010.000.50\widehat{\rm Cov}(\sqrt{n}\boldsymbol{S}_{n})=\begin{pmatrix}[r]{\color[rgb]{% 0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{0.34}}&0.00&0.00&-0.01&0% .00&0.00\\ 0.00&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{0.55}}% &0.01&0.01&0.00&0.01\\ 0.00&0.01&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{1% .01}}&-0.01&0.00&-0.01\\ -0.01&0.01&-0.01&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,1}{1.10}}&-0.01&0.01\\ 0.00&0.00&0.00&-0.01&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,1}{0.69}}&0.00\\ 0.00&0.01&-0.01&0.01&0.00&{\color[rgb]{0,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,1}{0.50}}\end{pmatrix}.over^ start_ARG roman_Cov end_ARG ( square-root start_ARG italic_n end_ARG bold_italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ( start_ARG start_ROW start_CELL 0.34 end_CELL start_CELL 0.00 end_CELL start_CELL 0.00 end_CELL start_CELL - 0.01 end_CELL start_CELL 0.00 end_CELL start_CELL 0.00 end_CELL end_ROW start_ROW start_CELL 0.00 end_CELL start_CELL 0.55 end_CELL start_CELL 0.01 end_CELL start_CELL 0.01 end_CELL start_CELL 0.00 end_CELL start_CELL 0.01 end_CELL end_ROW start_ROW start_CELL 0.00 end_CELL start_CELL 0.01 end_CELL start_CELL 1.01 end_CELL start_CELL - 0.01 end_CELL start_CELL 0.00 end_CELL start_CELL - 0.01 end_CELL end_ROW start_ROW start_CELL - 0.01 end_CELL start_CELL 0.01 end_CELL start_CELL - 0.01 end_CELL start_CELL 1.10 end_CELL start_CELL - 0.01 end_CELL start_CELL 0.01 end_CELL end_ROW start_ROW start_CELL 0.00 end_CELL start_CELL 0.00 end_CELL start_CELL 0.00 end_CELL start_CELL - 0.01 end_CELL start_CELL 0.69 end_CELL start_CELL 0.00 end_CELL end_ROW start_ROW start_CELL 0.00 end_CELL start_CELL 0.01 end_CELL start_CELL - 0.01 end_CELL start_CELL 0.01 end_CELL start_CELL 0.00 end_CELL start_CELL 0.50 end_CELL end_ROW end_ARG ) .

The off-diagonals are close to zero and the diagonal is close to true parameters

(p1τ12,,p5τ52)=(0.33, 0.55, 1.00, 1.08, 0.70, 0.50),subscript𝑝1superscriptsubscript𝜏12subscript𝑝5superscriptsubscript𝜏520.330.551.001.080.700.50\left(p_{1}\tau_{1}^{2},\ldots,p_{5}\tau_{5}^{2}\right)=\left(0.33,\,0.55,\,1.% 00,\,1.08,\,0.70,\,0.50\right),( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_p start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = ( 0.33 , 0.55 , 1.00 , 1.08 , 0.70 , 0.50 ) ,

see asymptotic covariance matrix in Proposition 2.1. This confirms the limiting parameters in the weak convergence result of Proposition 2.1.

Refer to caption
Refer to caption
Figure 2: (lhs) Empirical densities of nSn(k)/(pkτk)𝑛subscriptsuperscript𝑆𝑘𝑛subscript𝑝𝑘subscript𝜏𝑘\sqrt{n}S^{(k)}_{n}/(\sqrt{p_{k}}\tau_{k})square-root start_ARG italic_n end_ARG italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / ( square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K, compared to the standard Gaussian density, and (rhs) empirical densities of the random walk nTn(k)𝑛subscriptsuperscript𝑇𝑘𝑛\sqrt{n}T^{(k)}_{n}square-root start_ARG italic_n end_ARG italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K, compared to the Gaussian random walk densities of (Zk)k=1Ksuperscriptsubscriptsubscript𝑍𝑘𝑘1𝐾(Z_{k})_{k=1}^{K}( italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, see (2.5).

Figure 2 (lhs) shows the empirical densities of nSn(k)/(pkτk)𝑛subscriptsuperscript𝑆𝑘𝑛subscript𝑝𝑘subscript𝜏𝑘\sqrt{n}S^{(k)}_{n}/(\sqrt{p_{k}}\tau_{k})square-root start_ARG italic_n end_ARG italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / ( square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K and sample size n=1000𝑛1000n=1000italic_n = 1000. They are benchmarked against the standard Gaussian density in black color. We see a quite good alignment of these empirical densities, supporting the statement of Proposition 2.1. This justifies using the asymptotic approximation (2.3) for the auto-calibration Test 1 in this example. Since the components in the maximum in (2.3) may live on different scales, we also use an alternative test statistics that evaluates the normalized quantities

[max1kKn|Sn(k)pkτk|s]=[1kK{n|Sn(k)pkτk|s}]k=1K(2Φ(s)1).delimited-[]subscript1𝑘𝐾𝑛subscriptsuperscript𝑆𝑘𝑛subscript𝑝𝑘subscript𝜏𝑘𝑠delimited-[]subscript1𝑘𝐾𝑛subscriptsuperscript𝑆𝑘𝑛subscript𝑝𝑘subscript𝜏𝑘𝑠superscriptsubscriptproduct𝑘1𝐾2Φ𝑠1{\mathbb{P}}\left[\max_{1\leq k\leq K}\sqrt{n}\left|\frac{S^{(k)}_{n}}{\sqrt{p% _{k}}\tau_{k}}\right|\leq s\right]={\mathbb{P}}\left[\bigcap_{1\leq k\leq K}% \left\{\sqrt{n}\left|\frac{S^{(k)}_{n}}{\sqrt{p_{k}}\tau_{k}}\right|\leq s% \right\}\right]~{}\approx~{}\prod_{k=1}^{K}\left(2\Phi(s)-1\right).blackboard_P [ roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG | divide start_ARG italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | ≤ italic_s ] = blackboard_P [ ⋂ start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT { square-root start_ARG italic_n end_ARG | divide start_ARG italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | ≤ italic_s } ] ≈ ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( 2 roman_Φ ( italic_s ) - 1 ) . (S.1)

This then directly relates to the (normalized) graphs in Figure 2 (lhs).

Next, we turn our attention to the second test, involving the random walk consideration (2.5). In this case we get the random walk type empirical covariance matrix

Cov^(n𝑻n)=(0.340.340.340.330.330.340.340.890.900.890.900.910.340.901.911.901.911.920.330.891.902.992.993.010.330.901.912.993.683.700.340.911.923.013.704.21).^Cov𝑛subscript𝑻𝑛matrix0.340.340.340.330.330.340.340.890.900.890.900.910.340.901.911.901.911.920.330.891.902.992.993.010.330.901.912.993.683.700.340.911.923.013.704.21\widehat{\rm Cov}(\sqrt{n}\boldsymbol{T}_{n})=\begin{pmatrix}[r]{\color[rgb]{% 0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{0.34}}&0.34&0.34&0.33&0.% 33&0.34\\ 0.34&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{0.89}}% &0.90&0.89&0.90&0.91\\ 0.34&0.90&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{1% .91}}&1.90&1.91&1.92\\ 0.33&0.89&1.90&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,1}{2.99}}&2.99&3.01\\ 0.33&0.90&1.91&2.99&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb% }{0,0,1}{3.68}}&3.70\\ 0.34&0.91&1.92&3.01&3.70&{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor% }{rgb}{0,0,1}{4.21}}\end{pmatrix}.over^ start_ARG roman_Cov end_ARG ( square-root start_ARG italic_n end_ARG bold_italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ( start_ARG start_ROW start_CELL 0.34 end_CELL start_CELL 0.34 end_CELL start_CELL 0.34 end_CELL start_CELL 0.33 end_CELL start_CELL 0.33 end_CELL start_CELL 0.34 end_CELL end_ROW start_ROW start_CELL 0.34 end_CELL start_CELL 0.89 end_CELL start_CELL 0.90 end_CELL start_CELL 0.89 end_CELL start_CELL 0.90 end_CELL start_CELL 0.91 end_CELL end_ROW start_ROW start_CELL 0.34 end_CELL start_CELL 0.90 end_CELL start_CELL 1.91 end_CELL start_CELL 1.90 end_CELL start_CELL 1.91 end_CELL start_CELL 1.92 end_CELL end_ROW start_ROW start_CELL 0.33 end_CELL start_CELL 0.89 end_CELL start_CELL 1.90 end_CELL start_CELL 2.99 end_CELL start_CELL 2.99 end_CELL start_CELL 3.01 end_CELL end_ROW start_ROW start_CELL 0.33 end_CELL start_CELL 0.90 end_CELL start_CELL 1.91 end_CELL start_CELL 2.99 end_CELL start_CELL 3.68 end_CELL start_CELL 3.70 end_CELL end_ROW start_ROW start_CELL 0.34 end_CELL start_CELL 0.91 end_CELL start_CELL 1.92 end_CELL start_CELL 3.01 end_CELL start_CELL 3.70 end_CELL start_CELL 4.21 end_CELL end_ROW end_ARG ) .

Since we work with a small sample size of n=1000𝑛1000n=1000italic_n = 1000, there is still some noise involved which makes to above empirical covariance matrix not a perfect random walk covariance matrix. The random walk covariance matrix of Corollary 2.2 has diagonal entries (0.33,0.88,1.88,2.97,3.67,4.17)0.330.881.882.973.674.17(0.33,0.88,1.88,2.97,3.67,4.17)( 0.33 , 0.88 , 1.88 , 2.97 , 3.67 , 4.17 ). Figure 2 (rhs) plots the empirical densities nTn(k)𝑛subscriptsuperscript𝑇𝑘𝑛\sqrt{n}T^{(k)}_{n}square-root start_ARG italic_n end_ARG italic_T start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K, and these are benchmarked against the Gaussian random walk densities (2.5) of Zksubscript𝑍𝑘Z_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K. Again we see a rather good alignment, supporting the asymptotic approximation (2.6) for auto-calibration Test 2. Clearly, the last random walk components nTn(K)𝑛subscriptsuperscript𝑇𝐾𝑛\sqrt{n}T^{(K)}_{n}square-root start_ARG italic_n end_ARG italic_T start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ZKsubscript𝑍𝐾Z_{K}italic_Z start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, respectively, have the biggest variance, which implies that they will frequently determine the test statistics, see (2.6). Naturally, one could also revert index k𝑘kitalic_k by studying the mirrored quantity, see also Wüthrich (2023) for mirroring,

T~(k)=𝔼[(Yπ(𝑿))𝟙{π(𝑿)πk}]=0,superscript~𝑇𝑘𝔼delimited-[]𝑌𝜋𝑿subscript1𝜋𝑿subscript𝜋𝑘0\widetilde{T}^{(k)}={\mathbb{E}}\left[\left(Y-\pi(\boldsymbol{X})\right)% \mathds{1}_{\{\pi(\boldsymbol{X})\geq\pi_{k}\}}\right]=0,over~ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = blackboard_E [ ( italic_Y - italic_π ( bold_italic_X ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) ≥ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ] = 0 , (S.2)

and its empirical counterpart

T~n(k)=1ni=1n(Yiπ(𝑿i))𝟙{π(𝑿i)πk}=j=kKSn(j).subscriptsuperscript~𝑇𝑘𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑌𝑖𝜋subscript𝑿𝑖subscript1𝜋subscript𝑿𝑖subscript𝜋𝑘superscriptsubscript𝑗𝑘𝐾subscriptsuperscript𝑆𝑗𝑛\widetilde{T}^{(k)}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(Y_{i}-\pi(\boldsymbol{X% }_{i})\right)\mathds{1}_{\{\pi(\boldsymbol{X}_{i})\geq\pi_{k}\}}=\sum_{j=k}^{K% }S^{(j)}_{n}.over~ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

If the terms pkτk2subscript𝑝𝑘superscriptsubscript𝜏𝑘2p_{k}\tau_{k}^{2}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are increasing in 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K, this latter option may give a test with a better power, because the random walk increments will have a decreasing standard deviation.

Refer to caption
Figure 3: Empirical density of nVn2𝑛subscriptsuperscript𝑉2𝑛nV^{2}_{n}italic_n italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT compared to the sum of independent scaled χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-distributions given in Corollary 3.1.

Finally, Figure 3 illustrates the asymptotic result of Corollary 3.1 for a sample size of n=1000𝑛1000n=1000italic_n = 1000. The test statistics nVn2𝑛superscriptsubscript𝑉𝑛2nV_{n}^{2}italic_n italic_V start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT does not consider the maximum of the increments, max1kKn|Sn(k)|subscript1𝑘𝐾𝑛superscriptsubscript𝑆𝑛𝑘\max_{1\leq k\leq K}\sqrt{n}|S_{n}^{(k)}|roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG | italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT |, but it considers a weighted L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-norm of all increments. In (3.3) we study a weighted L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-norm which has been motivated by the ABC. However, in general, it is not clear why this weighting should be justified. Alternatively, we could also consider an unweighted test statistics

nV~n2=nk=1K(Sn(k))2k=1Kpkτk2χk2 as n,𝑛subscriptsuperscript~𝑉2𝑛𝑛superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝑆𝑛𝑘2superscriptsubscript𝑘1𝐾subscript𝑝𝑘superscriptsubscript𝜏𝑘2superscriptsubscript𝜒𝑘2 as nn\,\widetilde{V}^{2}_{n}=n\,\sum_{k=1}^{K}(S_{n}^{(k)})^{2}\quad% \Longrightarrow\quad\sum_{k=1}^{K}p_{k}\,\tau_{k}^{2}\,\chi_{k}^{2}\qquad\text% { as $n\to\infty$},italic_n over~ start_ARG italic_V end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_n ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟹ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as italic_n → ∞ , (S.3)

assuming π𝜋\piitalic_π is auto-calibrated for (Y,𝑿)𝑌𝑿(Y,\boldsymbol{X})( italic_Y , bold_italic_X ). Equivalently, we could just consider a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test

nk=1K(Sn(k))2pkτk2χK2 as n,𝑛superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝑆𝑛𝑘2subscript𝑝𝑘subscriptsuperscript𝜏2𝑘superscriptsubscript𝜒𝐾2 as nn\,\sum_{k=1}^{K}\frac{(S_{n}^{(k)})^{2}}{p_{k}\,\tau^{2}_{k}}\quad% \Longrightarrow\quad\chi_{K}^{2}\qquad\text{ as $n\to\infty$},italic_n ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ⟹ italic_χ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as italic_n → ∞ , (S.4)

where the right-hand side is a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-distributed random variable with K𝐾Kitalic_K degrees of freedom. This is the same scaling as in (S.1), however, we do not consider maximums of increments, but rather aggregated squares of the normalized random walk increments.


Summarizing, we have seen seven different test statistics that we will exploit numerically:

  • (1a)

    From Test 1, we can study the maximum of the increments, see (2.3).

  • (1b)

    A differently scaled version of Test 1 is given in (S.1).

  • (2a)

    From Test 2, we can study the maximum of a random walk, see (2.6).

  • (2b)

    An index reverted version of Test 2 is given in (S.2).

  • (3a)

    From Test 3, we get a weighted L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-norm of the random walk increments, see (3.3).

  • (3b)

    An unweighted alternative of Test 3 is given in (S.3).

  • (3c)

    Finally, we have χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test given by (S.4).

Because we have a discrete regression function π𝜋\piitalic_π taking finitely many values, we receive a natural partition of the covariates space, 𝒳=k=1𝒳k𝒳subscript𝑘1subscript𝒳𝑘{\cal X}=\bigcup_{k=1}{\cal X}_{k}caligraphic_X = ⋃ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and of the range of the regression function, (πk)k=1Ksuperscriptsubscriptsubscript𝜋𝑘𝑘1𝐾(\pi_{k})_{k=1}^{K}( italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. For continuous regression functions π𝜋\piitalic_π, one can discretize the range of the regression function π𝜋\piitalic_π and then perform a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test for auto-calibration. In the Bernoulli case this has been proposed by Hosmer–Lemeshow (1980), and the discretization is done with the help of the (empirical) quantiles of π𝜋\piitalic_π. Our proposal is a generalization to arbitrary responses, and we present test statistics that are different (and differently aggregated and normalized) from the classical χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test in the Bernoulli case.


Next, we aim at comparing the resulting powers of the seven tests in a simulation analysis. We therefore contaminate the above model. We simulate responses

Yδ=Y+δ, with Y|π(𝑿)=πkΓ(γk,ck).formulae-sequencesuperscript𝑌𝛿𝑌𝛿 with similar-toevaluated-at𝑌𝜋𝑿subscript𝜋𝑘Γsubscript𝛾𝑘subscript𝑐𝑘Y^{\delta}=Y+\delta,\quad\text{ with }\quad Y|_{\pi(\boldsymbol{X})=\pi_{k}}% \sim\Gamma(\gamma_{k},c_{k}).italic_Y start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT = italic_Y + italic_δ , with italic_Y | start_POSTSUBSCRIPT italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ roman_Γ ( italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (S.5)

Thus, we introduce a global bias by shifting the means πkπk+δmaps-tosubscript𝜋𝑘subscript𝜋𝑘𝛿\pi_{k}\mapsto\pi_{k}+\deltaitalic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↦ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_δ by a positive constant δ0𝛿0\delta\geq 0italic_δ ≥ 0. This is a global shift as it affects equally all levels πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 1kK1𝑘𝐾1\leq k\leq K1 ≤ italic_k ≤ italic_K.

Test 1a Test 1b Test 2a Test 2b Test 3a Test 3b Test 3c
95% quantiles 2.3456 2.6310 4.2060 4.2263 5.4066 9.1198 12.5916
Table 2: Quantiles of the different test for significance level 5%.

Table 2 gives the quantiles for significance level 5% for the different tests. The quantiles of Tests 1b and 3c are directly available in standard software, the quantile of Test 1a can be found by a root search algorithm, and quantiles of Tests 2a, 2b, 3a and 3b were computed empirically by a (simple) Monte Carlo simulation.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Powers of the seven tests: (top-lhs) global contamination (S.5), (top-rhs) local contamination (S.6) of the lowest level π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, (bottom-lhs) local contamination (S.6) of level π4subscript𝜋4\pi_{4}italic_π start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, (bottom-rhs) local contamination (S.6) of the highest level π6subscript𝜋6\pi_{6}italic_π start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT.

We simulate 10,000 times (with different seeds) i.i.d. samples (Yiδ,𝑿i)i=1nsuperscriptsubscriptsubscriptsuperscript𝑌𝛿𝑖subscript𝑿𝑖𝑖1𝑛(Y^{\delta}_{i},\boldsymbol{X}_{i})_{i=1}^{n}( italic_Y start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, n=1000𝑛1000n=1000italic_n = 1000, and for a grid of contaminations δ{0,1/20,2/20,,1}𝛿01202201\delta\in\{0,1/20,2/20,\ldots,1\}italic_δ ∈ { 0 , 1 / 20 , 2 / 20 , … , 1 }, see (S.5). This gives us for every simulation 1t10,000formulae-sequence1𝑡100001\leq t\leq 10,0001 ≤ italic_t ≤ 10 , 000 and for every contamination level δ{0,1/20,2/20,,1}𝛿01202201\delta\in\{0,1/20,2/20,\ldots,1\}italic_δ ∈ { 0 , 1 / 20 , 2 / 20 , … , 1 } the seven test statistics. In the uncontaminated case δ=0𝛿0\delta=0italic_δ = 0 roughly 5% of the 10,000 simulations should be above the quantiles of Table 2. This then verifies that the asymptotic results for the tests apply, i.e., that n=1000𝑛1000n=1000italic_n = 1000 is a sufficiently large sample size for these tests.

For contaminations δ>0𝛿0\delta>0italic_δ > 0 significantly more simulations should be above the quantiles of Table 2, and the more samples there are above the corresponding quantile the bigger the power of the test. Figure 4 (top-lhs) shows the results. We see that all curves start at the significance level of 5% for δ=0𝛿0\delta=0italic_δ = 0. Then, they increase to 1 for increasing contamination δ1𝛿1\delta\uparrow 1italic_δ ↑ 1. The fastest increase is achieved by Tests 2a-2b (maximum of random walk), followed by Tests 3a-3c (squared sum of random walk increments), and the slowest increase is achieved by Tests 1a-1b (maximum of random walk increments). From this we conclude that the random walk tests (2.6) and (S.2) have the biggest power in case of a global shift, and they should be preferred to find global shifts. Intuitively this is clear, each random walk increment Sn(k)superscriptsubscript𝑆𝑛𝑘S_{n}^{(k)}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is shifted by the contamination δ>0𝛿0\delta>0italic_δ > 0, and in the random walk these shifts are aggregated across all increments. Thus, we have an impact of Kδ𝐾𝛿K\deltaitalic_K italic_δ on the last random walk component Tn(K)superscriptsubscript𝑇𝑛𝐾T_{n}^{(K)}italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT. This is why Tests 2a-2b are the most sensitive ones to global shifts. In our example the order of aggregation is not very relevant, and Tests 2a-2b have almost equal power.

Global shifts are one potential cause of a violation of auto-calibration, but the violation can also only occur on individual levels πksubscript𝜋𝑘\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, or on different levels with different signs. To test for this local failure of the auto-calibration property, we only contaminate individual levels of the regression function. Fix {1,,K}1𝐾\ell\in\{1,\ldots,K\}roman_ℓ ∈ { 1 , … , italic_K }, and consider the local contamination

Yδ,=Y+δ 1{π(𝑿)=π}, with Y|π(𝑿)=πkΓ(γk,ck),formulae-sequencesuperscript𝑌𝛿𝑌𝛿subscript1𝜋𝑿subscript𝜋 with similar-toevaluated-at𝑌𝜋𝑿subscript𝜋𝑘Γsubscript𝛾𝑘subscript𝑐𝑘Y^{\delta,\ell}=Y+\delta\,\mathds{1}_{\{\pi(\boldsymbol{X})=\pi_{\ell}\}},% \quad\text{ with }\quad Y|_{\pi(\boldsymbol{X})=\pi_{k}}\sim\Gamma(\gamma_{k},% c_{k}),italic_Y start_POSTSUPERSCRIPT italic_δ , roman_ℓ end_POSTSUPERSCRIPT = italic_Y + italic_δ blackboard_1 start_POSTSUBSCRIPT { italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } end_POSTSUBSCRIPT , with italic_Y | start_POSTSUBSCRIPT italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ roman_Γ ( italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (S.6)

this only contaminates the responses that have conditional expectation π(𝑿)=π𝜋𝑿subscript𝜋\pi(\boldsymbol{X})=\pi_{\ell}italic_π ( bold_italic_X ) = italic_π start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT.

Based on this local contamination we repeat the above simulation experiment. Since violation of auto-calibration often happens at the boundary of the range of the regression function, we contaminate the model for the smallest and biggest conditional expectations πsubscript𝜋\pi_{\ell}italic_π start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, {1,6}16\ell\in\{1,6\}roman_ℓ ∈ { 1 , 6 }. These are also the least frequent levels in our example. Additionally we contaminate level πsubscript𝜋\pi_{\ell}italic_π start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, =44\ell=4roman_ℓ = 4, being in the main body of the covariate distribution. The results are presented in Figure 4 (top-rhs and bottom). The picture now significantly changes compared to the global contamination. Tests 1b and 3c have the best behavior, both of these tests consider the normalized increments Sn(k)/(pkτk)superscriptsubscript𝑆𝑛𝑘subscript𝑝𝑘subscript𝜏𝑘S_{n}^{(k)}/(\sqrt{p_{k}}\tau_{k})italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT / ( square-root start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). From this we conclude that one should bring all random walk increments first to the same scale. This is especially true if the violation of auto-calibration takes place at rare boundary levels, π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and π6subscript𝜋6\pi_{6}italic_π start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT in our case. For contaminated middle levels, π4subscript𝜋4\pi_{4}italic_π start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT in our case, the Tests 1a-1b and 3a-3c are all almost equally good. On the other hand, one should not use the aggregated random walk versions of Tests 2a-2b, because through aggregation the impact of individual violations of auto-calibration gets diluted. Another observation is that if the violation of auto-calibration happens on the biggest level π6subscript𝜋6\pi_{6}italic_π start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, it cannot be found by the ABC inspired test (3.3). This comes from the scaling 1αK1=pK1subscript𝛼𝐾1subscript𝑝𝐾1-\alpha_{K-1}=p_{K}1 - italic_α start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT which often is a small number. Therefore, we cannot generally recommend Test 3a.

We summarize our findings of the simulation example as follows:

  • Global shifts can most effectively be found by the random walk Tests 2a-2b, but this requires that auto-calibration is violated in the same direction on the entire support of the regression function.

  • Local violation of auto-calibration, especially in the tails of the regression function can most effectively be found by Tests 1b and 3c. Both tests consider scaled random walk increments (with unit variance), i.e., it seems beneficial that all random walk increments live on the same scale.

  • The ABC inspired Test 3a can generally not be recommended, because the ABC weighting seems to prefer the lower over the upper tail of the regression function, but there is no specific reason that justifies such a weighting, compare magenta dotted lines in Figures 4 (top-rhs) and (bottom-rhs).

References

  • Hosmer, D.W., Lemeshow, S. (1980). Goodness of fit tests for the multiple logistic regression model. Communications in Statistics - Theory and Methods 9, 1043-1069.

  • Wüthrich, M.V. (2023). Model selection with Gini indices under auto-calibration. European Actuarial Journal 13/1, 469-477.