Beta Regression For Modelling Rates and Proportions: Silvia L. P. Ferrari and Francisco Cribari-Neto
Beta Regression For Modelling Rates and Proportions: Silvia L. P. Ferrari and Francisco Cribari-Neto
A This paper proposes a regression model where the response is beta distributed using
a parameterization of the beta law that is indexed by mean and dispersion parameters. The
proposed model is useful for situations where the variable of interest is continuous and restricted
to the interval (0, 1) and is related to other variables through a regression structure. The
regression parameters of the beta regression model are interpretable in terms of the mean of the
response and, when the logit link is used, of an odds ratio, unlike the parameters of a linear
regression that employs a transformed response. Estimation is performed by maximum likelihood.
We provide closed-form expressions for the score function, for Fisher’s information matrix and
its inverse. Hypothesis testing is performed using approximations obtained from the asymptotic
normality of the maximum likelihood estimator. Some diagnostic measures are introduced.
Finally, practical applications that employ real data are presented and discussed.
Introduction
Practitioners commonly use regression models to analyse data that are perceived
to be related to other variables. The linear regression model, in particular, is
commonly used in applications. It is not, however, appropriate for situations
where the response is restricted to the interval (0, 1) since it may yield fitted
values for the variable of interest that exceed its lower and upper bounds. A
possible solution is to transform the dependent variable so that it assumes values
on the real line, and then to model the mean of the transformed response as a
linear predictor based on a set of exogenous variables. This approach, however,
has drawbacks, one of them being the fact that the model parameters cannot be
easily interpreted in terms of the original response. Another shortcoming is that
measures of proportions typically display asymmetry, and hence inference based
on the normality assumption can be misleading. Our goal is to propose a
regression model that is tailored for situations where the dependent variable ( y)
is measured continuously on the standard unit interval, i.e. 0\y\1. The
proposed model is based on the assumption that the response is beta distributed.
The beta distribution, as is well known, is very flexible for modelling proportions
since its density can have quite different shapes depending on the values of the
two parameters that index the distribution. The beta density is given by
!(pòq)
n( y; p, q)ó yp1(1ñy)q1, 0\y\1 (1)
!(p)!(q)
where p[0, q[0 and (·) is the gamma function. The mean and variance of y
are, respectively,
p
E( y)ó (2)
(pòq)
and
pq
var( y)ó (3)
(pòq)2(pòqò1)
The mode of the distribution exists when both p and q are greater than one:
mode( y)ó(pñ1)/(pòqñ2). The uniform distribution is a particular case of
equation (1) when póqó1. Estimation of p and q by maximum likelihood and
the application of small sample bias adjustments to the maximum likelihood
estimators of these parameters are discussed by Cribari-Neto & Vasconcellos
(2002).
‘Beta distributions are very versatile and a variety of uncertainties can be
usefully modelled by them. This flexibility encourages its empirical use in a wide
range of applications’ (Johnson et al., 1995, p. 235). Several applications of the
beta distribution are discussed by Bury (1999) and by Johnson et al. (1995).
These applications, however, do not involve situations where the practitioner is
required to impose a regression structure for the variable of interest. Our interest
lies in situations where the behaviour of the response can be modelled as a
function of a set of exogenous variables. To that end, we shall propose a beta
regression model. We shall also discuss the estimation of the unknown parameters
by maximum likelihood and some diagnostic techniques. Large sample inference
is also considered. The modelling and inferential procedures we propose are
similar to those for generalized linear models (McCullagh & Nelder, 1989),
except that the distribution of the response is not a member of the exponential
family. An alternative to the model we propose is the simplex model in Jørgensen
(1997) which is defined by four parameters. Our model, on the other hand, is
defined by only two parameters, and is flexible enough to handle a wide range
of applications.
It is noteworthy that several empirical applications can be handled using the
proposed class of regression models. As a first illustration, consider the dataset
collected by Prater (1956). The dependent variable is the proportion of crude oil
converted to gasoline after distillation and fractionation, and the potential
covariates are: the crude oil gravity (degrees API), the vapour pressure of the
crude oil (lbf/in2), the crude oil 10% point ASTIM (i.e. the temperature at which
10% of the crude oil has become vapour), and the temperature (ºF) at which all
the gasoline is vaporized. The dataset contains 32 observations on the response
and on the independent variables. It has been noted (Daniel & Wood, 1971, Ch. 8)
Beta Regression for Modelling Rates and Proportions 801
that there are only ten sets of values of the first three explanatory variables that
correspond to ten different crudes and were subjected to experimentally con-
trolled distillation conditions. This dataset was analysed by Atkinson (1985),
who used the linear regression model and noted that there is ‘indication that the
error distribution is not quite symmetrical, giving rise to some unduly large and
small residuals’ (Atkinson, 1995, p. 60). He proceeded to transform the response
so that the transformed dependent variable assumed values on the real line, and
then used it in a linear regression analysis. Our approach will be different: we
shall analyse these data using the beta regression model proposed in the next
section.
The paper unfolds as follows. The next section presents the beta regression
model, and discusses maximum likelihood estimation and large sample inference.
Diagnostic measures are discussed in the section after. The fourth section
contains applications of the proposed regression model, including an analysis of
Prater’s gasoline data. Concluding remarks are given in the final section. Tech-
nical details are presented in two separate appendices.
densities have ‘J shapes’ and two others have inverted ‘J shapes’. Although we
did not plot the uniform case, we note that when ó1/2 and ó2 the density
reduces to that of a standard uniform distribution. The beta density can also be
‘U shaped’ (skewed or not), and this situation is also not displayed in Figure 1.
Throughout the paper we shall assume that the response is constrained to the
standard unit interval (0, 1). The model we shall propose, however, is still useful
for situations where the response is restricted to the interval (a, b), where a and
b are known scalars, a\b. In this case, one would model ( yña)/(bña) instead
of modelling y directly.
Beta Regression for Modelling Rates and Proportions 803
k
g(k )ó ; x b óg (5)
t ti i t
i1
k†/(1ñk†)
ec@ió
k/(1ñk)
that is, exp{c } equals the odds ratio. Consider, for instance, Prater’s gasoline
i
example introduced in the previous section, and define the odds of converting
crude oil into gasoline as the number of units of crude oil, out of ten units, that
are, on average, converted into gasoline divided by the number of units that are
not converted. As an illustration, if, on average, 20% of the crude oil is
transformed into gasoline, then the odds of conversion equals 2/8. Suppose that
the temperature at which all the gasoline is vaporized increases by 50ºF, then 50
times the regression parameter associated with this covariate can be interpreted
as the log of the ratio between the chance of converting crude oil into gasoline
under the new setting relative to the old setting, all other variables remaining
constant.
804 S. L. P. Ferrari & F. Cribari-Neto
n
l(b, {)ó ; l (k , {) (6)
t t
t1
where
l (k , {)ólog !({)ñlog !(k {)ñlog !((1ñk ){)ò(k {ñ1) log y (7)
t t t t t t
ò{(1ñk ){ñ1} log(1ñy )
t t
with defined so that equation (5) holds. Let y*ólog{ y /(1ñy )} and
t t t t
k*ót(k {)ñt((1ñk ){). The score function, obtained by differentiating the
t t t
log-likelihood function with respect to the unknown parameters (see Appendix
A), is given by (U (b, {)T , U{(b, {))T, where
@
U (b, {)ó{X TT( y*ñk*) (8)
@
with X being an nîk matrix whose tth row is xT , Tódiag{1/g@(k ), . . . ,
t 1
1/g@(k )}, y*ó( y* , . . . , y*)T and k*ó(k* , . . . , k*)T, and
n 1 n 1 n
n
U{(b, {)ó ; {k ( y*ñk*)òlog(1ñy )ñt((1ñk ){)òt({)} (9)
t t t t t
t1
The next step is to obtain an expression for Fisher’s information matrix. The
notation can be described as follows. Let Wódiag{w , . . . , w }, with
1 n
1
w ó{{t@(k {)òt@((1ñk ){)}
t t t { g@(k )}2
t
có(c , . . . , c ) , with c ó{{t@(k {)k ñt@((1ñk ){) (1ñk )}, where @(·) is the
T
1 n t t t t t
trigamma function. Also, let Dódiag{d , . . . , d }, with d ót@(k {)k2ò
1 n t t t
t@((1ñk ){) (1ñk )2ñt@({). It is shown in Appendix A that Fisher’s information
t t
matrix is given by
K K
KóK(b, {)ó @@ @{ (10)
K{ K{{
@
where K ó{X TWX, K {óKT{bóX TTc and K{{ótr(D). Note that the parameters
@@ @
and are not orthogonal, in contrast to what is verified in the class of
generalized linear regression models (McCullagh & Nelder, 1989).
Under the usual regularity conditions for maximum likelihood estimation,
when the sample size is large,
b̂ b
N , K1
{ˆ ˜ k 1 {
used to obtain asymptotic standard errors for the maximum likelihood estimates.
Using standard expressions for the inverse of partitioned matrices (e.g. Rao,
1973, p. 33), we obtain
K@@ K@{
K1óK1(b, {)ó (11)
K{@ K{{
where
1 X TTccTT TX(X TWX)1
K@@ó (X TWX)1 I ò
{ k c{
Diagnostic Measures
After the fit of the model, it is important to perform diagnostic analyses in order
to check the goodness-of-fit of the estimated model. We shall introduce a global
measure of explained variation and graphical tools for detecting departures from
the postulated model and influential observations.
At the outset, a global measure of explained variation can be obtained by
computing the pseudo R2 (R2) defined as the square of the sample correlation
p
coefficient between ĝ and g( y). Note that 0OR2O1 and perfect agreement
p
between ĝ and g( y), and hence between k̂ and y, yields R2ó1.
p
The discrepancy of a fit can be measured as twice the difference between
the maximum log-likelihood achievable (saturated model) and that achieved by
the model under investigation. Let D( y; k, {)ó&n 2(l (k̃ , {)ñl (k , {)),
t1 t t t t
where k̃ is the value of that solves Ll /Lk ó0, i.e. {( y*ñk*)ó0. When is
t t t t t t
large, k*Blog{k /(1ñk )}, and it then follows that k̃ By ; see Appendix B. For
t t t t t
known , this discrepancy measure is D( y; k̄, {), where k̄ is the maximum
likelihood estimator of under the model being investigated. When is
unknown, an approximation to this quantity is D( y; k̂, {ˆ ); it can be named, as
usual, the deviance for the current mode. Note that D( y; k̂, {ˆ )ó&n (rd)2, where
t1 t
rdósign( y ñk̂ ){2(l (k̃ , {ˆ )ñl (k̂ , {ˆ ))}12
t t t t t t t
Note now that the tth observation contributes a quantity (rd)2 to the deviance,
t
and thus an observation with a large absolute value of rd can be viewed as
t
discrepant. We shall call rd the tth deviance residual.
t
It is also possible to define the standardized residuals:
y ñk̂
ró t t
t
vâr( y )
t
where k̂ óg1(xTb̂) and vâr( y )ó{k̂ (1ñk̂ )}/(1ò{ˆ ). A plot of these residuals
t t t t t
against the index of the observations (t) should show no detectable pattern. Also,
a detectable trend in the plot of r against ĝ could be suggestive of link function
t t
misspecification.
Since the distribution of the residuals is not known, half-normal plots with
simulated envelopes are a helpful diagnostic tool (Atkinson, 1985, section 4.2;
Neter et al., 1996, section 14.6). The main idea is to enhance the usual half-
normal plot by adding a simulated envelope that can be used to decide whether
the observed residuals are consistent with the fitted model. Half-normal plots
with a simulated envelope can be produced as follows:
(i) fit the model and generate a simulated sample of n independent
observations using the fitted model as if it were the true model;
(ii) fit the model to the generated sample, and compute the ordered
absolute values of the residuals;
(iii) repeat steps (i) ad (ii) k times;
(iv) consider the n sets of the k order statistics; for each set compute its
average, minimum and maximum values;
Beta Regression for Modelling Rates and Proportions 807
(v) plot these values and the ordered residuals of the original sample
against the half-normal scores 1((tònñ1/8)/(2nò1/2)).
The minimum and maximum values of the k order statistics yield the envelope.
Atkinson (1985, p. 36) suggests using kó19, so that the probability that a given
absolute residual will fall beyond the upper band provided by the envelope is
approximately equal to 1/20ó0.05. Observations corresponding to absolute
residuals outside the limits provided by the simulated envelope are worthy of
further investigation. Additionally, if a considerable proportion of points falls
outside the envelope, then one has evidence against the adequacy of the fitted
model.
Next, we shall be concerned with the identification of influential observations
and residual analysis. In what follows we shall use the generalized leverage
proposed by Wei et al. (1998), which is defined as
Lỹ
GL(h̃)ó
LyT
where is an s-vector such that E( y)ó() and h̃ is an estimator of , with
ỹók(h̃). Here, the (t, u) element of GL(h̃), i.e. the generalized leverage of the
estimator h̃ at (t, u), is the instantaneous rate of change in tth predicted value
with respect to the uth response value. As noted by the authors, the generalized
leverage is invariant under reparameterization and observations with large GL
tu
are leverage points. Let ĥ be the maximum likelihood estimator of , assumed to
exist and to be unique, and assume that the log-likelihood function has second-
order continuous derivatives with respect to and y. Wei et al. (1998) have
shown that the generalized leverage is obtained by evaluating
L2l 1 L2l
GL(h)óD ñ
F LhLh T
LhLyT
gA(k ) 1
q ó {{t@(k {)òt@((1ñk ){}ò( y*ñk*) t , tó1, . . . , n
t t t t t g@(k ) { g@(k )}2
t t
Additionally, it can be shown that L2l/LLyTóX TTM, where Módiag{m , . . . ,
1
m } with m ó1/{ y (1ñy )}, tó1, . . . , n. Therefore, we obtain
n t t t
GL(b)óTX(X TQX)1X TTM (12)
808 S. L. P. Ferrari & F. Cribari-Neto
L2l {X TTM
T
ó
LhLy bT
where bó(b , . . . , b )T with b óñ( y ñ )/{ y (1ñy )}, tó1, . . . , n. It can now be
1 n t t t t t
shown that
1
GL(b, {)óGL(b)ò TX(X TQX)1X TTf( f TTX(X TQX)1X TTMñbT)
c{
where GL() is given in equation (12). When is large, GL(, )BGL().
A measure of the influence of each observation on the regression parameter
estimates is Cook’s distance (Cook, 1977) given by k1(b̂ñb̂ )TX TWX(b̂ñb̂ ),
t t
where b̂ is the parameter estimate without the tth observation. It measures the
t
squared distance between b̂ and b̂ . To avoid fitting the model nò1 times, we
t
shall use the usual approximation to Cook’s distance given by
h r2
Có tt t
t k(1ñh )2
tt
It combines leverage and residuals. It is common practice to plot C against t.
t
Finally, we note that other diagnostic measures can be considered, such as
local influence measures (Cook, 1986).
Applications
This section contains two applications of the beta regression model proposed in the
second section. all computations were carried out using the matrix programming
language Ox (Doornik, 2001). The computer code and dataset used in the first
application are available at http://www.de.ufpe.br/˜cribari/betareg_example.zip.
Beta Regression for Modelling Rates and Proportions 809
Figure 2. Six diagnostic plots for Prater’s gasoline data. The upper left panel plots the
standardized residuals against t, the upper right panel plots the deviance residuals versus t,
the middle left panel displays the half-normal plot of absolute deviance residuals with a
simulated envelope, the middle right panel plots standardized residuals against ĝ , the lower
t
left panel presents a plot of C versus t, and the lower right panel plots the diagonal elements
t
of GL(b̂, ) against k̂
t
that the point estimates of the s were not significantly altered, but that the
estimate of the precision parameter jumped from 440.3 to 577.8; despite that,
however, the reduction in the asymptotic standard errors of the regression
parameter estimates was negligible.
The next application uses data on food expenditure, income, and number of
persons in each household from a random sample of 38 households in a large
US city; the source of the data is Griffiths et al. (1993, Table 15.4). The interest
lies in modelling the proportion of income spent on food ( y) as a function of
the level of income (x ) and the number of persons in the household (x ). At
2 3
the outset, consider a linear regression of the response on the covariates. The
estimated regression displayed evidence of heteroskedasticity; the p-value for
Koenker’s (1981) homoskedasticity test was 0.0514. If we consider instead the
regression of log{ y/(1ñy)} on the two covariates, the evidence of heteroskedastic-
ity is attenuated, but the residuals become highly asymmetric to the left.
We shall now consider the beta regression model proposed in the second
section. As previously mentioned, this model accommodates naturally non-
constant variances and skewness. The model is specified as
g(k )ób òb x òb x
t 1 2 t2 3 t3
Beta Regression for Modelling Rates and Proportions 811
The link function used was logit. The parameter estimates are given in Table 2.
The pseudo R2 of the estimated regression was 0.3878.
The values in Table 2 show that both covariates are statistically significant at
the usual nominal levels. We also note that there is a negative relationship
between the mean response (proportion of income spent on food) and the level
of income, and that there is a positive relationship between the mean response
and the number of persons in the household. Diagnostic plots similar to those
presented in Figure 2 were also produced but for brevity are not presented.
Concluding Remarks
This paper proposed a regression model tailored for responses that are measured
continuously on the standard unit interval, i.e. y é (0, 1), which is the situation that
practitioners encounter when modelling rates and proportions. The underlying
assumption is that the response follows a beta law. As is well known, the beta
distribution is very flexible for modelling data on the standard unit interval,
since the beta density can display quite different shapes depending on the values
of the parameters that index the distribution. We use a parameterization in
which a function of the mean of the dependent variable is given by a linear
predictor that is defined by regression parameters and explanatory variables. The
proposed parameterization also allows for a precision parameter. When the logit
link function is used to transform the mean response, the regression parameters
can be interpreted in terms of the odds ratio. Parameter estimation is performed
by maximum likelihood, and we provide closed-form expressions for the score
function, for Fisher’s information matrix and its inverse. Interval estimation
for different population quantities (such as regression parameters, precision
parameter, mean response, odds ratio) is discussed. Tests of hypotheses on the
regression parameters can be performed using asymptotic tests, and three tests
are presented: likelihood ratio, score and Wald. We also consider a set of
diagnostic techniques that can be employed to identify departures from the
postulated model and influential observations. These include a measure of the
degree of leverage of the different observations, and a half normal plot of
residuals with envelopes obtained from a simulation scheme. Applications using
real data sets were presented and discussed.
Acknowledgements
The authors gratefully acknowledge partial financial support from CNPq and
FAPESP. The authors also thank Gilberto Paula and a referee for comments
and suggestions on an earlier draft.
812 S. L. P. Ferrari & F. Cribari-Neto
References
Abramowtiz, M. & Stegun, I. A. (1965) Handbook of Mathematical Functions with Formulas, Graphs and
Mathematical Tables (New York: Dover).
Atkinson, A. C. (1985) Plots, Transformations and Regression: An Introduction to Graphical Methods of
Diagnostic Regression Analysis (New York: Oxford University Press).
Bury, K. (1999) Statistical Distributions in Engineering (New York: Cambridge University Press).
Cook, R. D. (1977) Detection of influential observations in linear regression, Technometrics, 19, pp. 15–18.
Cook, R. D. (1986) Assessment of local influence (with discussion), Journal of the Royal Statistical Society
B, 48, pp. 133–169.
Cribari-Neto, F. & Vasconcellos, K. L. P. (2002) Nearly unbiased maximum likelihood estimation for the
beta distribution, Journal of Statistical Computation and Simulation, 72, pp. 107–118.
Daniel, C. & Wood, F. S. (1971) Fitting Equations to Data (New York: Wiley).
Doornik, J. A. (2001) Ox: an Object-oriented Matrix programming Language, 4th edn (London: Timberlake
Consultants and Oxford: http://www.nuff.ox.ac.uk/Users/Doornik/).
Griffiths, W. E., Hill, R. C. & Judge, G. G. (1993) Learning and Practicing Econometrics (New York:
Wiley).
Johnson, N. L., Kotz, S. & Balakrishnan, N. (1995) Continuous Univariate Distributions, vol. 2, 2nd edn
(New York: Wiley).
Jørgesen, B. (1997) Proper dispersion models (with discussion), Brazilian Journal of Probability and
Statistics, 11, pp. 89–140.
Koenker, R. (1981) A note on studentizing a test for heteroscedasticity, Journal of Econometrics, 17,
pp. 107–112.
McCullagh, P. & Nelder, J. A. (1989) Generalized Linear Models, 2nd edn (London: Chapman and Hall).
Neter, J., Kutner, M. H., Nachtsheim, C. J. & Wasserman, W. (1996) Applied Linear Statistical Models,
4th edn (Chicago, IL: Irwin).
Nocedal, J. & Wright, S. J. (1999) Numerical Optimization (New York: Springer-Verlag).
Prater, N. H. (1956) Estimate gasoline yields from crudes, Petroleum Refiner, 35, pp. 236–238.
Rao, C. R. (1973) Linear Statistical Inference and Its Applications, 2nd edn (New York: Wiley).
Wei, B.-C., Hu, Y.-Q. & Fung, W.-K. (1998) Generalized leverage and its applications, Scandinavian
Journal of Statistics, 25, pp. 25–37.
Appendix A
In this appendix we obtain the score function and the Fisher information matrix
for (, ). The notation used here is defined in the second section. From equation
(6) we get, for ió1, . . . , k,
Ll (k , {) y
t t ó{ log t ñ{t(k {)ñt((1ñk ){)} (A2)
Lk 1ñy t t
t t
where (·) is the digamma function, i.e. (z)ód log (z)/dz for z[0. From
regularity conditions, it is known that the expected value of the derivative in
equation (7) equals zero, so that k*óE( y*), where y* and k* are defined in the
t t t t
second section. Hence,
Beta Regression for Modelling Rates and Proportions 813
Ll(b, {) n 1
ó{ ; ( y*ñk*) x (A3)
Lb t t g@(k ) ti
i t1 t
We then arrive at the matrix expression for the score function for given in
equation (8). Similarly, it can be shown that the score function for can be
written as in equation (9).
From equation (A1), the second derivative of l(, ) with respect to the s is
given by
L2l(b, {) n L Ll (k , {) dk dk Lg
ó; t t t t tx
Lb Lb Lk
t1 t
Lk dg dg Lb ti
i j t t t j
n L2l (k , {) dk Ll (k , {) L dk dk
ó; t t tò t t t tx x
Lk2 dg Lk Lk dg dg ti tj
t1 t t t t t t
Since E(Ll ( , )/L )ó0, we have
t t t
L2l(b, {) n L2l (k , {) dk 2
E ó; E t t t x x
Lb Lb Lk2 dg ti tj
i j t1 t t
Now, from equation (A2) we have
L2l (k , {)
t t óñ{2{t@(k {)òt@((1ñk ){)}
Lk2 t t
t
and hence
L2l(b, {) n
E óñ{ ; w x x
Lb Lb t ti tj
i j t1
In matrix form, we have that
L2l(b, {)
E ó{X TWX
LbLbT
From equation (A3), the second derivative of l(, ) with respect to and
i
can be written as
L2l(b, {) n Lk* 1
ó ; ( y*ñk*)ñ{ t x
Lb L{ t t L{ g@(k ) ti
i t1 t
Since E( y*)ók* and Lk*/L{ót@(k {)k ñt@((1ñk ){) (1ñk ), we arrive at
t t t t t t t
L2l(b, {) n 1
E óñ ; c x
Lb L{ t g@(k ) ti
i t1 t
814 S. L. P. Ferrari & F. Cribari-Neto
L2l(b, {)
E óñX TTc
LbL{
L2l(b, {)
E óñtr(D)
L{2
It is now easy to obtain the Fisher information matrix for (, ) given in
equation (10).
Appendix B
In this Appendix, we show how to perform large sample inference in the
beta regression model we propose. Consider, for instance, the test of the null
hypothesis H : b ób0 versus H : b Öb0, where ó( , . . . , )T and
0 1 1 1 1 1 1 1 m
b0ó(b0 , . . . , b0)T, for m\k, and b0 given. The log-likelihood ratio statistic is
1 1 m 1
u ó2{l(b̂, {ˆ )ñl(b̃, {˜ )}
1
where l(, ) is the log-likelihood function and (b̃T , {˜ )T is the restricted maximum
likelihood estimator of (T, )T obtained by imposing the null hypothesis. Under
the usual regularity conditions and under H , u D s2 , so that a test can be
0 1 m
performed using approximate critical values from the asymptotic s2 distribution.
m
In order to describe the score test, let U denote the m-vector containing the
1@
first m elements of the score function for and let K@@ be the mîm matrix
11
formed out of the first m rows and the first m columns of K1. It can be shown,
T
using equation (8), that U ó{X T( y*ñk*), where X is partitioned as [X X ]
1@ 1 1 2
following the partition of . Rao’s score statistic can be written as
u óŨT K̃@@ Ũ
2 1@ 11 1@
where tildes indicate that the quantities are evaluated at the restricted maximum
likelihood estimator. Under the usual regularity conditions and under
H ,u D s2 .
0 2 m
Asymptotic inference can also be performed using Wald’s test. The test statistic
for the test of H : b ób0 is
0 1 1
u ó(b̂ ñb0)T (K̂@@ )1 (b̂ ñb0)
3 1 1 11 1 1
where K̂@@ equals K@@ evaluated at the unrestricted maximum likelihood esti-
11 11
mator, and b̂ is the maximum likelihood estimator of . Under mild
1 1
regularity conditions and under H , u D s2 . In particular, for testing the
0 3 m
significance of the ith regression parameter ( ), ió1, . . . , k, one can use the
i
signed square root of Wald’s statistic, i.e. b̂ /se(b̂ ), where se(b̂ ) is the asymptotic
i i i
Beta Regression for Modelling Rates and Proportions 815
where ĝóxTb̂ and se(ĝ)óxT côv(b̂)x; here, côv(b̂) is obtained from the inverse
of Fisher’s information matrix evaluated at the maximum likelihood estimates
by excluding the row and column of this matrix corresponding to the precision
parameter. The above interval is valid for strictly increasing link functions.
Appendix C
Here we shall obtain approximations for w and k* , tó1, . . . , n, when and
t t t
(1ñ ) are large. At the outset, note that (Abramowitz & Stegun, 1965, p. 259),
t
as zê,
1 1 1
t(z)ólog(z)ñ ñ ò ò. . . (C1)
2z 12z2 120z4
1 1 1 1
t@(z)ó ò ò ñ ò. . . (C2)
z 2z2 6z3 30z5
In what follows, we shall drop the subscript t (that indexes observations). When
and (1ñ) are large, it follows from equation (C2) that
1 1 1 1 1
wB{ ò ó
k{ (1ñk){ g@(k)2 k(1ñk) g@(k)2
k
k*Blog(k{)ñlog((1ñk){)ólog
1ñk