Fits the models of Williams (1982) to overdispersed proportions (M.S. Ridout & P.W. Goedhart).
Options
PRINT = string tokens |
What to print if iterative estimation process converges successfully and whether to monitor the iterations (model , summary , accumulated , estimates , correlations , fittedvalues , monitoring ); default * |
---|---|
CONSTANT = string token |
How to treat constant (estimate , omit ); default esti |
FACTORIAL = scalar |
Limit for expansion of model terms; default 3 |
NOMESSAGE = string tokens |
Which warning messages to suppress (dispersion , leverage , residual , aliasing , marginality ); default * |
METHOD = string token |
Which model to fit to take account of the extra variation (II , III ); default II |
MODIFYMODEL = string token |
Whether to leave the modified MODEL settings (WEIGHTS and DISPERSION ) or whether to restore the original situation (yes , no ); default no |
WEIGHTS = variate |
To save estimated weights |
PHI = scalar |
To save estimated overdispersion parameter |
MAXCYCLE = scalar |
Maximum number of iterations; default 10 |
TOLERANCE = scalar |
Convergence criterion; default 0.01 |
Parameter
TERMS = formula |
Model terms to be fitted; if unset it is assumed that the model consists only of a constant term |
---|
Description
In binomial regression models, residual variability is often larger than would be expected if the data were indeed binomially distributed. This may be due to a few outliers or a poor choice of link function but often it simply indicates that the data are from a distribution more variable than the binomial. Such data are said to be “overdispersed” or to exhibit “extra-binomial variation”.
Williams (1982) discusses two possible models to extend the usual binomial model (Model I). Model II assumes that the true variance exceeds the binomial variance by a factor
V = 1 + (NBINOMIAL
-1) × φ (0 ≤ φ ≤ 1)
If the overdispersion parameter PHI were known, the data could be analysed using a binomial model with prior weights 1/V. Procedure EXTRABINOMIAL
estimates φ so that the residual chi-square statistic from this weighted analysis is (approximately) equal to the residual degrees of freedom (Moore 1987). If the binomial totals are all equal, Method II is equivalent to setting the DISPERSION
option of MODEL
equal to the residual chi-square statistic divided by its degrees of freedom.
Alternatively, Model III assumes that the linear predictor varies about its expectation with a constant variance. Usually this variation is assumed to follow a normal distribution; if there is then a logit link, the error distribution will be a logistic normal. Extensions to Model III to have several normal distributions contributing to the variation on the linear predictor, similar to those that occur in stratified analysis of variance, form the basis of many methods suggested for analysing generalized linear mixed models. For Model III, there is generally no simple expression for the exact variance. But the delta method can be used to show that, approximately, the variance exceeds the binomial variance by a factor
V = 1 + (NBINOMIAL
-1) × φ × F2 / (P × (1 – P))
where φ is variance on the scale of the linear predictor, P is the fitted probability and F is the derivative of the inverse of the link function, evaluated at the fitted value of the linear predictor.
Before using EXTRABINOMIAL
a MODEL
statement must be given, in the usual way, to define the y-variate, the binomial totals, the link and any offset. The error distribution must also of course be set to binomial
but any settings of WEIGHTS
or DISPERSION
are ignored.
The form of EXTRABINOMIAL
is similar in many ways to the FIT
directive. There is a single parameter TERMS
to define the model terms to be fitted, and the first four options, PRINT
, CONSTANT
, FACTORIAL
, and NOMESSAGE
, all have the same syntax and purpose as in FIT
. The remaining options are specific to EXTRABINOMIAL
.
The METHOD
option selects which model to use (II
or III
); by default METHOD=II
. Both models involve the estimation of the weight variate (1/V) required to fit the model using the standard Genstat facilities for generalized linear models. If option MODIFYMODEL=yes
, EXTRABINOMIAL
will leave the MODEL
statement in its modified form (provided the iterative estimation of φ converges), with the WEIGHTS
option set to these weights and the DISPERSION
option set to 1, so that directives like DROP
can be used to study the effects of individual terms in the model in the usual way. The TERMS
directive will also be left set to the model specified by the TERMS
parameter of EXTRABINOMIAL
, and this model will be the one most recently fitted, so further output can be obtained using RDISPLAY
.
Options WEIGHTS
and PHI
allow the weights and the estimated value of φ, respectively, to be saved. The MAXCYCLE
option specifies the maximum number of iterations in the estimation, and the TOLERANCE
option defines the convergence criterion:
ABS
(Chi-square – Residual d.f.) < TOLERANCE
× Residual d.f.
Options: PRINT
, CONSTANT
, FACTORIAL
, NOMESSAGE
, METHOD
, MODIFYMODEL
, WEIGHTS
, PHI
, MAXCYCLE
, TOLERANCE
.
Parameter: TERMS
.
Method
If the binomial totals are all equal, φ is determined (non-iteratively) from the residual chi-square statistic.
Otherwise, φ must be found iteratively and the method used (Williams, 1982) involves nested iterations. Each outer iteration (involving a model fit) requires an inner iteration (which uses only CALCULATE
statements) to get the updated estimate of φ. The option MAXCYCLE
controls the maximum number of outer iterations. The maximum number of inner iterations is fixed at 10.
Very precise convergence is not important in practice; the default setting of the TOLERANCE
option ( 1% ) should give a perfectly adequate estimate of φ, usually within 3 iterations.
Action with RESTRICT
Any of the following structures may be restricted: the Y
variate; the NBINOMIAL
variate; the WEIGHTS
variate; the OFFSET
variate; any variate or factor appearing in the model formula. Restrictions on different structures must be compatible. Restricted units are excluded from the analysis.
References
Moore, D.F. (1987). Modelling the extraneous variance in the presence of extra-binomial variation. Applied Statistics, 36, 8-14.
Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Applied Statistics, 31, 144-148.
See also
Procedures: GLMM
, HGANALYSE
, RNEGBINOMIAL
, R0INFLATED
.
Commands for: Regression analysis.
Example
CAPTION 'EXTRABIN example',\ !t('A 2 x 2 factorial experiment comparing germination',\ 'of two types of seed and two root extracts (Crowder, M.J.,',\ '1978, Appl. Statist., 27, 34-37).'); STYLE=meta,plain FACTOR [LABELS=!T(O_75,O_73); VALUES=1,10(1,2)] Seed FACTOR [LABELS=!T(Bean,Cucumber); VALUES=5(1,2),2,5(1,2)] RtExtrct VARIATE NGerm,NSeeds ;\ VALUES=!(10,23,23,26,17,5,53,55,32,46,10,8,10,8,23,0,3,22,15,32,3),\ !(39,62,81,51,39,6,74,72,51,79,13,16,30,28,45,4,12,41,30,51,7) MODEL [DISTRIBUTION=binomial; LINK=logit] NGerm; NBINOMIAL=NSeeds EXTRABIN [PRINT=estimates; PHI=Phi] Seed*RtExtrct PRINT Phi