Analyses data using a hierarchical or double hierarchical generalized linear model (R.W. Payne, Y. Lee, J.A. Nelder & M. Noh).
Options
PRINT = string tokens |
Controls printed output (model , fixedestimates , randomestimates , dispersionestimates , likelihoodstatistics , deviance , waldtests , fittedvalues , monitoring , dhgmonitoring ); default mode , fixe , disp , devi , like , moni |
---|---|
LMETHOD = string token |
Whether to use exact likelihood or extended quasi likelihood to obtain the y-variate and weights for the dispersion model (exact , eql ); default exac |
SEMETHOD = string token |
Method to use to calculate the se’s for the dispersion estimates (approximate , profilelikelihood ); default appr |
DMETHOD = string token |
Method to use for the adjusted profile likelihood when calculating the likelihood statistics (automatic, choleski , lrv ); default auto |
EMETHOD = string token |
Extrapolation method to use (aitken , adjustedaitken ); default aitk |
MLAPLACEORDER = scalar |
Order of Laplace approximation to use in the estimation of the mean model (0 or 1); default 0 |
DLAPLACEORDER = scalar |
Order of Laplace approximation to use in the estimation of the dispersion components (0, 1 or 2); default 0 |
MAXCYCLE = scalars |
Maximum number of iterations of the hierarchical generalized linear model fits, and maximum number of iterations in the fitting of the mean and dispersion models; default 99,50 |
EXIT = scalar |
Exit status (0 for success, 1 for failure to converge) |
TOLERANCE = scalar |
Criterion for convergence; default 0.0005 |
ETOLERANCE = scalar |
Maximum size of ratio of the original to the new estimates allowed in Aitken extrapolation; default 7.5 |
GROUPTERM = formula |
Random term to use as groups when fitting the augmented mean model; default * i.e. none |
Parameters
Y = variate |
Response variate (must be one only) |
---|---|
NBINOMIAL = variate or scalar |
Total numbers for binomial data |
RESIDUALS = variate |
Saves the residuals |
FITTEDVALUES = variate |
Saves the fitted values |
SAVE = pointer |
Saves details of the analysis for use in subsequent HGDISPLAY , HGKEEP , HGPLOT or HGPREDICT statements |
Description
HGANALYSE
is one of several procedures with the prefix HG
, which provide tools for fitting the hierarchical and double hierarchical generalized linear models (HGLMs and DHGLMs) defined by Lee & Nelder (1996, 2001, 2006) and described by Lee, Nelder & Pawitan (2006). These models extend generalized linear models (GLMs) to include additional random terms in the linear predictor. They include generalized linear mixed models (GLMMs) as a special case, but do not constrain the additional terms to follow a Normal distribution and to have an identity link (as in the GLMM). For example, if the basic generalized linear model is a log-linear model (Poisson distribution and log link), a more appropriate assumption for the additional random terms might be a gamma distribution and a log link.
The analysis involves fitting an augmented generalized linear model to describe the mean of the distribution. This has units corresponding to the original data units, together with additional units for the effects of the random terms; see Lee & Nelder (1996). Then there are further GLMs to describe the dispersion for each random term (including the residual dispersion, phi); see Lee & Nelder (2001). In a DHGLM, some of these dispersion GLMs are themselves extended to become HGLMs by the inclusion of random terms; see Lee & Nelder (2006).
Before calling HGANALYSE
, the fixed and random terms in the HGLM must be defined by the HGFIXEDMODEL
and HGRANDOMMODEL
procedures, respectively. The HGDRANDOMMODEL
procedure can then add random terms to a dispersion GLM, so that the model becomes a DHGLM.
The variate to be analysed must be supplied by the Y
parameter and, if the y-values are binomial responses, the NBINOMIAL
parameter should supply the corresponding total numbers. Residuals and fitted values can be saved using the RESIDUALS
and FITTEDVALUES
parameters, respectively. Note that only one y-variate can be analysed at once, so any additional variates are ignored (as occurs with the MODEL
directive when generalized linear models are defined).
The SAVE
parameter allows you to save a pointer containing full details of the analysis. This can then be used to generate further output from HGDISPLAY
, HGKEEP
, HGPLOT
or HGPREDICT
. The most recent save structure is kept automatically inside Genstat to use as a default for the SAVE
options of HGDISPLAY
, HGKEEP
, HGPLOT
and HGPREDICT
. So, you need save the pointer explicitly only if you want to display output from more than one analysis at a time.
The PRINT
, SEMETHOD
and DMETHOD
options control printed output, almost exactly as in the HGDISPLAY
procedure (which is called by HGANALYSE
to produce the output). The only difference is that PRINT
has additional settings: monitoring
provides information about the fitting process of an ordinary HGLM, and dhgmonitoring
provides information about the fitting of the HGLM for the dispersion model in a DHGLM.
The other options control various aspects of the fitting process. The fitting process involves alternative fits of the augmented GLM for the mean given the current estimates of the dispersion parameters, and of the models that estimate the dispersion parameters. The convergence of the process is assessed by comparing the dispersion estimates from successive fits. The MAXCYCLE
option can specify two scalars. The first sets a limit on the number of alternating fits (default 99), and the second controls the number of iterations in the estimation of the mean model and of the dispersion model (default 50). The TOLERANCE
option defines the criterion for convergence in the alternating fits (default 0.005). The EMETHOD
option determines whether Aitken (default) or adjusted Aitken extrapolation is used in the estimation of the dispersion estimates, or you can set EMETHOD=*
to use neither. The ETOLERANCE
option sets an upper limit on the ratio of the changed value to the original values in the extrapolations; the default value is 7.5. The GROUPTERM
option allows you to specify a random term whose factor combinations should be used as a groups factor during the fitting of the augmented mean model (see the GROUPS
option of the MODEL
directive). This allows models with large numbers of random effects to be fitted much more efficiently. However, algorithmic complications mean that predictions can then be made by HGPREDICT
only using a BLUP for a specific random effect of that term – you cannot form predictions at the expected value of the term. The EXIT
option can be set to a scalar which will be set to zero or one according to whether or not the fitting has been successful.
By default HGANALYSE
uses exact likelihood to obtain the y-variate and weights for the dispersion model. This produces estimates with less bias than the previous method, of extended quasi likelihood (EQL). However, option LMETHOD
is provided to enable EQL estimates to be obtained if required. For some of the models the DLAPLACEORDER
option allows the order of Laplace approximation involved in the estimation of the dispersion components to be increased from the standard value (and default) of 0, to either 1 or 2. This is appropriate for generalized linear mixed models with the binomial or Poisson distributions, where use of Laplace order 0 can lead to serious downwards bias. The MLAPLACEORDER
option similarly allows you to set the order of Laplace approximation to use in the estimation of the mean model to 1 instead of 0.
Options: PRINT
, LMETHOD
, SEMETHOD
, DMETHOD
, EMETHOD
, MLAPLACEORDER
, DLAPLACEORDER
, MAXCYCLE
, EXIT
, TOLERANCE
, ETOLERANCE
, GROUPTERM
.
Parameters: Y
, NBINOMIAL
, RESIDUALS
, FITTEDVALUES
, SAVE
.
Method
The model is fitted using the method of Lee & Nelder (2006).
Action with RESTRICT
Restrictions are not allowed.
References
Lee, Y., & Nelder, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series B, 58, 619-678.
Lee, Y., & Nelder, J.A. (2001). Hierarchical generalized linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88, 987-1006.
Lee, Y. & Nelder, J.A. (2006). Double hierarchical generalized linear models (with discussion). Appl. Statist., 55, 139-185.
Lee, Y., Nelder, J.A. & Pawitan, Y. (2006). Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood. Chapman and Hall, Boca Raton.
See also
Procedures: GEE
, GLMM
, HGDISPLAY
, HGDRANDOMMODEL
, HGFIXEDMODEL
, HGFTEST
, HGGRAPH
, HGKEEP
, HGNONLINEAR
, HGPLOT
, HGPREDICT
, HGRANDOMMODEL
, HGRTEST
, HGSTATUS
, HGWALD
.
Commands for: Regression analysis.
Example
CAPTION 'HGANALYSE example',!t(\ 'Breaking angles of cake baked from 3 recipes at 10 temperatures',\ '(Cochran & Cox, 1957, Experimental Designs, page 300).',\ 'Data values are assumed to follow a GLM with a gamma distribution',\ 'and reciprocal link. The linear predictor contains additional',\ 'random variables, with inverse gamma distributions and reciprocal',\ 'link, for replicates and batches of cake mixture.');\ STYLE=meta,plain FACTOR [NVALUES=270; LEVELS=3] Recipe & [LEVELS=15] Replicate & [LEVELS=!(175,185...225)] Temperature GENERATE Recipe,Replicate,Temperature VARIATE [NVALUES=270] Angle READ Angle 42 46 47 39 53 42 47 29 35 47 57 45 32 32 37 43 45 45 26 32 35 24 39 26 28 30 31 37 41 47 24 22 22 29 35 26 26 23 25 27 33 35 24 33 23 32 31 34 24 27 28 33 34 23 24 33 27 31 30 33 33 39 33 28 33 30 28 31 27 39 35 43 29 28 31 29 37 33 24 40 29 40 40 31 26 28 32 25 37 33 39 46 51 49 55 42 35 46 47 39 52 61 34 30 42 35 42 35 25 26 28 46 37 37 31 30 29 35 40 36 24 29 29 29 24 35 22 25 26 26 29 36 26 23 24 31 27 37 27 26 32 28 32 33 21 24 24 27 37 30 20 27 33 31 28 33 23 28 31 34 31 29 32 35 30 27 35 30 23 25 22 19 21 35 21 21 28 26 27 20 46 44 45 46 48 63 43 43 43 46 47 58 33 24 40 37 41 38 38 41 38 30 36 35 21 25 31 35 33 23 24 33 30 30 37 35 20 21 31 24 30 33 24 23 21 24 21 35 24 18 21 26 28 28 26 28 27 27 35 35 28 25 26 25 38 28 24 30 28 35 33 28 28 29 43 28 33 37 19 22 27 25 25 35 21 28 25 25 31 25 : FACPRODUCT !p(Replicate,Recipe); Batch HGFIXEDMODEL [DISTRIBUTION=gamma; LINK=reciprocal] Recipe*Temperature HGRANDOMMODEL [DISTRIBUTION=inversegamma; LINK=reciprocal] Replicate+Batch HGANALYSE [P=#,WALD] Angle