Fits a variance-components model by residual (or restricted) maximum likelihood.
Options
PRINT = string tokens |
What output to present (model , components , effects , means , stratumvariances , monitoring , vcovariance , deviance , Waldtests , missingvalues , covariancemodels ); default mode , comp , Wald , cova |
---|---|
PTERMS = formula |
Terms (fixed or random) for which effects or means are to be printed; default * implies all the fixed terms |
PSE = string token |
Standard errors to be printed with tables of effects and means (differences , estimates , alldifferences , allestimates , none ); default diff |
WEIGHTS = variate |
Weights for the analysis; default * implies all weights 1 |
MVINCLUDE = string tokens |
Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (explanatory , yvariate ); default * i.e. omit units with missing values in either explanatory factors or variates or y-variates |
SUBMODEL = formula |
Defines a submodel of the fixed model to be assessed against the full model (for METHOD=Fisher only) |
RECYCLE = string token |
Whether to reuse the results from the estimation when printing or assessing a submodel (yes, no ); default no |
RMETHOD = string token |
Which random terms to use when calculating RESIDUALS (final , all , notspline ); default fina |
METHOD = string token |
Indicates whether to use the standard Fisher-scoring algorithm or the new AI algorithm with sparse matrix methods (Fisher , AI ); default AI |
MAXCYCLE = scalar |
Limit on the number of iterations; default 30 |
TOLERANCES = variate |
Tolerances for matrix inversion; default * i.e. appropriate default values |
PARAMETERIZATION = string token |
Parameterization to use for the variance component estimation (gammas , sigmas ); default * i.e. use whichever is most appropriate for model |
CFORMAT = string token |
Whether printed output for covariance models gives the variance matrices or the parameters (variancematrices , parameters ); default vari |
FMETHOD = string token |
Controls whether and how to calculate F-statistics for fixed terms (automatic , none , algebraic , numerical ); default auto |
WORKSPACE = scalar |
Number of blocks of internal memory to be set up for use by the algorithm |
Parameters
Y = variates |
Variates to be analysed |
---|---|
RESIDUALS = variates |
Residuals from each analysis |
FITTEDVALUES = variates |
Fitted values from each analysis |
EXIT = scalar |
Exit status of the fit (0 if successful) |
SAVE = REML save structures |
Saves the details of each analysis for use in subsequent VDISPLAY and VKEEP directives |
Description
REML
estimates the treatment effects and variance components in a linear mixed model: that is, a linear model with both fixed and random effects. The model to be fitted is specified using the VCOMPONENTS
directive, covariance models for the random effects can be defined using the VSTRUCTURE
directive, and the VRESIDUAL
directive can define covariance models for the residual term or specify the residual term for the individual experiments in a meta-analysis. Further output can be produced following REML
using VDISPLAY
, and output can be saved in Genstat data structures using VKEEP
. REML
can be used in situations where you would normally use ANOVA
but have unbalanced or correlated data, or where you would normally use linear regression, but have more than one source of variation or correlation in the data.
REML
can be used to analyse data from a wide variety of applications. It can obtain information on sources and sizes of variability in data sets. This can be of interest where the relative size of different sources of variability must be assessed, for example to identify the least reliable stages in an industrial process, or to design more effective experiments. REML
also provides efficient estimates of treatment effects in unbalanced designs with more than one source of error. It can be used to provide estimates of treatment effects that combine information from all the strata of a partially balanced design, or to combine information over similar experiments conducted at different times or in different places. You can thus obtain estimates that make use of the information from all the experiments, as well as the separate estimates from each individual experiment. Examples from several different areas of application can be found in Robinson (1987). The facilities for estimation of covariance models allow estimates of treatment effects and standard errors to be obtained using an appropriate variance model and taking account of the correlation structure of the data.
The method of residual maximum likelihood (REML) was introduced by Patterson & Thompson (1971). It was developed in order to avoid the biased variance component estimates that are produced by ordinary maximum likelihood estimation: because maximum likelihood estimates of variance components take no account of the degrees of freedom used in estimating treatment effects, they have a downwards bias which increases with the number of fixed effects in the model. This in turn leads to under-estimates of standard errors for fixed effects, which may lead to incorrect inferences being drawn from the data. Estimates of variance parameters which take account of the degrees of freedom used in estimating fixed effects, like those generated by ANOVA
in balanced data sets, are more desirable.
Once a mixed model has been specified (using VCOMPONENTS
) and any covariance structures have been defined (using VSTRUCTURE
and VRESIDUAL
) you can fit the model to the data (the y-variates) using the REML
directive.
The Y
parameter lists the variates that are to be modelled. For example, given appropriate factor definitions, the following command sets up a model and analyses the data held in variate Yield
:
VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot
REML Yield; FITTED=Fit; RESIDUALS=Res
The FITTEDVALUES
and RESIDUALS
parameters allow you to store the fitted values and residuals from the fitted model – above they are stored in variates Fit
and Res
. The EXIT
parameter saves the “exit status” of each analysis. This is set to zero if it was completed successfully; for details of the other codes, see VKEEP
. The SAVE
parameter can be used to name the REML
save structure for use with later VKEEP
and VDISPLAY
directives.
The three options PRINT
, PTERMS
and PSE
all control the printed output. The PRINT
option selects the output to be displayed:
The default setting of PRINT=model,components,Wald,cova
, gives a description of the model and covariance models that have been fitted, plus estimates of the variance components and the Wald tests. By default if tables of means and effects are requested, tables for all terms in the fixed model are given together with a summary of the standard error of differences between effects/means. Options PTERMS
and PSE
can be used to change the terms or obtain different types of standard error. For example,
VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot
REML [PRINT=means; PTERMS=Nitrogen.Variety;\
PSE=allestimates] Yield
means that a Nitrogen
by Variety
table of predicted means will be produced with a standard error for each cell.
The FMETHOD
option controls whether to accompany the Wald tests for fixed effects with approximate F statistics and corresponding numbers of residual degrees of freedom. The computations, using the method devised by Kenward & Roger (1997), can be time consuming with large or complicated models. So, with the default setting FMETHOD=automatic
, Genstat assesses the model itself and decides automatically whether to do the computations and which method to use. The other settings allow you to control what to do yourself:
none |
no F statistics are produced; |
---|---|
algebraic |
F statistics are calculated using algebraic derivatives (which may involve large matrix calculations); |
numerical |
F statistics are calculated using numerical derivatives (which require an extra evaluation of the mixed model equations for every variance parameter). |
The CFORMAT
option controls the type of output produced for the estimated covariance models. The default setting, variancematrices
, produces the variance-covariance matrices for the components, whereas the setting parameters
prints their parameters.
The MVINCLUDE
option allows the inclusion of units with missing values. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting explanatory
allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting yvariate
includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see VSTRUCTURE
), or to produce predictions of data values for given values of explanatory factors and/or variates.
The WEIGHTS
option can be used to specify a weight for each unit in the analysis. This is useful when it is suspected that the size of the random error varies between units. For example, if the random error for unit i is known to have variance viσ2, a weight variate should be used containing values wi=1/vi.
The RMETHOD
option controls the way in which residuals and fitted values are formed. For the default setting RMETHOD=final
, the fitted values are calculated from all the fixed and random effects. The residuals are the difference between the data and the fitted values and, in this case, are estimates of the *units*
random error and can be used to check the Normality and variance homogeneity assumptions for the random error. To get fitted values constructed from the fixed terms alone, omitting all random terms, the setting RMETHOD=all
must be used. The setting RMETHOD=notspline
means that the residuals will be formed from all the random effects, excluding spline terms.
Option SUBMODEL
is used to specify a submodel of the fixed model (but only applies when METHOD=Fisher
). This model will be fitted as well as the full fixed model, using a slightly modified version of the algorithm, and the difference in deviances between the full and submodel can be used as a likelihood-based test to assess the importance of the fixed terms dropped from the full model, as described by Welham & Thompson (1997). Once the full model has been fitted, the RECYCLE
option can be used to test a series of submodels of the fixed model. If option RECYCLE=yes
is set, then only the estimation for the submodel is performed. Information for the full fixed model is picked up from the corresponding save structure. When the RECYCLE
option is set, only the deviance
and model
settings of PRINT
can be used.
The METHOD
option specifies whether to use the AI (Average Information) algorithm (Gilmour et al. 1995) with sparse matrix methods to maximize the residual likelihood, or Fisher scoring with full matrix manipulation. By default, the sparse Average Information algorithm is used. The AI algorithm generally runs faster per iteration than Fisher scoring and uses much less workspace, but it may require slightly more iterations to reach convergence. When sparse matrix methods are used, standard errors of differences will not be available for random effects, although standard errors are available. Note that when METHOD=AI
, the SUBMODEL
and RECYCLE
options do not apply. The WORKSPACE
option (default 1) specifies the number of blocks of internal memory to be allocated for use by the estimation algorithm when METHOD=AI
.
Option MAXCYCLE
can be used to change the maximum number of iterations performed by the algorithm from the default of 30.
The TOLERANCES
option gives tolerances for three matrix inversions. The first two values are matrix inversion tolerances for the information matrix and the mixed model equations respectively and take the value 10-5 by default. The third value is used to detect zero frequency counts for factor combinations in the mixed model equations: 10-6 is used by default.
Options: PRINT
, PTERMS
, PSE
, WEIGHTS
, MVINCLUDE
, SUBMODEL
, RECYCLE
, RMETHOD
, METHOD
, MAXCYCLE
, TOLERANCES
, PARAMETERIZATION
, CFORMAT
, WORKSPACE
.
Parameters: Y
, RESIDUALS
, FITTEDVALUES
, EXIT
, SAVE
.
Action with RESTRICT
Any of the y-variates or any of the factors or variates in the fixed and random models (defined by VCOMPONENTS
) may be restricted to indicate that only a subset of the units is to be used in the analysis. However, if more than one of these vectors is restricted, all must be restricted to the same set of units. Any restrictions on the variates supplied to save residuals or fitted values are ignored.
References
Gilmour, A.R., Thompson, R. & Cullis, B. (1995). AIREML, an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51, 1440-1450.
Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997.
Patterson, H.D. & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-554.
Robinson, D.L. (1987). Estimation and use of variance components. The Statistician, 36, 3-14.
Welham, S.J. & Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. Journal of the Royal Statistical Society, Series B, 59, 701-714.
See also
Directives: VCOMPONENTS
, VSTRUCTURE
, VRESIDUAL
, VDISPLAY
, VPREDICT
, VKEEP
, VPEDIGREE
, VCYCLE
, VSTATUS
.
Procedures: VAIC
, VALLSUBSETS
, VBOOTSTRAP
, VCHECK
, VCRITICAL
, VFRESIDUALS
, VGRAPH
, VPLOT
, VDEFFECTS
, VDFIELDRESIDUALS
, VFIXEDTESTS
, VFLC
, VFPEDIGREE
, VFUNCTION
, VHERITABILITY
, VLSD
, VMETA
, VMCOMPARISON
, VPOWER
, VRACCUMULATE
, VRCHECK
, VRFIT
, VRMETAMODEL
, VSAMPLESIZE
, VSCREEN
, VSOM
, VSPREADSHEET
, VSURFACE
, VTCOMPARISONS
, VABLOCKDESIGN
, VAMETA
, VAROWCOLUMNDESIGN
, VASERIES
, VALINEBYTESTER
, VLINEBYTESTER
, FCONTRASTS
, FDIALLEL
, AOVANYHOW
.
Commands for: REML analysis of linear mixed models, Analysis of variance.
Example
" Example REML-1: Incomplete block analysis (from Cochran & Cox, Table 10.3, p. 406) " FILEREAD [NAME='%gendir%/examples/REML-1.DAT']\ Reps,Blocks,Treats,Yield; FGROUPS=3(yes),no VCOMPONENTS [FIXED=Treats] RANDOM=Reps/Blocks CALCULATE Log = LOG(Yield) REML [PRINT=components,effects,means,stratum,vcov,monitor; PSE=diff;\ PTERMS=Reps/Blocks; METHOD=Fisher; MAXCYCLE=2] Yield,Log; SAVE=Syield,Slog VAIC " Plot residuals for each analysis" VPLOT [GRAPHICS=high; SAVE=Syield] fitted VPLOT [GRAPHICS=high; SAVE=Slog] fitted