REML directive

Fits a variance-components model by residual (or restricted) maximum likelihood.

Options

`PRINT` = string tokens	What output to present (`model`, `components`, `effects`, `means`, `stratumvariances`, `monitoring`, `vcovariance`, `deviance`, `Waldtests`, `missingvalues`, `covariancemodels`); default `mode`, `comp`, `Wald`, `cova`
`PTERMS` = formula	Terms (fixed or random) for which effects or means are to be printed; default `*` implies all the fixed terms
`PSE` = string token	Standard errors to be printed with tables of effects and means (`differences`, `estimates`, `alldifferences`, `allestimates`, `none`); default `diff`
`WEIGHTS` = variate	Weights for the analysis; default `*` implies all weights 1
`MVINCLUDE` = string tokens	Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (`explanatory`, `yvariate`); default `*` i.e. omit units with missing values in either explanatory factors or variates or y-variates
`SUBMODEL` = formula	Defines a submodel of the fixed model to be assessed against the full model (for `METHOD=Fisher` only)
`RECYCLE` = string token	Whether to reuse the results from the estimation when printing or assessing a submodel (`yes, no`); default `no`
`RMETHOD` = string token	Which random terms to use when calculating `RESIDUALS` (`final`, `all`, `notspline`); default `fina`
`METHOD` = string token	Indicates whether to use the standard Fisher-scoring algorithm or the new AI algorithm with sparse matrix methods (`Fisher`, `AI`); default `AI`
`MAXCYCLE` = scalar	Limit on the number of iterations; default 30
`TOLERANCES` = variate	Tolerances for matrix inversion; default `*` i.e. appropriate default values
`PARAMETERIZATION` = string token	Parameterization to use for the variance component estimation (`gammas`, `sigmas`); default `*` i.e. use whichever is most appropriate for model
`CFORMAT` = string token	Whether printed output for covariance models gives the variance matrices or the parameters (`variancematrices`, `parameters`); default `vari`
`FMETHOD` = string token	Controls whether and how to calculate F-statistics for fixed terms (`automatic`, `none`, `algebraic`, `numerical`); default `auto`
`WORKSPACE` = scalar	Number of blocks of internal memory to be set up for use by the algorithm

Parameters

`Y` = variates	Variates to be analysed
`RESIDUALS` = variates	Residuals from each analysis
`FITTEDVALUES` = variates	Fitted values from each analysis
`EXIT` = scalar	Exit status of the fit (0 if successful)
`SAVE` = REML save structures	Saves the details of each analysis for use in subsequent `VDISPLAY` and `VKEEP` directives

Description

REML estimates the treatment effects and variance components in a linear mixed model: that is, a linear model with both fixed and random effects. The model to be fitted is specified using the VCOMPONENTS directive, covariance models for the random effects can be defined using the VSTRUCTURE directive, and the VRESIDUAL directive can define covariance models for the residual term or specify the residual term for the individual experiments in a meta-analysis. Further output can be produced following REML using VDISPLAY, and output can be saved in Genstat data structures using VKEEP. REML can be used in situations where you would normally use ANOVA but have unbalanced or correlated data, or where you would normally use linear regression, but have more than one source of variation or correlation in the data.

REML can be used to analyse data from a wide variety of applications. It can obtain information on sources and sizes of variability in data sets. This can be of interest where the relative size of different sources of variability must be assessed, for example to identify the least reliable stages in an industrial process, or to design more effective experiments. REML also provides efficient estimates of treatment effects in unbalanced designs with more than one source of error. It can be used to provide estimates of treatment effects that combine information from all the strata of a partially balanced design, or to combine information over similar experiments conducted at different times or in different places. You can thus obtain estimates that make use of the information from all the experiments, as well as the separate estimates from each individual experiment. Examples from several different areas of application can be found in Robinson (1987). The facilities for estimation of covariance models allow estimates of treatment effects and standard errors to be obtained using an appropriate variance model and taking account of the correlation structure of the data.

The method of residual maximum likelihood (REML) was introduced by Patterson & Thompson (1971). It was developed in order to avoid the biased variance component estimates that are produced by ordinary maximum likelihood estimation: because maximum likelihood estimates of variance components take no account of the degrees of freedom used in estimating treatment effects, they have a downwards bias which increases with the number of fixed effects in the model. This in turn leads to under-estimates of standard errors for fixed effects, which may lead to incorrect inferences being drawn from the data. Estimates of variance parameters which take account of the degrees of freedom used in estimating fixed effects, like those generated by ANOVA in balanced data sets, are more desirable.

Once a mixed model has been specified (using VCOMPONENTS) and any covariance structures have been defined (using VSTRUCTURE and VRESIDUAL) you can fit the model to the data (the y-variates) using the REML directive.

The Y parameter lists the variates that are to be modelled. For example, given appropriate factor definitions, the following command sets up a model and analyses the data held in variate Yield:

VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot

REML Yield; FITTED=Fit; RESIDUALS=Res

The FITTEDVALUES and RESIDUALS parameters allow you to store the fitted values and residuals from the fitted model – above they are stored in variates Fit and Res. The EXIT parameter saves the “exit status” of each analysis. This is set to zero if it was completed successfully; for details of the other codes, see VKEEP. The SAVE parameter can be used to name the REML save structure for use with later VKEEP and VDISPLAY directives.

The three options PRINT, PTERMS and PSE all control the printed output. The PRINT option selects the output to be displayed:

`model`	description of model fitted
`components`	estimates of variance components and estimated parameters of covariance models
`effects`	estimates of parameters α and β, the fixed and random effects
`means`	predicted means for factor combinations
`stratumvariances`	approximate stratum variances from a decomposition of the information matrix for the variance components (available only when `METHOD=Fisher`)
`monitoring`	monitoring information at each iteration
`vcovariance`	variance-covariance matrix of the estimated components
`deviance`	deviance of the fitted model ( -2 × log-likelihood RL) plus deviance of submodel when fitted
`waldtests`	Wald tests for all fixed terms in model
`missingvalue`	estimates of missing values
`covariancemodels`	estimated covariance models

The default setting of PRINT=model,components,Wald,cova, gives a description of the model and covariance models that have been fitted, plus estimates of the variance components and the Wald tests. By default if tables of means and effects are requested, tables for all terms in the fixed model are given together with a summary of the standard error of differences between effects/means. Options PTERMS and PSE can be used to change the terms or obtain different types of standard error. For example,

VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot

REML [PRINT=means; PTERMS=Nitrogen.Variety;\

PSE=allestimates] Yield

means that a Nitrogen by Variety table of predicted means will be produced with a standard error for each cell.

The FMETHOD option controls whether to accompany the Wald tests for fixed effects with approximate F statistics and corresponding numbers of residual degrees of freedom. The computations, using the method devised by Kenward & Roger (1997), can be time consuming with large or complicated models. So, with the default setting FMETHOD=automatic, Genstat assesses the model itself and decides automatically whether to do the computations and which method to use. The other settings allow you to control what to do yourself:

`none`	no F statistics are produced;
`algebraic`	F statistics are calculated using algebraic derivatives (which may involve large matrix calculations);
`numerical`	F statistics are calculated using numerical derivatives (which require an extra evaluation of the mixed model equations for every variance parameter).

The CFORMAT option controls the type of output produced for the estimated covariance models. The default setting, variancematrices, produces the variance-covariance matrices for the components, whereas the setting parameters prints their parameters.

The MVINCLUDE option allows the inclusion of units with missing values. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting explanatory allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting yvariate includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see VSTRUCTURE), or to produce predictions of data values for given values of explanatory factors and/or variates.

The WEIGHTS option can be used to specify a weight for each unit in the analysis. This is useful when it is suspected that the size of the random error varies between units. For example, if the random error for unit i is known to have variance v_iσ², a weight variate should be used containing values w_i=1/v_i.

The RMETHOD option controls the way in which residuals and fitted values are formed. For the default setting RMETHOD=final, the fitted values are calculated from all the fixed and random effects. The residuals are the difference between the data and the fitted values and, in this case, are estimates of the *units* random error and can be used to check the Normality and variance homogeneity assumptions for the random error. To get fitted values constructed from the fixed terms alone, omitting all random terms, the setting RMETHOD=all must be used. The setting RMETHOD=notspline means that the residuals will be formed from all the random effects, excluding spline terms.

Option SUBMODEL is used to specify a submodel of the fixed model (but only applies when METHOD=Fisher). This model will be fitted as well as the full fixed model, using a slightly modified version of the algorithm, and the difference in deviances between the full and submodel can be used as a likelihood-based test to assess the importance of the fixed terms dropped from the full model, as described by Welham & Thompson (1997). Once the full model has been fitted, the RECYCLE option can be used to test a series of submodels of the fixed model. If option RECYCLE=yes is set, then only the estimation for the submodel is performed. Information for the full fixed model is picked up from the corresponding save structure. When the RECYCLE option is set, only the deviance and model settings of PRINT can be used.

The METHOD option specifies whether to use the AI (Average Information) algorithm (Gilmour et al. 1995) with sparse matrix methods to maximize the residual likelihood, or Fisher scoring with full matrix manipulation. By default, the sparse Average Information algorithm is used. The AI algorithm generally runs faster per iteration than Fisher scoring and uses much less workspace, but it may require slightly more iterations to reach convergence. When sparse matrix methods are used, standard errors of differences will not be available for random effects, although standard errors are available. Note that when METHOD=AI, the SUBMODEL and RECYCLE options do not apply. The WORKSPACE option (default 1) specifies the number of blocks of internal memory to be allocated for use by the estimation algorithm when METHOD=AI.

Option MAXCYCLE can be used to change the maximum number of iterations performed by the algorithm from the default of 30.

The TOLERANCES option gives tolerances for three matrix inversions. The first two values are matrix inversion tolerances for the information matrix and the mixed model equations respectively and take the value 10^-5 by default. The third value is used to detect zero frequency counts for factor combinations in the mixed model equations: 10^-6 is used by default.

Options: PRINT, PTERMS, PSE, WEIGHTS, MVINCLUDE, SUBMODEL, RECYCLE, RMETHOD, METHOD, MAXCYCLE, TOLERANCES, PARAMETERIZATION, CFORMAT, WORKSPACE.

Parameters: Y, RESIDUALS, FITTEDVALUES, EXIT, SAVE.

Action with `RESTRICT`

Any of the y-variates or any of the factors or variates in the fixed and random models (defined by VCOMPONENTS) may be restricted to indicate that only a subset of the units is to be used in the analysis. However, if more than one of these vectors is restricted, all must be restricted to the same set of units. Any restrictions on the variates supplied to save residuals or fitted values are ignored.

References

Gilmour, A.R., Thompson, R. & Cullis, B. (1995). AIREML, an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51, 1440-1450.

Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997.

Patterson, H.D. & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-554.

Robinson, D.L. (1987). Estimation and use of variance components. The Statistician, 36, 3-14.

Welham, S.J. & Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. Journal of the Royal Statistical Society, Series B, 59, 701-714.

Example

" Example REML-1: Incomplete block analysis
  (from Cochran & Cox, Table 10.3, p. 406) "

FILEREAD [NAME='%gendir%/examples/REML-1.DAT']\ 
  Reps,Blocks,Treats,Yield; FGROUPS=3(yes),no

VCOMPONENTS [FIXED=Treats] RANDOM=Reps/Blocks
CALCULATE Log = LOG(Yield)
REML [PRINT=components,effects,means,stratum,vcov,monitor; PSE=diff;\ 
  PTERMS=Reps/Blocks; METHOD=Fisher; MAXCYCLE=2] Yield,Log; SAVE=Syield,Slog
VAIC

" Plot residuals for each analysis"
VPLOT [GRAPHICS=high; SAVE=Syield] fitted
VPLOT [GRAPHICS=high; SAVE=Slog] fitted

Updated on October 28, 2020

Tagged: Command Procedures OLD

Was this article helpful?

Yes No