1. Home
  2. REML directive

REML directive

Fits a variance-components model by residual (or restricted) maximum likelihood.

Options

PRINT = string tokens What output to present (model, components, effects, means, stratumvariances, monitoring, vcovariance, deviance, Waldtests, missingvalues, covariancemodels); default mode, comp, Wald, cova
PTERMS = formula Terms (fixed or random) for which effects or means are to be printed; default * implies all the fixed terms
PSE = string token Standard errors to be printed with tables of effects and means (differences, estimates, alldifferences, allestimates, none); default diff
WEIGHTS = variate Weights for the analysis; default * implies all weights 1
MVINCLUDE = string tokens Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (explanatory, yvariate); default * i.e. omit units with missing values in either explanatory factors or variates or y-variates
SUBMODEL = formula Defines a submodel of the fixed model to be assessed against the full model (for METHOD=Fisher only)
RECYCLE = string token Whether to reuse the results from the estimation when printing or assessing a submodel (yes, no); default no
RMETHOD = string token Which random terms to use when calculating RESIDUALS (final, all, notspline); default fina
METHOD = string token Indicates whether to use the standard Fisher-scoring algorithm or the new AI algorithm with sparse matrix methods (Fisher, AI); default AI
MAXCYCLE = scalar Limit on the number of iterations; default 30
TOLERANCES = variate Tolerances for matrix inversion; default * i.e. appropriate default values
PARAMETERIZATION = string token Parameterization to use for the variance component estimation (gammas, sigmas); default * i.e. use whichever is most appropriate for model
CFORMAT = string token Whether printed output for covariance models gives the variance matrices or the parameters (variancematrices, parameters); default vari
FMETHOD = string token Controls whether and how to calculate F-statistics for fixed terms (automatic, none, algebraic, numerical); default auto
WORKSPACE = scalar Number of blocks of internal memory to be set up for use by the algorithm

Parameters

Y = variates Variates to be analysed
RESIDUALS = variates Residuals from each analysis
FITTEDVALUES = variates Fitted values from each analysis
EXIT = scalar Exit status of the fit (0 if successful)
SAVE = REML save structures Saves the details of each analysis for use in subsequent VDISPLAY and VKEEP directives

Description

REML estimates the treatment effects and variance components in a linear mixed model: that is, a linear model with both fixed and random effects. The model to be fitted is specified using the VCOMPONENTS directive, covariance models for the random effects can be defined using the VSTRUCTURE directive, and the VRESIDUAL directive can define covariance models for the residual term or specify the residual term for the individual experiments in a meta-analysis. Further output can be produced following REML using VDISPLAY, and output can be saved in Genstat data structures using VKEEP. REML can be used in situations where you would normally use ANOVA but have unbalanced or correlated data, or where you would normally use linear regression, but have more than one source of variation or correlation in the data.

REML can be used to analyse data from a wide variety of applications. It can obtain information on sources and sizes of variability in data sets. This can be of interest where the relative size of different sources of variability must be assessed, for example to identify the least reliable stages in an industrial process, or to design more effective experiments. REML also provides efficient estimates of treatment effects in unbalanced designs with more than one source of error. It can be used to provide estimates of treatment effects that combine information from all the strata of a partially balanced design, or to combine information over similar experiments conducted at different times or in different places. You can thus obtain estimates that make use of the information from all the experiments, as well as the separate estimates from each individual experiment. Examples from several different areas of application can be found in Robinson (1987). The facilities for estimation of covariance models allow estimates of treatment effects and standard errors to be obtained using an appropriate variance model and taking account of the correlation structure of the data.

The method of residual maximum likelihood (REML) was introduced by Patterson & Thompson (1971). It was developed in order to avoid the biased variance component estimates that are produced by ordinary maximum likelihood estimation: because maximum likelihood estimates of variance components take no account of the degrees of freedom used in estimating treatment effects, they have a downwards bias which increases with the number of fixed effects in the model. This in turn leads to under-estimates of standard errors for fixed effects, which may lead to incorrect inferences being drawn from the data. Estimates of variance parameters which take account of the degrees of freedom used in estimating fixed effects, like those generated by ANOVA in balanced data sets, are more desirable.

Once a mixed model has been specified (using VCOMPONENTS) and any covariance structures have been defined (using VSTRUCTURE and VRESIDUAL) you can fit the model to the data (the y-variates) using the REML directive.

The Y parameter lists the variates that are to be modelled. For example, given appropriate factor definitions, the following command sets up a model and analyses the data held in variate Yield:

VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot

REML Yield; FITTED=Fit; RESIDUALS=Res

The FITTEDVALUES and RESIDUALS parameters allow you to store the fitted values and residuals from the fitted model – above they are stored in variates Fit and Res. The EXIT parameter saves the “exit status” of each analysis. This is set to zero if it was completed successfully; for details of the other codes, see VKEEP. The SAVE parameter can be used to name the REML save structure for use with later VKEEP and VDISPLAY directives.

The three options PRINT, PTERMS and PSE all control the printed output. The PRINT option selects the output to be displayed:

    model description of model fitted
    components estimates of variance components and estimated parameters of covariance models
    effects estimates of parameters α and β, the fixed and random effects
    means predicted means for factor combinations
    stratumvariances approximate stratum variances from a decomposition of the information matrix for the variance components (available only when METHOD=Fisher)
    monitoring monitoring information at each iteration
    vcovariance variance-covariance matrix of the estimated components
    deviance deviance of the fitted model ( -2 × log-likelihood RL) plus deviance of submodel when fitted
    waldtests Wald tests for all fixed terms in model
    missingvalue estimates of missing values
    covariancemodels estimated covariance models

The default setting of PRINT=model,components,Wald,cova, gives a description of the model and covariance models that have been fitted, plus estimates of the variance components and the Wald tests. By default if tables of means and effects are requested, tables for all terms in the fixed model are given together with a summary of the standard error of differences between effects/means. Options PTERMS and PSE can be used to change the terms or obtain different types of standard error. For example,

VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot

REML [PRINT=means; PTERMS=Nitrogen.Variety;\

     PSE=allestimates] Yield

means that a Nitrogen by Variety table of predicted means will be produced with a standard error for each cell.

The FMETHOD option controls whether to accompany the Wald tests for fixed effects with approximate F statistics and corresponding numbers of residual degrees of freedom. The computations, using the method devised by Kenward & Roger (1997), can be time consuming with large or complicated models. So, with the default setting FMETHOD=automatic, Genstat assesses the model itself and decides automatically whether to do the computations and which method to use. The other settings allow you to control what to do yourself:

    none no F statistics are produced;
    algebraic F statistics are calculated using algebraic derivatives (which may involve large matrix calculations);
    numerical F statistics are calculated using numerical derivatives (which require an extra evaluation of the mixed model equations for every variance parameter).

The CFORMAT option controls the type of output produced for the estimated covariance models. The default setting, variancematrices, produces the variance-covariance matrices for the components, whereas the setting parameters prints their parameters.

The MVINCLUDE option allows the inclusion of units with missing values. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting explanatory allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting yvariate includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see VSTRUCTURE), or to produce predictions of data values for given values of explanatory factors and/or variates.

The WEIGHTS option can be used to specify a weight for each unit in the analysis. This is useful when it is suspected that the size of the random error varies between units. For example, if the random error for unit i is known to have variance viσ2, a weight variate should be used containing values wi=1/vi.

The RMETHOD option controls the way in which residuals and fitted values are formed. For the default setting RMETHOD=final, the fitted values are calculated from all the fixed and random effects. The residuals are the difference between the data and the fitted values and, in this case, are estimates of the *units* random error and can be used to check the Normality and variance homogeneity assumptions for the random error. To get fitted values constructed from the fixed terms alone, omitting all random terms, the setting RMETHOD=all must be used. The setting RMETHOD=notspline means that the residuals will be formed from all the random effects, excluding spline terms.

Option SUBMODEL is used to specify a submodel of the fixed model (but only applies when METHOD=Fisher). This model will be fitted as well as the full fixed model, using a slightly modified version of the algorithm, and the difference in deviances between the full and submodel can be used as a likelihood-based test to assess the importance of the fixed terms dropped from the full model, as described by Welham & Thompson (1997). Once the full model has been fitted, the RECYCLE option can be used to test a series of submodels of the fixed model. If option RECYCLE=yes is set, then only the estimation for the submodel is performed. Information for the full fixed model is picked up from the corresponding save structure. When the RECYCLE option is set, only the deviance and model settings of PRINT can be used.

The METHOD option specifies whether to use the AI (Average Information) algorithm (Gilmour et al. 1995) with sparse matrix methods to maximize the residual likelihood, or Fisher scoring with full matrix manipulation. By default, the sparse Average Information algorithm is used. The AI algorithm generally runs faster per iteration than Fisher scoring and uses much less workspace, but it may require slightly more iterations to reach convergence. When sparse matrix methods are used, standard errors of differences will not be available for random effects, although standard errors are available. Note that when METHOD=AI, the SUBMODEL and RECYCLE options do not apply. The WORKSPACE option (default 1) specifies the number of blocks of internal memory to be allocated for use by the estimation algorithm when METHOD=AI.

Option MAXCYCLE can be used to change the maximum number of iterations performed by the algorithm from the default of 30.

The TOLERANCES option gives tolerances for three matrix inversions. The first two values are matrix inversion tolerances for the information matrix and the mixed model equations respectively and take the value 10-5 by default. The third value is used to detect zero frequency counts for factor combinations in the mixed model equations: 10-6 is used by default.

Options: PRINT, PTERMS, PSE, WEIGHTS, MVINCLUDE, SUBMODEL, RECYCLE, RMETHOD, METHOD, MAXCYCLE, TOLERANCES, PARAMETERIZATION, CFORMAT, WORKSPACE.

Parameters: Y, RESIDUALS, FITTEDVALUES, EXIT, SAVE.

Action with RESTRICT

Any of the y-variates or any of the factors or variates in the fixed and random models (defined by VCOMPONENTS) may be restricted to indicate that only a subset of the units is to be used in the analysis. However, if more than one of these vectors is restricted, all must be restricted to the same set of units. Any restrictions on the variates supplied to save residuals or fitted values are ignored.

References

Gilmour, A.R., Thompson, R. & Cullis, B. (1995). AIREML, an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51, 1440-1450.

Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997.

Patterson, H.D. & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-554.

Robinson, D.L. (1987). Estimation and use of variance components. The Statistician, 36, 3-14.

Welham, S.J. & Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. Journal of the Royal Statistical Society, Series B, 59, 701-714.

See also

Directives: VCOMPONENTS, VSTRUCTURE, VRESIDUAL, VDISPLAY, VPREDICT, VKEEP, VPEDIGREE, VCYCLE, VSTATUS.
Procedures: VAIC, VALLSUBSETS, VBOOTSTRAP, VCHECK, VCRITICAL,  VFRESIDUALS, VGRAPH, VPLOT, VDEFFECTS, VDFIELDRESIDUALS, VFIXEDTESTS, VFLC, VFPEDIGREE, VFUNCTION, VHERITABILITY, VLSD, VMETA, VMCOMPARISON, VPOWER, VRACCUMULATE, VRCHECK, VRFIT, VRMETAMODEL, VSAMPLESIZE, VSCREEN, VSOM, VSPREADSHEET, VSURFACEVTCOMPARISONS, VABLOCKDESIGN, VAMETA, VAROWCOLUMNDESIGN, VASERIES, VALINEBYTESTER, VLINEBYTESTER, FCONTRASTS, FDIALLEL, AOVANYHOW.

Commands for: REML analysis of linear mixed models, Analysis of variance.

Example

" Example REML-1: Incomplete block analysis
  (from Cochran & Cox, Table 10.3, p. 406) "

FILEREAD [NAME='%gendir%/examples/REML-1.DAT']\ 
  Reps,Blocks,Treats,Yield; FGROUPS=3(yes),no

VCOMPONENTS [FIXED=Treats] RANDOM=Reps/Blocks
CALCULATE Log = LOG(Yield)
REML [PRINT=components,effects,means,stratum,vcov,monitor; PSE=diff;\ 
  PTERMS=Reps/Blocks; METHOD=Fisher; MAXCYCLE=2] Yield,Log; SAVE=Syield,Slog
VAIC

" Plot residuals for each analysis"
VPLOT [GRAPHICS=high; SAVE=Syield] fitted
VPLOT [GRAPHICS=high; SAVE=Slog] fitted
Updated on October 28, 2020

Was this article helpful?