Analyses a simple `REML`

variance components model for outliers using a variance shift outlier model (S.J. Welham, F.N. Gumedze & D.B. Baird).

### Options

`PRINT` = string tokens |
Specifies the output to be produced (`fdr` , `outliers` ); default `fdr` , `outl` |
---|---|

`VPRINT` = string tokens |
Controls the output from the `REML` analysis of the baseline model (`model` , `components` , `effects` , `means` , `stratumvariances` , `monitoring` , `vcovariance` , `deviance` , `Waldtests` , `missingvalues` , `covariancemodels` ); default `mode` , `comp` , `Wald` , `cova` |

`PLOT` = string tokens |
Controls which plots are produced (`indexplots` , `residual` ); default `inde` , `resi` |

`INDEXPLOT` = string tokens |
Selects the index plots to produce (`omega` , `sigma2` , `tsquared` , `lrt` , `method` , `all` ); default `meth` |

`TERM` = formula |
Random term to scan for outliers; default is the residual term |

`METHOD` = string token |
Method for calculating the statistics used to indicate an outlier (`full` , `partial` , `t` ); default `t` |

`THRMETHOD` = string token |
Method for obtaining the threshold statistics (`approximate` , `bootstrap` ); default `appr` for `METHOD` =`full` and `boot` otherwise |

`NBOOT` = scalar |
Number of bootstrap samples to take to form the threshold statistics; default `99` for `METHOD` =`full` and `499` otherwise |

`FIXED` = formula |
Fixed model terms |

`RANDOM` = formula |
Random model terms |

`CONSTANT` = string token |
How to treat the constant term (`estimate` , `omit` ); default `esti` |

`FACTORIAL` = scalar |
Limit on the number of factors or covariates in each fixed term; default `3` |

`VCONSTRAINTS` = string token |
How to constrain the variance components and the residual variance (`none` , `positive` , `fixrelative` , `fixabsolute` ); default `posi` |

`INITIAL` = variate |
Initial values for the variance components; default `1` |

`SEED` = scalar |
Seed for random number generation; default 0 continues an existing sequence or, if none, selects a seed automatically |

`SAVEITEMS` = string tokens |
Selects the items to save (`residuals` , `omega` , `sigma2` , `gamma` , `tsquared` , `lrt` , `fdr` , `approxthresholds` , `thresholdstats` , `outliers` , `method` , `all` ); default `resi` , `omeg` , `sigm` , `meth` , `fdr` , `outl` |

### Parameters

`Y` = variates |
Response variates |
---|---|

`TITLE` = texts |
Specifies the title or titles to use for the plots |

`SAVE` = pointers |
Saves information from the analysis of each y-variate |

### Description

`VSOM`

uses a mixed-model analysis with a variance shift outlier model (VSOM) to search for potential outliers. By default, the VSOM is used to assess the residuals. However, you can set the `TERM`

option to a random term in the analysis, to assess its effects: i.e. to see whether any of the groups of observations defined by the random term seem to be aberrant. The model defines an extra component of variation for each unit (an individual or a group), in turn, and estimates the extra variance associated with it. The `METHOD`

option specifies how the extra variance is estimated, with the following settings.

`full` |
refits the full model with the added variance term for each unit; this can be very time-consuming. |
---|---|

`partial` |
approximates the change in likelihood by a partial likelihood, where the baseline model parameters are held fixed, and only the extra variance component for each unit is estimated; this is much faster than re-estimating the full model. |

`t` |
uses the squared t-statistics (i.e. squared standardized residuals) to approximate the change in likelihood (default); this is the fastest approach. |

To assess whether a unit is outside its expected distribution, thresholds are calculated at various levels of significance. The `THRMETHOD`

option specifies the method to use:

`approximate` |
uses the asymptotic distribution to calculate the thresholds; and |
---|---|

`bootstrap` |
uses parametric bootstrap samples, with the variance components in the baseline model, to calculate the thresholds from the percentiles of the order statistics. |

Each bootstrap sample is formed by taking the sum of the fitted fixed effects from the baseline model, together with simulated effects for the random terms in the model. Each random effect is simulated by Normal random numbers, with a mean of zero and the variance that was estimated for that term in the baseline model. The `NBOOT`

option defines how many random samples to perform; the default is `99`

for `METHOD`

=`full`

, and `499`

otherwise. The `SEED`

option specifies the seed for the random number generator, used by the `GRNORMAL`

function to make the bootstrap samples. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically from the computer clock. If you repeat the analysis with the same (non-zero) seed, you will get the same random numbers, and hence the same results.

The `FIXED`

and `RANDOM`

options specify the fixed and random terms to be fitted in the analysis and the `FACTORIAL`

option sets a limit on the number of factors and variates allowed in each fixed term. If neither `FIXED`

nor `RANDOM`

is specified, their settings are taken from the most recent `VCOMPONENTS`

command. Its `FACTORIAL`

setting is also taken if `VCOMPONENTS`

is providing the fixed model. A fault is given if neither a fixed nor a random model is supplied. Note that the analysis cannot handle covariance models (which would be specified by the `VSTRUCTURE`

directive). The `VCONSTRAINTS`

option specifies constraints on the variance components, using the same settings as the `CONSTRAINTS`

parameter of `VCOMPONENTS`

. The `CONSTANT`

option allows you to omit the constant.

Printed output is controlled by the `PRINT`

option, with the following settings:

`outliers` |
prints a summary of the potential outliers, as measured against the threshold statistics, at various levels of significance; and |
---|---|

`fdr` |
prints the estimated false discovery rates for the potential outliers. |

The false discovery rates (FDR) are estimated from the distribution of p-values calculated with the *t*-statistics from the asymptotic model. This uses the `FDRMIXTURE`

procedure, or else the `FDRBONFERRONI`

procedure if that fails. The FDR estimates the probability that the outlier is generated by noise. If this is small, it is likely that the outlier is genuine. However, if it is larger than 0.5, there is more chance that it was generated by noise. The FDR probabilities do not allow for correlations between the estimates. So, if there are only 2-3 replicates of the fixed terms, these may be too small, and should be interpreted with caution.

The `VPRINT`

option controls the output from the `REML`

analysis of the baseline model (as specified by the `FIXED`

and `RANDOM`

options). This has the same settings and default as the `PRINT`

option of `REML`

.

Graphical output is controlled by the `PLOT`

option, with the following settings.

`residual` |
when `TERM` is set, the `DRESIDUALS` procedure is used to plot histograms and Normal plots of the specified random effects; when `TERM` is not set, `DRESIDUALS` is used to plot histograms and Normal plots of the residuals together with a plot of the residuals against the fitted values. |
---|---|

`indexplots` |
plots the statistics, selected by the `INDEXPLOT` option, against their index (i.e. their position in the y-variate). |

For `residual`

and `indexplots`

, points are plotted in red if they are greater than their 5% bootstrap threshold, and in purple or green if greater than the 1% or 5% asymptotic thresholds respectively. The index plot also displays reference lines for the order statistics (OS 1, OS 2…) when `THRMETHOD`

=`bootstrap`

, or the 5%, 1% and 0.1% and 0.01% asymptotic thresholds when `THRMETHOD`

=`approximate`

.

The plots that are produced as components of the index plot can be controlled by the `INDEXPLOT`

option, with the following settings:

`omega` |
variance shift as a ratio to the residual variance, |
---|---|

`sigma2` |
estimated residual variance under VSOM, |

`tsquared` |
squared t-statistic, |

`lrt` |
likelihood ratio test, |

`method` |
the statistic associated with the setting of the `METHOD` option, i.e. `lrt` for `full` or `partial` , and `tsquared` for `t` (default), and |

`all` |
all the statistics. |

The `Y`

parameter specifies the response variate. The `TITLE`

parameter can supply a text, with either one or three values, to label the graphs. If the text has a single value, this is used to prefix the standard descriptions for the three graphs. If it has three values, these give (in full) the titles for the `comparison`

, `indexplots`

, `residual`

plots, respectively.

The `SAVE`

parameter can save a pointer containing variates, storing the statistics calculated for each group or individual. The labels of the pointer, and the corresponding statistics, are as follows:

`'residuals'` |
the standardized residuals, |
---|---|

`'omega'` |
the variance shift as a ratio to the residual variance, |

`'sigma2'` |
the estimated residual variance under VSOM, |

`'gamma'` |
the estimated variance component for `TERM` under VSOM, |

`'tsquared'` |
the squared t-statistic, |

`'LRT'` |
the partial likelihood ratio test if `THRMETHOD` =`partial` or the full likelihood ratio test otherwise, |

`'method'` |
the statistic associated with the setting of the `METHOD` option (`lrt` for `full` or `partial` , and `tsquared` for `t` ), |

`'FDR'` |
the false discovery rate base on the t-statistics, |

`'approxthresholds'` |
the approximate thresholds used to indicate significant departures, |

`'thresholdstats'` |
the 95 percentiles of the order statistics from the bootstrap samples in decreasing order, and |

`'outliers'` |
the unit numbers of outliers above the thresholds. |

The `SAVEITEMS`

option controls which of the above items are saved.

Options: `PRINT`

, `VPRINT`

, PLOT, `INDEXPLOT`

, `RTERM`

, `METHOD`

, `THRMETHOD`

, `NBOOT`

, `FIXED`

, `RANDOM`

, `CONSTANT`

, `FACTORIAL`

, `VCONSTRAINTS`

, `INITIAL`

, `SEED`

, `SAVEITEMS`

.

Parameters: `Y`

, `TITLE`

, `SAVE`

.

### Method

`VSOM`

uses the method of Gumedze *et al.* (2010).

### Action with `RESTRICT`

The `Y`

parameter can be restricted. All output estimates will then be based only on the unrestricted units.

### Reference

Gumedze, F.N., Welham, S.J., Gogel, B.J. & Thompson, R. (2010). A variance shift model for detection of outliers in the linear mixed model. *Computational Statistics and Data Analysis*, 54, 2128-2144.

### See also

Directives: `REML`

, `VCOMPONENTS`

, `VSTRUCTURE`

.

Procedure: `VCHECK`

, `VRCHECK`

, `VPLOT`

, `VDFIELDRESIDUALS`

, `VFRESIDUALS`

, `DRESIDUALS`

. `FDRBONFERRONI`

, `FDRMIXTURE`

.

Commands for: REML analysis of linear mixed models.

### Example

CAPTION 'VSOM examples',\ !T('Cambridge Filter data (Wagner & Thaggard 1979):',\ 'Nicotine extracted from pads at 14 laboratories'); STYLE=meta,plain SPLOAD [PRINT=*] '%EXAMPLES%/CambridgeFilterData.gsh' "Check residual term - individual samples for outliers" VSOM [METHOD=t; FIXED=Sample; RANDOM=Laboratory; SEED=7643] Nicotine;\ TITLE='Cambridge Filter data' "Check laboratory term for outliers" VSOM [METHOD=full; FIXED=Sample; RANDOM=Laboratory; TERM=Laboratory]\ Nicotine; TITLE='Cambridge filter data by laboratory' CAPTION 'Slate Hall spring wheat trial (Kempton & Fox 1997)'; STYLE=plain SPLOAD [PRINT=*] '%DATA%/SlateHall.gsh' "Check residual term - individual plots for outliers" VSOM [PRINT=; VPRINT=*; PLOT=#; INDEXPLOT=all; FIXED=variety;\ RANDOM=fieldrow*fieldcolumn; METHOD=Partial; NBOOT=199;\ SAVEITEMS=residuals,omega,fdr] yield; TITLE='Slate Hall'; SAVE=results "Test fieldcolumn effects for outliers" VSOM [FIXED=variety; RANDOM=fieldrow*fieldcolumn; TERM=fieldcolumn]\ yield; TITLE='Slate Hall by field column'