Gives robust identification of multiple outliers in 2-way tables (J.K.M. Brown).

### Options

`PRINT` = string tokens |
Printed output required (`graph` , `table` ); default `grap` , `tabl` |
---|---|

`GRAPHICS` = string tokens |
Type of graph required (`highresolution` , `lineprinter` ); default `high` |

`SORT` = string tokens |
Sorting of printed output, in order of absolute value of median tetrad (`ascending` , `descending` , `none` ); default `none` |

### Parameters

`TABLE` = tables |
Specifies the two-way table of data |
---|---|

`ROWS` = factors |
Saves the factor classifying the table rows |

`COLUMNS` = factors |
Saves the factor classifying the table columns |

`DATA` = variates |
Saves the data values in the body of the table |

`MEDIANTETRADS` = variates |
Saves median tetrads for each cell in the table |

`RANKS` = variates |
Saves ranks of absolute values of median tetrads |

`HALFNORMALSCORES` = variates |
Saves half-Normal scores of absolute values of median tetrads |

`TESTOUTLIERS` = scalars |
Specifies the number of cells, with the highest absolute median tetrads, to be set to their predicted values before re-running the analysis |

### Description

In a table of data cross-classified by two factors, some cells may be outliers, in that they contain values substantially higher or lower than those expected from the means of the relevant rows and columns. Median tetrad analysis is a robust, single-step method of identifying several outliers in a two-way table (Bradu & Hawkins 1982).

A tetrad is calculated from four cells which form a square in the body of the table. For instance, if the cell in row *i* and column *j* has a value *c _{ij}*, the tetrad involving that cell and the cell in row

*p*and column

*q*is defined as

*t _{ij}*

_{; pq}=

*c*–

_{ij}*c*–

_{iq}*c*+

_{pj}*c*

_{pq}In a clean tetrad, none of the values *c _{iq}*,

*c*or

_{pj}*c*are themselves outliers, so the tetrad is an estimate of the amount by which

_{pq}*c*deviates from its expected value. In a contaminated tetrad, one of more of

_{ij}*c*,

_{iq}*c*or

_{pj}*c*are outliers, so a contaminated tetrad is not a reliable estimate of the deviation of

_{pq}*c*from its expectation.

_{ij}`MEDIANTETRAD`

calculates the median of all the tetrads involving each cell of the table (such that *i* ≠ *p* and *j* ≠ *q*, so the four cells in the tetrad form a square). These median tetrads are robust estimates of the deviations for each cell and therefore indicate which cells may contain outliers. The method is robust because the median will be a clean tetrad (and therefore a reliable estimate of the deviation) so long as fewer than half the tetrads involving that cell are contaminated. Furthermore, the robustness of the method allows several outliers to be detected reliably in a single step; other methods of detecting outliers may detect only a single outlier, or may require several steps, one for each outlier.

The options of `MEDIANTETRAD`

control the output. `PRINT`

has two settings. The `graph`

setting produces a plot of half-Normal scores of the median tetrads against the absolute values of the median tetrads. In the half-Normal plot, inliers (values for cells which are not outliers, with low deviations) fall on a straight line passing through the origin, while outliers (with high deviations) fall at the upper end of this line and below the level of the line. A regression line, passing through the origin, of half-Normal scores against absolute values of median tetrads, is also plotted. The setting `table`

prints the factors which classify the table, the data in the body of the table, the median tetrads, the ranks of the absolute values of the median tetrads and the half-Normal scores. The `GRAPHICS`

option controls graphical output, as a high-resolution plot (the default setting) or as a line-printer plot. The `SORT`

option controls whether the output provided by setting `PRINT=table`

is sorted in ascending order (most extreme median tetrad last), descending order, or not at all.

The `TABLE`

parameter specifies a table, classified by two factors, in which outliers are to be identified. The table may contain missing values, in which case the corresponding median tetrad is returned as a missing value. The `TABLE`

parameter must be set, while the other parameters are optional. The next six parameters save output. `ROWS`

and `COLUMNS`

save the factors which classify the table, `DATA`

saves the numerical body of the table, and `MEDIANTETRADS`

, `RANKS`

and `HALFNORMALSCORES`

save the median tetrads, their ranks and half-Normal scores respectively.

When a table has few rows (or, equivalently, few columns), a large outlier in the cell in row *i* and column *j* may cause other cells in column *j* to appear to be moderately outlying. This is bound to be a problem if the table has only two or three rows, in which case 100% or at least 50%, respectively, of tetrads involving cells in column *j* will be contaminated, so the median tetrads of those cells will be contaminated. The presence of missing values may also cause this problem to occur in larger tables, by reducing the proportion of clean tetrads. The parameter `TESTOUTLIERS`

can be used to examine the influence of suspected outliers on the deviations of other cells. When `TESTOUTLIERS`

is set to a positive integer (*m*), the analysis is run twice. In the first run, the data used is that supplied in `TABLE`

. In the second run, the cells with the highest *m* absolute median tetrads are set to values estimated from the remainder of the data (i.e. those not suspected to be outliers). If these *m* values are indeed the only notable outliers, all the data will now be inliers, so the half-Normal plot of the median tetrads will be a close fit to a straight line passing through the origin. Note that, if `TESTOUTLIERS`

is set, the output saved in the variates set by the `DATA`

, `MEDIANTETRADS`

, `RANKS`

and `HALFNORMALSCORES`

parameters will be from the second analysis, that of the modified table. If the option `GRAPHICS=highresolution`

is set in combination with a non-zero value of `TESTOUTLIERS`

, you may need to set the option “Multiple Windows” in the Windows version of Genstat Graphics in order to see the two graphs, before and after adjustment of the suspected outliers.

Options: `PRINT`

, `GRAPHICS`

, `SORT`

.

Parameters: `TABLE`

, `ROWS`

, `COLUMNS`

, `DATA`

, `MEDIANTETRADS`

, `RANKS`

, `HALFNORMALSCORES`

, `TESTOUTLIERS`

.

### Method

All proper tetrads are calculated for each cell and their median is calculated. The median tetrad for a cell with a missing value is set to a missing value. The absolute values of the median tetrads are then ranked and their half-Normal scores calculated, as described in the Procedure Library Manual for `APLOT`

. If `TESTOUTLIERS`

is set to an integer *m*>0, the cells with the highest *m* outliers are set to missing values, an analysis of variance (anova) is carried out with treatmentstructure `ROWS`

+ `COLUMNS`

(i.e. no interaction term is fitted), then the *m* cells with suspected outliers are given the appropriate fitted value saved from that anova.

### References

Bradu, D. & Hawkins, D.M. (1982). Location of multiple outliers in two-way tables, using tetrads. *Technometrics*, 24, 103-108.

### See also

Directive: `TABULATE`

.

Procedure: `DRESIDUALS`

, `RCHECK`

.

### Example

CAPTION 'MEDIANTETRAD example',\ !t('Data from Bradu & Hawkins 1982, Table 1. Prevalence rates of',\ 'men of various occupations with hearing levels 16 dB or more',\ 'above the audiometric zero at various frequencies. (There are',\ '3 suspected outliers.)'); STYLE=meta,plain FACTOR [NVALUES=49; LEVELS=7; LABELS=!t(Professionl,Farm,Clerical,\ Craftsman,Operative,Service,Labourer)] Occupation & [LABEL=!t('500 Hz','1000 Hz','2000 Hz','3000 Hz',\ '4000 Hz','6000 Hz','Nrml speech')] Frequency GENERATE Frequency,Occupation TABLE [CLASSIFICATION=Frequency,Occupation] HearTable; VALUES=!(\ 2.1, 6.8, 8.4, 1.4,14.6, 7.9, 4.8, 1.7, 8.1, 8.4, 1.4,12.0, 3.7,\ 4.5,14.4,14.8,27.0,30.9,36.5,36.4,31.4,57.4,62.4,37.4,63.3,65.5,\ 65.6,59.8,66.2,81.7,53.3,80.7,79.7,80.8,82.4,75.2,94.0,74.5,87.9,\ 93.3,87.8,80.5, 4.1,10.2,10.7, 5.5,18.1,11.4, 6.1) MEDIANTETRAD [PRINT=graph,table; SORT=descending] HearTable; ROWS=Freq;\ COLUMNS=Occup; DATA=Hearing; TEST=3