### 1. Highlights

● produced in 2006

● 2 new directives, 64 new procedures and 2 new functions

● exact and permutation tests for regression and generalized linear models, analysis of variance and t-tests (see `APERMTEST`

, `RPERMTEST`

, `AONEWAY`

and `TTEST`

)

● two-straight-line, broken-stick or split-line models (see `R2LINES`

)

● linear functional relationship models (see `RLFUNCTIONAL`

)

● complex surveys (see `SVCALIBRATE`

, `SVREWEIGHT`

, `SVTABULATE`

and `SVWEIGHT`

)

● tabulation of standard deviations (see `TABULATE`

)

● tables of modes (see `TABMODE`

)

● analysis of multitiered (multi-phased) experiments (see AMTIER and `AMTDISPLAY`

)

● exploration, analysis and visualization of microarray data from either two-colour or Affymetrix slides (see `AFFYMETRIX`

, `MA2CLUSTER`

, `MABGCORRECT`

, `MACALCULATE`

, `MAEBAYES`

, `MAESTIMATE`

, `MAHISTOGRAM`

, `MAPCLUSTER`

, `MAPLOT`

, `MARMA`

, `MAROBUSTMEANS`

, `MASCLUSTER`

, `MASHADE`

, `MAVDIFFERENCE`

, `MAVOLCANO`

and `MNORMALIZE`

)

● designs for two-colour microarray experiments: loop and reference-level designs and balanced-incomplete-block designs for any number of treatments in blocks of size 2 (see `AGLOOP`

, `AGREFERENCE`

, `AGBIB`

and `MADESIGN`

)

● screening tests for unbalanced designs with several error terms (see `ASCREEN`

)

● Kendall’s rank correlation coefficient τ (see `KTAU`

and `PRKTAU`

)

● Cochran’s *Q* test for differences between related-samples (see `QCOCHRAN`

)

● diversity statistics (see `ECDIVERSITY`

)

● rank/abundance, *ABC* and *k*-dominance plots (see `ECABUNDANCEPLOT`

)

● 10 additional similarity measures, including Dice, Canberra, Bray-Curtis and Minkowski (see `FSIMILARITY`

)

● analysis of similarities i.e. *ANOSIM* (see `ECANOSIM`

)

● modelling of species abundance data (see `ECFIT`

, `ECNICHE`

and `ECRAREFACTION`

)

● Lorenz curves, Gini and asymmetry coefficients to assess the evenness of distributions (see `LORENZ`

)

● analysis of the clustering of events in space and time (see `DKSTPLOT`

, `KSTHAT`

, `KSTMCTEST`

, `KSTSE`

and `PTK3D`

)

● new, more efficient implementation of hierarchical generalized linear models, extending the facilities to allow random effects in the dispersion models (i.e. double hierarchical generalized linear models), predictions and correlation structures for random terms (see `HGANALYSE`

, `HGDRANDOMMODEL`

, `HGPREDICT`

and `HGRANDOMMODEL`

)

● Bayesian computing using the Differential Evolution Markov Chain algorithm (see `DEMC`

)

● Median polishing of two-way data (see `MPOLISH`

)

● Quantile normalization (see `QNORMALIZE`

)

● Tukey biweight algorithm (see `TUKEYBIWEIGHT`

)

● basis functions for natural cubic splines and thin-plate splines (see `NCSPLINE`

and `THINPLATE`

)

● formation of all partitions of a set of objects (see `SETALLOCATIONS`

).

● ability to include “typesetting” commands in textual strings to define Greek letters and mathematical symbols to appear in the output (see `PRINT`

)

● ability to print variates, matrices and tables with different numbers of decimals in different cells (see `PRINT`

)

● ability to use alternative labels for data structures instead of their identifiers in output (see `DUMMY`

, `EXPRESSION`

, `FACTOR`

, `FORMULA`

, `MATRIX`

, `POINTER`

, `SCALAR`

, `SYMMETRICMATRIX`

, `TABLE`

, `TEXT`

and `VARIATE`

)

### 2. What’s new

**2.1 Directives**

`FAULT`

checks whether to issue a diagnostic, i.e. a fault, warning or message.

`SETALLOCATIONS`

runs through all ways of allocating a set of objects to subsets.

**2.2 Procedures**

`AFFYMETRIX`

estimates expression values for Affymetrix slides.

`AGLOOP`

generates loop designs e.g. for time-course microarray experiments.

`AGNATURALBLOCK`

forms 1- and 2-dimensional designs with blocks of natural size.

`AGREFERENCE`

generates reference-level designs e.g. for microarray experiments.

`AMTDISPLAY`

displays further output for multitiered designs analysed by `AMTIER`

.

`AMTIER`

analyses a multitiered design by analysis of variance specified by up to 3 model formulae.

`APERMTEST`

does random permutation tests for analysis-of-variance tables.

`ASCREEN`

performs screening tests for designs with orthogonal block structure.

`A2DISPLAY`

provides further output following an analysis of variance by `A2WAY`

.

`A2KEEP`

copies information from an `A2WAY`

analysis into GenStat data structures.

`A2WAY`

performs analysis of variance of a balanced or unbalanced design with up to two treatment factors.

`DEMC`

performs Bayesian computing using the Differential Evolution Markov Chain algorithm.

`DKSTPLOT`

produces diagnostic plots for space-time clustering.

`DMADENSITY`

plots the empirical CDF or PDF (kernel smoothed) by groups.

`ECABUNDANCEPLOT`

produces rank/abundance, *ABC* and *k*-dominance plots.

`ECANOSIM`

performs an analysis of similarities (ANOSIM).

`ECDIVERSITY`

calculates measures of diversity with jackknife or bootstrap estimates.

`ECFIT`

fits models to species abundance data.

`ECNICHE`

generates relative abundance of species for niche-based models.

`ECRAREFACTION`

calculates individual or sample-based rarefaction.

`F2DRESIDUALVARIOGRAM`

calculates and plots a 2-dimensional variogram from a 2-dimensional array of residuals.

`HGDRANDOMMODEL`

defines the random model in a hierarchical generalized linear model for the dispersion model of a double hierarchical generalized linear model.

`HGPREDICT`

forms predictions from hierarchical or double hierarchical generalized linear model analysis.

`KCROSSVALIDATION`

computes cross validation statistics for punctual kriging.

`KSTHAT`

calculates an estimate of the K function in space, time and space-time.

`KSTMCTEST`

performs a Monte-Carlo test for space-time interaction.

`KSTSE`

calculates the standard error for the space-time K function.

`KTAU`

calculates Kendall’s rank correlation coefficient τ.

`LORENZ`

plots the Lorenz curve and calculates the Gini and asymmetry coefficients.

`MAANOVA`

does analysis of variance for a single-channel microarray design.

`MABGCORRECT`

performs background correction of Affymetrix slides.

`MACALCULATE`

corrects and transforms two-colour microarray differential expressions.

`MADESIGN`

assesses the efficiency of a two-colour microarray design.

`MAEBAYES`

modifies t-values by an empirical Bayes method.

`MAESTIMATE`

estimates treatment effects from a two-colour microarray design.

`MAHISTOGRAM`

plots histograms of microarray data.

`MAPCLUSTER`

clusters probes or genes with microarray data.

`MAPLOT`

produces two-dimensional plots of microarray data.

`MARMA`

calculates Affymetrix expression values.

`MAROBUSTMEANS`

does a robust means analysis for Affymetrix slides.

`MASCLUSTER`

clusters microarray slides.

`MASHADE`

produces shade plots to display spatial variation of microarray data.

`MAVDIFFERENCE`

applies the average difference algorithm to Affymetrix data.

`MAVOLCANO`

produces volcano plots of microarray data.

`MA2CLUSTER`

performs a two-way clustering of microarray data by probes (or genes) and slides.

`MNORMALIZE`

normalizes two-colour microarray data.

`MPOLISH`

performs a median polish of two-way data.

`NCSPLINE`

calculates natural cubic spline basis functions (for use e.g. in `REML`

).

`PRKTAU`

calculates probabilities for Kendall’s rank correlation coefficient τ.

`PTK3D`

performs kernel smoothing of space-time data.

`QCOCHRAN`

performs Cochran’s *Q* test for differences between related-samples.

`QFACTOR`

allows the user to decide to convert texts or variates to factors.

`QNORMALIZE`

performs quantile normalization.

`RLFUNCTIONAL`

fits a linear functional relationship model.

`RPERMTEST`

does random permutation tests for regression or generalized-linear-model analyses.

`RXGENSTAT`

Submits a set of commands externally to R and readS the output.

`R2LINES`

fits two-straight-line (broken-stick) models to data.

`SVCALIBRATE`

performs generalized calibration of survey data.

`SVREWEIGHT`

modifies survey weights, adjusting other weights to ensure that their overall sum remains unchanged.

`SVTABULATE`

tabulates data from random surveys, including multistage surveys and surveys with unequal probabilities of selection.

`SVWEIGHT`

forms survey weights.

`TABMODE`

forms summary tables of modes of values.

`THINPLATE`

calculates the basis functions for thin-plate splines.

`TUKEYBIWEIGHT`

estimates means using the Tukey biweight algorithm.

The following spatial-analysis procedures, previously available only in a supplementary library, have also now been incorporated into the main library.

`FHAT`

calculates an estimate of the F nearest-neighbour distribution function.

`FZERO`

gives the F function expectation under complete spatial randomness.

`GHAT`

calculates an estimate of the G nearest-neighbour distribution function.

`GRCSR`

generates completely spatially random points in a polygon.

`KCSRENVELOPES`

simulates K function bounds under complete spatial randomness.

`KHAT`

calculates an estimate of the K function.

`KLABENVELOPES`

gives bounds for K function differences under random labelling.

`KSED`

calculates the standard error for K function differences under random labelling.

`KTORENVELOPES`

gives bounds for the bivariate K function under independence.

`K12HAT`

calculates an estimate of the bivariate K function.

`MSEKERNEL2D`

estimates the mean square error for a kernel smoothing.

`PTAREAPOLYGON`

calculates the area of a polygon.

`PTGRID`

generates a grid of points in a polygon.

`PTINTENSITY`

calculates the overall density for a spatial point pattern.

`PTKERNEL2D`

performs kernel smoothing of a spatial point pattern.

`PTSINPOLYGON`

returns points inside or outside a polygon.

**2.3 Functions**

`GRSAMPLE` |
random sampling with replacement |
---|---|

`GRSELECT` |
random sampling without replacement |

### 3. What’s changed

Most of the changes are completely compatible with Release 8, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated (see Section 1.7.1 of Part 1 of the *Guide to the GenStat Command Language* for details). To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.

Any command, where changes in Release 9 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol ^{†}. The full details are given in Section 3.4.

**3.1 Directives**

`CAPTION`

can now print “notes”.

`CLUSTER`

can save the criterion values, the group means and the group predictors (from maximal predictive classification).

`FIT`

and the other regression model-fitting directives (`ADD`

, `DROP`

and `SWITCH`

) have an additional setting, `ignore`

, of the `CONSTANT`

option to omit the constant but ignore it when assessing marginality constraints.

`GET`

can now save the current settings for the printing of captions and typesetting in output or graphics.

`PROCEDURE`

has an additional setting of the `RESTORE`

option to restore the setting for the printing of captions.

`SET`

can now control which captions are printed and whether typesetting takes place in output or graphics.

`TERMS`

has a new option `RIDGE`

to supply a constant to add to the diagonal of the sums-of-squares-and-products matrix, to allow ridge methods to be used in regression and generalized linear models.

`DUMMY`

, `EXPRESSION`

, `FACTOR`

, `FORMULA`

, `MATRIX`

, `POINTER`

, `SCALAR`

, `SYMMETRICMATRIX`

, `TABLE`

, `TEXT`

and `VARIATE`

all have an extra option `IPRINT`

to specify these data structures will be identified in output. If `IPRINT`

is not set, they will be identified in whatever way is usual for the section of output is concerned. For example, the `PRINT`

directive generally uses their identifiers (although this can be changed using the `IPRINT`

option of `PRINT`

itself), while the `ANOVA`

directive will print the identifier and the extra text for each y-variate. So, for example, if you set `IPRINT=extra`

, the “extra text” (defined by the `EXTRA`

parameter of the directive) for the data structure will be used instead of its identifier – thus allowing complete freedom in the way that it is labelled.

**3.2 Procedures**

`AGBIB`

can now construct balanced-incomplete-block designs for any number of treatments in blocks of size 2.

^{†}`APLOT`

can plot residuals from any stratum (error term), not just from the final residual term, and the user can supply an overall title (to use instead of the identifier of the y-variate).

^{†}`BOOTSTRAP`

can use a block formula to perform the randomizations used in a permutation test.

`DECIMALS`

extended to allow different numbers of decimals to be determined for each unit of a data structure.

`FIELLER`

now uses as t distribution when the dispersion parameter is not fixed.

^{†}`HGANALYSE`

can now analyse double hierarchical generalized linear models.

`GENPROCRUSTES`

can now scale each configuration prior to the analysis so that the trace of each matrix is one. It can also produce plots of the variable projections, and of the consensus with and without the individuals, and you can now control how many roots you wish to display or save.

`MANNWHITNEY`

can print the median difference between the samples with confidence limits.

`RMGLM`

can ignore the constant when assessing marginality constraints (see `FIT`

), it can include units with missing values in the explanatory factors and variates, and it can save a regression save structure.

`TTEST`

can provide probabilities using permutation tests or (when feasible) exact tests.

^{†}`VPLOT`

the user can supply an overall title (to use instead of the identifier of the y-variate).

In addition, the nonparametric procedures now ignore missing values (instead of failing, they now print a warning message). The Library procedures have also been modified where possible to use the new ability to use different numbers of decimal places for different units of a variate, matrix or table, and to suppress irrelevant captions.

**3.3 Functions**

`CHARACTERS` |
now has a second argument to specify whether to return the raw length of the string (without checking for any typesetting commands) or the formatted length (taking account of typesetting commands); see `PRINT` for more information. |
---|

**3.3 Incompatibilities**

`APLOT` procedure |
option `STRATUM` inserted before `GRAPHICS` , and option `TITLE` inserted before `SAVE` . |
---|---|

`GENPROCRUSTES` procedure |
options `NROOTS` , `PLOT` and `NDROOTS` inserted before `TOLERANCE` . |

`HGANALYSE` procedure |
many new options; also `DECIMALS` option deleted, `LMETHOD` option replaced by `DMETHOD` option (`LMETHOD` now used to choose between exact likelihood or extended quasi likelihood), `LAPLACEORDER` option renamed `DLAPLACEORDER` , `NCYCLE` option replaced by `MAXCYCLE` . |

`HGDISPLAY` procedure |
`DECIMALS` option deleted, `LMETHOD` option replaced by `DMETHOD` option. |

`HGKEEP` procedure |
`DHGRANDOMTERM` parameter inserted before `RESIDUALS` . |

`HGPLOT` procedure |
`RMETHOD` option inserted before `INDEX` . |

`VPLOT` procedure |
option `TITLE` inserted before `SAVE` . |