### 1. Highlights

● produced in 2002

● 7 new directives, 50 new procedures and 40 new functions

● the limitation of no more than 31 factors or variates in analysis of variance and in model formulae in regression has been removed

● Boolean calculations on sets (`SETCALCULATE`

, `SETRELATE`

)

● operations on a new tree data structure (`BCONSTRUCT`

, `BGRAPH`

, `BPRINT`

, `BPRUNE`

and `BIDENTIFY`

)

● classification trees (`BCLASSIFICATION`

, `BCDISPLAY`

, `BCIDENTIFY`

and `BCVALUES`

), identification keys (`BKEY`

, `BKIDENTIFY`

and `BPRINT`

) and regression trees (`BREGRESSION`

, `BRDISPLAY`

, `BRPREDICT`

and `BRVALUES`

).

● hierarchical generalized linear models (`HGANALYSE`

, `HGDISPLAY`

, `HGFIXEDMODEL`

, `HGKEEP`

, `HGPLOT`

and `HGRANDOMMODEL`

)

● estimation of the aggregation parameter for the negative binomial distribution in a generalized linear model (`RNEGBINOMIAL`

)

● Latin squares balanced for carry-over effects (`AFCARRYOVER`

, `AGCROSSOVERLATIN`

)

● stacking and unstacking of variates and factors (`STACK`

, `UNSTACK`

)

● plots of probability distributions (`DPROBABILITY`

)

### 2. What’s new

**2.1 Directives**

`BCUT`

cuts a tree at a defined node, discarding nodes and information below it.

`BJOIN`

extends a tree by joining another tree to a terminal node.

`BGROW`

adds new branches to a node of a tree.

`SETCALCULATE`

performs Boolean set calculations on the contents of vectors.

`SETRELATE`

compares two sets of values in two data structures.

`SET2FORMULA`

forms a model formula using structures supplied in a pointer.

`TREE`

declares a tree, & initializes it to have a single node known as the root.

**2.2 Procedures**

`AFCARRYOVER`

forms factors to represent carry-over effects in cross-over trials.

`AFIELDRESIDUALS`

display residuals in field layout.

`AGCROSSOVERLATIN`

generates Latin squares balanced for carry-over effects.

`ALLPAIRWISE`

performs a range of all pairwise multiple comparison tests.

`AMMI`

allows exploratory analysis of genotype × environment interactions.

`AUKEEP`

saves output from analysis of an unbalanced design (by `AUNBALANCED`

).

`BCDISPLAY`

displays a classification tree.

`BCIDENTIFY`

identifies specimens using a classification tree.

`BCLASSIFICATION`

constructs a classification tree.

`BCONSTRUCT`

constructs a tree.

`BCVALUES`

forms values for nodes of a classification tree.

`BGRAPH`

plots a tree.

`BKDISPLAY`

displays an identification key.

`BKEY`

constructs an identification key.

`BKIDENTIFY`

identifies specimens using a key.

`BPRINT`

displays a tree.

`BPRUNE`

prunes a tree using minimal cost complexity.

`BRDISPLAY`

displays a regression key.

`BREGRESSION`

constructs a regression tree.

`BRPREDICT`

makes predictions using a regression tree.

`BRVALUES`

forms values for nodes of a regression tree.

`DCOMPOSITIONAL`

plots 3-part compositional data within a barycentric triangle.

`DMASS`

plots discrete data like mass spectra, discrete probability functions.

`DPROBABILITY`

creates a probability distribution plot of the values in a variate.

`FACDIVIDE`

represents a factor by factorial combinations of a set of factors.

`FBASICCONTRASTS`

breaks a model term down into its basic contrasts.

`FFRAME`

forms multiple windows in a plot-matrix for high-resolution graphics.

`FHADAMARDMATRIX`

forms Hadamard matrices.

`FITINDIVIDUALLY`

fits regression models one term at a time.

`FMFACTORS`

forms a pointer of factors representing a multiple-response.

`FPROJECTIONMATRIX`

forms a projection matrix for a set of model terms.

`GSTATISTIC`

calculates the gamma statistic of agreement for ordinal data.

`HGANALYSE`

analyses data using hierarchical generalized linear models.

`HGDISPLAY`

displays a hierarchical generalized linear model analysis.

`HGFIXEDMODEL`

defines the fixed model for a hierarchical generalized linear model.

`HGKEEP`

saves information from a hierarchical generalized linear model analysis.

`HGPLOT`

produces model-checking plots for a hierarchical generalized linear model analysis.

`HGRANDOMMODEL`

defines the random model for a hierarchical generalized linear model.

`JOIN`

joins or merges two sets of vectors together, based on classifying keys.

`KERNELDENSITY`

uses kernel density estimation to estimate a sample density.

`MTABULATE`

forms tables classified by multiple-response factors.

`MVFILL`

replaces missing values in a vector with the previous non-missing value.

`PRMANNWHITNEYU`

calculates probabilities for the Mann-Whitney U statistic.

`QLIST`

gets the user to select a response interactively from a list.

`REPPERIODOGRAM`

gives periodogram-based analyses for replicated time series.

`RNEGBINOMIAL`

fits a negative binomial GLM estimating the aggregation parameter.

`RSEARCH`

helps search through models for a regression or generalized linear model.

`STACK`

combines several data sets by “stacking” the corresponding vectors.

`UNSTACK`

splits vectors into individual vectors according to levels of a factor.

`XOEFFICIENCY`

calculates efficiency of estimating effects in cross-over designs.

**2.2 Functions**

Summary functions

`KURTOSIS(x)` |
Kurtosis of the non-missing values in `x` . |
---|---|

`SD(x)` |
Standard deviation of the non-missing values in `x` . |

`SEMEAN(x)` |
Standard error of the mean of the non-missing values in `x` . |

`SKEWNESS(x)` |
Skewness of the non-missing values in `x` . |

`PAREA(y;x)` |
Area of a polygon with vertices specified by `y` and `x` . |

Transformations

`BETA(a;b;x)` |
Beta function (incomplete if `x` set, otherwise complete). |
---|---|

`COSH(x)` |
Hypobolic cosine of `x` . |

`FRACTION(x)` |
Fractional part of `x` i.e. `x-SIGN(X)*INTEGER(x)` . |

`RANK(x)` |
Ranks of the values in `x` . |

`SIGN(x)` |
Sign of `x` (-1, 0 or 1 for `x` <0, `x` ==0 or `x` >0 respectively). |

`SINH(x)` |
Hypobolic sine of `x` . |

`TANH(x)` |
Hypobolic tangent of `x` . |

Matrix functions

`COLCENTRE(x)` |
Centres the columns of matrix `x` by subtracting their means. |
---|---|

`COLMEANS(x)` |
Mean of the non-missing elements of each row of matrix `x` . |

`COLNOBSERVATIONS(x)` |
Number of non-missing elements in each column of matrix `x` . |

`COLSUMS(x)` |
Sum of the non-missing elements of each column of matrix `x` . |

`EVALUES(x)` |
Eigenvalues of `x` (as a diagonal matrix). |

`EVECTORS(x)` |
Eigenvectors of `x` (as a rectangular matrix). |

`GINVERSE (x)` |
Moore-Penrose generalized inverse of `x` . |

`LSVECTORS(x)` |
Matrix of vectors from the left-hand side of a singular-value decomposition of `x` . |

`MAT0` |
Synonym of `MZERO` . |

`ROWCENTRE(x)` |
Centres the rows of matrix `x` by subtracting their means. |

`ROWMEANS(x)` |
Mean of the non-missing elements of each row of matrix `x` . |

`ROWNOBSERVATIONS(x)` |
Number of non-missing elements in each row of matrix `x` . |

`ROWSUMS(x)` |
Sum of the non-missing elements of each row of matrix `x` . |

`RSVECTORS(x)` |
Matrix of vectors from the right-hand side of a singular-value decomposition of x. |

`SVALUES(x)` |
Singular values of `x` (as a diagonal matrix). |

Probability functions

`CLINVNORMAL(x;m;v)` |
Cumulative lower probability for an inverse Normal (or inverse Gaussian) distribution with mean `m` and variance `v` . |
---|---|

`CUINVNORMAL(x;m;v)` |
Cumulative upper probability for an inverse Normal (or inverse Gaussian) distribution with mean `m` and variance `v` . |

`EDINVNORMAL(p;m;v)` |
Equivalent deviate corresponding to cumulative lower probability `p` for an inverse Normal (or inverse Gaussian). distribution with mean `m` and variance `v` . |

`PRINVNORMAL(x;m;v)` |
Probability density function for an inverse Normal (or inverse Gaussian) distribution with mean `m` and variance `v` . |

Vector functions

`VKURTOSIS(p)` |
Kurtosis of the non-missing values in each unit of the variates (or scalars) in pointer `p` . |
---|---|

`VPOSITIONS(x;p)` |
Gives the suffix of the first vector in the pointer `p` . containing the value in each unit of the variate or text `x` . |

`VSD(x)` |
Standard deviation of the non-missing values in each unit of the variates (or scalars) in pointer `p` . |

`VSEMEANS(x)` |
Standard error of the mean of non-missing values in each unit of the variates (or scalars) in pointer `p` . |

`VSKEWNESS(x)` |
Skewness of the non-missing values in each unit of the variates (or scalars) in pointer `p` . |

Table functions

`TKURTOSIS(x)` |
Forms margins containing the kurtosis of the cells in table `t` . |
---|---|

`TSD(t)` |
Forms margins of between-cell standard deviations for table `t` . |

`TSEMEANS(t)` |
Forms margins of standard errors for between-cell means of table `t` . |

`TSKEWNESS(x)` |
Forms margins containing the skewness of the cells in table `t` . |

Tree functions

`BBELOW(t;n;m)` |
provides a variate containing numbers of all the nodes below node `n` of tree `t` ; if `m` =1 this gives only the terminal nodes below `n` , otherwise it includes internal nodes as well. |
---|---|

`BBRANCHES(t;n)` |
provides a variate containing the numbers of the branches taken on the path to node `n` in tree `t` (the result is of the same length as the results of the `BPATH` function, and includes missing value as the final element, corresponding to `n` itself). |

`BDEPTH(t;x)` |
calculates the depths of nodes `x` in tree `t` . |

`BMAXNODE(t)` |
provides the maximum node number in tree `t` . |

`BNBRANCHES(t;x)` |
provides the number of branches below nodes `x` in tree `t` (0 for any than are terminal nodes). |

`BNEXT(t;x;y)` |
finds the numbers of the nodes on branches `y` from nodes `x` in tree `t` (returning a missing value for any terminal node). |

`BNNODES(t)` |
provides the number of nodes in tree `t` . |

`BPATH(t;n)` |
provides a variate containing the numbers of the nodes on the branch to node `n` in tree `t` (includes `n` itself as the final element). |

`BPREVIOUS(t;x)` |
finds the numbers of the nodes immediately above nodes `x` in tree `t` (or a missing value if a node is the root of the tree). |

`BSCAN(t;x)` |
finds the numbers of the nodes immediately after nodes `x` in tree `t` in an standard branch-by-branch order that visits each node once (or a missing value for the node that is the last one in the tree). |

`BTERMINAL(t;x)` |
finds the next terminal nodes after nodes `x` in tree `t` (or a missing value for the node that is the last terminal node). |

### 3. What’s changed

Most of the changes are compatible with Release 4.2, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.

Any command, where changes in Release 6 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol ^{†}. The full details are given in Section 3.4.

**3.1 Directives**

^{†}`AKEEP`

directive has a new option `RMETHOD`

to control the type of residual that is saved. It also has seven new parameters. `SEDMEANS`

saves a symmetric matrix containing standard errors for comparisons between every pair of entries in the table of means. `VCMEANS`

saves a symmetric matrix containing variances and covariances of means. `SECBMEANS`

saves a table of standard errors for combined means, usable for calculating standard errors for differences between means in the table, at equal levels of the factors specified by the `EQMEANS`

option. `VCCBMEANS`

saves a symmetric matrix with variances and covariances of combined means. `SEDCBMEANS`

saves a symmetric matrix with standard errors for comparisons between every pair of entries in the table of combined means. `DFMEANS`

saves a symmetric matrix with degrees of freedom for comparisons between every pair of entries in the table of means. Finally, `RTERM`

saves a formula defining the residual term corresponding to a treatment term. A further change is that, if the replications of a term are all equal, they can be saved in a scalar instead of a table by the `REPLICATION`

parameter. Indeed, if the structure to save the replications has not yet been defined and the replications are equal, it will now be defined as a scalar rather than as a table.

`ASSIGN`

has a new default of zero for the `NSUBSTITUTE`

option, but the effect remains the same (i.e. no substitution).

`DELETE`

has a new option `NSUBSTITUTE`

for use when working with dummies. The default value, `*`

, substitutes the dummy (and any dummy to which it points) as now, so the deleted structure is the structure to which the dummy (eventually) points. `NSUBSTITUTE`

controls the number of times to substitute,as in `ASSIGN`

, so for example setting `NSUBSTITUTE`

=0 would delete the dummy itself.

`DUMP`

option `PRINT`

has a new setting, `space`

, to provide information about the current use of workspace within the GenStat server.

`DUPLICATE`

has a new option `REDEFINE`

to allow the type of a data structure to be redefined if required for the duplication.

^{†}`FCLASSIFICATION`

has a new default `*`

for the `FACTORIAL`

option, meaning no limitation on the number of factors and variates in the terms that are generated. It also has a new option `INCLUDEFUNCTIONS`

to specify whether or not functions like `POL`

or `SSPLINE`

are to be included in the output formula or terms. Previously these were always omitted.

`GET`

has three new options. `SEEDS`

saves a pointer containing three variates that save the current seeds for GenStat’s three random-number generators (that is, for random number functions, for the `RANDOMIZE`

directive, and for internal use by directives). `FIELDWIDTH`

saves the current default fieldwidth for printing, and `SIGNIFICANTFIGURES`

saves the current default precision. The options have also been added to the `SET`

directive, to enable these items to be modified by users.

`GROUPS`

has an option `CASE`

which allows the case of letters in text to be ignored. It also has an option `LDIRECTION`

which can be set to `'given'`

to request that the levels and labels be left in the order in which they are encountered in the data rather than being sorted into ascending order (the default). Note: `GROUPS`

also has a `PRINT`

option which was accidentally omitted from the R4.2 manuals!

^{†}`HELP`

has been revised to present a more appropriate interface for each particular type of computer. On PCs running Windows, for example, it loads the contents screen of the Windows-based help to the command language.

^{†}`PEN`

has two new parameters `CSYMBOLS`

and `CLINE`

to control the colour of symbols and lines drawn by the pen concerned, and the `FILLCOLOUR`

parameter is renamed to `CFILL`

.

`PRINT`

parameter `JUSTIFICATION`

parameter of has new settings `'centre'`

or `'center'`

to request that the output is centred within the specified field width.

`READ`

has a new option `CASE`

which allows the case of letters in texts to be ignored when these are converted to factors. It also has an option `LDIRECTION`

which can be set to ‘given’ to request that, when levels or labels are defined by `READ`

, they are left in the order in which they are encountered in the data rather than being sorted into ascending order (the default).

^{†}`PREDICT`

can save and print standard errors for differences between predictions and, for models with the Normal distribution, least significant differences of predictions. The `PRINT`

option has three new settings: `'sed'`

, `'lsd'`

and `'vcovariance'`

. `PREDICT`

also has three new options `SED`

, `LSD`

and `LSDLEVEL`

(inserted between `SE`

and `VCOVARIANCE`

): `SED`

and `LSD`

save matrices of standard errors of differences and least significant differences respectively, and `LSDLEVEL`

sets the significance level (in %) to use in the calculation of lsd’s.

`PROCEDURE`

option `RESTORE`

has two new settings: `'seeds'`

restores the seeds for random number generation on exit from the procedure; and `'all'`

has the same effect as listing all the available settings of `RESTORE`

.

`RESUME`

has a `CLOSE`

option, which allows you to close the file afterwards.

`RKEEP`

has three new parameters: `SUMMARY`

and `ACCUMULATED`

save the summary and accumulated analysis-of-variance (or deviance) tables respectively, and the `STATISTICS`

parameter allows statistics to be saved for any current y-variate (rather than only the first, as with the existing `STATISTICS`

option).

^{†}`TRY`

can now provide a more succinct summary of the potential changes. This is requested by the new `'changes'`

setting of the `PRINT`

option, which is now its default.

^{†}`VRESIDUAL`

directive has a new option `CONSTRAINT`

which allows the residual variance to be fixed at its initial value.

`VRESIDUAL`

and `VSTRUCTURE`

directives have a new parameter `EQUALITYCONSTRAINTS`

that can constrain parameters in the variance model to have equal values.

**3.2 Procedures**

^{†}`AONEWAY`

has been rewritten to provide customized facilities for one-way analysis of variance. For example, if the treatments have unequal replication, a standard error is printed for each mean, rather than the summary for comparisons of means with minimum and maximum replication as given by `ANOVA`

. Similarly, any missing values are excluded from the analysis by `AONEWAY`

. In `ANOVA`

they need to be included, to ensure balance in the more general situations that it covers, and are estimated as part of the analysis.

^{†}`APLOT`

now provides index and absolute-residual plots, and the choice of line-printer or high-resolution graphics (default is high resolution). There are new options `INDEX`

and `GRAPHICS`

, and a new parameter `PEN`

.

`AREPMEASURES`

now customizes the `ANOVA`

output to take account of the correction factor on the degrees of freedom. It also has new options `FPROBABILITY`

, `PSE`

and `LSDLEVEL`

which operate as in `ANOVA`

, and an option `EPSILON`

to save the correction factor.

`DESCRIBE`

procedure now calculates the standard error of the variance.

`DSCATTER`

can now plot factors (as well as variates), and procedure `TRELLIS`

can now plot medians.

`FACPRODUCT`

has a new option `LMETHOD`

to control whether levels are formed only for combinations of the factors that are present in the data, or for all the combinations.

`GLMM`

has a new option `CADJUST`

controlling centring of covariates, and a new parameter `ITERATIVEWEIGHTS`

to save the iterative weights.

^{†}`MANNWHITNEY`

now provides exact probabilities (using new procedure `PRMANNWHITNEYU`

). The `NORMAL`

option is now replaced by a `PROBABILITY`

option (saving the probability rather than the Normal approximation).

^{†}`PROBITANALYSIS`

now provides a choice of methods, selected using the `FITMETHOD`

option. When `FITMETHOD=generalizednonlinear`

, the model is fitted as a generalized nonlinear model, using the `FIT`

directive. The alternative setting, `'nonlinear'`

, fits it as a nonlinear model using `FITNONLINEAR`

. Apart from minor numerical differences, the two methods should generate the same results. Generalized nonlinear models allow a confidence region to be generated for lethal doses, and these are used as default for all situations except Wadley’s problem. The nonlinear method is more accurate, and is thus used as the default for the more difficult situation presented by Wadley’s problem. There is a new option `LOGBASE`

(between `LD`

and `DISPERSION`

) which can be used to specify the base of antilog transformation (if any) to be applied to the lethal doses, and there is a new `MAXCYCLE`

option to control the maximum number of iterations for fitting the model.

^{†}`RCHECK`

can now plot confidence envelopes around Normal and half-Normal plots. These are controlled by new options `ENVELOPE`

, `PROBABILITY`

, `NSIMULATIONS`

and `SHADE`

, which are inserted between `INDEX`

and `RESIDUALS`

.

`REPLICATION`

now takes account of the detection probability (= one minus the type II error rate), and has an option `PRDETECTION`

for specifying it.

`RPROPORTIONAL`

and `RSURVIVAL`

have a new setting ‘loglikelihood’ for their `PRINT`

options to print -2 times the log likelihood.

`TTEST`

has revised headings and an extended summary, which now includes number of observations, mean, variance, standard deviation and standard error of mean.

^{†}`VPLOT`

procedure can now produce composite plots like those from `DAPLOT`

and `RCHECK`

, as well as absolute-residual and index plots. There is a new parameter `PEN`

and a new option `INDEX`

.

**3.3 Functions**

In regression, the `POL`

and `REG`

functions have been extended to work on factors, and they can now be included in interactions. The meaning of the `REG`

function has been clarified, so that now its contrasts are always orthogonalized for the main effects of the variate or factor (even if the matrix third argument is set). Unorthogonalized contrasts are now fitted using the `COMPARISON`

function (previously available only in `ANOVA`

), which has an identical syntax to `REG`

.

The `GAMMA`

function has been extended to provide the incomplete gamma function (by setting an optional second argument).

**3.4 Incompatibilities**

`AKEEP` directive |
new option `RSAVE` inserted before `SAVE` ; new parameters `SEDMEANS` and `VCMEANS` inserted between `SEMEANS` and `EFFECTS` ; `DFMEANS` inserted between `DF` and `SS` ; `RTERM` between `VARIANCE` and `CEFFICIENCY` ; and `SECBMEANS` , `SEDCBMEANS` and `VCCBMEANS` between `CBMEANS` and `CBEFFECTS` . |
---|---|

`AONEWAY` procedure |
completely rewritten: `GROUPS` option must now be set to a factor; `HOMOGENEITY` option replaced by a `homogeneity` setting of the `PRINT` option; `EXPLAIN` option deleted. |

`APLOT` procedure |
new options `INDEX` and `GRAPHICS` before `SAVE` ; default is now to give high-resolution graphics. |

`FCLASSIFICATION` directive |
default for the `FACTORIAL` option now 0. |

`HELP` directive |
revised syntax, which may depend on the type of computer (for details type `HELP` alone on a line). |

`MANNWHITNEY` procedure |
option `NORMAL` replaced by option `PROBABILITY` (saving the probability rather than the Normal approximation). |

`PEN` directive |
`FILLCOLOUR` renamed to `CFILL` , and preceeded by new parameters `CSYMBOLS` and `CLINE` . |

`PREDICT` directive |
new options `SED` , `LSD` and `LSDLEVEL` inserted between `SE` and `VCOVARIANCE` . |

`PROBITANALYSIS` procedure |
option `LOGBASE` inserted between `LD` and `DISPERSION` . |

`RCHECK` procedure |
options `ENVELOPE` , `PROBABILITY` , `NSIMULATIONS` and `SHADE` inserted between `INDEX` and `RESIDUALS` . |

`REG` function |
in regression and generalized linear models these contrasts are now always orthogonalized for the main effects of the variate or factor, even if the matrix third argument is set (unorthogonalized contrasts can be fitted instead using the `COMP` function). |

`TRY` directive |
default for the `PRINT` option now `'changes'` . |

`VPLOT` procedure |
option `INDEX` added before `GRAPHICS` . |

`VRESIDUAL` directive |
new option `CONSTRAINT` between `VARIANCE` and `COORDINATES` . |

Also procedures `DAYCOUNT`

, `GETDATA`

, `SAVEDATA`

, `INVNORMAL`

and `EDINVNORMAL`

are now obsolete. (You should use the date/time functions, Save-Session menus and/or `RECORD`

and `RESUME`

, and functions `CLINVNORMAL`

, `CUINVNORMAL`

, `EDINVNORMAL`

and `PRINVNORMAL`

instead).