Stores results from a linear, generalized linear, generalized additive or nonlinear model.

### Options

`EXPAND` = string token |
Whether to put estimates in the order defined by the maximal model for linear or generalized linear models (`yes` , `no` ); default `no` |
---|---|

`DISPERSION` = scalar |
Dispersion parameter to be used as estimate for variability in s.e.s; default as set in the `MODEL` directive |

`RMETHOD` = string token |
Type of residuals to form if parameter `RESIDUALS` is set (`deviance` , `Pearson` , `simple` ); default as set in `MODEL` |

`DMETHOD` = string token |
Basis of estimate of dispersion, if not fixed by `DISPERSION` option (`deviance, Pearson` ); default as set in `MODEL` |

`PROBABILITY` = scalar |
Probability level for confidence limits; default 0.95 |

`OMODEL` = pointer |
Pointer to settings of options of the current `MODEL` statement, given unit labels corresponding to the option names of `MODEL` (starting with `'distribution'` ) |

`PMODEL` = pointer |
Pointer to settings of parameters of the current `MODEL` statement, given unit labels corresponding to the parameter names of `MODEL` (starting with `'y'` ), only refers to the first setting of `Y` , `FITTEDVALUES` and `RESIDUAL` |

`STATISTICS` = variates |
Saves all the statistics that could be displayed for the first `Y` variate by the `'summary'` setting of the `PRINT` option of the fitting directives `FIT` , `ADD` etc |

`CIMETHOD` = string token |
Method to use to calculate confidence intervals for nonlinear models (`exact` , `quadratic` ); default `quad` |

`IGNOREFAILURE` = string |
Whether to ignore failure to fit a generalized linear model (`yes` , `no` ); default `no` |

`MAXIMALMODEL` = formula structure |
Saves the maximal model (as defined by `TERMS` ) |

`FITMODEL` = formula structure |
Saves the currently-fitted model (including any contrast functions) |

`FITCONSTANT` = scalar |
Saves a scalar containing the value one if the constant is included in the fitted model, or zero otherwise |

`FITTYPE` = scalar |
Saves a scalar to indicate the type of model that has been fitted |

`SAVE` = identifier |
Specifies save structure of model; default `*` i.e. that from latest model fitted |

### Parameters

`Y` = variates |
Response variates for which results are to be saved; default is the list of response variates in the most recent `MODEL` statement |
---|---|

`RESIDUALS` = variates |
Residuals for each `Y` variate, as specified by the `RMETHOD` option |

`FITTEDVALUES` = variates |
Fitted values for each `Y` variate |

`LEVERAGES` = variate |
Leverages of the units for each `Y` variate |

`ESTIMATES` = variates |
Estimates of parameters for each `Y` variate |

`SE` = variates |
Standard errors of the estimates |

`INVERSE` = symmetric matrix |
Inverse matrix from a linear or generalized linear model, inverse of second derivative matrix from a nonlinear model |

`VCOVARIANCE` = symmetric matrix |
Variance-covariance matrix of the estimates |

`DEVIANCE` = scalars |
Residual ss or deviance |

`DF` = scalar |
Residual degrees of freedom |

`TERMS` = pointer or formula structure |
Fitted terms (excluding constant) |

`ITERATIVEWEIGHTS` = variate |
Iterative weights from a generalized linear model |

`LINEARPREDICTOR` = variate |
Linear predictor from a generalized linear model |

`YADJUSTED` = variate |
Adjusted response of a generalized linear model |

`EXIT` = scalar |
Exit status from a generalized linear or nonlinear model |

`GRADIENTS` = pointer |
Derivatives of fitted values with respect to parameters in a nonlinear model |

`GRID` = variate |
Grid of function or deviance values from a nonlinear model |

`DESIGNMATRIX` = matrix |
Design matrix whose columns are explanatory variates and dummy variates |

`PEARSONCHISQUARE` = scalar |
Pearson chi-square statistic from a generalized linear model |

`STERMS` = pointer |
Saves the identifiers of the variates that have been smoothed in the current model |

`SCOMPONENTS` = pointer |
Saves a pointer to variates holding the nonlinear components of the variates that have been smoothed |

`NOBSERVATIONS` = scalar |
Number of units used in regression, excluding missing data and zero weights and taking account of restrictions |

`SEFITTEDVALUES` = variate |
Saves standard errors of the fitted values |

`SELINEARPREDICTOR` = variate |
Saves standard errors of the linear predictor |

`INFLATION` = variate |
Saves the variance inflation factors of the parameter estimates |

`UPPER` = variates |
Saves upper confidence limits for the parameter estimates |

`LOWER` = variates |
Saves lower confidence limits for the parameter estimates |

`MEANDEVIANCE` = scalars |
Saves the residual mean deviance (or mean square) |

`TDEVIANCE` = scalars |
Saves the total deviance (or sum of squares) |

`TDF` = scalars |
Saves the total degrees of freedom (corrected for the mean or uncorrected as displayed by the fitting directives) |

`TMEANDEVIANCE` = scalars |
Saves the total mean deviance (or mean square) |

`SUMMARY` = pointer |
Saves the summary analysis-of-variance (or deviance) table as a pointer with a variate or text for each column (source, d.f. etc) |

`ACCUMULATED` = pointer |
Saves the accumulated analysis-of-variance (or deviance) table as a pointer with a variate or text for each column (source, d.f. etc) |

`STATISTICS` = variates |
Saves all the statistics that could be displayed for the `Y` variate by the `'summary'` setting of the `PRINT` option of the fitting directives `FIT` , `ADD` etc |

### Description

`RKEEP`

allows you to copy information from a regression analysis (performed, for example, by a `FIT`

, `FITCURVE`

or `FITNONLINEAR`

statement) into Genstat data structures. You do not need to declare the structures in advance; Genstat will declare them automatically to be of the correct type and length.

The `Y`

parameter specifies the response variates for which the results are to be saved. Unusually for the first parameter of a directive, this has a default: if you leave it out, Genstat assumes that results are to be saved for all the response variates, as given in the previous `MODEL`

statement.

The `RESIDUALS`

, `FITTEDVALUES`

, `LEVERAGES`

, `SEFITTEDVALUES`

and `SELINEARPREDICTOR`

parameters allow you to save the standardized residuals, the fitted values, the leverages, the standard errors of the fitted values and the standard errors of the linear predictor. For example, `RESIDUALS=R`

puts the residuals in a variate `R`

. The `RMETHOD`

option controls the type of residuals that are formed. You cannot save these values if you have set `RMETHOD=*`

in the `MODEL`

statement. The standard errors of fitted values are defined by:

s.e. = √(leverage × variance function × dispersion / weight)

where the variance function is calculated from the fitted value according to the setting of the `DISTRIBUTION`

option of the current `MODEL`

statement, and the dispersion is the fixed or estimated value of dispersion, as controlled by the `DISPERSION`

and `DMETHOD`

options of the `MODEL`

and `RKEEP`

directives.

The `ESTIMATES`

and `SE`

parameters save the parameter estimates and their standard errors; `RKEEP`

puts them in variates, using the same order as in the display produced by the `PRINT`

option of the directive used to fit the model. Alternatively, if you have used `TERMS`

to define a maximal model, you can set option `EXPAND=yes`

to reorder the estimates to their order in the maximal model (including missing values for the parameters not currently in the model). The variates saving these values are set up with labels; thus, you can refer to individual values in expressions using the labels as displayed when the estimates are printed. For example, to get the estimate of the constant into a scalar, you could put:

`RKEEP ESTIMATES=Esti`

`SCALAR Const`

`CALCULATE Const = Esti$['Constant']`

The `UPPER`

and `LOWER`

parameters allow you to save upper and lower confidence limits for the parameter estimates. The probability for the confidence interval is specifed by the `PROBABILITY`

option, with default 0.95. The `CIMETHOD`

option controls the method used with nonlinear models. The default setting, `quadratic`

, uses the same method as for other types of regression, basing the limits on a quadratic surface fitted to the likelihood surface around the optimum. These may be poor approximations if the surface is very non symmetric. The alternative setting, `exact`

, calculates the limits directly from the likelihood surface.

The `INFLATION`

parameter allows the variance inflation factors of the parameters to be saved.

The `INVERSE`

parameter allows you to save the inverse matrix as a symmetric matrix: that is, (*X*′*X*)^{-1} where *X* is the design matrix. This matrix is the same for all response variates.

The `VCOVARIANCE`

parameter saves the variance-covariance matrix of the estimates for each response variate: these are formed by multiplying the inverse matrix by the relevant variance estimate based on the estimated dispersion, or on the dispersion that you have supplied.

The `DEVIANCE`

parameter allows you to save the residual sum of squares, or the *deviance* for distributions other than Normal. The `DF`

parameter saves the residual degrees of freedom, and the `MEANDEVIANCE`

parameter saves the residual mean deviance. The `TDEVIANCE`

parameter saves the total deviance, the `TDF`

parameter saves the total degrees of freedom (corrected for the mean or uncorrected as displayed by the fitting directives), and the `TMEANDEVIANCE`

parameter saves the total mean deviance.

The `LINEARPREDICTOR`

parameter allows you to save the linear predictor of a generalized linear model; the values of the linear predictor are the same as the fitted values if the link function is the identity function.

The `ITERATIVEWEIGHTS`

parameter saves a variate containing the iterative weights used in the last cycle of the iteration for fitting a generalized linear model. The iterative weights do not contain any contribution from the weights that can be specified, whether or not the model is iterative, by the `WEIGHTS`

option of the `MODEL`

directive, and they are 1.0 for ordinary linear regression.

The `YADJUSTED`

parameter saves the adjusted response variate used in the last cycle of the iteration for fitting a generalized linear model; with the identity link function this is the same as the response variate.

The Pearson chi-square statistic can be saved using the `PEARSONCHI`

parameter of `RKEEP`

. It is calculated as the sum of the squared Pearson residuals. This can be used as an alternative to the deviance for testing goodness of fit; see Nelder & McCullagh (1989).

The `EXIT`

parameter of `RKEEP`

provides a code that indicates the success or type of failure of an iterative fit. Codes 0-7 are relevant to standard curves and general nonlinear models, and codes 0 and 8-13 are for generalized linear models:

0 Successful fitting

1 Limit on number of cycles has been reached without convergence

2 Parameter out of bounds

3 Likelihood appears constant

4 Failure to progress towards solution

5 Some standard errors are not available because the information matrix is nearly singular

6 Calculated likelihood may be incorrect because of missing fitted values

7 Curve is close to a limiting form

8 Data incompatible with model

9 Predicted mean or linear predictor out of range

10 Invalid calculation for calculated link or distribution

11 All units have been excluded from the analysis

12 Iterative process has diverged

13 Failure due to lack of space or data access

14 Function returned a missing value

With a generalized linear model, unless you set option `IGNOREFAILURE=yes`

, the `EXIT`

code is the only information that you can save if the fit has been unsuccessful. Alternatively, with a nonlinear model or when `IGNOREFAILURE=yes`

, `RKEEP`

will save any information that may be available. (You may thus, for example, be able to discover more about the cause of the failure.)

The derivatives of the fitted values with respect to each parameter in a standard curve or general nonlinear model can be stored in variates using the `GRADIENTS`

parameter. You can use these quantities to assess the relative influence of each observation on a parameter; you can also construct a measure of leverage by summing the gradients for all the parameters.

The `GRID`

parameter can be used to store a grid of values of the deviance (or any general function) following `FITNONLINEAR`

.

The `DESIGNMATRIX`

parameter allows you to save the matrix *X*. The columns correspond to the parameters of the model, ordered as for the `ESTIMATES`

parameter. For simple linear regression with a constant this has only two columns, the first containing ones and the second containing the values of the explanatory variate. Columns corresponding to aliased parameters are omitted, but you can use the corresponding option of `TERMS`

to construct the full design matrix.

The `PEARSONCHI`

parameter provides the Pearson chi-square statistic for dispersion, which is the same as the residual sum of squares for the Normal distribution, but is different to the deviance for other distributions. The `STERMS`

and `SCOMPONENTS`

parameters are relevant to generalized additive models. The `STERMS`

parameter can be used to store a pointer to those variates whose effects in the model are smoothed. The `SCOMPONENTS`

parameter stores a pointer to variates, one for each smoothed variate in the same order as in `STERMS`

, containing the fitted nonlinear component of each smoothed variate – this does not include the linear component or the constant term.

The `NOBSERVATIONS`

parameter allows you to save the number of units used in the analysis, omitting units with missing values or excluded by restrictions. This will be the same as the total number of degrees of freedom plus one, except in a regression with no constant term and no explanatory factors when it will equal the total number of degrees of freedom.

The `SUMMARY`

parameter can be used to save the summary analysis-of-variance (or deviance) table for each response variate. The summary table is saved as a pointer with a variate or text for each of its columns (source, d.f. etc). Similarly, the `ACCUMULATED`

parameter can save the accumulated analysis-of-variance (or deviance) tables.

The `STATISTICS`

parameter saves all the statistics that could be displayed for each response variate by the `'summary'`

setting of the `PRINT`

option of the fitting directives `FIT`

, `ADD`

etc. Alternatively, the `STATISTICS`

option can be used to save the statistics for the first response variate specified by the `MODEL`

statement.

The `DISPERSION`

option allows you to define the value to be used for the dispersion parameter when calculating the standard errors. The `DMETHOD`

option indicates how this should be calculated if `DISPERSION`

is not set. By default the deviance is used but you can set `DMETHOD=Pearson`

to request the Pearson chi-square statistic to be used instead.

Options `OMODEL`

and `PMODEL`

allow you to save pointers containing information about the current model. The labels of the pointers can be specified in either lower or upper case, or any mixture. `OMODEL`

can be set to a pointer to store information about each of the options set in the previous `MODEL`

statement. For example, the statement

`RKEEP [OMODEL=Om]`

will allow you to refer to the current variate of weights (if one was set in the `WEIGHTS`

option of `MODEL`

) as `Om['weights']`

. Whether or not a variate was set, the statement

`MODEL [WEIGHTS=Om['weights']] Newobs`

will allow a new analysis with the same weighting as the old.

The pointer `Om`

has 16 values, with suffixes corresponding to the options of `MODEL`

in the defined order. Similarly, the statement

`RKEEP [PMODEL=Pm]`

will set up a pointer storing the (eight) current parameter settings of the previous `MODEL`

statement. However, if there was more than one response variate, the first value of the pointer will be the identifier of the first response variate only: the others are not stored. Similarly, only the fitted-values and residuals variates for the first response will be pointed at. For example, the identifier `Pm[1]`

or `Pm['y']`

can be used to refer to the current response variate after the `RKEEP`

statement above.

The `MAXIMALMODEL`

option saves the maximal model (as defined by `TERMS`

). The `FITMODEL`

option saves the model that has currently been fitted, including any contrast functions (i.e. `POL`

, `REG`

, `COMPARISON`

, `SSPLINE`

or `LOESS`

). The `FITCONSTANT`

option saves a scalar containing the value one if the constant is included in the fitted model, or zero otherwise. The `FITTYPE`

option saves a scalar to indicate the type of model that has been fitted: 1 for an ordinary regression or generalized linear model (`FIT`

), 2 for a generalized nonlinear model (`FIT`

with the `CALCULATION`

option set), 3 for a standard curve (`FITCURVE`

) and 4 for a nonlinear model (`FITNONLINEAR`

).

Options: `EXPAND`

, `DISPERSION`

, `RMETHOD`

, `DMETHOD`

, `PROBABILITY`

, `OMODEL`

, `PMODEL`

, `STATISTICS`

, `CIMETHOD`

, `IGNOREFAILURE`

, `MAXIMALMODEL`

, `FITMODEL`

, `FITCONSTANT`

, `FITTYPE`

, `SAVE`

.

Parameters: `Y`

, `RESIDUALS`

, `FITTEDVALUES`

, `LEVERAGES`

, `ESTIMATES`

, `SE`

, `INVERSE`

, `VCOVARIANCE`

, `DEVIANCE`

, `DF`

, `TERMS`

, `ITERATIVEWEIGHTS`

, `LINEARPREDICTOR`

, `YADJUSTED`

, `EXIT`

, `GRADIENTS`

, `GRID`

, `DESIGNMATRIX`

, `PEARSONCHISQUARE`

, `STERMS`

, `SCOMPONENTS`

, `NOBSERVATIONS`

, `SEFITTEDVALUES`

, `SELINEARPREDICTOR`

, `INFLATION`

, `UPPER`

, `LOWER`

, `MEANDEVIANCE`

, `TDEVIANCE`

, `TDF`

, `TMEANDEVIANCE`

, `SUMMARY`

, `ACCUMULATED`

, `STATISTICS`

.

### Reference

McCullagh, P. & Nelder, J.A. (1989). *Generalized Linear Models* (second edition). Chapman and Hall, London.

### See also

Directives: `FIT`

, `FITCURVE`

, `FITNONLINEAR`

, `RKESTIMATES`

.

Commands for: Regression analysis.

### Example

" Example FIT-3: Comparing linear regressions between groups Experiments on cauliflowers in 1957 and 1958 provided data on the mean number of florets in the plant and the temperature during the growing season (expressed as accumulated temperature above 0 deg C." " The counts and temperatures are in a file called 'FIT-3.DAT'" FILEREAD [NAME='%gendir%/examples/FIT-3.DAT'] MnCount,AccTemp " The first 7 values are from 1957 and the rest from 1958; set up a factor to distinguish the two years." FACTOR [LEVELS=!(1957,1958); VALUES=7(1957,1958)] Year " Fit a linear regression model of the mean count of florets on accumulated temperature - first ignoring the division into two years." MODEL MnCount TERMS AccTemp*Year FIT AccTemp " Fit parallel regressions for the two years." ADD Year " Fit separate regressions for the two years." ADD AccTemp.Year " Display the accumulated summary: an analysis of parallelism." RDISPLAY [PRINT=accumulated] " Show the parallel models." DROP [PRINT=*] AccTemp.Year RGRAPH [GRAPHICS=high] " Extract the parameter estimates and s.e.s and display the common slope and its s.e." RKEEP ESTIMATES=Esti; SE=Se CALC Slope,SlopeSE = (Esti,Se)$[2] PRINT Slope,SlopeSE