Does logistic ridge regression (A.I. Glaser).

### Options

`PRINT` = string token |
What output to print (`correlation` , `crossvalidation` , `ridge` , `scaledridge` , `standarderrors` ); default `corr` |
---|---|

`PLOT` = string tokens |
What graphs to plot (`correlation` , `ridgetrace` , `buildup` ); default `*` i.e. none |

`LINK` = string token |
Link function (`logit` , `probit` , `complementaryloglog` ); default `logi` |

`DISPERSION` = scalar |
Value of the dispersion parameter; default 1 |

`TERMS` = formula |
Explanatory model |

`FACTORIAL` = scalar |
Limit on number of factors/covariates in a model term; default 3 |

`LAMBDA` = variate or scalar |
Values for the ridge parameter lambda |

`CROSSVALIDATION` = string token |
Whether to use cross-validation to find an optimal value of lambda (`yes` , `no` ); default `no` |

`NCROSSVALIDATIONGROUPS` = scalar |
Number of groups for cross-validation; default 10 |

`CVMETHOD` = string token |
Which method to use for cross-validation (`deviance` , `squarederror` , `countingerror` ); default `devi` |

`SEED` = scalar |
Seed for random numbers to use in cross-validation; default 0 |

### Parameters

`Y` = variates |
Response variate |
---|---|

`NBINOMIAL` = scalars or variates |
Number of binomial trials for each unit; default 1 |

`YVALIDATION` = variates |
Response variate for validation |

`XVALIDATION` = pointers |
Explanatory variables for validation |

`XDATA` = pointers |
Pointer containing the original explanatory variables in the same order as in `XVALIDATION` ; default takes the variables in the order in which they occur in `TERMS` |

`NVALIDATION` = variates or scalars |
Number of binomial trials for the units of each `YVALIDATION` variate; default 1 |

`BESTLAMBDA` = scalars |
Saves the optimal lambda value from cross-validation |

`CVSTATISTICS` = matrices |
Saves the cross-validation statistics |

`RESIDUALS` = variates |
Saves residuals when `LAMBDA` is a scalar |

`FITTEDVALUES` = variates |
Saves fitted values when `LAMBDA` is a scalar |

`ESTIMATES` = variates |
Saves parameter estimates when `LAMBDA` is a scalar |

`SE` = variates |
Saves standard errors of the parameter estimates when `LAMBDA` is a scalar |

`DEVIANCE` = scalars |
Saves the residual deviance when `LAMBDA` is a scalar |

`LINEARPREDICTOR` = variates |
Saves the linear predictor when `LAMBDA` is a scalar |

### Description

Procedure `LRIDGE`

fits a logistic ridge regression model based on penalized likelihood inference, as explained in the *Method* section. The response variate is specified by the `Y`

parameter. The `NBINOMIAL`

parameter defines the number of binomial trials for each unit, with a default of one. If `NBINOMIAL`

is greater then one, `LRIDGE`

forms a modified copy of the data set in which each of the original observations is expanded into its underlying individuals (i.e. to have binary responses either one or zero).

The model to fit is defined by the `TERMS`

option. The `FACTORIAL`

option sets a limit on the number of variates and/or factors in the model terms generated from the `TERMS`

model formula, as in the `FIT`

directive. The `LINK`

option defines the link function. This can be either logit (the default), probit or complementary-log-log. The `DISPERSION`

option specifies the dispersion parameter in the usual way i.e. the default is to fix the parameter at one, or you can set `DISPERSION=*`

to use a dispersion parameter estimated from the residual deviance.

Printed output is controlled by the `PRINT`

option, with settings:

`correlation` |
prints the correlations between the explanatory variables in the `TERMS` formula, |
---|---|

`crossvalidation` |
prints the cross-validation results, with optimal lambda value, |

`ridge` |
prints the ridge coefficients on the original scale, |

`scaledridge` |
prints the ridge coefficients for the standardized data, and |

`standarderrors` |
includes standard errors with coefficients printed by the `ridge` or `scaledridge` settings. |

Graphical output is controlled by the `PLOT`

option:

`ridgetrace` |
produces coefficient estimates against lambda, showing how they decrease as lambda increases, |
---|---|

`buildup` |
plots coefficient values against the coefficients divided by their maximum values, showing the relative decrease as lambda increases, and |

`correlation` |
uses the `DCORRELATION` procedure to produce a graphical representation of the correlation matrix for elements in `TERMS` . |

The `LAMBDA`

option allows you to define the values to try for the ridge parameter lambda (see *Method*). By default `LRIDGE`

takes a range of values between 0 and 1. If you have set `LAMBDA`

to a single value, you can save results from the analysis using the `RESIDUALS`

, `FITTEDVALUES`

, `ESTIMATES`

, `DEVIANCE`

and `LINEARPREDICTOR`

parameters. Note that the residuals are simple residuals, rather than standardized residuals.

`LRIDGE`

can use cross-validation to find an optimal value of lambda. The `YVALIDATION`

, `XVALIDATION`

and `NVALIDATION`

parameters allow you to supply an independent data set for validation. The `YVALIDATION`

parameter specifies the response variate, the `NVALIDATION`

parameter specifies the corresponding numbers of binomial trials (default 1), and the `XVALIDATION`

supplies a pointer containing values for the explanatory variables. `LRIDGE`

needs to match the validation explanatory variables with the original variables in `TERMS`

. You can define the correspondence explicitly by setting the `XDATA`

parameter to a pointer containing the original variables in the same order as the corresponding variables in the `XVALIDATION`

pointer. If `XDATA`

is not set, `LRIDGE`

forms the original list using the `CLASSIFICATION`

of the `FCLASSIFICATION`

directive. The order of variables should easily be predictable for straightforward `TERMS`

models, but it is safest to specify `XDATA`

explicitly for complicated models.

If you do not have an independent data set, `LRIDGE`

can do the validation by selecting subsets of the original data set. The `NCROSSVALIDATIONGROUPS`

option defines the number of subsets (default 10). The data set (modified to contain binary responses, as explained above, if `NBINOMIAL`

is greater than one) is divided into that number of roughly equal-sized subsets. The model is fitted to the data set with each of these parts removed, in turn, and the prediction error is calculated for the omitted subset based on that fit. The method for calculating the prediction error is specified by the `CVMETHOD`

option:

`deviance` |
uses the deviance function (defined as twice the difference between the maximum log-likelihood and that achieved under the validation data), |
---|---|

`squarederror` |
takes the sum of the squared differences between the validation data and the expected values, and |

`countingerror` |
counts the number of “wrong” predictions in the validation data, i.e. if the value of the validation data was 1 and the expected probability was less than 0.5, the prediction would be considered to be wrong. |

The calculation of the prediction error is repeated for every value of the `LAMBDA`

option. The value that minimizes the mean prediction error is taken as the optimal lambda, and can be saved by the `BESTLAMBDA`

parameter. (You could then use `LRIDGE`

again, with `LAMBDA`

set to that value, and use the parameters `RESIDUALS`

, `FITTEDVALUES`

etc. to save information from the optimal analysis.)

Options: `PRINT`

, `PLOT`

, `LINK`

, `DISPERSION`

, `TERMS`

, `FACTORIAL`

, `LAMBDA`

, `CROSSVALIDATION`

, `NCROSSVALIDATIONGROUPS`

, `CVMETHOD`

, `SEED`

.

Parameters: `Y`

, `NBINOMIAL`

, `YVALIDATION`

, `XVALIDATION`

, `XDATA`

, `NVALIDATION`

, `BESTLAMBDA`

, `CVSTATISTICS`

, `RESIDUALS`

, `FITTEDVALUES`

, `ESTIMATES`

, `SE`

, `DEVIANCE`

, `LINEARPREDICTOR`

.

### Method

Logistic ridge regression is carried out as described by le Cessie & van Houwelingen (1992). The usual log-likelihood for logistic regression is extended to include a penalty on the sum of squares of the parameter estimates *β*, namely λ × √{∑*β*^{2}}. When the ridge parameter, lambda, is equal to zero, the parameter estimates will be the usual maximum-likelihood estimates, whereas as lambda tends to infinity all of the parameters tend towards zero. The penalty term is applied by setting the `RIDGE`

option of the `TERMS`

directive. The columns of the design matrix in `TERMS`

are standardized. However, estimated coefficients are available for both the standardized and unstandardized data.

### Action with `RESTRICT`

There must be no restrictions.

### Reference

le Cessie, S. & van Houwelingen, J.C. (1992). Ridge estimators in logistic regression. *Applied Statistics*, 41, 191-202.

### See also

Commands for: Regression analysis.

### Example

CAPTION 'LRIDGE example'; STYLE=meta " Data showing presence/absence of frogs in the Snowy Mountain area of New South Wales, Australia. See Maindonald & Braun (2007), Data Analysis and Graphics Using R, 2nd Edition." SPLOAD '%GENDIR%/Examples/LRID-1.gsh' POINTER [VALUES=No_of_breeding_sites,altitude,average_rain,mean_max_temp,\ mean_min_temp,log_No_of_pools,log_distance] xvars " Try a range of LAMBDA values, and select best by cross-validation." VARIATE [VALUES=0, 0.001, 0.002...0.01, 0.02, 0.03...0.1, 0.2, 0.3...1,\ 2...5] lambda LRIDGE [PRINT=correlation,SCAL,ST,ridge; PLOT=ridgetrace,buildup,correlation;\ LAMBDA=lambda; CROSSVALIDATION=yes; SEED=237819; TERMS=xvars[]]\ Y=Present; BEST=optlambda PRINT optlambda LRIDGE [PRINT=*; LAMBDA=optlambda; TERMS=xvars[]]\ Y=Present; ESTIMATES=estimates; SE=se; FITTED=prob PRINT estimates,se PRINT Present,prob