Calculates design matrices to fit a penalized spline as a linear mixed model (S.J. Welham).

### Options

`KMETHOD` = string token |
Method for constructing the set of knots (`equal` , `quantile` , `given` ); default equa |
---|---|

`NSEGMENTS` = scalar |
Specifies the number of segments between boundaries; default `*` obtains a value automatically |

`INKNOTS` = variate |
Provides the set of knots when `KMETHOD=given` |

`DEGREE` = scalar |
Degree of polynomial used to form the underlying spline basis functions; default 1 |

`LOWER` = scalar |
Specifies the lower boundary when `KMETHOD=equal` ; default takes the minimum value in `X` |

`UPPER` = scalar |
Specifies the upper boundary when `KMETHOD=equal` ; default takes the maximum value in `X` |

`ORTHOGONALIZETO` = variate |
Variate to use to get an orthogonalized basis; default `*` i.e. orthogonalization with respect to `X` |

`SCALING` = scalar |
Scaling of the `XRANDOM` terms (`automatic` , `none` ); default `auto` |

### Parameters

`X` = variates |
The explanatory variate for which the spline values are required |
---|---|

`XFIXED` = matrices |
Saves the design matrix to define the fixed terms (excluding the constant) for fitting the penalized spline |

`XRANDOM` = matrices |
Saves the design matrix to define the random terms for fitting the penalized spline |

`KNOTS` = variates |
Saves the internal knots and boundaries used to form the basis for the spline |

`PX` = variates |
Specifies x-values at which predictions are required |

`PFIXED` = matrices |
Saves the design matrix for the fixed terms (excluding the constant) for the spline at the prediction points |

`PRANDOM` = matrices |
Saves the design matrix for the random terms for the spline at the prediction points |

### Description

This procedure generates the fixed and random terms required to fit a penalized spline (Ruppert, Wand & Carroll 2003) as a linear mixed model, using `REML`

estimation of the smoothing parameter. The explanatory variate values at which the spline is to be calculated are specified, in a variate, by the `X`

parameter.

The `KMETHOD`

option specifies how to choose the set of knots for the penalized spline, using settings:

`equal` |
splits the range of `X` into segments of equal length (default), |
---|---|

`quantiles` |
defines the set of knots in terms of equally-spaced quantiles of `X` , |

`given` |
indicates that the knots will be supplied, in a variate, by the `INKNOTS` option. |

The number of segments or quantiles for the `equal`

and `quantile`

settings is specified by the `NSEGMENTS`

option. If this is unset, the number is determined automatically as

min([*p*/4], 35) + 1

(Ruppert 2002) where *p* is the number of unique values of the variate `X`

and [*r*] denotes the integer part of the number *r*. The lower and upper boundaries for equal segments are specified by the `LOWER`

and `UPPER`

options, respectively, taking the minimum and maximum values of `X`

as their defaults. The set of knots used to form the spline basis can be saved using the `KNOTS`

parameter.

The `DEGREE`

option specifies the degree of polynomial that is used to form the underlying spline basis functions. The default, `DEGREE=1`

, gives a linear penalized spline.

The `ORTHOGONALIZETO`

option specifies a variate to use in orthogonalization. The set of random spline terms will then be orthogonal to the fixed terms when evaluated at the specified values. For most data sets, it is recommended to set `ORTHOGONALIZETO`

to the variate `X`

(the default). The random terms will then be orthogonal to the fixed terms, and fitted values corresponding to the fixed model will represent the whole of the polynomial trend in the fitted spline. For very large data sets, this calculation can be onerous and can be approximated by making the two bases orthogonal at the knots. No orthogonalization is carried out if `ORTHOGONALIZETO`

is set to a scalar value (e.g. `ORTHOGONALIZETO=0`

).

The penalized spline terms are saved as two matrices. The terms required to be fitted as fixed terms can be saved using the `XFIXED`

parameter. This matrix does not include the constant term as this is added by default as part of a mixed model. The terms to be fitted as random can be saved using the `XRANDOM`

parameter.

The random terms can be scaled so that, for a random spline matrix `Z`

,

`TRACE(Z *+ T(Z)) = NROWS(Z)`

This ensures that the average contribution of `Z`

to the variance of an observation is equal to one, and hence the overall contribution from the term is equal to the spline variance component. This removes possible computational instabilities, and improves interpretability of the spline variance component. This scaling is imposed by default, but can be avoided by setting option `SCALING=none`

.

The penalized spline terms required for prediction can be saved using the `PXFIXED`

and `PXRANDOM`

parameters. The `PX`

parameter defines the set of x-values at which the predictions are to be made.

Options: `KMETHOD`

, `NSEGMENTS`

, `INKNOTS`

, `DEGREE`

, `LOWER`

, `UPPER`

, `ORTHOGONALIZETO`

, `SCALING`

.

Parameters: `X`

, `XFIXED`

, `XRANDOM`

, `KNOTS`

, `PX`

, `PXFIXED`

, `PXRANDOM`

.

### Method

The penalized spline of degree *k* and *r* knots, evaluated on variate `X`

, minimizes the penalized sum of squares

(*y* – *X* τ – *Z* *u*)ʹ *R*^{-1} (*y* – *X* τ – *Z* *u*) + λ *u*ʹ *u*

where

*X* is a design matrix containing *k* basis functions *x*^{{0…k}}, with associated unknown parameters τ;

*Z* is a design matrix containing *r* truncated power basis (TPF) functions with associated unknown effects *u*.

The TPF function of degree *k* at knot *t _{j}* takes the form (

*X*–

*t*)

_{j}_{+}

^{k}, where

*x*_{+} = *x* for *x* > 0

= 0 otherwise.

This penalized sum of squares is reformulated as the estimating equations from a mixed model of the form

*y* = *X* τ + *Z* *u* + *e*

where

*u* is a set of *r* independently and identically distributed Normal random effects with variance σ_{s}^{2}*I*

*e* is a vector of residual errors with variance σ^{2}*R*.

Fitting this mixed model with known λ set equal to σ^{2}/σ_{s}^{2} produces estimates that minimize the penalized sum of squares. In addition, we can estimate the smoothing parameter using `REML`

via the variance component σ_{s}^{2}. This can be generalized straightforwardly to mixed models with additional fixed and random terms.

The implementation in this procedure allows the random design matrix to be orthogonalized with respect to the fixed design matrix at a given variate. For orthogonalization with respect to the variate `X`

, this is achieved by using random design matrix

*Z*^{*} = (I – *X*(*X*ʹ*X*)^{-1}*X*ʹ)Z

The entirety of the polynomial trend is then captured by the fixed model. Orthogonalization with respect to a variate `t`

is calculated as

*Z*^{*} = *Z* – *X*(*T*ʹ*T*)^{-1}*T*ʹ *P*(`t`

)

where *T* is a matrix holding `t`

^{{0…k}} and *P*(`t`

) is the appropriate TPF basis evaluated at `t`

.

When the random matrix is scaled so that trace(*Z*^{*}*Z*^{*}ʹ) is equal to the number of row of *Z*^{*},

the average contribution of the spline term to the variance of each unit (σ_{s}^{2} × diag(*Z*^{*}*Z*^{*}ʹ)) is equal to σ_{s}^{2}. This makes the spline variance component value directly comparable with the residual variance.

Note that the constant function is not included in the fixed design matrix generated by `PENSPLINE`

, as this term is added automatically to the linear mixed model by the default option setting, `CONSTANT=estimate`

, in the `VCOMPONENTS`

statement.

The design matrices for use in prediction are calculated by evaluating the same set of basis functions at the predict points.

### Action with `RESTRICT`

Input structures must not be restricted.

### References

Ruppert, D. (2002). Selecting the number of knots for penalised splines. *Computational & Graphical Statistics*, 11, 735-757.

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). *Semiparametric Regression*. Cambridge University Press, Cambridge.

### See also

Directive: `VCOMPONENTS`

.

Procedures: `SPLINE`

, `LSPLINE`

, `NCSPLINE`

, `PSPLINE`

, `RADIALSPLINE`

, `TENSORSPLINE`

.

Function: `SSPLINE`

.

Commands for: Calculations and manipulation, Regression analysis, REML analysis of linear mixed models.

### Example

CAPTION 'PENSPLINE examples'; STYLE=meta " generate data " CALCULATE [SEED=23826] x = GRUNIFORM(30; 0; 100) & true = 10 - 0.005*(x-27)**2 - 2*SIN(2*C('pi')*x/40) & [SEED=0] y = true + GRNORMAL(NVAL(x); 0; 1) " pens for plotting lines " PEN 2,3; METH=LINE; SYMBOL=0; LINE=1 " plot data " DGRAPH [TITLE='Observed data'] y; x " generate linear penalized spline basis using 10 segments and design matrices to predict at single-unit intervals from 0 to 100 " VARIATE [VALUES=0...100] xpred PENSPLINE [NSEGMENTS=10; LOWER=0; UPPER=100] x; XFIXED=X; XRANDOM=Z;\ KNOTS=knots; PX=xpred; PFIXED=PX; PRANDOM=PZ " plot random basis functions (held in columns of Z) " DGRAPH [TITLE='Random basis functions for penalized spline'; KEY=0]\ Z$[*;1...9]; x; PEN=2 " check set of knots used " PRINT knots " fit penalized spline as a linear mixed model " VCOMPONENTS [FIXED=X] RANDOM=Z; CONSTRAINT=positive REML y " save fitted values " VKEEP [FITTED=fv] " plot data with fitted values - curve not smooth due to gaps between observations and linear interpolation on plot " DGRAPH ['Data with interpolated linear penalized spline'; KEY=0]\ y,fv; x; PEN=1,2 " predict at 1-unit intervals (using PX and PZ generated earlier) " VPREDICT [PREDICT=Tp] X,Z; LEVELS=PX,PZ; PARALLEL=*,X " extract and plot predicted curve with data " VTABLE Tp; VARIATE=Vp FRAME [GRID=xy] 1 XAXIS 1; MARKS=knots DGRAPH ['Data with fitted linear penalized spline'; KEY=0]\ Vp,y; xpred,x; PEN=2,1 " for comparison, generate quadratic penalized spline " PENSPLINE [NSEGMENTS=10; LOWER=0; UPPER=100; DEGREE=2]\ x; XFIXED=Xq; XRANDOM=Zq; PX=xpred; PFIXED=PXq; PRANDOM=PZq " fit quadratic penalized spline as linear mixed model " VCOMPONENTS [FIXED=Xq] RANDOM=Zq; CONSTRAINT=positive REML y " predict at 1-unit intervals (using PX and PZ generated earlier) " VPREDICT [PREDICT=Tpq] Xq,Zq; LEVELS=PXq,PZq; PARALLEL=Xq " compare fitted curves: linear spline is much less smooth " VTABLE Tpq; VARIATE=Vpq DGRAPH ['Data with fitted quadratic penalized spline'; KEY=0]\ Vpq,y; xpred,x; PEN=2,1