Calculates design matrices to fit a P-spline as a linear mixed model (S.J. Welham).
Options
NSEGMENTS = scalar |
Specifies the number of segments between boundaries; default * obtains a value automatically |
---|---|
DEGREE = scalar |
Degree of polynomial used to form the underlying spline basis functions; default 3 |
DIFFORDER = scalar |
Differencing order for penalty; default 2 |
LOWER = scalar |
Specifies the lower boundary; default takes the minimum value in X |
UPPER = scalar |
Specifies the upper boundary; default takes the maximum value in X |
ORTHOGONALIZETO = variate |
Variate to use to get an orthogonalized basis; default * i.e. orthogonalization with respect to X |
SCALING = scalar |
Scaling of the XRANDOM terms; (automatic , none ); default auto |
Parameters
X = variates |
The explanatory variate for which the basis functions are required |
---|---|
XFIXED = matrices |
Saves the design matrix to define the fixed terms (excluding the constant) for fitting the P-spline |
XRANDOM = matrices |
Saves the design matrix to define the random terms for fitting the P-spline |
KNOTS = variates |
Saves the internal knots and boundaries used to form the basis functions |
PX = variates |
Specifies x-values at which predictions are required |
PFIXED = matrices |
Saves the design matrix for the fixed terms (excluding the constant) for the spline at the prediction points |
PRANDOM = matrices |
Saves the design matrix for the random terms for the spline at the prediction points |
Description
This procedure generates the fixed and random terms required to fit a P-spline (Eilers & Marx, 1995) as a linear mixed model, using REML
estimation of the smoothing parameter. The explanatory variate values at which the spline is to be calculated are specified in a variate using the X
parameter. The full range of the spline can be specified by the LOWER
and UPPER
options; by default the lower limit is equal to the minimum value of X
and the upper limit is equal to the maximum value. The region between these bounds is divided into a number of equal segments, specified by the NSEGMENTS
option. The boundaries of these segments form the set of knots used to form the spline basis functions, and can be saved as a variate using the KNOTS
parameter. If NSEGMENTS
is unset, the number of segments is determined automatically as
min([p/4], 35) + 1
(Ruppert 2002) where p is the number of unique values of the variate X
and [r] denotes the integer part of the number r.
The DEGREE
option specifies the degree of polynomial that is used to form the underlying spline basis functions. The default, DEGREE=3
, gives a cubic spline.
The ORTHOGONALIZETO
option specifies a variate to use in orthogonalization. The set of random spline terms will then be orthogonal to the fixed terms when evaluated at the specified values. For most data sets, it is recommended to set ORTHOGONALIZETO
to the variate X
(the default). The random terms will then be orthogonal to the fixed terms, and fitted values corresponding to the fixed model will represent the whole of the polynomial trend in the fitted spline. For very large data sets, this calculation can be onerous and can be approximated by making the two bases orthogonal at the knots. No orthogonalization is carried out if ORTHOGONALIZETO
is set to a scalar value (e.g. ORTHOGONALIZETO=0
).
The spline terms are saved as two matrices. The terms required to be fitted as fixed terms can be saved using the XFIXED
parameter. This matrix does not include the constant term as this is added by default as part of a mixed model. When DIFFORDER
is set to one, this is a null term and no matrix will be returned. The terms to be fitted as random can be saved using the XRANDOM
parameter.
The random terms can be scaled so that, for a random spline matrix Z
,
TRACE(Z *+ T(Z)) = NROWS(Z)
This ensures that the average contribution of Z
to the variance of an observation is equal to one, and hence the overall contribution from the term is equal to the spline variance component. This removes possible computational instabilities, and improves interpretability of the spline variance component. This scaling is imposed by default, but can be avoided by setting option SCALING=none
.
The spline terms required for prediction via VPREDICT
can be saved using the PXFIXED
and PXRANDOM
parameters. The PX
parameter defines the set of x-values at which the predictions are to be made.
Options: NSEGMENTS
, DEGREE
, DIFFORDER
, LOWER
, UPPER
, ORTHOGONALIZETO
, SCALING
.
Parameters: X
, XFIXED
, XRANDOM
, KNOTS
, PX
, PXFIXED
, PXRANDOM
.
Method
The P-spline of degree k with differencing order d and r knots, evaluated on variate X
, minimizes the penalized sum of squares
(y – B α)ʹ R-1 (y – B α) + λ αʹ Δdʹ Δd α
where
B is a matrix of b = r + k + 1 B-spline basis functions of degree k with r equally-spaced knots evaluated at the values in X
,
α is a vector of r+k+1 unknown spline coefficients,
λ is a smoothing parameter, and
Δd is a (b – d) × b differencing matrix of order d.
This penalized sum of squares is reformulated as the estimating equations from a mixed model of the form
y = X τ + Z u + e
where
X is a design matrix containing k basis functions x{0…d-1}, with associated unknown parameters τ
e is a vector of residual errors with variance σ2R, and
Z = B U S-1
where
Δdʹ = U S Vʹ
is the design matrix for a set of (b – d) independently and identically distributed Normal random effects u = S Uʹ α with variance σs2I.
Fitting this mixed model, with known λ set equal to σ2/σs2, produces estimates that minimize the penalized sum of squares. In addition, we can estimate the smoothing parameter using REML
via the variance component σs2. This can be generalized straightforwardly to mixed models with additional fixed and random terms.
The implementation in this procedure allows the random design matrix to be orthogonalized with respect to the fixed design matrix at a given variate. For orthogonalization with respect to the variate X
, this is achieved by using random design matrix
Z* = (I – X(XʹX)-1Xʹ)Z
The entirety of the polynomial trend is then captured by the fixed model. Orthogonalization with respect to a variate t
is calculated as
Z* = Z – X(TʹT)-1Tʹ B(t
) U S-1
where T is a matrix holding t{0…k}, and B(t
) is the appropriate B-spline basis evaluated at t
.
When the random matrix is scaled so that trace(Z*Z*ʹ) is equal to the number of rows of Z*,
the average contribution of the spline term to the variance of each unit (σs2 × diag(Z*Z*ʹ)) is equal to σs2. This makes the spline variance component value directly comparable with the residual variance.
Note that the constant function is not included in the fixed design matrix generated by PSPLINE
, as this term is added automatically to the linear mixed model by the default option setting, CONSTANT=estimate
, in the VCOMPONENTS
statement.
The design matrices for use in prediction are calculated by evaluating the same set of basis functions at the predict points specified by the PX
option.
Action with RESTRICT
The input structures must not be restricted.
References
Currie, I.D. & Durban, M., (2002). Flexible smoothing with P-splines: a unified approach. Statistical Modelling, 2, 333-349.
Eilers, P.H.C. & Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89-121.
Ruppert, D. (2002). Selecting the number of knots for penalised splines. Computational & Graphical Statistics, 11, 735-757.
See also
Directive: VCOMPONENTS
.
Procedures: LSPLINE
, NCSPLINE
, PENSPLINE
, RADIALSPLINE
, TENSORSPLINE
, SPLINE
.
Function: SSPLINE
.
Commands for: Calculations and manipulation, Regression analysis, REML analysis of linear mixed models.
Example
CAPTION 'PSPLINE examples'; STYLE=meta " generate data " CALCULATE [SEED=23826] x = GRUNIFORM(30;0;100) & true = 10 - 0.005*(x-27)**2 - 2*SIN(2*C('pi')*x/40) & [SEED=0] y = true + GRNORMAL(NVAL(x);0;1) " pens for plotting lines " PEN 2,3; METH=LINE; SYMBOL=0; LINE=1 " plot data " DGRAPH [TITLE='Observed data'] y; x " generate P-spline basis for cubic spline, with second-order differencing using 10 segments, and generate design matrices to predict at single-unit intervals from 0 to 100 " VARIATE [VALUES=0...100] xpred PSPLINE [NSEGMENTS=10; LOWER=0; UPPER=100] x; XFIXED=X; XRANDOM=Z;\ KNOTS=knots; PX=xpred; PFIXED=PX; PRANDOM=PZ " plot random basis functions (held in columns of Z) - plotted lines do linear interpolation between values of x " DGRAPH [TITLE='Random basis functions evaluated at x'; KEY=0]\ Z$[*;1...11]; x; PEN=2 " check set of knots used " PRINT knots " fit cubic P-spline with d=2 as a linear mixed model " VCOMPONENTS [FIXED=X] RANDOM=Z; CONSTRAINT=positive REML y " save fitted values " VKEEP [FITTED=fv] " plot data with fitted values - curve not smooth due to gaps between observations and linear interpolation on plot " DGRAPH ['Data with interpolated fitted values, 2nd order differencing';\ KEY=0] y,fv; x; PEN=1,2 " predict at 1-unit intervals (using PX and PZ generated earlier) " VPREDICT [PREDICT=Tp] X,Z; LEVELS=PX,PZ; PARALLEL=*,X " extract and plot predicted curve with data " VTABLE Tp; VARIATE=Vp DGRAPH ['Data with fitted cubic spline, 2nd order differencing'; KEY=0]\ Vp,y; xpred,x; PEN=2,1 " for comparison, generate linear Pspline with first-order differencing note: fixed design matrix is not required when differencing order = 1 " PSPLINE [NSEGMENTS=10; LOWER=0; UPPER=100; DEGREE=1; DIFFORDER=1]\ x; XRANDOM=Zl; PX=xpred; PRANDOM=PZl " fit linear P-spline with d=1 as linear mixed model " VCOMPONENTS RANDOM=Zl; CONSTRAINT=positive REML y " predict at 1-unit intervals (using PZl generated earlier) " VPREDICT [PREDICT=Tpl] Zl; LEVELS=PZl " compare fitted curves: linear spline with first order differencing is much less smooth " VTABLE Tpl; VARIATE=Vpl FRAME [GRID=xy] 1 XAXIS 1; MARKS=knots DGRAPH ['Data with fitted linear spline, 1st order differencing'; KEY=0]\ Vpl,y; xpred,x; PEN=2,1