Performs orthogonal partial least squares regression (V. M. Cave).
Options
PRINT = string tokens |
Printed output required (data , xloadings , yloadings , ploadings , scores , leverages , xerrors , yerrors , scree , xpercent , ypercent , predictions , groups , estimates , fittedvalues , summary ); default esti , xper , yper , scor , xloa , yloa , ploa , summ |
---|---|
PCPRINT = string tokens |
Controls printed output from principal components analysis of orthogonal X matrix (loadings , roots , scores , tests ); default root |
PLOT = string token |
What graphs to plot (pcplot ); default * (i.e. none) |
NORTHOGONALROOTS = scalar |
Number of orthogonal components to extract; default 1 |
NROOTS = scalar |
Number of predictive (i.e. PLS) components to extract; default 1 |
STANDARDIZE = string tokens |
Whether to standardize the Y , X and filtered X variables to unit variance and zero mean (Y , X , filteredX ); default * (i.e. no standardizing) |
NGROUPS = scalar |
Number of cross-validation groups used by PLS; default 1 (i.e. no cross-validation performed) |
SEED = scalar or factor |
A scalar indicating the seed value used for dividing the data randomly into NGROUPS groups for cross-validation by PLS, or a factor indicating a specific set of groupings to use for cross-validation by PLS; default 0 |
LABELS = text |
Sample labels for X and Y to use in output; default uses the integers 1…n where n is the length of the variates in X and Y |
PLABELS = text |
Labels for XPREDICTIONS ; default uses P1 , P2 etc. |
PCMETHOD = string tokens |
Method used by PCP to perform principal components analysis on the orthogonal X matrix (ssp , correlation , vcovariance , variancecovariance ); default * (i.e. principal components analysis not performed) |
WINDOW = scalar |
Window to use for graph (available only when NORTHOGONALROOTS = 1); default 3 |
Parameters
Y = pointers |
Pointer to variates containing the dependent variable(s) for each analysis |
---|---|
X = pointers |
Pointer to variates containing the independent variables for each analysis |
YLOADINGS = pointers |
Pointer to variates containing the Y component loadings, for the predictive (i.e. PLS) dimensions, extracted from the filtered X matrix |
XLOADINGS = pointers |
Pointer to variates containing the component loading weights for the predictive dimensions, extracted from the filtered X matrix |
PLOADINGS = pointers |
Pointer to variates containing the bilinear model loadings for the predictive dimensions, extracted from the filtered X matrix |
YSCORES = pointers |
Pointer to variates containing the Y component scores, for each predictive dimension extracted from the filtered X matrix |
XSCORES =pointers |
Pointer to variates containing the component scores for each predictive dimension, extracted from the filtered X matrix |
B = diagonal matrices |
Saves the regression coefficients of YSCORES on XSCORES , for the predictive dimensions, extracted from the filtered X matrix |
YPREDICTIONS = pointers |
Pointer to variates used to store predicted y-values for samples in the prediction set |
XPREDICTIONS = pointers |
Pointer to variates containing data for the independent variables in the prediction set |
ESTIMATES = matrices |
An nX+1 by nY matrix (where nX and nY are the number of variates contained in X and Y , respectively) to store the PLS regression coefficients |
FITTEDVALUES = pointers |
Pointer to variates used to store the fitted values for the Y variates |
LEVERAGES = variates |
Variate to store the leverage that each sample has on the PLS model |
PRESS = variates |
Variate used to store the Predictive Residual Error Sum of Squares for each dimension in the PLS model, available only if cross-validation has been selected |
RSS = variates |
Variate to save residual sums of squares |
YRESIDUALS = pointers |
Pointer to variates containing the residuals from the Y block after NROOTS predictive dimensions have been extracted, uncorrected for any scaling applied using STANDARDIZE |
XRESIDUALS = pointers |
Pointer to variates containing the residuals from the X block after NROOTS predictive dimensions have been extracted, uncorrected for any scaling applied using STANDARDIZE |
PCSCORES = matrices |
Matrix to save principal component scores |
PCSAVE = pointers |
Pointer to save structures from the principal component analysis (by PCP ) of the orthogonal X matrix |
SAVE = pointers |
Pointer to save structures from the orthogonal projection |
Description
OPLS
performs orthogonal partial least squares (O-PLS) regression.
Variation in X that is orthogonal (i.e. uncorrelated) to Y may disturb PLS modelling, complicating the model interpretation. O-PLS combines PLS with a pre-processing step that filters out systematic variation in X, orthogonal to Y, that disturbs the PLS model. To improve model interpretation, the variation explained by each regular PLS component is partitioned into two parts:
1) variation linearly related to Y (i.e. predictive) and
2) variation orthogonal to Y.
The resulting O-PLS model takes the form:
X = TPT + TorthoPorthoT + E
Y = TCT + F
where T = XW and Tortho = XWortho. The predictive variation in X is modelled by the matrices T, W and P, whose columns contain the predictive component scores, loading weights and loadings, respectively. The orthogonal variation is modelled by analogous matrices Tortho, Wortho and Portho, whose columns contain the orthogonal component scores, loading weights and loadings, respectively. The columns of matrix C contain Y-loadings, and E and F are the residual matrices.
The number of predictive components used to model the predictive variation is specified by the NROOTS
option; default 1. The number of orthogonal components used to model the orthogonal variation is specified by the NORTHOGONALROOTS
option; default 1. The OPLS
procedure also enables the orthogonal variation to be further explored, through principal components analysis.
In practice, the OPLS
procedure removes Y-orthogonal variation from X to form a filtered X matrix (Xfiltered). A PLS model is then fitted to Xfiltered, using the PLS
procedure.
The dependent and independent variates are supplied using the Y
and X
parameters, respectively, as pointers containing a variate for each dimension. The Y
and X
variates must not contain missing values. A pointer of variates containing new X data, for which predictions are desired, can be specified by the XPREDICTIONS
parameter. Sample labels for X
and XPREDICTIONS
can be provided by using the LABELS
and PLABELS
options, respectively.
The STANDARDIZE
option controls whether the Y, X and the filtered X variables are standardized to mean zero and unit variance prior to analysis. The Y variables are standardized prior to orthogonal projection and PLS analysis, the X variables are standardized prior to orthogonal projection, and the filtered X variables are standardized prior to modelling by PLS. By default, none of these are standardized. Note, however, that all variables are automatically centred prior to the PLS analysis, even if no standardization is requested.
The SAVE
parameter can supply a pointer to store structures from orthogonal projection. The labels of the pointer, and their corresponding information, are as follows:
w_ortho |
orthogonal component loading weights, |
---|---|
t_ortho |
orthogonal component scores |
p_ortho |
orthogonal loadings, |
X_filtered |
filtered X matrix, with the orthogonal variation removed, |
X_ortho |
matrix containing the orthogonal variation, |
Xpred_filtered |
filtered prediction X matrix, with the orthogonal variation removed, |
Xpred_ortho |
matrix containing the orthogonal variation of the prediction X matrix. |
The NGROUPS
and SEED
options control cross-validation by the PLS
procedure. The parameters YLOADINGS
, XLOADINGS
, PLOADINGS
, YSCORES
, XSCORES
, B
, YPREDICTIONS
, ESTIMATES
, FITTEDVALUES
, LEVERAGES
, PRESS
, RSS
, YRESIDUALS
and XRESIDUALS
allow output from the PLS
procedure to be saved (i.e. from modelling the predictive variation).
Printed output is controlled by the PRINT
option. Almost all of the settings are the same as those of the PLS
procedure, and are used in exactly the same way. However, there is an additional setting, summary
, which summarizes the percentage of variation in X
explained by each orthogonal and predictive (i.e. PLS) component.
You can set the PCMETHOD
option to request a principal component analysis to decompose the matrix of orthogonal variation (see X_ortho
above), and to specify the method to use. Its settings are the same as those of the METHOD
option of the PCP
directive. Printed output is controlled by the PCPRINT
option, which operates exactly as the PRINT
option of the PCP
directive. The PCSAVE
parameter can supply a pointer to store details from the analysis. You can set option PLOT
= pcplot
to produce a score plot; by default, no plot is produced. When NORTHOGONALROOTS
= 1, the WINDOW
option can be used to control the window to used for the plot; default 3.
Options: PRINT
, PCPRINT
, PLOT
, NORTHOGONALROOTS
, NROOTS
, STANDARDIZE
, NGROUPS
, SEED
, LABELS
, PLABELS
, PCMETHOD
, WINDOW
.
Parameters: Y
, X
, YLOADINGS
, XLOADINGS
, PLOADINGS
, YSCORES
, XSCORES
, B
, YPREDICTIONS
, XPREDICTIONS
, ESTIMATES
, FITTEDVALUES
, LEVERAGES
, PRESS
, RSS
, YRESIDUALS
, XRESIDUALS
, PCSCORES
, PCSAVE
, SAVE
.
Method
OPLS
uses the methodology of Trygg & Wold (2002), applying the algorithm described in Biagioni et al. (2011), to remove variation from X that is not correlated to Y. OPLS
then calls the PLS
procedure to fit a PLS model to the filtered (i.e. pre-treated) matrix with the orthogonal variation removed.
To perform the principal components analysis on the matrix of orthogonal variation, OPLS
uses the PCP
directive, taking the setting for its METHOD
option from the PCMETHOD
option, and the setting for its NROOTS
option from the NORTHOGONALROOTS
option. When there is only one root, the score plot, which can be requested by setting option PLOT
= pcplot
, is produced by the DOTHISTOGRAM
procedure. When there are several roots, it is produced by the DMSCATTER
procedure. If the XPREDICTIONS
parameter is set, principal component scores for the samples in the prediction set are estimated as described by Trygg & Wold (2002), and plotted in red.
Action with RESTRICT
OPLS
will work with restricted variates, fitting an O-PLS model to the subset of objects formed by the restriction. The subset can be defined by restricting any of the X
or Y
variates. However, if more than one variate is restricted, they must be be restricted in the same way. Note that the unrestricted length of all of the data variates must be the same, and the number of samples in the restricted subset must be at least three. Any restrictions on a text supplied for the LABELS
option, or on a factor for the SEED
option, are ignored.
When restricted data are supplied, and LABELS
are also given, the appropriate subset of labels will appear in the output; if LABELS
are not defined, then default labels reflecting the position in the restricted data are used.
No restrictions are allowed on the variates supplied by the XPREDICTIONS
parameter, or on the text supplied by the PLABELS
option.
References
Biagioni, D.J., Astling, D.P., Graf, P. & Davis, M.F. (2011). Orthogonal projects to latent structures solutions properties for chemometrics and systems biology. Journal of Chemometrics, 25, 514-525.
Trygg, J. & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16, 119-128.
See also
Procedure: PLS
.
Commands for: Multivariate and cluster analysis, Regression analysis.
Example
CAPTION 'OPLS example',!t('The data are 24 calibration samples used to',\ 'determine the protein content of wheat from spectroscopic readings',\ 'at six different wavelengths.'),\ !t('Fearn, T. (1983), Applied Statistics, 32, 73-79.');\ STYLE=meta,plain,plain VARIATE [NVALUES=24] L[1...6],%Protein READ L[1...6],%Protein 468 123 246 374 386 -11 9.23 458 112 236 368 383 -15 8.01 457 118 240 359 353 -16 10.95 450 115 236 352 340 -15 11.67 464 119 243 366 371 -16 10.41 499 147 273 404 433 5 9.51 463 119 242 370 377 -12 8.67 462 115 238 370 353 -13 7.75 488 134 258 393 377 -5 8.05 483 141 264 384 398 -2 11.39 463 120 243 367 378 -13 9.95 456 111 233 365 365 -15 8.25 512 161 288 415 443 12 10.57 518 167 293 421 450 19 10.23 552 197 324 448 467 32 11.87 497 146 271 407 451 11 8.09 592 229 360 484 524 51 12.55 501 150 274 406 407 11 8.38 483 137 260 385 374 -3 9.64 491 147 269 389 391 1 11.35 463 121 242 366 353 -13 9.70 507 159 285 410 445 13 10.75 474 132 255 376 383 -7 10.75 496 152 276 396 404 6 11.47 : " Extract two orthogonal components before fitting a one dimensional PLS model to the standardized data with leave-one-out cross-validation. Principal components analysis is performed on the orthogonal variation." OPLS [PRINT=summary,estimate,xpercent,ypercent,xloadings,yloadings,ploadings;\ PCPRINT=loadings,roots,scores,tests; PLOT=pcplot; NORTHOGONALROOTS=2;\ NROOTS=1; STANDARDIZE=X,Y; NGROUPS=24; SEED=38639; PCMETHOD=correlation]\ Y=!p(%Protein); X=L