Estimates the parameter lambda of a single parameter transformation (D.M. Smith).

### Options

`TRANSFORM` = string token |
Type of transformation (`power` , `modulus` , `foldedpower` , `GuerreroJohnson` , `Aranda1` , `Aranda2` , `powerlogit` ); default `powe` |
---|---|

`METHOD` = string tokens |
Method of evaluating transformation parameter lambda (`Atkinson` , `Andrews` , `BoxCox` , `Robust` ); default `boxc` |

`K` = scalar |
Cut-off value for robust method; default `*` |

`LOWER` = scalar |
Lower limit of range of lambda; default `*` |

`UPPER` = scalar |
Upper limit of range of lambda; default `*` |

`STEPLENGTH` = scalar |
Increment of lambda; default (`UPPER` – `LOWER` )/20 |

`LAMBDA` = scalar |
Single value of lambda; default `*` |

`FVBOUND` = string token |
Replace illegal fitted values by the corresponding boundary values (`no` , `yes` ); default `no` |

`GRAPHICS` = string token |
What sort of graphics to use (`lineprinter` , `highresolution` ); default `high` |

`TERMS` = formula |
Terms of model |

### Parameters

`Y` = variates |
Response variate |
---|---|

`NBINOMIAL` = variates |
Denominator for a binomial variate |

`SAVE` = pointers |
Structures to save the output |

### Description

This procedure is for evaluating the “best” value of the transformation parameter (lambda) for a range of single parameter transformations. It offers four methods of evaluation and seven families of transformations. If a range of values of lambda is input (using the `LOWER`

and `UPPER`

options), plots are produced of either an F statistic or a log likelihood on the `Y`

axis against lambda on the `X`

axis. For the Atkinson and Andrews methods it is an F statistic, whereas for the Box-Cox and robust methods it is a log likelihood. The interval (of lambda) at which the plotted functions are evaluated can be controlled by the `STEPLENGTH`

option. A list of methods is allowed and the plots have been arranged so that they are all produced on the same screen in order to make comparison easy. By default these are in high-resolution. Setting option `GRAPHICS=lineprinter`

generates line-printer style (character) plots (one per page), and setting `GRAPHICS=*`

suppresses the plots altogether. If a single value of lambda is input (using the `LAMBDA`

option) no graphical display is produced.

The `Y`

parameter must be set to specify the response variate i.e. the variate being considered for transformation. For a binomial distribution the `NBINOMIAL`

parameter must also be set. The terms in the fitted model are specified by the `TERMS`

option, which may be set to a formula or left unset to fit a model involving only a constant term. For reasons of scale invariance, as described in Schlesselman (1971), a constant term must be included in the model. The `TRANSFORM`

option specifies which family of transformations is desired. It can take one of seven values. The setting `power`

represents the power transformation family (Box & Cox 1964); `modulus`

represents the modulus transformation family (John & Draper 1980); `foldedpower`

the folded-power transformation family (Atkinson 1985); `guerrerojohnson`

the Guerrero-Johnson (1982) transformation family; `aranda1`

and `aranda2`

the two Aranda-Ordaz (1981) transformation families; and `powerlogit`

the power-logit (otherwise known as skewed logit) transformation family (Stukel 1988). The `METHOD`

option details which methods of evaluating the transformation parameter (lambda) are required. It can be a list of from one to four values. Four methods of evaluation are incorporated. These are the added variable method of Atkinson (1982), the added variable method of Andrews (1971), the maximum likelihood method of Box & Cox (1964), and a robust method due to Carroll (1980). For this latter method a scalar `K`

is required. This value is the standard normal deviate value (*z*) at which the distribution changes from a standard normal to an exponential.

One problem with transforming data and then fitting models is that the fitted values (of the transformed data) can go out of the legal range. If the data are binomial, proportions of zero or one are replaced inside the procedure by 0.5/`NBINOMIAL`

and 1 – 0.5/`NBINOMIAL`

respectively. Conversely, when proportions are input directly in the `Y`

variate, units with values less than or equal to zero or greater than or equal to one are ignored in the calculations. Option `FVBOUND`

controls what happens in other circumstances when a fitted value goes outside the allowed range of the transformation. By default, no action is taken but, if `FVBOUND=yes`

, illegal fitted values are replaced by the corresponding limiting values of the transformation.

The values of the F statistics or log likelihoods can be saved, with the associated values of lambda, using the `SAVE`

parameter. This returns a pointer containing four elements. The first three of these are texts specifying, respectively, the transformation family (`SAVE[1]`

, one value), the value of `FVBOUND`

(`SAVE[2]`

, one value) and the methods used (`SAVE[3]`

, one to four values). The fourth element (`SAVE[4]`

) is a matrix of results with dimensions (number of values of lambda evaluated × number of methods plus one). Column 1 of this matrix contains the evaluated values of lambda, column 2 has the values (F statistics or log likelihoods) for the first method requested, and so on for the other methods. If the option `LAMBDA`

is used, this matrix has only one row.

Full details of the methodology implemented are given by Smith (2002).

Options: `TRANSFORM`

, `METHOD`

, `K`

, `LOWER`

, `UPPER`

, `STEPLENGTH`

, `LAMBDA`

, `FVBOUND`

, `GRAPHICS`

, `TERMS`

.

Parameters: `Y`

, `NBINOMIAL`

, `SAVE`

.

### Method

Much of the methodology implemented is based on that described and reviewed in Atkinson (1985), and Cook & Weisberg (1982). The four methods of evaluation are the added variable method of Atkinson (1982), the added variable method of Andrews (1971), the maximum likelihood method of Box & Cox (1964), and a robust method (based on maximum likelihood) due to Carroll (1980). The seven transformations are the power transformation of Box & Cox (1964), the modulus transformation of John & Draper (1980), the folded-power transformation (as expounded in Atkinson 1985), the Guerrero-Johnson (1982) transformation, the two transformations of Aranda-Ordaz (1981), and the power-logit (otherwise known as skewed logit) transformation of Stukel (1988). The log-likelihood produced for the Box & Cox method differs from that given by Box & Cox (1964), as they omit the constant term *N*/2. `YTRANSFORM`

includes this for compatibility with Carroll’s robust method, which collapses to Box & Cox’s method as *K* becomes infinite.

### Action with `RESTRICT`

If the `Y`

variate is restricted, the analysis will use only the units not excluded by the restriction.

### References

Andrews, D.F. (1971). A note on the selection of data transformations. *Biometrika*, 58, 249-54.

Aranda-Ordaz, F.J. (1981). On two families of transformation to additivity for binary response data. *Biometrika*, 68, 357-63.

Atkinson, A.C. (1982). Regression diagnostics, transformations and constructed variables (with discussion). *Journal of the Royal Statistical Society, Series B*, 44, 1-36.

Atkinson, A.C. (1985). *Plots, Transformations and Regression*. Oxford University Press, Oxford.

Box, G.E.P. & Cox, D.R. (1964), An analysis of transformations (with discussion). *Journal of the Royal Statistical Society, Series B*, 26, 211-46.

Carroll, R.J. (1980). A robust method for testing transformations to achieve approximate normality. *Journal of the Royal Statistical Society, Series B*, 42, 71-78.

Cook, R.D. & Weisberg, S. (1982). *Residuals and Influence in Regression*. Chapman & Hall, New York.

Guerrero, V.M. & Johnson, R.A. (1982). Use of Box-Cox transformation with binary response models. *Biometrika*, 69, 309-14.

John, J.A. & Draper, N.R. (1980). An alternative family of transformations. *Applied Statistics*, 29, 190-97.

Schlesselman, J. (1971). Power families: a note on the Box and Cox transformation. *Journal of the Royal Statistical Society, Series B*, 33, 307-311.

Smith, D.M. (2002). Computing single parameter transformations. *Communications in Statistics – Simulation and Computation*, 32, 605-618.

Stukel, T.A. (1988). Generalized logistic models. *Journal of the American Statistical Association*, 83, 426-31.

### See also

Directive: `CALCULATE`

.

Procedure: `ABOXCOX`

.

Commands for: Calculations and manipulation, Regression analysis.

### Example

CAPTION 'YTRANSFORM example',!t('Data from Box & Cox',\ '(1964), J.R. Statist. Soc. B, 26, 211-46. Y is survival',\ 'time (unit 10 hours) of animals, D is poison dose and T is',\ 'treatment. Note, the results for METHOD=BoxCox differ',\ 'from those in the paper by the constant value 24 (= n/2).');\ STYLE=meta,plain FACTOR [LEVELS=3; VALUES=16(1...3)] D & [LEVELS=4; VALUES=(1...4)12] T VARIATE [VALUES=0.31,0.82,0.43,0.45,0.45,1.10,0.45,0.71,0.46,0.88,\ 0.63,0.66,0.43,0.72,0.76,0.62,0.36,0.92,0.44,0.56,\ 0.29,0.61,0.35,1.02,0.40,0.49,0.31,0.71,0.23,1.24,\ 0.40,0.38,0.22,0.30,0.23,0.30,0.21,0.37,0.25,0.36,\ 0.18,0.38,0.24,0.31,0.23,0.29,0.22,0.33] Y YTRANSFORM [TERMS=T+D; METHOD=Atkinson,Andrews; LAMBDA=1.0]\ Y; SAVE=!P(Transform,Restriction,Methods,Results) FOR [NTIMES=1] PRINT [IPRINT=*] 'Transformation:',Transform; JUSTIFICATION=left & [IPRINT=*] 'Restriction:',Restriction; JUSTIFICATION=left & [IPRINT=*; SERIAL=yes; ORIENTATION=across]\ Methods; FIELDWIDTH=13 PRINT [IPRINT=*; RLPRINT=*; CLPRINT=*; SQUASH=yes] Results; FIELDWIDTH=13 ENDFOR YTRANSFORM [TERMS=T+D; METHOD=BoxCox,robust; K=2; LOWER=-1.5; UPPER=0.5]\ Y; SAVE=Save FOR [NTIMES=1] PRINT [IPRINT=*; STYLE=plain] 'Transformation:',Save[1]; JUSTIFICATION=le & 'Restriction:',Save[2]; JUSTIFICATION=left & [SERIAL=yes; STYLE=form; ORIENTATION=across]\ Save[3]; FIELDWIDTH=13; SKIP=0 PRINT [IPRINT=*; RLPRINT=*; CLPRINT=*; SQUASH=yes] Save[4]; FIELDWIDTH=13 ENDFOR