Normalizes two-colour microarray data (D.B. Baird).

### Options

`PRINT` = string tokens |
What to print (`summary` , `slidesummary` , `monitoring` ); default `summ` , `slid` , `moni` |
---|---|

`PLOT` = string tokens |
What plots to produce (`pineffects` , `roweffects` , `columneffects` , `intensityeffects` , `rowxcoleffects` , `ma` , `standardizedma` , `spatialresiduals` ); default `*` i.e. none |

`METHOD` = string token |
What type of model components to fit (`spline` , `loess` ); default `spli` |

`MODELTERMS` = string tokens |
What model components to fit (`pins` , `rows` , `columns` , `intensity` , `pinxintensity` , `ar1` , `rowxcolumn` , `pinxrow` , `pinxcolumn` ); default `pins` , `rows` , `colu` , `inte` |

`DFINTENSITY` = scalar |
Degrees of freedom for intensity cubic spline; default 24 |

`DFROWXCOLUMN` = scalar |
Degrees of freedom for row × col thinplate spline; default 49 |

`POORFLAGS` = text or variate |
Levels of `FLAGS` that are poor quality spots |

`BADFLAGS` = text or variate |
Levels of `FLAGS` that are bad spots |

`ARRANGEMENT` = string token |
Whether to use trellis or single plots (`single` , `trellis` ); default `trel` |

`WINDOW` = scalar |
Window number for the graphs; default 3 |

`DEVICE` = scalar |
Device number on which to plot the graphs |

`GRAPHICSFILE` = text |
What graphics filename template to use to save the graphs; default `*` |

### Parameters

`LOGRATIOS` = variates or pointers |
Log-ratios |
---|---|

`INTENSITIES` = variates or pointers |
Spot intensities |

`SLIDES` = factors or texts |
Slides |

`PINS` = factors |
Pins |

`SROWS` = factors |
Rows across whole slide |

`SCOLUMNS` = factors |
Columns across whole slide |

`PROWS` = factors |
Rows within pins |

`PCOLUMNS` = factors |
Columns within pins |

`FLAGS` = factors or pointers |
Quality flags |

`CLOGRATIOS` = variates or pointers |
Save corrected log-ratios |

`SLOGRATIOS` = variates or pointers |
Save standardized log-ratios |

`SDSMOOTH` = variates or pointers |
Save smoothed deviations |

`PINEFFECTS` = tables |
Save estimated pin effects |

`ROWEFFECTS` = tables |
Save estimated row effects |

`COLEFFECTS` = tables |
Save estimated column effects |

`INTEFFECTS` = variates or pointers |
Save estimated intensity effects |

`CLRED` = variates or pointers |
Save corrected log2 red values |

`CLGREEN` = variates or pointers |
Save corrected log2 green values |

`VAREXPLAINED` = variates |
Save the variance explained by slide |

### Description

With large microarrays it is essential to identify sources of variation and correct for them, to allow for robust use of this technology. Through normalization procedures, such variations can be identified and removed to obtain data for follow-on research. The analysis of the microarrays is thus a two-step process: a within-slide analysis aimed at normalization and, if required, standardization; then a between-slide analysis to estimate the differences between targets (or treatments) and evaluate their consistency.

Various techniques have been suggested for normalization, including linear regression, ratio statistics, local smoothing and analysis of variance. The approach in `MNORMALIZE`

is to model the variation associated with spatial and structural components and remove this as noise. Examples of spatial components are the grid layout on the slide (rows × columns), and of structural components are the pins, print order and differential dye responses to binding and scanning. The model can be specified to fit the type of variation found in the particular series of slides. The usual statistical modelling approach is taken where all possible sources of noise are jointly fitted in one model, and the need for each term is assessed using the statistical significance of the reduction in the remaining unexplained variation. Model terms can be added or removed as required. The fitted model then indicates where useful modification of protocols and equipment would help minimize variation in future experiments.

The type of model to use is selected using the `METHOD`

option, with settings:

`spline` |
a mixed model including cubic smoothing splines, fitted with the `REML` directive; or |
---|---|

`loess` |
regression with the `LOESS` smoothing function, fitted with the `FIT` directive. |

The terms to include in the models are selected by the `MODELTERMS`

option, with settings:

`pins` |
an effect for each pin on the slide; |
---|---|

`rows` |
an effect for each row on the slide; |

`columns` |
an effect for each column on the slide; |

`intensity` |
a cubic smoothing spline or Loess curve for spot intensity, with degrees of freedom defined by the `DFINTENSITY` option (default 24); |

`pinxintensity` |
a different linear effects of intensity for each pin; |

`ar1` |
autoregressive model with order 1, separately in row and column directions (`REML` only); |

`rowxcolumn` |
a thin-plate spline (`REML` only) which fits a smooth surface with row and column interaction, with degrees of freedom defined by the `DFROWXCOLUMN` option (default 49); |

`pinxrow` |
pin-by-row interaction; and |

`pinxcolumn` |
pin-by-column interaction. |

The log-ratios and spot intensities are supplied by the `LOGRATIOS`

and `INTENSITIES`

parameters. If these are single variates, the `SLIDES`

parameter should supply a factor to index the slides. Alternatively you can supply pointers containing a variate for each slide for these, and the `SLIDES`

parameter may be omitted; alternatively it can supply a text giving a label for each slide.

The slide layout is specified by the parameters `PINS`

, `SROWS`

, `SCOLUMNS`

, `PROWS`

and `PCOLUMNS`

. `PINS`

provides a factor to index the pins. `SROWS`

and `SCOLUMNS`

provide factors to index the rows and columns within the whole slide. `PROWS`

and `PCOLUMNS`

provides factors to index the rows and columns within the pins. If `LOGRATIOS`

is a pointer, the slide layout factors refer to a single slide, and all slides must have a common layout.

The `FLAGS`

parameter supplies a factor giving a quality flag for each spot, which must match the type and length of the `LOGRATIOS`

parameter. The `POORFLAGS`

and `BADFLAGS`

options can then each supply a text or variate, defining levels of `FLAGS`

that indicate poor or bad quality spots. The poor spots are still used for model fitting, but are excluded from the output variates. The bad quality spots are excluded from any analysis.

The `CLOGRATIOS`

parameter can supply a variate or pointer, to save the corrected log-ratios. Similarly, the `SLOGRATIOS`

parameter can save the standardized log-ratios, and `SDSMOOTH`

can save the smoothed deviations. The `PINEFFECTS`

, `ROWEFFECTS`

and `COLEFFECTS`

parameters can save tables containing estimated pin, row and column effects, respectively. The `INTEFFECTS`

parameter can save the estimated intensity effects. The `CLRED`

and `CLGREEN`

parameters can save the corrected log_{2} red and green values, respectively. If they have already been defined, the output structures specified by `CLOGRATIOS`

, `SLOGRATIOS`

, `SDSMOOTH`

, `INTEFFECTS`

, `CLRED`

and `CLGREEN`

must have the same type as the `LOGRATIOS`

parameter (i.e. variates if `LOGRATIOS`

is a variate, and pointers if `LOGRATIOS`

is a pointer). Finally, the `VAREXPLAINED`

parameter can save a variate with the variance explained by the fitted model on each slide.

The `PRINT`

option controls printed output, and the `PLOT`

option controls what graphs are produced. By default the plots for the slides are displayed in a trellis arrangement, but you can set option `ARRANGEMENT=single`

to display them separately, in single plots. The `WINDOW`

option specifies the window to use for the graphs (by default 3). You can use the `DEVICE`

option to plot to a device other than the screen. The `GRAPHICSFILE`

option then supplies a template for the file names.

Options: `PRINT`

, `PLOT`

, `METHOD`

, `MODELTERMS`

, `DFINTENSITY`

, `DFROWXCOLUMN`

, `POORFLAGS`

, `BADFLAGS`

, `ARRANGEMENT`

, `WINDOW`

, `DEVICE`

, `GRAPHICSFILE`

.

Parameters: `LOGRATIOS`

, `INTENSITIES`

, `SLIDES`

, `PINS`

, `SROWS`

, `SCOLUMNS`

, `PROWS`

, `PCOLUMNS`

, `FLAGS`

, `CLOGRATIOS`

, `SLOGRATIOS`

, `SDSMOOTH`

, `PINEFFECTS`

, `ROWEFFECTS`

, `COLEFFECTS`

, `INTEFFECTS`

, `CLRED`

, `CLGREEN`

, `VAREXPLAINED`

.

### Action with `RESTRICT`

Any restrictions on `LOGRATIOS`

, `INTENSITIES`

, `SLIDES`

, `PINS`

, `SROWS`

, `SCOLUMNS`

, `PROWS`

, `PCOLUMNS`

or `FLAGS`

are removed (and a warning is given).

### See also

Procedures: `DMADENSITY`

, `FDRBONFERRONI`

, `FDRMIXTURE`

, `MACALCULATE`

, `MAESTIMATE`

, `MAHISTOGRAM`

, `MAPCLUSTER`

, `MAPLOT`

, `MASCLUSTER`

, `MASHADE`

, `MAVOLCANO`

, `MA2CLUSTER`

.

Commands for: Microarray data.

### Example

CAPTION 'MNORMALIZE example'; STYLE=meta ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' IF check SPLOAD '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' " Normalize Microarray Data " MNORMALIZE [METHOD=spline; PRINT=summary,slidesummary,monitoring;\ MODELTERMS=pins,rows,columns,intensity,rowxcolumn;\ PLOT=pineffects,roweffects,columneffects,intensityeffects,\ rowxcoleffects; ARRANGEMENT=trellis; POORFLAGS=!(-25,-50);\ BADFLAGS=!(-75,-100); DFINTENSITY=24] LOGRATIOS=logRatio;\ INTENSITIES=Intensity; SLIDES=Slide; PINS=Block;\ SROWS=Slide_Row; SCOLUMNS=Slide_Column; PROWS=Row;\ PCOLUMNS=Column; FLAGS=Flags ELSE CAPTION 'Microarray example datasets have not been installed.' ENDIF