Estimates expression values for Affymetrix slides (D.B. Baird).

### Options

`PRINT` = string tokens |
What to print (`estimates` , `background` , `monitoring` ); default `para` |
---|---|

`METHOD` = string token |
Method for calculating probe expression values (`mas4` , `mas5` , `rma` , `rma2` ); default `rma` |

`BMETHOD` = string token |
Method to use for background values (`mean` , `quantile` , `none` ); default `mean` for `METHOD` settings `mas4` and `mas5` , but `none` for settings `rma` and `rma2` |

`BWEIGHTING` = string token |
Method for weighting background grids (`affymetrix` , `distance` ); default `affy` |

`TRANSFORMATION` = string token |
How to transform the data (`log2` , `none` ); default `log2` |

`NMETHOD` = string token |
Method for normalization i.e. whether to use a mean, median or geometric mean for the averaged normalized distribution (`means` , `medians` , `geometricmeans` , `none` ); default `mean` |

`REPLACEDATA` = string token |
Whether to replace the `DATA` variates with background corrected intensities (`yes` , `no` ); default `no` |

`SPREADSHEET` = string token |
What to save in a spreadsheet (`results` ); default `*` i.e. nothing |

`MAXCYCLE` = scalar |
Maximum number of iterations; default 50 |

`TOLERANCE` = scalar |
Tolerance for convergence; default 0.0001 |

### Parameters

`DATA` = variates |
Intensities to be analysed |
---|---|

`SLIDES` = factors |
Identify the slides (or chips) |

`PROBES` = factors |
Identify the probes (or genes) within each slide |

`ATOMS` = factors |
Identify the PM/MM pairs within each probe |

`PMMM` = factors |
Distinguish between PM and MM values |

`TYPEPROBES` = factors |
Defines the probe-type corresponding to each intensity |

`ROWS` = factors |
Identifies rows within each slide (required only if background corrections are to be made) |

`COLUMNS` = factors |
Identifies columns within each slide (required only if background corrections are to be made) |

`ESTIMATES` = variates |
Saves the estimated expression values for each slide and probe combination |

`SE` = variates |
Saves approximate standard errors for the estimates |

`IDSLIDES` = factors |
Saves factors to identify the slides in the `ESTIMATES` variates |

`IDPROBES` = factors |
Saves factors to identify the probes in the `ESTIMATES` variates |

### Description

`AFFYMETRIX`

estimates expression values over the perfect match (PM) and mismatch (MM) pairs for each probe on Affymetrix slides (or chips). On Affymetrix chips, each probe has 8-20 pairs of DNA sequences with a central base changed between the perfect match and mismatch sequences. The value for the probe level of expression is taken as an average over the pairs of perfect match (PM) and mismatch (MM) spots. The intensity values are obtained by reading in a series of Affymetrix CEL files, and the chip information from a CDF file.

The `METHOD`

option selects the method to use to summarize over the PM and MM pairs, with settings:

`rma` |
Robust Means Analysis model – the probe level model introduced by Irizarry et al. (2003) which only uses PM information and transforms the values based on a kernel density estimate of the PM distribution; |
---|---|

`rma2` |
Robust Means Analysis 2 – an adaptation of RMA algorithm which fits the kernel density to a truncated distribution of the PM values, with the truncation point based on an initial kernel density estimate; |

`mas4` |
Affymetrix Version 4 – the AvDiff algorithm introduced in the Affymetrix version 4 software; and |

`mas5` |
Affymetrix Version 5 – the Tukey biweight algorithm introduced in the Affymetrix version 5 software. |

In the Affymetrix MAS 4 and 5 methods, the difference between the signals (PM – MM) is averaged using a robust averaging method. The MAS 4 algorithm uses the AvDiff algorithm which discards the minimum and maximum difference, and any differences greater than 3 standard deviations from the mean. The MAS 5 algorithm uses the Tukey biweight algorithm which reweights the values depending on how far they are from the median, and discards any that are more than 5 times the median absolute distance away. The MAS 5 algorithm also replaces the MM value with a value known as an Ideal Mismatch (IM), which is always less than the PM value.

The standard RMA algorithm would normally use the log_{2} transformed PM values with no background correction, which then have a quantile normalization applied to them. The adjusted PM values then have a Normal function transformation applied to them with the values for the transformation being calculated from a kernel density estimate applied to the adjusted PM values. Finally the transformed PM values are summarized with a median polish of the slides by atom values for each probe. The log_{2} transformation can be suppressed by setting option `TRANSFORMATION=none`

.

The RMA model performs a background correction by fitting a two component model to the PM intensities:

*Observed intensity* = *Signal* + *Noise*

where *Signal* has an exponential distribution with parameter α (the reciprocal of the mean), the *Noise* has an Normal distribution with parameters μ (the mean) and σ (the standard deviation). α, μ and σ are then estimated and the expected value of the signal is estimated, given the observed value of the intensity.

For all algorithms, the lowest 2% of spots on each slide can be used to estimate a background correction for the intensities. The chip is divided into 16 zones in a 4 × 4 grid, and each spot has a weighted average of these 16 levels removed from it. The levels used are controlled by the `BMETHOD`

options, with settings:

`means` |
the means of the values below the 2% quantile are used as the background levels; |
---|---|

`quantiles` |
the actual 2% quantiles are used as the background levels; and |

`none` |
if you want no background correction to be made. |

The `BWEIGHTING`

option controls how the background levels are combined before removing them from each spot:

`affymetrix` |
the weights are 1/(squared-distance + 100); and |
---|---|

`distance` |
the weights are 1/(min(squared-distance, 100), |

where *Squared-distance* = (*distance from the spot to the zone centroid*)^{2}.

The quantile normalization of the PM/MM values on each slide is controlled by the `NMETHOD`

option. Its settings select the way in which the overall distribution is produced from the cumulative density functions on each slide:

`means` |
takes the means; |
---|---|

`medians` |
takes the medians; and |

`geometricmeans` |
takes geometric means (i.e. the mean on the log scale, back-transformed to the natural scale); and |

`none` |
if you do not want any quantile normalization. |

The intensity values are specified by the `DATA`

parameter. If these are in a single variate, the `SLIDE`

parameter should supply a factor to index the slides, and the `PROBES`

parameter should supply a factor to index the probes (or genes). Alternatively you can supply a pointer containing a variate for each slide. The slides factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The `PROBES`

factor is that for a single slide, and all slides must have a common layout.

The `ATOMS`

parameter supplies a factor to identify the PM/MM pairs within each probe, and the `PMMM`

parameter supplies a factor, with levels labelled `'PM'`

and `'MM'`

, to distinguish between PM and MM values. The `TYPEPROBES`

parameter supplies a factor to specify the probe types. The types of probes that can occur on Affymetrix chips are: `'Expression'`

, `'Genotyping'`

, `'CustomSeq'`

, `'Tag'`

, `'Unknown'`

, `'Checkerboard`

`Negative'`

, `'Checkerboard`

`Positive'`

, `'Hybridization`

`Negative'`

, `'Hybridization`

`Positive'`

, `'Text`

`Negative'`

, `'Text`

`Positive'`

, `'Central`

`Negative'`

, `'Central`

`Positive'`

, `'Gene`

`Exp`

`Negative'`

, `'Gene`

`Exp`

`Positive'`

, `'Cycle`

`Fidelity`

`Negative'`

, `'Cycle`

`Fidelity`

`Positive'`

, `'Central`

`Cross`

`Negative'`

, `'Central`

`Cross`

`Positive'`

, `'Cross`

`Hyb`

`Negative'`

and `'Cross`

`Hyb`

`Positive'`

.

The `ROWS`

and `COLUMNS`

parameters can supply factors to identify the rows and columns within each slide. These are required only if background corrections are to be made.

The `ESTIMATES`

parameter must supply a variate to save the estimated expression value for each slide and probe combination. The `IDPROBES`

and `IDSLIDES`

parameters must supply factors to identify the probes and slides, respectively, in the `ESTIMATES`

variate. You can also set parameter `SPREADSHEET=results`

to save these in a Genstat spreadsheet. The SE parameter can supply a variate to save approximate standard errors and, if this is set, the standard errors are included in the spreadsheet.

Options: `PRINT`

, `METHOD`

, `BMETHOD`

, `BWEIGHTING`

, `TRANSFORMATION`

, `NMETHOD`

, `REPLACEDATA`

, `SPREADSHEET`

, `MAXCYCLE`

, `TOLERANCE`

.

Parameters: `DATA`

, `SLIDES`

, `PROBES`

, `ATOMS`

, `PMMM`

, `TYPEPROBES`

, `ROWS`

, `COLUMNS`

, `ESTIMATES`

, `SE`

, `IDSLIDES`

, `IDPROBES`

.

### References

Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. & Speed, T.P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. *Biostatistics*, 4, Number 2, 249-264.

### See also

Procedures: `FDRBONFERRONI`

, `FDRMIXTURE`

, `MAANOVA`

, `MABGCORRECT`

, `MAEBAYES`

, `MAREGRESSION`

, `MARMA`

, `MAROBUSTMEANS`

, `MAVDIFFERENCE`

, `MAVOLCANO`

, `QNORMALIZE`

.

Commands for: Microarray data.

### Example

CAPTION 'AFFYMETRIX example'; STYLE=meta " Warning, this example takes 1GB of RAM to run! " ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Hyb-AllData.gwb' IF check SPLOAD '%GENDIR%/Data/Microarrays/Hyb-AllData.gwb' " Estimate Expression Values from Affymetrix CEL data." AFFYMETRIX [PRINT=estimates,background,monitoring; METHOD=RMA;\ BMETHOD=none; TRANSFORMATION=log2; NMETHOD=medians;\ MAXCYCLE=10; TOLERANCE=0.0001; "SPREADSHEET=results"]\ DATA=Intensity; SLIDES=Slide; PROBES=Probe; ATOMS=Atom;\ PMMM=PM_MM; TYPEPROBES=Type; ROWS=ROW; COLUMNS=COL;\ IDPROBES=SlideID; IDSLIDES=ProbeID; ESTIMATES=Expression; SE=SE ELSE CAPTION 'Microarray example datasets have not been installed.' ENDIF