AFFYMETRIX procedure

Estimates expression values for Affymetrix slides (D.B. Baird).

Options

`PRINT` = string tokens	What to print (`estimates`, `background`, `monitoring`); default `para`
`METHOD` = string token	Method for calculating probe expression values (`mas4`, `mas5`, `rma`, `rma2`); default `rma`
`BMETHOD` = string token	Method to use for background values (`mean`, `quantile`, `none`); default `mean` for `METHOD` settings `mas4` and `mas5`, but `none` for settings `rma` and `rma2`
`BWEIGHTING` = string token	Method for weighting background grids (`affymetrix`, `distance`); default `affy`
`TRANSFORMATION` = string token	How to transform the data (`log2`, `none`); default `log2`
`NMETHOD` = string token	Method for normalization i.e. whether to use a mean, median or geometric mean for the averaged normalized distribution (`means`, `medians`, `geometricmeans`, `none`); default `mean`
`REPLACEDATA` = string token	Whether to replace the `DATA` variates with background corrected intensities (`yes`, `no`); default `no`
`SPREADSHEET` = string token	What to save in a spreadsheet (`results`); default `*` i.e. nothing
`MAXCYCLE` = scalar	Maximum number of iterations; default 50
`TOLERANCE` = scalar	Tolerance for convergence; default 0.0001

Parameters

`DATA` = variates	Intensities to be analysed
`SLIDES` = factors	Identify the slides (or chips)
`PROBES` = factors	Identify the probes (or genes) within each slide
`ATOMS` = factors	Identify the PM/MM pairs within each probe
`PMMM` = factors	Distinguish between PM and MM values
`TYPEPROBES` = factors	Defines the probe-type corresponding to each intensity
`ROWS` = factors	Identifies rows within each slide (required only if background corrections are to be made)
`COLUMNS` = factors	Identifies columns within each slide (required only if background corrections are to be made)
`ESTIMATES` = variates	Saves the estimated expression values for each slide and probe combination
`SE` = variates	Saves approximate standard errors for the estimates
`IDSLIDES` = factors	Saves factors to identify the slides in the `ESTIMATES` variates
`IDPROBES` = factors	Saves factors to identify the probes in the `ESTIMATES` variates

Description

AFFYMETRIX estimates expression values over the perfect match (PM) and mismatch (MM) pairs for each probe on Affymetrix slides (or chips). On Affymetrix chips, each probe has 8-20 pairs of DNA sequences with a central base changed between the perfect match and mismatch sequences. The value for the probe level of expression is taken as an average over the pairs of perfect match (PM) and mismatch (MM) spots. The intensity values are obtained by reading in a series of Affymetrix CEL files, and the chip information from a CDF file.

The METHOD option selects the method to use to summarize over the PM and MM pairs, with settings:

`rma`	Robust Means Analysis model – the probe level model introduced by Irizarry et al. (2003) which only uses PM information and transforms the values based on a kernel density estimate of the PM distribution;
`rma2`	Robust Means Analysis 2 – an adaptation of RMA algorithm which fits the kernel density to a truncated distribution of the PM values, with the truncation point based on an initial kernel density estimate;
`mas4`	Affymetrix Version 4 – the AvDiff algorithm introduced in the Affymetrix version 4 software; and
`mas5`	Affymetrix Version 5 – the Tukey biweight algorithm introduced in the Affymetrix version 5 software.

In the Affymetrix MAS 4 and 5 methods, the difference between the signals (PM – MM) is averaged using a robust averaging method. The MAS 4 algorithm uses the AvDiff algorithm which discards the minimum and maximum difference, and any differences greater than 3 standard deviations from the mean. The MAS 5 algorithm uses the Tukey biweight algorithm which reweights the values depending on how far they are from the median, and discards any that are more than 5 times the median absolute distance away. The MAS 5 algorithm also replaces the MM value with a value known as an Ideal Mismatch (IM), which is always less than the PM value.

The standard RMA algorithm would normally use the log₂ transformed PM values with no background correction, which then have a quantile normalization applied to them. The adjusted PM values then have a Normal function transformation applied to them with the values for the transformation being calculated from a kernel density estimate applied to the adjusted PM values. Finally the transformed PM values are summarized with a median polish of the slides by atom values for each probe. The log₂ transformation can be suppressed by setting option TRANSFORMATION=none.

The RMA model performs a background correction by fitting a two component model to the PM intensities:

Observed intensity = Signal + Noise

where Signal has an exponential distribution with parameter α (the reciprocal of the mean), the Noise has an Normal distribution with parameters μ (the mean) and σ (the standard deviation). α, μ and σ are then estimated and the expected value of the signal is estimated, given the observed value of the intensity.

For all algorithms, the lowest 2% of spots on each slide can be used to estimate a background correction for the intensities. The chip is divided into 16 zones in a 4 × 4 grid, and each spot has a weighted average of these 16 levels removed from it. The levels used are controlled by the BMETHOD options, with settings:

`means`	the means of the values below the 2% quantile are used as the background levels;
`quantiles`	the actual 2% quantiles are used as the background levels; and
`none`	if you want no background correction to be made.

The BWEIGHTING option controls how the background levels are combined before removing them from each spot:

`affymetrix`	the weights are 1/(squared-distance + 100); and
`distance`	the weights are 1/(min(squared-distance, 100),

where Squared-distance = (distance from the spot to the zone centroid)².

The quantile normalization of the PM/MM values on each slide is controlled by the NMETHOD option. Its settings select the way in which the overall distribution is produced from the cumulative density functions on each slide:

`means`	takes the means;
`medians`	takes the medians; and
`geometricmeans`	takes geometric means (i.e. the mean on the log scale, back-transformed to the natural scale); and
`none`	if you do not want any quantile normalization.

The intensity values are specified by the DATA parameter. If these are in a single variate, the SLIDE parameter should supply a factor to index the slides, and the PROBES parameter should supply a factor to index the probes (or genes). Alternatively you can supply a pointer containing a variate for each slide. The slides factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The PROBES factor is that for a single slide, and all slides must have a common layout.

The ATOMS parameter supplies a factor to identify the PM/MM pairs within each probe, and the PMMM parameter supplies a factor, with levels labelled 'PM' and 'MM', to distinguish between PM and MM values. The TYPEPROBES parameter supplies a factor to specify the probe types. The types of probes that can occur on Affymetrix chips are: 'Expression', 'Genotyping', 'CustomSeq', 'Tag', 'Unknown', 'Checkerboard Negative', 'Checkerboard Positive', 'Hybridization Negative', 'Hybridization Positive', 'Text Negative', 'Text Positive', 'Central Negative', 'Central Positive', 'Gene Exp Negative', 'Gene Exp Positive', 'Cycle Fidelity Negative', 'Cycle Fidelity Positive', 'Central Cross Negative', 'Central Cross Positive', 'Cross Hyb Negative' and 'Cross Hyb Positive'.

The ROWS and COLUMNS parameters can supply factors to identify the rows and columns within each slide. These are required only if background corrections are to be made.

The ESTIMATES parameter must supply a variate to save the estimated expression value for each slide and probe combination. The IDPROBES and IDSLIDES parameters must supply factors to identify the probes and slides, respectively, in the ESTIMATES variate. You can also set parameter SPREADSHEET=results to save these in a Genstat spreadsheet. The SE parameter can supply a variate to save approximate standard errors and, if this is set, the standard errors are included in the spreadsheet.

Options: PRINT, METHOD, BMETHOD, BWEIGHTING, TRANSFORMATION, NMETHOD, REPLACEDATA, SPREADSHEET, MAXCYCLE, TOLERANCE.

Parameters: DATA, SLIDES, PROBES, ATOMS, PMMM, TYPEPROBES, ROWS, COLUMNS, ESTIMATES, SE, IDSLIDES, IDPROBES.

References

Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. & Speed, T.P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, Number 2, 249-264.

Example

CAPTION      'AFFYMETRIX example'; STYLE=meta
" Warning, this example takes 1GB of RAM to run! "
ENQUIRE      CHANNEL=-1; EXIST=check; NAME=\
             '%GENDIR%/Data/Microarrays/Hyb-AllData.gwb'
IF check
  SPLOAD     '%GENDIR%/Data/Microarrays/Hyb-AllData.gwb'
  " Estimate Expression Values from Affymetrix CEL data."
  AFFYMETRIX [PRINT=estimates,background,monitoring; METHOD=RMA;\
             BMETHOD=none; TRANSFORMATION=log2; NMETHOD=medians;\
             MAXCYCLE=10; TOLERANCE=0.0001; "SPREADSHEET=results"]\
             DATA=Intensity; SLIDES=Slide; PROBES=Probe; ATOMS=Atom;\
             PMMM=PM_MM; TYPEPROBES=Type; ROWS=ROW; COLUMNS=COL;\
             IDPROBES=SlideID; IDSLIDES=ProbeID; ESTIMATES=Expression; SE=SE
ELSE
  CAPTION    'Microarray example datasets have not been installed.'
ENDIF

Updated on March 11, 2019

Tagged: Command Procedures

Was this article helpful?

Yes No