Estimates expression values for Affymetrix slides (D.B. Baird).
Options
PRINT = string tokens |
What to print (estimates , background , monitoring ); default para |
---|---|
METHOD = string token |
Method for calculating probe expression values (mas4 , mas5 , rma , rma2 ); default rma |
BMETHOD = string token |
Method to use for background values (mean , quantile , none ); default mean for METHOD settings mas4 and mas5 , but none for settings rma and rma2 |
BWEIGHTING = string token |
Method for weighting background grids (affymetrix , distance ); default affy |
TRANSFORMATION = string token |
How to transform the data (log2 , none ); default log2 |
NMETHOD = string token |
Method for normalization i.e. whether to use a mean, median or geometric mean for the averaged normalized distribution (means , medians , geometricmeans , none ); default mean |
REPLACEDATA = string token |
Whether to replace the DATA variates with background corrected intensities (yes , no ); default no |
SPREADSHEET = string token |
What to save in a spreadsheet (results ); default * i.e. nothing |
MAXCYCLE = scalar |
Maximum number of iterations; default 50 |
TOLERANCE = scalar |
Tolerance for convergence; default 0.0001 |
Parameters
DATA = variates |
Intensities to be analysed |
---|---|
SLIDES = factors |
Identify the slides (or chips) |
PROBES = factors |
Identify the probes (or genes) within each slide |
ATOMS = factors |
Identify the PM/MM pairs within each probe |
PMMM = factors |
Distinguish between PM and MM values |
TYPEPROBES = factors |
Defines the probe-type corresponding to each intensity |
ROWS = factors |
Identifies rows within each slide (required only if background corrections are to be made) |
COLUMNS = factors |
Identifies columns within each slide (required only if background corrections are to be made) |
ESTIMATES = variates |
Saves the estimated expression values for each slide and probe combination |
SE = variates |
Saves approximate standard errors for the estimates |
IDSLIDES = factors |
Saves factors to identify the slides in the ESTIMATES variates |
IDPROBES = factors |
Saves factors to identify the probes in the ESTIMATES variates |
Description
AFFYMETRIX
estimates expression values over the perfect match (PM) and mismatch (MM) pairs for each probe on Affymetrix slides (or chips). On Affymetrix chips, each probe has 8-20 pairs of DNA sequences with a central base changed between the perfect match and mismatch sequences. The value for the probe level of expression is taken as an average over the pairs of perfect match (PM) and mismatch (MM) spots. The intensity values are obtained by reading in a series of Affymetrix CEL files, and the chip information from a CDF file.
The METHOD
option selects the method to use to summarize over the PM and MM pairs, with settings:
rma |
Robust Means Analysis model – the probe level model introduced by Irizarry et al. (2003) which only uses PM information and transforms the values based on a kernel density estimate of the PM distribution; |
---|---|
rma2 |
Robust Means Analysis 2 – an adaptation of RMA algorithm which fits the kernel density to a truncated distribution of the PM values, with the truncation point based on an initial kernel density estimate; |
mas4 |
Affymetrix Version 4 – the AvDiff algorithm introduced in the Affymetrix version 4 software; and |
mas5 |
Affymetrix Version 5 – the Tukey biweight algorithm introduced in the Affymetrix version 5 software. |
In the Affymetrix MAS 4 and 5 methods, the difference between the signals (PM – MM) is averaged using a robust averaging method. The MAS 4 algorithm uses the AvDiff algorithm which discards the minimum and maximum difference, and any differences greater than 3 standard deviations from the mean. The MAS 5 algorithm uses the Tukey biweight algorithm which reweights the values depending on how far they are from the median, and discards any that are more than 5 times the median absolute distance away. The MAS 5 algorithm also replaces the MM value with a value known as an Ideal Mismatch (IM), which is always less than the PM value.
The standard RMA algorithm would normally use the log2 transformed PM values with no background correction, which then have a quantile normalization applied to them. The adjusted PM values then have a Normal function transformation applied to them with the values for the transformation being calculated from a kernel density estimate applied to the adjusted PM values. Finally the transformed PM values are summarized with a median polish of the slides by atom values for each probe. The log2 transformation can be suppressed by setting option TRANSFORMATION=none
.
The RMA model performs a background correction by fitting a two component model to the PM intensities:
Observed intensity = Signal + Noise
where Signal has an exponential distribution with parameter α (the reciprocal of the mean), the Noise has an Normal distribution with parameters μ (the mean) and σ (the standard deviation). α, μ and σ are then estimated and the expected value of the signal is estimated, given the observed value of the intensity.
For all algorithms, the lowest 2% of spots on each slide can be used to estimate a background correction for the intensities. The chip is divided into 16 zones in a 4 × 4 grid, and each spot has a weighted average of these 16 levels removed from it. The levels used are controlled by the BMETHOD
options, with settings:
means |
the means of the values below the 2% quantile are used as the background levels; |
---|---|
quantiles |
the actual 2% quantiles are used as the background levels; and |
none |
if you want no background correction to be made. |
The BWEIGHTING
option controls how the background levels are combined before removing them from each spot:
affymetrix |
the weights are 1/(squared-distance + 100); and |
---|---|
distance |
the weights are 1/(min(squared-distance, 100), |
where Squared-distance = (distance from the spot to the zone centroid)2.
The quantile normalization of the PM/MM values on each slide is controlled by the NMETHOD
option. Its settings select the way in which the overall distribution is produced from the cumulative density functions on each slide:
means |
takes the means; |
---|---|
medians |
takes the medians; and |
geometricmeans |
takes geometric means (i.e. the mean on the log scale, back-transformed to the natural scale); and |
none |
if you do not want any quantile normalization. |
The intensity values are specified by the DATA
parameter. If these are in a single variate, the SLIDE
parameter should supply a factor to index the slides, and the PROBES
parameter should supply a factor to index the probes (or genes). Alternatively you can supply a pointer containing a variate for each slide. The slides factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The PROBES
factor is that for a single slide, and all slides must have a common layout.
The ATOMS
parameter supplies a factor to identify the PM/MM pairs within each probe, and the PMMM
parameter supplies a factor, with levels labelled 'PM'
and 'MM'
, to distinguish between PM and MM values. The TYPEPROBES
parameter supplies a factor to specify the probe types. The types of probes that can occur on Affymetrix chips are: 'Expression'
, 'Genotyping'
, 'CustomSeq'
, 'Tag'
, 'Unknown'
, 'Checkerboard
Negative'
, 'Checkerboard
Positive'
, 'Hybridization
Negative'
, 'Hybridization
Positive'
, 'Text
Negative'
, 'Text
Positive'
, 'Central
Negative'
, 'Central
Positive'
, 'Gene
Exp
Negative'
, 'Gene
Exp
Positive'
, 'Cycle
Fidelity
Negative'
, 'Cycle
Fidelity
Positive'
, 'Central
Cross
Negative'
, 'Central
Cross
Positive'
, 'Cross
Hyb
Negative'
and 'Cross
Hyb
Positive'
.
The ROWS
and COLUMNS
parameters can supply factors to identify the rows and columns within each slide. These are required only if background corrections are to be made.
The ESTIMATES
parameter must supply a variate to save the estimated expression value for each slide and probe combination. The IDPROBES
and IDSLIDES
parameters must supply factors to identify the probes and slides, respectively, in the ESTIMATES
variate. You can also set parameter SPREADSHEET=results
to save these in a Genstat spreadsheet. The SE parameter can supply a variate to save approximate standard errors and, if this is set, the standard errors are included in the spreadsheet.
Options: PRINT
, METHOD
, BMETHOD
, BWEIGHTING
, TRANSFORMATION
, NMETHOD
, REPLACEDATA
, SPREADSHEET
, MAXCYCLE
, TOLERANCE
.
Parameters: DATA
, SLIDES
, PROBES
, ATOMS
, PMMM
, TYPEPROBES
, ROWS
, COLUMNS
, ESTIMATES
, SE
, IDSLIDES
, IDPROBES
.
References
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. & Speed, T.P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, Number 2, 249-264.
See also
Procedures: FDRBONFERRONI
, FDRMIXTURE
, MAANOVA
, MABGCORRECT
, MAEBAYES
, MAREGRESSION
, MARMA
, MAROBUSTMEANS
, MAVDIFFERENCE
, MAVOLCANO
, QNORMALIZE
.
Commands for: Microarray data.
Example
CAPTION 'AFFYMETRIX example'; STYLE=meta " Warning, this example takes 1GB of RAM to run! " ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Hyb-AllData.gwb' IF check SPLOAD '%GENDIR%/Data/Microarrays/Hyb-AllData.gwb' " Estimate Expression Values from Affymetrix CEL data." AFFYMETRIX [PRINT=estimates,background,monitoring; METHOD=RMA;\ BMETHOD=none; TRANSFORMATION=log2; NMETHOD=medians;\ MAXCYCLE=10; TOLERANCE=0.0001; "SPREADSHEET=results"]\ DATA=Intensity; SLIDES=Slide; PROBES=Probe; ATOMS=Atom;\ PMMM=PM_MM; TYPEPROBES=Type; ROWS=ROW; COLUMNS=COL;\ IDPROBES=SlideID; IDSLIDES=ProbeID; ESTIMATES=Expression; SE=SE ELSE CAPTION 'Microarray example datasets have not been installed.' ENDIF