Plots displays to assess genotype + genotype-by-environment variation (A.I. Glaser).
Options
PRINT = string tokens |
What to print (variation ); default * i.e. nothing |
---|---|
DIMENSIONS = scalars |
Which dimensions to display; default 1,2 |
PLOT = string token |
Type of plot (scatter , ranking , compare , joint , centred ); default scat |
METHOD = string token |
Whether the names in LEV1 (and LEV2 ) are from the ENVIRONMENTS or GENOTYPES factor (environments , genotypes ); default envi |
SCPLOT = string token |
Features to add to a scatter plot (hull , sector , megaenvironment , vector , linear ); default * i.e. none |
SCALING = string tokens |
What scaling to use (genotype , environment , symmetric ); default envi |
NORMALIZE = string token |
Whether to scale the data using the within-environment standard deviation (yes , no ); default no |
CULL = variate or text |
Specifies environments at which to examine the performance of the genotypes in order to decide which genotypes to cull |
QUANTILE = scalar |
Proportion at which to calculate quantile for CULL ; default 0.5. |
DIVISIONS = scalar |
Number of parallel lines or concentric circles to use when ranking genotypes or environments; default 10 |
RANKINGLINES = string token |
Whether the ranking lines drawn with PLOT settings ranking or joint are perpendicular to the biplot axis or projected onto the axis (perpendicular , projection ); default perp |
GENREVERSE = string token |
Whether to reverse the order of the genotype scores (yes , no ); default no |
ENVREVERSE = string token |
Whether to reverse the order of the environment scores (yes , no ); default no |
WINDOW = scalar |
Which graphical window to use; default 1 |
KEYWINDOW = scalar |
Window number for the key (zero for no key); default 2 |
Parameters
DATA = variates or tables |
Provides the data to be analysed |
---|---|
GENOTYPES = factors |
Specifies the genotypes |
ENVIRONMENTS = factors |
Specifies the environments |
LEV1 = texts or scalars |
First environment (or genotype) to use with PLOT settings centred , compare , joint or ranking , or with scatter when SCPLOT=linear |
LEV2 = texts or scalars |
Second environment (or genotype) to use with PLOT settings centred , compare or joint |
LABGENOTYPES = texts |
Labels for genotypes |
LABENVIRONMENTS = texts |
Labels for environments |
TITLE = texts |
Titles for the plots; if this is unset, an appropriate title is formed auomatically |
MEGAGROUPS = variates or texts |
Specifies or saves the groupings to use for the plot produced by SCPLOT=megaenvironment |
Description
GGEBIPLOT
provides a range of plots that are useful for assessing the performance of genotypes in different environments. The observed phenotypic variation (P) of genotypes across environments is made up of environment variations (E), genotype variations (G) and genotype-by-environment interaction (GE): i.e.
P = E + G + GE,
Usually E is the dominant source of variation, while G and GE are relatively small. Thus, it is usual to remove the environmental main effect E, and focus only on G and GE.
The data for GGEBIPLOT
is a table of data values, classified by genotype and environment factors, and specified by the DATA
parameter. The genotype and environment factors are specified by the GENOTYPES
and ENVIRONMENTS
parameters. You can set DATA
to the table itself. Alternately, you can set it to a variate containing the raw data, and GGEBIPLOT
will form the table as a table of means.
GGEBIPLOT
illustrates the genotype plus genotype-by-environment variation using scores from a principal components analysis, treating the table as a data matrix. The rows (or units) of the data matrix correspond to the genotypes, and the columns (or variates) correspond to the environments. The analysis works on the matrix of variances and covariances between environments. The environment means are automatically removed during the calculation of the variances and covariances. So the analysis automatically ensures that it is only the genotype variation and genotype-by-environment interaction that is examined. You can also scale the columns first, using the within-environment standard deviation, by setting option NORMALIZE=yes
. Usually the scores are taken from the first two dimensions of the decomposition, but you can request others by setting the DIMENSIONS
option. You can set option PRINT=variation
to print the amount of variation explained by these two dimensions; by default, nothing is printed.
GGEBIPLOT
plots the scores in a range of different ways, together with biplot axes from the principal components analysis. Essentially these are standard principal-component biplots, but various additional information can be added to the plots, as suggested in the book GGE Biplot Analysis by Yan & Kang (2003), to help elucidate the genotype and environment relationships.
The PLOT
option controls the plots that are displayed. The setting scatter
plots the genotype and environment scores. The SCPLOT
option allows further information to be included on the plot, with settings:
hull |
to draw an enclosing convex hull around the genotype scores; |
---|---|
sector |
to draw lines from the origin perpendicular to each side of the convex hull around the genotype scores, to divide the biplot into sectors; |
megaenvironment |
to draw an ellipse round those environments which share the same sector; |
vector |
to draw lines connecting environment scores with the origin; |
linear |
to draw the same lines as vector , together with a rug plot at the side showing the angles between the environments, the parameter LEV1 must then be set to the label (or level) of an environmental factor which will be used as a “base” factor. |
Note that hull
, sector
and megaenvironment
can be used together, but vector
and linear
must be used individually. For single-trait data, genotypes at the vertex of the convex hull are considered to be the best performers in the environments that occur in the same sector (these are known as the vertex cultivars). The sector
setting splits the plots into different sectors. The genotypes in the same sector as a particular environment should be those with higher yields in that environment. As a general rule, the vertex cultivar will be the highest-yielding genotype in all environments with which it shares a sector. The megaenvironment
setting draws an ellipse around those environments which share a sector (if the ellipse extends into another sector and sector lines are plotted, the ellipse lines become dashed when they go into a different sector).
The MEGAGROUPS
parameter can be used to specify or save the groups used for the megaenvironment
setting. To specify the groups, you can set MEGAGROUPS
to a variate or text with the same length as the number of levels of the ENVIRONMENTS
factor; its values indicate the group to which each environment belongs. Alternatively, if MEGAGROUPS
is set to an undefined data structure, or one with no values, this will be defined as a variate containing the default group definitions.
The PLOT
setting ranking
can examine the performances of all the genotypes within a specific environment. Alternatively, you can set option METHOD=genotype
to examine all the environments for a specific genotype. This draws a biplot axis through the specific environment (or genotype) together with ranking lines to show the best performing genotypes (or environments) in that environment (or genotype). By default the ranking lines are drawn to be perpendicular to the biplot axis, but you can set option RANKINGLINES=projection
to project lines from the environments (or genotypes) to the biplot axis instead. In the plot, the best performing genotypes (or environments) are those whose projections onto the biplot axis are closest to the environment or genotype). The required genotype (or environment) is specified by setting the parameter LEV1
to either the label or level of the required environment (or genotype). If LEV1
is unset or is set to a missing value, an axis is drawn through the “average environment coordinate” (AEC), with the appropriate ranking lines. The AEC is represented by a circle on the plot.
The PLOT
setting compare
can compare the performance of the environments with a specific environment, or you can set option METHOD=genotype
to compare the genotypes with a specific genotype. The specific environment (or genotype) is viewed as an “ideal” environment (or genotype), and concentric circles are plotted around it. The closer an environment (or genotype) is to the “ideal” environment (or genotype) the more attributes they share. The required environment (or genotype) is specified by setting the parameter LEV1
to either the label or level of the required environment (or genotype). If LEV1
is unset or is set to a missing value, GGEBIPLOT
constructs an “ideal” environment (or genotype), and draws concentric circles from its point. The constructed “ideal” environment (or genotype) lies on the line that joins the origin to the AEC, at a distance from the origin equal to the distance from the origin to the environment (or genotype) with the greatest yield. (The “ideal” environment or genotype considers only those environments or genotypes that show greater than average yield.) The “ideal” environment (or genotype) is represented by an arrow on the plot. In practice the “ideal” is unlikely to exist, but can be used as a reference point. It is also possible to see where the AEC is in relation to the “ideal” genotype (or environment) by setting LEV2
to a missing value.
The major difference between ranking
and compare
is that ranking
shows the best performing environments (or genotypes) in a genotype (or environment) in a single dimension, whilst compare
shows the best performing genotypes (or environments) in comparison to an “ideal” genotype (or environment) in two dimensions. The DIVISIONS
option specifies the number of lines, or concentric circles, to use when ranking genotypes or environments with PLOT
settings ranking
or compare
; the default is to use 10.
The PLOT
setting joint
can be used to compare two environments simultaneously, or you can set option METHOD=genotype
to compare two genotypes. When comparing two environments, a line is drawn joining the environments. A median point on this line is found, which acts as a virtual trait. A biplot axis is plotted passing through this median and the origin. Ranking lines are also drawn to the biplot axis, as with the PLOT
setting ranking
; the RANKINGLINES
option again controls whether these are perpendicular to the axis or projected onto the axis. The genotypes that are furthest along the biplot axis (in the direction of the arrow) are considered to be the best performing genotypes in the two environments. Alternatively, when comparing two genotypes, a line is drawn joining the genotypes. An axis is now drawn through the origin perpendicular to this joining line. The environments on the same side of the axis as one of the chosen genotypes are those where that genotype is considered to have a better performance. In some circumstances both genotypes may end up on the same side of the axis. The genotype that is closest to the axis is then considered to have a better performance in the environments on the other side of the perpendicular line. The two environments (or genotypes) are specified by setting LEV1
and LEV2
to their levels or labels.
The PLOT
setting centred
can produce a scatter plot of the environment-centred data, with the x and y-axes representing two of the environments. In this case only the genotypes are plotted. Alternatively, you can set METHOD=genotype
to produce a plot of the genotype-centred environment data, with the x and y-axes representing two of the genotypes. The line y=x is also plotted. Genotypes (or environments) below this line perform better in the environment (or genotype) representing the x-axis, and genotypes (or environments) above this line perform better in the environment (or genotype) representing the y-axis. The two environments (or genotypes) are again specified by setting LEV1
and LEV2
to their levels or labels.
When there are a large number of genotypes it may be helpful to cull some of them from the biplot. For example, you may want to remove genotypes that have performed badly in some of the environments. To do this you specify CULL
to a variate or a text containing the levels or labels of the environments that you want to consider. Then, by default, all genotypes with y-values less then the median value at each chosen environment will be removed. Alternatively, you can specify some other quantile at which to cull by using the QUANTILE
option. Note, however, if you select more than one environment when the y-values at the environments are negatively correlated, there may be very few (or possibly no) genotypes left to plot.
The GENREVERSE
and ENVREVERSE
options can reverse the y-direction in the plots of the genotype and environment scores, respectively,
By default, the species scores, site scores and x-variable(s) are labelled by the labels of the ENVIRONMENTS
and GENOTYPES
factors, if available, or otherwise by their levels. Alternatively, you can specify other labels using the LABENVIRONMENTS
and LABGENOTYPES
parameters.
Options: PRINT
, DIMENSIONS
, PLOT
, METHOD
, SCPLOT
, SCALING
, NORMALIZE
, CULL
, QUANTILE
, DIVISIONS
, RANKINGLINES
, GENREVERSE
, ENVREVERSE
, WINDOW
, KEYWINDOW
.
Parameters: DATA
, GENOTYPES
, ENVIRONMENTS
, LEV1
, LEV2
, LABGENOTYPES
, LABENVIRONMENTS
, TITLE
, MEGAGROUPS
.
Method
GGEBIPLOT
calculates a principal components analysis on the data variates, which automatically column-centres the data thus removing the environmental effects. The eigenvectors for genotype i and/or the eigenvectors for environment j are multiplied by a constant to get environment and genotype scores. The constant is chosen by setting the SCALING
option as follows:
genotype |
λi × ith environmental eigenvector |
---|---|
environment |
λi × ith genotype eigenvector |
symmetric |
genotype scores scaled by √λi × ith environmental eigenvector, environment scores scaled by √λi × ith genotype eigenvector |
where {λi} are the singular values of the data, with the values of i set by DIMENSIONS
.
The singular values are equivalent to multiplying the roots from a principal components analysis by (n-1) and then raising to the power of -½. The eigenvectors for the genotypes are obtained by multiplying the scores from a principal components analysis by a diagonal matrix containing the singular values. The enviromental eigenvectors are calculated by multiplying the data by the inverse of (the genotype eigenvectors multiplied by the singular values).
The genotype-focused scaling is used to display the interrelationships of the genotypes. The environment-focused scaling is probably used most frequently. It displays the interrelationship among environments, and has the following properties.
(1) The cosine of the angle between any two environments approximates their correlation.
(2) The lengths of the environment vectors are approximately proportional to their standard deviations.
(3) The inner product between two environments approximates their covariance.
The symmetric scaling method allows for comparisons of the relative variances between the genotypes and environments.
References
Yan, W. & Kang, M.S. (2003). GGE Biplot Analysis: a Graphical Tool for Breeders, Geneticists and Agronomists. CRC Press, Boca Raton.
Hunt, L.A. & Yan, W. (2002). Biplot analysis of diallel data. Crop Science, 42, 21-30.
See also
Procedures: AMMI
, GESTABILITY
, RFINLAYWILKINSON
, DBIPLOT
, CABIPLOT
, CRBIPLOT
, CRTRIPLOT
.
Commands for: REML analysis of linear mixed models, Graphics.
Example
CAPTION 'GGEBIPLOT example',!t('Data from Hunt & Yan (2002)',\ 'Fusarium head Table 3. Tolerance to infection by',\ 'pink stem borer (PSB) of 10 blight of seven winter',\ 'wheat genotypes and their FAC1 hybrids'); STYLE=meta,plain VARIATE [NVALUES = 49; VALUES =\ 27.5, 35.7, 46.4, 53.7, 33.3, 64.9, 43.3,\ 35.7, 37.5, 46.2, 40.8, 51.9, 45.6, 57.5,\ 46.4, 46.2, 38.7, 49.1, 50.4, 55.6, 69.4,\ 53.7, 40.8, 49.1, 51.2, 49.4, 48.1, 57.5,\ 33.3, 51.9, 50.4, 49.4, 42.5, 63.1, 68.9,\ 64.9, 45.6, 55.6, 48.1, 63.1, 60.0, 63.1,\ 43.3, 57.5, 69.4, 57.5, 68.9, 63.1, 43.7] Yield FACTOR [NVALUES=49; LABELS=!T(a,b,c,d,e,f,g)] Env FACTOR [NVALUES=49; LABELS=!T(A,B,C,D,E,F,G)] Genotype GENERATE Env,Genotype GGEBIPLOT Yield; GEN=Genotype; ENVIRONMENT=Env