Generates descriptive statistics and diagnostic plots of molecular marker data (D.A. Murray, S.J. Welham, M. Malosetti, M.P. Boer, L.C.P. Keizer & J.T.N.M. Thissen).
Options
PRINT = string tokens |
What to print (summary , missingvalues , frequencies ); default summ , miss , freq |
---|---|
PLOT = string tokens |
What to plot (missingvalues , frequencies , probabilities , genotypes , map ); default miss , geno , map |
GEN%MISSING = scalar |
Threshold for printing genotypes with many missing values (i.e. genotypes with a higher percentage of missing values than the specified value); default 10 |
MK%MISSING = scalar |
Threshold for printing markers with many missing values (i.e. markers with a higher percentage of missing values than the specified value); default 10 |
MK%EXTREME = scalar |
Threshold for printing markers with rare alleles (i.e. alleles present with a lower percentage than the specified threshold); default 10 |
POPULATIONTYPE = string token |
Type of population (BC1 , DH1 , F2 , RIL , BCxSy , CP , AMP ); must be set |
NGENERATIONS = scalar |
Number of generations for a RIL population; default 6 |
NBACKCROSSES = scalar |
Number of backcrosses; must be set for a BCxSy population |
NSELFINGS = scalar |
Number of selfings; must be set for a BCxSy population |
DCHROMOSOMES = variate, text or scalar |
Specifies a subset of the linkage groups to be displayed |
PDIRECTION = string token |
How to sort the probabilities when PRINT=frequencies with BC1 , DH1 , F2 , RIL and BCxSy populations (ascending , descending ); default * i.e. no sorting |
Parameters
MKSCORES = pointers |
Genotype codes for each marker; must be set |
---|---|
CHROMOSOMES= factors |
Linkage groups for the markers; must be set |
POSITIONS = variates |
Positions within the linkage groups of markers; must be set |
MKNAMES = texts |
Marker name; must be sets |
IDMGENOTYPES = texts |
Labels for genotypes corresponding to the marker scores |
PARENTS = pointers |
Parent information |
IDPARENTS = texts |
Labels to identify the parents |
GENCHECK = variates |
Logical variates containing the value one for genotypes with missing value problems, according to the setting of the GEN%MISSING option, and zero otherwise |
MKCHECK = variates |
Logical variates containing the value one for markers with missing or extreme value problems, as defined by the MK%MISSING and MK%EXTREME options, and zero otherwise |
SUMMARY = pointers |
Saves a summary of counts and probabilities for the chi-square tests for BC1 , DH1 , F2 , RIL and BCxSy populations |
Description
QMKDIAGNOSTICS
generates descriptive statistics and diagnostic plots of molecular marker data. The marker scores data must be supplied in a pointer by the MKSCORES
pointer. The length of the MKSCORES
pointer must be equal to the number of markers, and each structure of the pointer must be a factor with labels. The population type must be specified by the POPULATIONTYPE
option. For a RIL
population, the number of generations is specified by the NGENERATIONS
option; default 6. For a BCxSy
population, the number of backcrosses and the number of selfings are supplied by the NBACKCROSSES
and NSELFINGS
options, respectively.The labels for the genotypes corresponding to the marker scores can be supplied by the IDMGENOTYPES
parameter.
The corresponding map information for the markers must be supplied by the CHROMOSOMES
and POSITIONS
parameters, and the labels of the markers must be supplied by the MKNAMES
parameter.
The parent information must be supplied using the PARENTS
parameter in a pointer to a set of texts. The first text in the pointer defines the alleles for parent 1, the second text defines the allele for parent 2, and so on. The labels for the parents are supplied in a text using the IDPARENTS
parameter.
The PRINT
option controls printed output, with settings:
summary |
to print the number of genotypes and markers, and summary statistics per chromosome, |
---|---|
missingvalues |
to print the genotypes with percentages of missing values GEN%MISSING and the markers with percentages of missing values greater than MK%MISSING , |
frequencies |
to print the allele frequencies of all markers with allele frequencies greater than MK%EXTREME for for an AMP population, or the frequencies of genotype codes for markers for BC1 , DH1 , F2 , RIL and BCxSy populations. |
By default PRINT
=
summary,
missingvalues,
frequencies
. If PRINT=frequencies
or PLOT=probabilities
, the output for BC1
, DH1
, F2
, RIL
and BCxSy
populations includes the probabilities of the calculated chi-square tests of Mendelian segregation; the expected ratios are defined in the Method Section. The summary table of genotypic code frequencies can be sorted into ascending or descending order of probabilities by setting the PDIRECTION
option.
The PLOT
option controls graphical output, with settings:
missingvalues |
to produce a trellis plot of percentages of missing values against the map position for each linkage group and a plot of missing marker scores using the DQMKSCORES procedure, |
---|---|
frequencies |
to produce a trellis plot of the allele frequency percentages against the map position for each linkage group (for AMP population only), |
probabilities |
to produce a trellis plot of the chi-square probabilities, plotted on a -log10 scale against the map position for each linkage group (for BC1 , DH1 , F2 , RIL and BCxSy populations only), |
genotypes |
to plot all graphical genotypes, and |
map |
to plot the linkage map. |
By default PLOT
=
missingvalues,
genotypes,
map
.
The DCHROMOSOMES
option can be used to select a subset of the linkage groups to display. The setting can be either a variate or scalar to define a subset using the levels of the CHROMOSOMES
factor, or a text to define a subset using its labels.
The GENCHECK
parameter can save a logical variate identifying the genotypes that have less (with values of zero) or more (with values of one) than the required number of missing values, based on the setting of the GEN%MISSING
option. Similarly the MKCHECK
parameter can save a logical variate identifying the markers that have problems of missing or extreme values, according to the settings of the MK%MISSING
and MK%EXTREME
options.
The SUMMARY
parameter can save a pointer containing the structures that are printed when PRINT=frequencies
for F2
, BC1
, DH1
and RIL
populations. This contains the marker number, the marker name, the chromosome number, the position on the chromosome, percentage missing, the allele frequencies and the chi-square probability.
Options: PRINT
, PLOT
, GEN%MISSING
, MK%MISSING
, MK%EXTREME
, POPULATIONTYPE
, NGENERATIONS
, NBACKCROSSES
, NSELFINGS
, DCHROMOSOMES
, PDIRECTION
.
Parameters: MKSCORES
, CHROMOSOMES
, POSITIONS
, MKNAMES
, IDMGENOTYPES
, PARENTS
, IDPARENTS
, GENCHECK
, MKCHECK
, SUMMARY
.
Method
For markers the segregation is evaluated against the expected allele frequencies using a chi-square test. The frequencies are as follows:
Population | Alleles | Expected ratio |
BC1 |
1/1 : 1/2 | 1 : 1 |
DH1 |
1/1 : 2/2 | 1 : 1 |
F2 |
1/1 : 1/2 : 2/2 | 1 : 2 : 1 |
1/1 : 2/- | 1 : 3 | |
2/2 : 1/- | 1 : 3 | |
RILn |
1/1 : 1/2 : 2/2 | 2^{n-1}-1 : 2 : 2^{n-1}-1 |
1/1 : 2/- | 2^{n-1}-1 : 2^{n-1}+1 | |
2/2 : 1/- | 2^{n-1}-1 : 2^{n-1}+1 | |
BCxSy |
1/1 : 1/2 : 2/2 | 2^{x+y+1}-2^{y}-1 : 2 : 2^{y}-1 |
1/1 : 2/- | 2^{x+y+1}-2^{y}-1 : 2^{y}+1 | |
2/2 : 1/- | 2^{x+y+1}-2^{y}-1 : 2^{y}-1 |
where 1 is the allele for parent 1, 2 is the allele for parent 2, n is the number of RIL
generations, and x and y are the number of backcrosses and selfings, respectively, for a BCxSy
population.
Action with RESTRICT
Restrictions are not allowed.
See also
Procedures: DQMAP
, DQMKSCORES
, DQMQTLSCAN
, DQSQTLSCAN
, QMKRECODE
.
Commands for: Statistical genetics and QTL estimation, Graphics.
Example
CAPTION 'QMKDIAGNOSTICS example'; STYLE=meta " SxM DH1 population " QIMPORT [POPULATIONTYPE=DH1] '%GENDIR%/Examples/SxM_geno.txt';\ MAPFILE='%GENDIR%/Examples/SxM_map.txt';\ MKSCORES=m_scores1; CHROMOSOMES=m_chromo1; POSITIONS=m_pos1;\ MKNAMES=m_names1; PARENTS=parents1; IDPARENTS=idparents1 QMKDIAGNOSTICS [POPULATIONTYPE=DH1] m_scores1;\ CHROMOSOMES=m_chromo1; POSITIONS=m_pos1;\ MKNAMES=m_names1; SUMMARY=summary1; PARENTS=parents1;\ IDPARENTS=idparents1 " F2 population " QIMPORT [POPULATIONTYPE=F2] '%GENDIR%/Examples/F2maize_geno.txt';\ MAPFILE='%GENDIR%/Examples/F2maize_map.txt';\ MKSCORES=m_scores2; CHROMOSOMES=m_chromo2; POSITIONS=m_pos2;\ MKNAMES=m_names2; PARENTS=parents2; IDPARENTS=idparents2 QMKDIAGNOSTICS [POPULATIONTYPE=F2; DCHROMOSOMES=!(1,5); PDIRECTION=asce]\ m_scores2; CHROMOSOMES=m_chromo2; POSITIONS=m_pos2;\ MKNAMES=m_names2; SUMMARY=summary2; PARENTS=parents2;\ IDPARENTS=idparents2 " CP population " QIMPORT [POPULATIONTYPE=CP] '%GENDIR%/Examples/CPapple_geno.txt';\ MAPFILE='%GENDIR%/Examples/CPapple_map.txt';\ MKSCORES=m_scores3; CHROMOSOMES=m_chromo3; POSITIONS=m_pos3;\ MKNAMES=m_names3; PARENTS=parents3; IDPARENTS=idparents3 QMKDIAGNOSTICS [POPULATIONTYPE=CP; PDIRECTION=asce]\ m_scores3; CHROMOSOMES=m_chromo3; POSITIONS=m_pos3;\ MKNAMES=m_names3; SUMMARY=summary3; PARENTS=parents3;\ IDPARENTS=idparents3 " AMP population " QIMPORT [POPULATIONTYPE=AMP] '%GENDIR%/Examples/LD_match_geno.txt';\ MAPFILE='%GENDIR%/Examples/LD_match_map.txt';\ MKSCORES=m_scores4; CHROMOSOMES=m_chromo4; POSITIONS=m_pos4; \ MKNAMES=m_names4 QMKDIAGNOSTICS [POPULATIONTYPE=AMP; PDIRECTION=asce]\ m_scores4; CHROMOSOMES=m_chromo4; POSITIONS=m_pos4;\ MKNAMES=m_names4; SUMMARY=summary4