Matches different data structures to be used in QTL estimation (L.C.P. Keizer & J.T.N.M. Thissen).
Options
PRINT = string tokens |
What to print (summary , details ); default summ |
---|---|
GEN%MISSING = scalar |
Percentage of missing values allowed for a genotype; default 50 |
MK%MISSING = scalar |
Percentage of missing values allowed for a marker; default 50 |
MK%EXTREME = scalar |
Extreme allele percentage allowed for a marker; default 5 |
GENSELECTION = variate |
Logical variate containing the value one for the genotypes to retain and zero for those to remove (supersedes the options GEN%MISSING , MK%MISSING and MK%EXTREME ) |
MKSELECTION = variate |
Logical variate containing the value one for the markers to retain and zero for those to remove (supersedes the options GEN%MISSING , MK%MISSING and MK%EXTREME ) |
POPULATIONTYPE = string token |
Type of population (BC1 , DH1 , F2 , RIL , BCxSy , CP , AMP ); must be set |
OUTFILEPREFIX = text |
Prefix for the output file names; default * i.e. files not saved |
Parameters
TRAITS = pointers or variates |
Quantitative traits |
---|---|
GENOTYPES = factors |
Genotype factors corresponding to the traits |
ENVIRONMENTS = factors |
Environment factors corresponding to the traits |
MKSCORES = pointers |
Marker scores; must be set |
CHROMOSOMES = factors |
Chromosomes corresponding to the markers |
POSITIONS = variates |
Positions on the chromosomes corresponding to the markers |
MKNAMES = texts |
Names of the markers |
IDMGENOTYPES = texts |
Labels for the genotypes corresponding to the markers |
PARENTS = pointers |
Parent information |
IDPARENTS = texts |
Labels used to identify the parents |
KMATRIX = symmetric matrices |
Kinship matrices containing coefficients of coancestries |
SUBPOPULATIONS = factors |
Groups of genotypes |
STRAITS = pointers or variates |
Saves the sorted quantitative traits |
SGENOTYPES = factors |
Saves the sorted genotype factors |
SENVIRONMENTS = factors |
Saves the sorted environment factors |
SMKSCORES = pointers |
Saves the sorted marker scores; must be set |
SCHROMOSOMES = factors |
Saves the sorted chromosomes corresponding to the markers |
SPOSITIONS = variates |
Saves the sorted positions on the chromosomes corresponding to the markers |
SMKNAMES = texts |
Saves the sorted names of the markers |
SIDMGENOTYPES = texts |
Saves the sorted labels for the genotypes |
SPARENTS = pointers |
Saves the sorted parent information |
SIDPARENTS = texts |
Saves the sorted labels used to identify the parents |
SKMATRIX = symmetric matrices |
Saves the sorted kinship matrices |
SSUBPOPULATIONS = factors |
Saves the sorted groups of genotypes |
Description
QMATCH
matches the various data structures that can be used in QTL detection. These include molecular marker information of sets of genotypes, map information, phenotypic information, and also genetic relatedness information in the form of genotype groupings and kinship matrices. QMATCH
can be used to align all these data for further analyses.
Molecular marker information is supplied by the MKSCORES
, MKNAMES
and IDMGENOTYPES
parameters; MKSCORES
must be set. The type of population from which the genotypes come must be specified using the POPULATIONTYPE
option. If parental genotypes are known (designed crosses), the marker scores of the parents can be supplied by the PARENTS
parameter, and their labels can be specified by the IDPARENTS
parameter. Molecular map information is supplied by the CHROMOSOMES
and POSITIONS
parameters. Phenotypic data are specified by the TRAITS
parameter, as a variate for a single trait, or as a pointer containing several variates for more than one trait. The GENOTYPES
parameter supplies a factor defining the genotype of each trait observation, and the ENVIRONMENTS
parameter can supply a factor defining the environment of each observation when the data are from a multi-environment trial. Genetic relatedness information, used in association mapping analyses, can be given as a kinship matrix using the KMATRIX
parameter, or a grouping factor using the SUBPOPULATIONS
parameter.
QMATCH
matches the different data sets together, with respect to the same set of genotypes (MKSCORES
and TRAITS
), or the same set of markers (MKSCORES
and the map structures). The non-common genotypes and/or markers are removed.
In addition to subsetting the data, the procedure can also be used to remove genotypes and/or markers with too many missing values. The GEN%MISSING
option sets a threshold on the percentage of missing values within each genotype (default 50); genotypes with more than that percentage of missing scores are excluded. Similarly, the MK%MISSING
option sets a threshold on the percentage of missing values within each marker (default 50); markers with more than that percentage of missing scores are excluded. This can also be done with the MK%EXTREME
option; markers are then excluded if one allele percentage of that marker is greater than the MK%EXTREME
value.
In some situations you may already know which markers or genotypes you want to remove. If so, you can set the GENSELECTION
and MKSELECTION
options (and the GEN%MISSING
, MK%MISSING
and MK%EXTREME
options are then ignored). The setting of each option is a logical variate containing the value one for the genotypes or markers (respectively) to retain, and zero for those that are to be removed. If any of these two options is set, no checks are carried out using the GEN%MISSING
, MK%MISSING
and MK%EXTREME
options.
The modified data structures can be saved using the parameters beginning with the prefix S
. The SMKSCORES
parameter, which must be set, saves the marker scores. If only the MKSCORES
and SMKSCORES
parameters are specified, the SMKSCORES
variates are sorted according to the labels of the MKSCORES
pointer. If the MKNAMES
and/or the IDMGENOTYPES
parameters are also specified, sorting is then done according to their values. If the map structures (CHROMOSOMES
and POSITIONS
) are also set, the SMKSCORES
variates are first sorted in ascending order according to the levels of the CHROMOSOMES
factor, and then within each chromosome (linkage group) in ascending order of the POSITIONS
. If the SMKNAMES
, SCHROMOSOMES
, SPOSITIONS
, SPARENTS
and SIDPARENTS
are set, their values are sorted in the same way. The structures corresponding to the traits (i.e. STRAITS
, SGENOTYPES
and SENVIRONMENTS
) are sorted in the same way as the SIDMGENOTYPES
text; if these structures contain values from more than one environment, the sorting according to the values of SIDMGENOTYPES
is done within each environment. Finally, if the KMATRIX
and/or the SUBPOPULATIONS
parameters are set, their sorted values can be saved by the SKMATRIX
and SSUBPOPULATIONS
parameters, respectively.
The OUTFILEPREFIX
option can be used to define the initial part of the names of files to save the modified data. The text supplied by the option should not contain an extension, as the extension is defined automatically for the different files. The saved marker scores are stored in a flapjack file with '_geno.txt'
added to OUTFILEPREFIX
, the saved map structures in a flapjack map file with '_map.txt'
added, and the saved phenotypical structures in a Genstat spreadsheet file with '_pheno.gsh'
added. The saved kinship matrix and the saved subpopulations structures are also stored in Genstat spreadsheet files, with '_kmat.gsh'
and '_subpop.gsh'
added, respectively.
The PRINT
option controls the printed output, with settings:
summary |
for a general summary of the changes, and |
---|---|
details |
for details of the omitted genotypes and markers, etc. |
Options: PRINT
, GEN%MISSING
, MK%MISSING
, MK%EXTREME
, GENSELECTION
, MKSELECTION
, POPULATIONTYPE
, OUTFILEPREFIX
.
Parameters: TRAITS
, GENOTYPES
, ENVIRONMENTS
, MKSCORES
, CHROMOSOMES
, POSITIONS
, MKNAMES
, IDMGENOTYPES
, PARENTS
, IDPARENTS
, KMATRIX
, SUBPOPULATIONS
, STRAITS
, SGENOTYPES
, SENVIRONMENTS
, SMKSCORES
, SCHROMOSOMES
, SPOSITIONS
, SMKNAMES
, SIDMGENOTYPES
, SPARENTS
, SIDPARENTS
, SKMATRIX
, SSUBPOPULATIONS
.
Action with RESTRICT
Restrictions are not allowed.
See also
Procedure: QMKDIAGNOSTICS
.
Commands for: Statistical genetics and QTL estimation.
Example
CAPTION 'QMATCH example'; STYLE=meta QIMPORT [POPULATION=AMP] '%GENDIR%/Examples/LD_example_geno.txt';\ MAPFILE='%GENDIR%/Examples/LD_example_map.txt';\ MKSCORES=mkscores; MKNAMES=mknames; CHROMOSOMES=mkchr;\ POSITIONS=mkpos; IDMGENOTYPES=idmgeno QMATCH [POPULATION=AMP; OUTFILE='LD_match']\ MKSCORES=mkscores; CHROMOSOMES=mkchr; POSITIONS=mkpos;\ MKNAMES=mknames; IDMGENOTYPES=idmgeno; SMKSCORES=smkscores