Identifies specimens using a classification tree (R.W. Payne).
Options
PRINT = string tokens |
Controls printed output (identification , transcript ); if PRINT is unset in an interactive run BCIDENTIFY will ask what you want to print, in a batch run the default is iden |
---|---|
TREE = tree |
Specifies the tree |
IDENTIFICATION = text |
Saves the identification of each specimen |
TERMINALNODES = pointer |
Saves the numbers of the terminal nodes reached by each specimen |
PROBABILITIES = matrix |
Specimen × group matrix giving the probability that the specimens belong to each group |
MVINCLUDE = string token |
Whether to provide identifications for specimens with missing or unavailable values of the x-variables (explanatory ); default expl |
Parameters
X = variates or factors |
Explanatory variables |
---|---|
VALUES = scalars, variates or texts |
Values to use for the explanatory variables; if these are unset for any variable, its existing values are used |
Description
BCIDENTIFY
identifies specimens using a classification tree, as constructed by the BCLASSIFICATION
procedure. The tree can be saved from BCLASSIFICATION
(using the TREE
option of BCLASSIFICATION
), and specified for BCIDENTIFY
using its own TREE
option. Alternatively, BCIDENTIFY
will ask you for the identifier of the tree if you do not specify TREE
when running interactively.
The characteristics of the specimens can be specified in the variates or factors listed by the X
parameter. These must have identical names (and levels) to those used originally to construct the tree. You can use the VALUES
parameter to supply new values, if those stored in any of the variates or factors are unsuitable.
If you do not set X
when running interactively, BCIDENTIFY
will ask you to supply the relevant characteristics in turn, as required by the tree. Otherwise, if an x-variable in the tree is not specified in the X
parameter list, its values are assumed to be unavailable (i.e. missing).
By default, when the x-variable required at a node in the tree is unavailable or contains a missing value, BCIDENTIFY
will follow all the branches from that node, and form a combined conclusion. You can set option MVINCLUDE=*
, if you would prefer the identification to be missing.
The PRINT
option controls printed output, with settings:
identification |
prints the identifications obtained using the tree; |
---|---|
transcript |
prints the observed characteristics when supplied in response to questions in an interactive run. |
If you do not set PRINT
in an interactive run, BCIDENTIFY
will ask what you would like to print. In batch, the default is to print the identifications.
The IDENTIFICATION
option allows you to save the identifications (in a text). The TERMINALNODES
option allows you to save a pointer, with an element for each specimen, containing the numbers of the terminal nodes reached in the tree to provide its identification. This will be a scalar if the identification was derived from a single node, or a variate if it involved more than one (because several branches have been taken, as the result of a missing x-value). Finally, the PROBABILITIES
option can save a specimen-by-group matrix giving the probability that the specimens belong to each group.
Options: PRINT
, TREE
, IDENTIFICATION
, TERMINALNODES
, PROBABILITIES
, MVINCLUDE
.
Parameters: X
, VALUES
.
Method
BCIDENTIFY
uses BIDENTIFY
to find the terminal nodes of the tree that correspond to the values of the explanatory variables.
Action with RESTRICT
Restrictions are ignored.
See also
Procedures: BCLASSIFICATION
, BCDISPLAY
, BCKEEP
.
Commands for: Multivariate and cluster analysis.
Example
CAPTION 'BCIDENTIFY example',!t(\ 'Calculator digit recognition problem as in Breiman et al.',\ '(1984, p.44). The assumption is that the digits of a calculator',\ 'are made up of 7 lines (as shown below), which may be missing for',\ 'any particular digit with probability 0.1:'); STYLE=meta,plain SCALAR Chan ENQUIRE Chan; FILETYPE=output; OUTSTYLE=Style OUTPUT [STYLE=plain] PRINT !t(' -1- ','| |','2 3','| |',' -4- ',\ '| |','5 6','| |',' -7- '); FIELD=20 OUTPUT [STYLE=#Style] VARIATE xdefn[1...7] READ [PRINT=error] xdefn[1...7] 0 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 : "generate a set of random observations" SCALAR nsamples,seed; VALUE=50,876083 VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit CALC rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1 & truelight[] = ELEMENTS(xdefn[]; rdigit) GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7] CALC light[] = MOD(truelight[] + error[]; 2) FACTOR [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0 FACTOR [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0 "form the classification tree" BCLASSIFICATION [PRINT=*; GROUPS=digit; TREE=tree]\ x1,x2,x3,x4,x5,x6,x7 "prune the tree" BPRUNE [PRINT=table] tree; NEWTREE=pruned "use the 5th tree - renumber nodes" BCUT [RENUMBER=yes] pruned[5]; NEWTREE=tree "display the tree" BCDISPLAY [PRINT=labelled] tree PRINT 'Check identification of the true representations of the digits.' FACTOR [LEVELS=!(0,1); NVALUES=10] x1,x2,x3,x4,x5,x6,x7; VALUES=xdefn[] BCIDENTIFY [PRINT=*; TREE=tree; IDENTIFICATION=identification]\ x1,x2,x3,x4,x5,x6,x7 TEXT [VALUES='Digit 1:','Digit 2:','Digit 3:','Digit 4:','Digit 5:',\ 'Digit 6:','Digit 7:','Digit 8:','Digit 9:','Digit 0:'] name PRINT name,identification; FIELD=15