Forms values for nodes of a classification tree (R.W. Payne).

### Options

`GROUPS` = factor |
Groupings of the observations in the data set |
---|---|

`TREE` = tree |
Tree for which predictions and accuracy values are to be formed |

`REPLACE` = string token |
Whether to replace the values stored in the tree (`yes` , `no` ); default `no` |

`PREDICTION` = pointer |
New predictions for the nodes of the tree |

`ACCURACY` = pointer |
New accuracy values for the nodes of the tree |

`REPLICATION` = pointer |
New replication tables for the nodes of the tree |

### Parameter

`X` = factors or variates |
Values of the factors or variates used in the tree for the new data set |
---|

### Description

When pruning a classification tree, it is best to use “accuracy” figures that are derived from a different set or sets of data from that which was used to construct the tree. `BCVALUES`

allows these to be calculated, together with new predictions for the nodes of the tree.

The `TREE`

option specifies the tree for which the values are to be formed. The `GROUPS`

option specifies a factor defining the groupings of the observations in the new data set, and the `X`

parameter defines their levels for the factors or variates as used to construct the tree. You can set option `REPLACE=yes`

to use the new values to replace those already stored in the tree. Alternatively, you can use the `PREDICTION`

parameter to save the predictions, in a pointer. This has an element for each node of the tree (and with the same suffix as that node) pointing to a scalar storing the prediction for the node. Similarly, the `ACCURACY`

parameter saves the accuracies, in a pointer to a set of scalars, and the `REPLICATION`

parameter saves the replications of the groups at each node, in a pointer to a set of tables classified by the `GROUPS`

factor. You can use these later to replace the prediction and accuracy values in the original tree by

`CALCULATE Tree[]['accuracy'] = ACCURACY[]`

`& Tree[]['prediction'] = PREDICTION[]`

`& Tree[]['replication'] = REPLICATION[]`

Alternatively, you may want to combine them first with other estimates, for example to form bootstrapped estimates.

Options: `GROUPS`

, `TREE`

, `REPLACE`

, `PREDICTION`

, `ACCURACY`

, `REPLICATION`

.

Parameter: `X`

.

### Method

`BCVALUES`

uses the standard Genstat tree functions to obtain the necessary information about the tree.

### Action with `RESTRICT`

`BCVALUES`

takes account of any restrictions on the `X`

vectors or on `GROUPS`

.

### See also

Procedures: `BCLASSIFICATION`

, `BPRUNE`

.

Commands for: Multivariate and cluster analysis.

### Example

CAPTION 'BCVALUES example',\ !t('Calculator digit recognition problem as in Breiman et al.',\ '(1984, p.44); for more details see the BCLASSIFICATION example.');\ STYLE=meta,plain VARIATE xdefn[1...7] READ [PRINT=error] xdefn[1...7] 0 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 : "generate a set of random observations" SCALAR nsamples,seed; VALUE=50,876083 VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit CALC rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1 & truelight[] = ELEMENTS(xdefn[]; rdigit) GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7] CALC light[] = MOD(truelight[] + error[]; 2) FACTOR [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0 FACTOR [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0 "check the data" CAPTION 'Check the data: mean values of each light for each digit.' FOR i=0...9 RESTRICT light[]; digit==i PRINT 'Digit',i; FIELD=6; DECIMALS=0; JUST=left & [ORIENT=across; SQUASH=yes] MEAN(light[]); FIELD=10 ENDFOR "number of each digit in the data set" RESTRICT light[] TABULATE [CLASS=digit; PRINT=count] CAPTION 'Mean error rate for each light.' PRINT [ORIENT=across] MEAN(error[]); DECIMALS=4 "form the classification tree" BCLASSIFICATION [PRINT=labelled; GROUPS=digit; TREE=tree]\ x1,x2,x3,x4,x5,x6,x7 CAPTION 'Prediction and accuracy values stored with the tree.' FOR pred=tree[]['prediction']; acc=tree[]['accuracy'] PRINT [SQUASH=yes] pred,acc; FIELD=25 ENDFOR "generate another set of random observations" SCALAR nsamples,seed; VALUE=500,728342 VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit CALC rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1 & truelight[] = ELEMENTS(xdefn[]; rdigit) GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7] CALC light[] = MOD(truelight[] + error[]; 2) FACTOR [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0 FACTOR [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0 "form new prediction and accuracy values" POINTER [NVALUES=!t(y,x); VALUES=digit,!p(x1,x2,x3,x4,x5,x6,x7)] data BCVALUES [GROUPS=digit; TREE=tree; PREDICTION=prediction; ACCURACY=accuracy]\ x1,x2,x3,x4,x5,x6,x7 CAPTION 'New prediction and accuracy values (from another data set).' FOR pred=prediction[]; acc=accuracy[] PRINT [SQUASH=yes] pred,acc; FIELD=15 ENDFOR "prune the tree" BPRUNE [PRINT=table] tree; ACCURACY=accuracy; NEWTREE=pruned "use the 5th tree - renumber nodes" BCUT [RENUMBER=yes] pruned[5]; NEWTREE=tree "display the tree" BCDISPLAY [PRINT=summary,labelled] tree