Obtains a starting classification for non-hierarchical clustering (S.A. Harding).

### No options

### Parameters

`DATA` = pointers |
Each pointer contains a set of variates giving the properties of the units to be grouped |
---|---|

`NGROUPS` = scalars |
Indicates the number of groups required |

`GROUPS` = factors |
Stores the classifications formed |

### Description

In non-hierarchical classification an initial classification is required, and it is advantageous to have these classes as homogeneous as possible. This reduces the risk of converging to a local optimum, and also encourages faster convergence of the iterative transfer algorithm used by the `CLUSTER`

directive.

The attributes of the units to be formed into groups are specified in a set of variates; these should be placed into a pointer for use as the setting for the `DATA`

parameter. The number of groups required is specified by the `NGROUPS`

parameter; this must not be greater than the number of units. The group allocations that are formed are stored in the factor indicated by the `GROUPS`

parameter. This factor need not be declared in advance but will be formed by the procedure.

Options: none.

Parameters: `DATA`

, `NGROUPS`

, `GROUPS`

.

### Method

When the number of groups is greater than the number of data variates plus one, `CLASSIFY`

forms the groups according to the positions of the units in the first dimension of a principal coordinates analysis (`PCO`

) of the `DATA`

variates.

Otherwise it tries to find a suitable classification into the *k* groups by finding the *k* units that are furthest apart in *p*-dimensional space (where *p* is the number of variates). These are then used as nuclei for the classes, with each of the remaining units being allocated to the class with the nearest nucleus.

The units defining the nuclei are found by first finding the two units that are furthest apart. The third unit is the unit with greatest distance from the line joining the first two units. The fourth is the unit with greatest distance from the plane containing the first three units, and so on until the *k*th unit is the unit furthest from the (*k*-2) dimensional space spanned by the (*k*-1) units already found.

### Action with `RESTRICT`

The variates must not be restricted.

### See also

Directive: `CLUSTER`

.

Commands for: Multivariate and cluster analysis.

### Example

CAPTION 'CLASSIFY example',\ !t('Ten units are described by four variables;',\ 'form an initial classification into four groups.');\ STYLE=meta,plain VARIATE [NVALUES=10] Variable[1...4] FACTOR [LEVELS=4; NVALUES=10] InitCl READ Variable[1...4] 1.0 4.0 2.0 1.0 4.0 6.0 1.0 3.0 7.0 8.0 5.0 4.0 3.0 4.0 2.0 1.0 6.0 4.0 5.0 2.0 9.0 4.0 1.0 2.0 4.0 2.0 1.0 3.0 4.0 5.0 2.0 7.0 1.0 2.0 4.0 1.0 5.0 1.0 2.0 0.0 : CLASSIFY Variable; NGROUPS=4; GROUPS=InitCl PRINT Variable[1...4],InitCl