Declares a self-organizing map (R.W. Payne).

### No options

### Parameters

`IDENTIFIER` = identifiers |
Identifiers of the SOMs |
---|---|

`VARIABLENAMES` = texts |
Names of variables corresponding to the weights of each SOM |

`ROWS` = scalars or variates |
Number of rows or row coordinates for the map |

`COLUMNS` = scalars or variates |
Number of columns or column coordinates for the map |

`DMETHOD` = string tokens |
Method for calculating the distances of data points from the modes (`euclidean` , `cityblock` ); default `eucl` |

`WMETHOD` = string tokens |
Method for calculating the contribution of a data point to each node when revising the weights (`gaussian` , `neighbour` ); default `gaus` |

### Description

A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable). `SOM`

defines the Genstat data structures used to represent self-organizing maps. These are compound data structures similar, for example, to the LRV structure used to store latent roots and vectors (see the `LRV`

directive). Compound data structures are like Genstat pointers in that they point to a set of other structures. However, the set has a fixed size, its elements must be of the correct types, and must form a consistent set (in terms of their sizes and so on). You can refer to the elements of an SOM in exactly the same way as the elements of a pointers, but the suffixes and their labels are fixed. Unlike pointers, the labels are not case sensitive, so Genstat will recognize the label in either upper-case or lower-case letters or in any mixture of the two.

The elements of an SOM are as follows:

`[1]` or `['variablenames']` |
text containing the names of the variables; |
---|---|

`[2]` or `['rows']` |
factor giving the row position of each node; |

`[3]` or `['columns']` |
factor giving the column position of each node; |

`[4]` or `['dmethod']` |
text containing either `'EUCLIDEAN'` or `'CITYBLOCK'` indicating the method used to measure distance on the map; |

`[5]` or `['wmethod']` |
text containing either `'GAUSSIAN'` or `'NEIGHBOUR'` indicating the method used to adjust the weights at each iteration during their estimation; |

`[6]` or `['weights']` |
matrix of weights (variables × nodes); |

`[7]` or `['summaries']` |
pointer to store variates of summaries of variables at the modes of the map; |

`[8]` or `['smethods']` |
text indicating the method used to summarize the variable in each variate of summaries; |

`[9]` or `['svariablenames']` |
text indicating the variable that was summarized in each variate of summaries. |

The `SOM`

procedure defines the SOM, and forms its first five elements. The weights (element 6) can be estimated and stored in the SOM by the `SOMESTIMATE`

procedure, and the summary information (elements 7-9) can then be formed and added by the `SOMDESCRIBE`

procedure. Once this has been done, the `SOMPREDICT`

procedure can be used to generate predicted values of the summary variables for new or hypothetical observations.

The identifier for the SOM is specified by the `IDENTIFIER`

parameter. The names of variables corresponding to the weights are provided in a text specified by the `VARIABLENAMES`

parameter. The row and column positions of the nodes are specified by the `ROWS`

and `COLUMNS`

options. These can be set to scalars, specifying the numbers of rows and columns in a rectangular grid. The row and column coordinates are then positive integers starting at one. Alternatively, you can define your own row and column coordinates (which then need not be in a rectangular grid), by setting `ROWS`

and `COLUMNS`

to variates. By default, `ROWS`

is 5 and `COLUMNS`

is 6. The distance and weighting methods are specified by the `DMETHOD`

and `WMETHOD`

options, respectively.

Options: none.

Parameters: `IDENTIFIER`

, `VARIABLENAMES`

, `ROWS`

, `COLUMNS`

, `DMETHOD`

, `WMETHOD`

.

### Method

For further information, see Hastie, Tibshirani & Friedman (2001) Section 14.4.

### Reference

Hastie, T., Tibshirani, R, & Friedman, J. (2001). *The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. Springer-Verlag, New York.

### See also

Procedures: `SOMADJUST`

, `SOMDESCRIBE`

, `SOMESTIMATE`

, `SOMIDENTIFY`

, `SOMPREDICT`

.

Commands for: Data mining.

### Example

CAPTION 'SOM example'; STYLE=meta SOM Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W) PRINT Som[1...5]