Does correspondence analysis, or reciprocal averaging (P.G.N. Digby & A.I. Glaser).

### Options

`PRINT` = string tokens |
Printed output from the analysis (`roots` , `rowscores` , `rowinertias` , `rowchisquare` , `rowmass` , `rowquality` , `colscores` , `colinertias` , `colchisquare` , `colmass` , `colquality` ); default `*` i.e. no output |
---|---|

`METHOD` = string token |
Type of analysis required (`correspondence` , `digbycorrespondence` , `biplot` , `reciprocal` ); default `corr` |

`NROOTS` = scalar |
Number of latent roots for printed output; default * requests them all to be printed |

`%METHOD` = string token |
How to represent proportions or %s in quality statistics (`permills` , `percentages` , `proportions` ); default `prop` |

`NDIMENSIONS` = scalar |
Number of dimensions for which quality statistics are required; default 2 |

`ROWSUBSET` = scalars |
Indexes of subset rows |

`COLSUBSET` ` = scalars |
Indexes of subset columns |

`ROWPASSIVE` = scalars |
Indexes of passive rows |

`COLPASSIVE` = scalars |
Indexes of passive columns |

### Parameters

`DATA` = matrices or data matrices |
Data to be analysed |
---|---|

`ROOTS` = diagonal matrices |
Saves the squared singular values from each analysis |

`ROWSCORES` = matrices |
Saves the scores for the rows of the data matrix |

`COLSCORES` = matrices |
Saves the scores for the columns of the data matrix |

`ROWINERTIAS` = matrices |
Saves the inertias for the rows of the data matrix |

`COLINERTIAS` = matrices |
Saves the inertias for the columns of the data matrix |

`ROWQUALITY` = matrices |
Saves the quality statistics for rows of the data |

`COLQUALITY` = matrices |
Saves the quality statistics for columns of the data |

`SAVE` = pointers |
Saves details of the analysis for use by `CABIPLOT` |

### Description

Correspondence analysis is an ordination technique used to analyse two-way categorical data tables. Ordination techniques approximate relationships between variables in a reduced number of dimensions.

The type of analysis is specified by the `METHOD`

option, with one of the following settings:

`correspondence` |
correspondence analysis (Greenacre 1984), |
---|---|

`digbycorrespondence` |
an alternative implementation of correspondence analysis described by Digby & Kempton (1987), |

`reciprocal` |
reciprocal averaging (see Digby & Kempton 1987), or |

`biplot` |
a similar biplot-style analysis (again see Digby & Kempton 1987). |

The default setting is `correspondence`

, and this should be retained if either of the options to subset rows or columns are set.

The data for the procedure are specified by the `DATA`

parameter as either a matrix or a datamatrix (i.e. a pointer to variates, all with the same length). The matrix must not contain any missing values; it is unchanged on exit from the procedure.

Printed output is controlled by the `PRINT`

option with settings:

`roots` |
to print the roots (together with the roots expressed as percentages and cumulative percentages), |
---|---|

`rowscores` |
to print the scores for the rows of the data matrix, |

`rowinertias` |
to print the inertias for the rows of the data matrix, |

`rowmass` |
to print the row masses, |

`rowchisquare` |
to print the row chi-square distances, |

`rowquality` |
to print the quality statistics for the rows, |

`colscores` |
to print the scores for the columns of the data matrix, |

`colinertias` |
to print the inertias for the columns of the data matrix, |

`colmass` |
to print the column masses, |

`colchisquare` |
to print the column chi-square distances, and |

`colquality` |
to print the quality statistics for the columns. |

The `NROOTS`

option controls the printed output of roots, scores and inertias. By default, results are printed for all the roots, but you can set the `NROOTS`

option to specify a lesser number.

The quality settings produce tables with the following columns:

● the mass of the row (or column), in proportion to the total mass;

● the “quality” of the representation i.e. how much of the inertia of a row (or column) is represented by the dimensions shown;

● the proportion of the total inertia of the row (or column) compared to the total inertia for all rows (or columns);

● principal coordinates of the rows (or columns) in the specified dimension;

● the amount of inertia for each row (or column) in the specified dimension relative to the total amount of inertia given by the value of the quality statistic – hence the sum of a specific row (or column) across the dimensions shown will be equal to the value given by the quality statistic;

● the proportion of inertia explained by a row (or column) in a dimension, compared to the total inertia in that dimension.

The representation of the columns of proportions is controlled by the `%METHOD`

option; these can be printed either as proportions (default), percentages or as permills i.e. tenths of a percent. The `NDIMENSIONS`

option specifies the number of dimensions for which to print quality statistics; default 2.

When carrying out correspondence analysis, there may be rows and/or columns (for example outliers with low mass) that you would like to ignore during the calculation of the roots or inertia, so that they have no influence. Instead of removing these rows and/or columns from the data before running `CORANALYSIS`

, an alternative is to list the indexes of the rows or columns that are to be ignored using the `ROWPASSIVE`

and/or `COLPASSIVE`

options. These “passive” rows will still be included in the table of quality statistics, where their relative contributions will be shown and compared to total for all the passive rows or columns.

You may want to apply a correspondence analysis calculated from the whole data set onto only a subset of the rows and/or columns when some of the rows and/or columns divide into groups with common traits. This can be done by setting the `ROWSUBSET`

and/or `COLSUBSET`

options to the indexes of the rows and/or columns indexes in the subset of interest. If any of these options is set, the `METHOD`

option must be set to `correspondence`

. If `ROWPASSIVE`

and `ROWSUBSET`

(or `COLPASSIVE`

and `COLSUBSET`

) are both set, any indexes that occur in both will be removed from the `ROWSUBSET`

(or `COLSUBSET`

).

Results from the analysis can be saved using the parameters `ROOTS`

, `ROWSCORES`

, `COLSCORES`

, `ROWINERTIAS`

, `COLINERTIAS`

, `ROWQUALITY`

and `COLQUALITY`

. The structures specified for these parameters need not be declared in advance. The `SAVE`

parameter can save full details of the analysis for use by the `CABIPLOT`

procedure.

Options: `PRINT`

, `METHOD`

, `NROOTS`

, `%METHOD`

, `NDIMENSIONS`

, `ROWSUBSET`

, `COLSUBSET`

, `ROWPASSIVE`

, `COLPASSIVE`

.

Parameters: `DATA`

, `ROOTS`

, `ROWSCORES`

, `COLSCORES`

, `ROWINERTIAS`

, `COLINERTIAS`

, `ROWQUALITY`

, `COLQUALITY`

, `SAVE`

.

### Method

Full details of correspondence analysis (i.e. `METHOD=correspondence`

) are given by Greenacre (1984 & 2007). The other methods are described by Digby & Kempton (1987).

The data matrix *X*, is scaled to have sum one for `METHOD`

settings `correspondence`

and `digbycorrespondence`

. The matrices *U*, *S* and *V* are taken from the singular-value decomposition of

*Y* = (*X* – *R* *C*) / √(*R* *C*)

for `METHOD=correspondence`

and

*Y* = ( *R*^{-½} *X* *C*^{-½} )

for the other methods, where *R* and *C* are diagonal matrices of row and column totals of the data matrix *X*. The scores for the rows and columns from `METHOD=correspondence`

are

*A* = ( *R*^{-½} *U* )

and

*B* = ( *C*^{-½} *V* )

The scores from `METHOD=digbycorrespondence`

are similar, but are multiplied by *S*. This makes the row scores obtained here the same as the principal coordinates given with the quality statistics.

With the other two methods *X* is not scaled to total one, and the scores are given by *A* = ( *R*^{-½} *U* *S ^{m}* ) and

*B*= (

*C*

^{-½}

*V*

*S*): the parameter

^{m}*m*is zero for

`METHOD=reciprocal`

, and 0.5 for `METHOD=biplot`

.The inertia values for the rows and columns are given by

( *R A A*′ ) *S*′

and

( *C B B*′ ) *S*′

where *S*′ = *S* for `METHOD=correspondence`

, and *S* = 1 for the other methods; see Greenacre (1984) for further information.

The roots are the squares of the singular values. Note that the first singular value will always be one for methods other than `correspondence`

; this corresponds to a trivial solution given in the first column of *A* and *B* above, which is automatically removed from the results printed and saved from `CORANALYSIS`

.

Rows and/or columns chosen as passive rows and/or columns are separated from the original data matrix before it is scaled. Rows and/or columns chosen as subset rows and/or columns are separated from Y after this scaling.

For the quality statistics, the weighted sum-of-squares of the principal coordinates on the *i*th dimension is equal to the *i*th squared singular value. The row and column scores for `METHOD=digbycorrespondence`

are equivalent to the principal coordinates. Conversely the row and column scores for `METHOD=correspondence`

or `reciprocal`

are equivalent to standard coordinates, where the weighted sum-of-squares for each dimension is equal to one.

### References

Digby, P.G.N. & Kempton, R.A. (1987). *Multivariate Analysis of Ecological Communities*. Chapman & Hall, London.

Greenacre, M.J. (1984). *Theory and Applications of Correspondence Analysis*. Academic Press, London.

Greenacre, M. (2007). *Correspondence Analysis in Practice, second edition*. Chapman & Hall, London.

### See also

Procedures: `CABIPLOT`

, `MCORANALYSIS`

.

Commands for: Multivariate and cluster analysis.

### Example

CAPTION 'CORANALYSIS example',\ 'Data from Table 9.1 of Greenacre (2007)'; STYLE=meta,plain TEXT Staff,St; VALUES=!T(Sen_Mngr,Jun_Mngr,Sen_Empl,Jun_Empl,Secretry),\ !T(SM, JM, SE, JE, Sy) & Smoke; VALUES=!T(None,Light,Medium,Heavy) MATRIX [ROWS=Staff; COLUMNS=Smoke] Smoking; VALUES=\ !( 4, 2, 3, 2, 4, 3, 7, 4, 25, 10, 12, 4, 18, 24, 33, 13, 10, 6, 7, 2) PRINT Smoking; FIELDWIDTH=8; DECIMALS=0 CAPTION 'Use CORANALYSIS, printing all results, saving SCORES only.' CORANALYSIS [PRINT=roots,rowscores,colscores,rowinertia,colinertia;\ METHOD=correspondence] Smoking; SAVE=cora1 "Print rowmass" PRINT cora1['rowmass'] "Plot the scores in the 1st and 2nd dimensions. Row are in principal coordinates and columns are in standard coordinates. Figure 9.2 of Greenacre (2007)." CABIPLOT [COLSCALING=standard] LROW=St