Performs factor analysis.

### Options

`PRINT` = string tokens |
Printed output required (`communalities` , `loadings` , `coefficients` , `scores` , `residuals` , `cresiduals` , `vresiduals` , `tests` ); default `*` i.e. no printing |
---|---|

`NDIMENSIONS` = scalar |
Number of factors to fit; no default, must be specified |

`METHOD` = string token |
Whether to use correlations or variances and covariances (`correlation` , `vcovariance` , `variancecovariance` ); default `vcov` |

`MAXCYCLE` = scalar |
Maximum number of iterations; default 50 |

`TOLERANCE` = scalar |
Minimum value to assume for the unique component ψ_{i}^{2} of each observed variable; default 10^{-6} |

### Parameters

`DATA` = pointers or matrices or symmetric matrices or SSPMs |
Pointer of variates forming the data matrix, or matrix storing the variate values by columns, or symmetric matrix storing their variances and covariances, or SSPM giving their sums of squares and products |
---|---|

`NUNITS` = scalars |
When `DATA` is set to a symmetric matrix of variances and covariances, `NUNITS` must specify the number of units from which they were calculated if tests are required |

`LRV` = LRVs |
Saves the loadings, latent roots and trace from each analysis |

`SSPM` = SSPMs |
Saves the SSPM formed from a `DATA` matrix or pointer |

`COMMUNALITIES` = variates |
Saves the communalities |

`COEFFICIENTS` = matrices |
Saves the factor score coefficients |

`SCORES` = matrices or pointers |
Saves the factor analysis scores |

`RESIDUALS` = matrices or pointers |
Saves residuals from the dimensions fitted in the analysis |

`CRESIDUALS` = symmetric matrices |
Saves the residual correlation or covariance matrix |

`VRESIDUALS` = variates |
Saves the residual variances |

### Description

Factor analysis aims to find a set of “latent” (or unobservable) variables {*z*_{1}…*z _{k}*} that account for the variances and covariances S between a set of

*p*observed variables {

*x*

_{1}…

*x*}. In the terminology of factor analysis, the latent variables {

_{p}*z*} are known as

_{i}*factors*. However, they are continuous variables, and thus are represented in Genstat by variate rather than by factor data structures. So to avoid confusion, when we refer to the latent variables below,

*factor*will be printed in italic font.

The data for a factor analysis consists of observed measurements on the variables {*x _{i}*} made on a set of subjects. The assumption is that, for each subject, the values of the observed variables are related to the

*factors*by a linear model

*x* = *μ* + Γ *z* + *ε*

where *x* is the vector of observed variables,

*z* is the vector of *factors*,

*μ* is a vector of means for the observed variables,

Γ is a matrix of *loadings* defining the relationship between observed and latent variables, and

*ε* is a vector of residuals.

The elements of the residual vector *ε* are assumed to have mean zero and to be uncorrelated, i.e. the dispersion matrix of *ε* is assumed to be diagonal

cov(*ε*) = Ψ = diag(*ψ*_{1}^{2}, … *ψ _{p}*

^{2})

(They thus differ from the residuals formed in a principal components analysis, which will be correlated; see e.g. Krzanowski 1988 Section 16.2 for more details). The *factors* themselves are assumed to have variance one and to be uncorrelated, i.e.

cov(*z*) = I.

So the correlations between the observed variables {*x _{i}*} arise only through their relations with the

*factors*, and not because of any correlation between the residuals or between the

*factors*.

The `DATA`

parameter specifies the data for the factor analysis. You can supply either a pointer containing a set of variates, one for each observed variable {*x _{i}*}, or a matrix storing the observed variables by columns, or a symmetric matrix containing variances and covariances between the variables, or an SSPM structure (formed using

`FSSPM`

from the variates of observed measurements). When `DATA`

specifies a symmetric matrix of variances and covariances, you must also set the `NUNITS`

parameter to specify the number of units from which they were calculated if you want `FCA`

to print tests.The `METHOD`

option has settings `vcovariance`

(with synonym `variancecovariance`

) and `correlation`

, to control whether `FCA`

forms a matrix of variances and covariances or a matrix of correlations for the analysis. The same *factors* will be obtained if you use a correlation matrix, but the loadings will be scaled to be between zero and one. The number of *factors*, *q*, to fit must be specified by the `NDIMENSIONS`

option. Arising from the numbers of parameters in the model (see Krzanowski 1988 Section 16.2.2) this is subject to the constraint

(*p* – *q*)^{2} ≥ *p* + *q*.

The `PRINT`

option controls printed output, with settings:

`communalities` |
the proportion of variation explained by the factors for each observed variable, (var(x) – _{i}ψ_{i}^{2}) / var(x);_{i} |
---|---|

`loadings` |
the matrix of factor loadings Γ; |

`coefficients` |
the factor score coefficients; |

`scores` |
the factor scores calculated from the model for each subject; |

`residuals` |
the vectors of residuals ε, |

`cresiduals` |
the residual correlation or covariance matrix i.e. a symmetric matrix showing the amount of unexplained correlation or covariance between each pair of variables; |

`vresiduals` |
the residual variances; and |

`tests` |
a chi-square goodness of fit test for the model. |

By default nothing is printed. Note, however, that scores and residuals cannot be produced when `DATA`

is set to a symmetric matrix of variances and covariances.

The communalities, factor coefficients, scores, residuals, residual correlations or covariances and residual variances can also be saved using the `COMMUNALITIES`

, `COEFFICIENTS`

, `SCORES`

, `RESIDUALS`

, `CRESIDUALS`

and `VRESIDUALS`

parameters, respectively. The `LRV`

parameter allows an LRV structure to be saved, with the loadings in the `['vectors']`

component, and the eigenvalues of the matrix Ψ^{-½} S Ψ^{-½} in the `['roots']`

component; the loadings are scaled eigenvectors of Ψ^{-½} S Ψ^{-½}. (Remember, S is the matrix of variances and covariances of the observed variables {*x _{i}*}.) The

`SSPM`

parameter can save the SSPM structure constructed from a `DATA`

pointer for the analysis. A particularly convenient instance is when you have supplied an SSPM structure as input but, for example, have set `METHOD=correlation`

: the SSPM that is saved will then contain correlations instead of sums of squares and products.Options: `PRINT`

, `NDIMENSIONS`

, `METHOD`

, `MAXCYCLE`

, `TOLERANCE`

.

Parameters: `DATA`

, `NUNITS`

, `LRV`

, `SSPM`

, `COMMUNALITIES`

, `COEFFICIENTS`

, `SCORES`

, `RESIDUALS`

, `CRESIDUALS`

, `VRESIDUALS`

.

### Method

`FCA`

estimates the parameters of the model by maximum likelihood, assuming multivariate Normality, using subroutines `G03CAF`

and `G03CCF`

from the NAG Library. The `MAXCYCLE`

option sets a limit on the number of iterations (default 50). The `TOLERANCE`

option specifies the minimum value to assume for the unique component *ψ _{i}*

^{2}of each observed variable so that the communality is always less than one; the default is 10

^{-6}.

### Action with `RESTRICT`

If any of the variates in a `DATA`

pointer is restricted, only the defined subset of the units will be used in the analysis.

### References

Krzanowski, W.J. (1988). *Principles of Multivariate Analysis: a User’s Perspective*. Oxford University Press, Oxford.

### See also

Directives: `CVA`

, `MDS`

, `PCO`

, `PCP`

, `ROTATE`

, `SSPM`

.

Procedures: `LRVSCREE`

, `DMST`

, `PLS`

, `RIDGE`

.

Commands for: Multivariate and cluster analysis.

### Example

" Example 2:6.11 " TEXT [VALUES=Gaelic,English,History,Arithmetic,Algebra,Geometry] Subjects SYMMETRICMATRIX [ROWS=Subjects; VALUES=\ 1.000,\ 0.439, 1.000,\ 0.410, 0.351, 1.000,\ 0.288, 0.354, 0.164, 1.000,\ 0.329, 0.320, 0.190, 0.595, 1.000,\ 0.248, 0.329, 0.181, 0.470, 0.464, 1.000] Correlation FCA [PRINT=communalities,loadings,cresiduals,tests; NDIMENSION=2]\ Correlation; NUNITS=220