Forms correlations between variates, autocorrelations of variates, and lagged cross-correlations between variates.

### Options

`PRINT` = string tokens |
What to print (`correlations` , `autocorrelations` , `partialcorrelations` , `crosscorrelations` ); default `*` |
---|---|

`GRAPH` = string tokens |
What to display with graphs (`autocorrelations` , `partialcorrelations` , `crosscorrelations` ); default `*` |

`MAXLAG` = scalar |
Maximum lag for results; default `*` i.e. value inferred from variates to save results |

`CORRELATIONS` = symmetric matrix |
Stores the correlations between the variates specified by the `SERIES` parameter |

### Parameters

`SERIES` = variates |
Variates from which to form correlations |
---|---|

`LAGGEDSERIES` = variates |
Series to be lagged to form crosscorrelations with first series |

`AUTOCORRELATIONS` = variates |
To save autocorrelations, or to provide them to form partial autocorrelations if `SERIES` =* |

`PARTIALCORRELATIONS` = variates |
To save partial autocorrelations |

`CROSSCORRELATIONS` = variates |
To save crosscorrelations |

`TEST` = scalars |
To save test statistics |

`VARIANCES` = variates |
To save prediction error variances |

`COEFFICIENTS` = variates or matrices |
To save prediction coefficients: in a variate to keep only those for the maximum lag, or in a matrix to keep the coefficients for all lags up to the maximum |

### Description

The most straightforward use of the `CORRELATE`

directive is to calculate correlation coefficients between a set of variates. For example this would display the correlations between the variates `Age`

, `Height`

and `Weight`

as a lower-triangular matrix.

`CORRELATE [PRINT=correlations; CORRELATIONS=Corr]\`

` Age,Height,Weight`

The correlations are also saved in the symmetric matrix `Corr`

using the `CORRELATIONS`

option. Note that, if there are missing values, `CORRELATE`

uses only those units where none of the variates is missing.

`CORRELATE`

can also be used to obtain autocorrelations of a time series, that is the correlations between values in the series lagged by particular time intervals. The set of autocorrelations for all possible lags is the *autocorrelation function*. You can derive the *partial autocorrelation function* from these. To look at the relationship between two series, you should use the *cross-correlation function* between one series and the other lagged by the various intervals. The sample autocorrelation function of a series can be displayed either as a table of numbers, or as a graph – called a *correlogram*. In either case, you must specify the maximum lag for which the autocorrelation is to be calculated, *m* say. You can do this either by setting the `MAXLAG`

option to *m*, or by pre-defining the length of a variate to be *m*+1 and including it in the `AUTOCORRELATIONS`

parameter to store the calculated values. Genstat includes the autocorrelation at lag 0 in the autocorrelation function; this is always unity. The formula used for the sample autocorrelation at lag *k* is

*r _{k}* = (1 –

*k*/

*n*) ×

*C*/

_{k}*C*

_{0}

where

*C _{k}* = (1 /

*n*) ∑

_{k}_{i = 1 … n–k}{(

*y*– mean(

_{t}*y*)) (

*y*

_{t+}_{k}– mean(

*y*))}

The number *n _{k}* is the number of terms included in the sum. The series can contain missing values, but the calculation excludes any product that involves any missing values at all. You can restrict a series, but the restricted set must consist of a contiguous set of units. Thus, you can look at the autocorrelation function derived from just the first section of a series, or from just the last section, or from a section in the middle; but you cannot use restriction to exclude a section from the middle of the series, or to exclude just individual observations.

The `AUTOCORRELATIONS`

parameter allows you to save the calculated autocorrelations. If you want to display a correlogram in a different form from the standard one produced by the `GRAPH`

option, you must save the autocorrelations and plot them explicitly using either the `GRAPH`

or `DGRAPH`

directives. You will then need to define the variate of lags from *0* to *m*.

The `TEST`

parameter of `CORRELATE`

allows you to save a statistic that can be used to test the hypothesis that the true autocorrelation is zero for positive lags. It is defined as

*S* = *n* ∑_{k=1 … m} { *r _{k}*

^{2})

Provided *n* (the number of data values) is large and *m* (the maximum lag) is much smaller than *n*, then under the null hypothesis, the statistic has a chi-square distribution with *m* degrees of freedom. Thus, a large value provides evidence of autocorrelation in a time series.

You can calculate autocorrelation functions for several series in one statement by specifying several variates with the `SERIES`

parameter.

Genstat forms partial autocorrelations from an autocorrelation function. The value at lag *k* is defined as

corr( *y _{t}*,

*y*│

_{t-k}*y*,

_{t-1}*y*…

_{t-2}*y*)

_{t-k+1}representing the excess correlation between values separated by *k* timepoints that is not accounted for by the intermediate points; it is denoted by *φ _{k,k}* because it is also the value of the last in the set of coefficients in the autoregressive prediction equation:

*y _{t}* =

*c*+ φ

_{k,1}

*y*–

_{t}_{1}+ … + φ

_{k,k}

*y*+

_{t-k}*e*

_{k}_{,t}

Genstat calculates these coefficients recursively for *k*=1…*m* by

φ_{k,k} = ( *r _{k}* – φ

_{k-1,1}

*r*–

_{k}_{1}– … – φ

_{k-1,k-1}

*r*

_{1}) /

*v*–

_{k}_{1}

φ_{k,j} = φ_{k-1,j} – φ_{k,k}φ_{k-1,k–j} , *j*=1…*k*-1

*v _{k}* =

*v*–

_{k}_{1}(1 – φ

_{k,k}

^{2 })

It starts with *v*_{0}=1, the quantity *v _{k}* being the

*k*th order prediction error variance ratio

variance(*e _{k}*

_{,t}) / variance(

*y*).

_{t}Partial correlations provide a valuable alternative way of displaying the autocorrelation structure of a series. You can display the partial autocorrelation function either as a table of numbers, or as a graph. Two methods are available for doing this. You can supply the series using the `SERIES`

parameter, in which case the autocorrelations are formed first, automatically, and the partial autocorrelations are then derived from them. Alternatively, you can set `SERIES=*`

, and provide the autocorrelations using the `AUTOCORRELATIONS`

parameter. You can specify the maximum lag, either by setting the `MAXLAG`

option, or by pre-defining the length of a variate specified for either the `AUTOCORRELATIONS`

or the `PARTIALCORRELATIONS`

parameter.

You can save the partial autocorrelation function using the `PARTIALCORRELATIONS`

parameter. You can set the `VARIANCES`

and `COEFFICIENTS`

parameters to variates to save the *prediction-error variances* *v _{0}…v_{m}*, and the

*prediction coefficients*

*1, φ*for the maximum lag

_{m,1}… φ_{m,m}*m*. Genstat sets the first coefficient to 1, and also the first element of the partial autocorrelation sequence to 1: you should find this to be a useful convention for the lag 0 values. Alternatively, if the

`COEFFICIENTS`

parameter is set to a matrix structure, the rows of this matrix will be used to save the prediction coefficients for *all*the orders up to the maximum lag.

` CORRELATE`

will print a warning if you include missing values in an autocorrelation function that you have supplied, or if for some other reason the autocorrelations are invalid. In particular, if a partial autocorrelation value is obtained outside the range (-1, 1), Genstat will truncate the sequence at the previous lag.

You can calculate cross-correlations between two series by specifying one series with the `SERIES`

parameter and the other with the `LAGGEDSERIES`

parameter. You must define the maximum lag, as for autocorrelations, and you can again plot or tabulate the resulting function. Missing values are allowed, as for autocorrelations. Genstat calculates the sample cross-correlation between the first series *x _{t}* and the lagged series

*y*at lag

_{t}*k*using:

*r _{k}* = (1 –

*k*/

*n*)

*C*/ (

_{k}*s*)

_{x}s_{y}where

*C _{k}* = (1 /

*n*) ∑

_{k}_{i = 1 … n–k}{(

*x*– mean(

_{t}*x*)) (

*y*

_{t+}_{k}– mean(

*y*))}

The series *x _{t}* and

*y*may be of different lengths. The summation includes all possible terms, but excludes any product containing missing values; the number

_{t}*n*is the number of terms included in the sum. The values and are the sample means, and

_{k}*s*,

_{x}*s*are the sample standard deviations. The number

_{y}*n*is the minimum of the number of values of

*x*and of

*y*, excluding missing values. You can restrict either series to a set of contiguous units: if both are restricted, their restrictions must match.

You can save the cross-correlation function using the `CROSSCORRELATIONS`

parameter. You can also save a test statistic using the `TEST`

parameter; this is used similarly to the statistic to test for lack of lagged cross-correlation in one direction of the relationship between two series. However the test is valid only if each of the series has a zero autocorrelation function. Cross-correlations take precedence in the storage. Thus if you request both autocorrelations and cross-correlations in a single `CORRELATE`

statement, the stored test statistic will relate to the cross-correlations: that for the autocorrelations will not be stored.

Options: `PRINT`

, `GRAPH`

, `MAXLAG`

, `CORRELATIONS`

.

Parameters: `SERIES`

, `LAGGEDSERIES`

, `AUTOCORRELATIONS`

, `PARTIALCORRELATIONS`

, `CROSSCORRELATIONS`

, `TEST`

, `VARIANCES`

, `COEFFICIENTS`

.

### Action with `RESTRICT`

You can restrict the units involved in the calculation of the correlations by restricting either the `SERIES`

variate, or the `LAGGEDSERIES`

variate (if present). For the calculation of autocorrelations, partial-correlations or cross-correlations, the restriction must define a contiguous set of units. If `SERIES`

and `LAGGEDSERIES`

are both restricted, they must be restricted in exactly the same way.

### See also

Procedures: `DCORRELATION`

, `FCORRELATION`

, `PRCORRELATION`

, `PARTIALCORRELATIONS`

, `FVCOVARIANCE`

.

Functions: `CORRELATION`

, `COVARIANCE`

. `VARIANCE`

.

Commands for: Basic and nonparametric statistics, Calculations and manipulation, Time series.

### Example

" Example CORR-1: Calculate the acf and pacf of a series" FILEREAD [NAME='%gendir%/examples/CORR-1.DAT'] Y CALCULATE n = NVALUES(Y) VARIATE [VALUES=1...n] X1 " Form the differenced and doubly differenced series" CALCULATE Dy,Ddy = DIFF(Y,Dy; 1) " Display the acf and pacf of the series" VARIATE [VAL=0...50] Lag UNIT Lag CORRELATE [MAX=50; GRAPH=a,p] Y " Print and save the acf and pacf of the diff and 2-diff series" CORRELATE [MAX=50; PRINT=a,p] SERIES=Dy,Ddy; AUTO=Acfdy,Acfddy;\ PARTIAL=Pacfdy,Pacfddy