Saves and/or prints summary statistics for variates (R.C. Butler & D.A. Murray).

### Options

`PRINT` = string token |
Controls whether or not the summaries are printed (`summaries` ); default `summ` |
---|---|

`SELECTION` = string tokens |
Selects the statistics to be produced (`nval` , `nobs` , `nmv` , `mean` , `median` , `min` , `max` , `range` , `q1` , `q3` , `sd` , `sem` , `var` , `sevar` , `%cv` , `sum` , `ss` , `uss` , `skew` , `seskew` , `kurtosis` , `sekurtosis` , `all` ); default `mean` , `min` , `max` , `nobs` , `nmv` , `medi` , `q1` , `q3` |

`GROUPS` = factor |
Allows groups to be defined, so that summaries are produced for each group in turn |

### Parameters

`DATA` = variates |
Data to summarize |
---|---|

`SUMMARIES` = variates or pointers |
To save summaries for each `DATA` variate, in a variate if `GROUPS` is unset, or in a pointer to a set of variates (one for each group) if groups have been specified; will be redefined if necessary |

### Description

`DESCRIBE`

calculates up to 22 different summary statistics for values stored in a variate. The statistics may be saved, or printed, or both. The statistics to be calculated are indicated by the `SELECTION`

option; the available settings are:

`nval` |
number of values |
---|---|

`nobs` |
number of non-missing values |

`nmv` |
number of missing values |

`mean` |
arithmetice mean |

`median` |
median |

`min` |
minimum |

`max` |
maximum |

`range` |
range (max-min) |

`q1` |
lower quartile |

`q3` |
upper quartile |

`sd` |
standard deviation |

`sem` |
standard error of mean |

`var` |
variance |

`sevar` |
standard error of variance |

`%cv` |
coefficient of variation |

`sum` |
total of values |

`ss` |
corrected sum of squares |

`uss` |
uncorrected sum of squares |

`skew` |
skewness (see Method) |

`seskew` |
standard error of skewness |

`kurtosis` |
kurtosis (see Method) |

`sekurtosis` |
s.e. of kurtosis |

`all` |
all 22 summaries |

by default the mean, min, max, nobs, nmv, median and both quartiles are calculated.

Printing is controlled by the `PRINT`

option. The statistics are printed by default, so to suppress printing you need to put `PRINT=*`

.

The `GROUPS`

option allows groups of observations to be defined. Summaries are then given for each group.

The `SUMMARIES`

parameter allows the statistics to be saved in a variate, or a pointer to a set of variates if there are groups. These need not be declared in advance. The units of the variate(s) are labelled by the corresponding strings from the settings (in capital letters) of the `SELECTION`

option, to simplify the subsequent access of any individual statistic. For example, the minimum value can be copied from a `SUMMARIES`

variate `v`

into a scalar `m`

by

`CALCULATE m = v$['MIN']`

Options: `PRINT`

, `SELECTION`

, `GROUPS`

.

Parameters: `DATA`

, `SUMMARIES`

.

### Method

The statistics are calculated in a variate which is then restricted to print only those that were required, and to obtain the unit numbers of those to be copied into the `SUMMARIES`

variate.

SE Variance is calculated as

√((*N* (*M*_{4} – 4 *M*_{1} *M*_{3} + 6 *M*_{1} *M*_{1} *M*_{2} – 3 *M*_{1}^{4})/(*N*-1) – (*N* (*M*_{2} – *M*_{1} *M*_{1})/(*N*-1))^{2})/*N*)

Skewness is calculated as (*M*_{3} – 3 *M*_{1} *M*_{2} + 2 *M*_{1}^{3 }) / (*M*_{2} – *M*_{1} *M*_{1})^{3/2}

SE Skewness is calculated as √({6*N*×(*N*-1)}/{(*N*-2)×(*N*+1)×(*N*+3)})

Kurtosis is calculated as (*M*_{4} – 4 *M*_{1} *M*_{3} + 6 *M*_{1}^{2} *M*_{2} – 3 *M*_{1}^{4})/(*M*_{2} – *M*_{1} *M*_{1})^{2} – 3

SE Kurtosis is calculated as √({24*N*(N-1)^{2}}/{(*N*-2)(*N*-3)(*N*+5)(*N*+3)})

where *M _{i}* = ∑

*x*/

^{i}*N*

and *N* = `NOBSERVATIONS(DATA)`

### Action with `RESTRICT`

The statistics are calculated for the restricted set of units from each `DATA`

variate. Any existing restrictions are not affected by the procedure.

### See also

Directive: `TABULATE`

.

Procedures: `CDESCRIBE`

, `PTDESCRIBE`

, `TABMODE`

.

Commands for: Basic and nonparametric statistics.

### Example

CAPTION 'DESCRIBE example',\ !t('1. The default statistics (mean, min, max, nobs, nmv, median, q1, q3)',\ 'are printed for a variate of 20 random numbers called data.');\ STYLE=meta,plain VARIATE [NVALUES=20] data CALCULATE data = URAND(50697; 20) DESCRIBE data CAPTION !t('2. From a variate containing the weights of children and a',\ 'factor sex, use DESCRIBE to print median and quartiles of the',\ 'weights of the girls, and save them in a variate called save.') VARIATE [values=38.2,40.1,45.2,39.6,41.4,47.9,38.8,42.3,47.5,41.2] weight FACTOR [label=!t(girl,boy); values=5(1,2)] sex RESTRICT weight; CONDITION=( sex .IN. 'girl') DESCRIBE [SELECT=median,q1,q3] weight; SUMMARIES=save PRINT save RESTRICT weight CAPTION !t('3. Use DESCRIBE to print all the summary statistics',\ 'for girls and boys separately.') DESCRIBE [GROUP=sex; SELECTION=all] weight CAPTION !t('4. Use DESCRIBE to save (but not print) the skewness and',\ 'its standard error in a variate called skew, for',\ '100 Normally distributed random numbers.') CALC normal = grnormal(100; 20; 10) HISTOGRAM normal DESCRIBE [PRINT=*; SELECT=skew,seskew] normal; SUMMARIES=skew PRINT [RLPRINT=*] !t('skewness','s.e. of skewness'),skew