Analyses stratified random surveys by expansion or ratio raising (S.D. Langton).

### Options

`PRINT` = string token |
Controls printed output (`summary` , `totals` , `means` , `influence` , `ratios` , `extra` ); default `summ` , `tota` , `infl` |
---|---|

`PLOT` = string token |
Controls which high-resolution graphs are plotted (`single` , `separate` ); default `*` i.e. none |

`XMISSING` = string token |
Action if x-variable contains missing values (`estimate` , `fault` ); default `esti` |

`RESTRICTED` = string token |
Action with restricted (or filtered) observations (`omit` , `add` ); default `omit` |

`STRATUMFACTOR` = factor |
Stratification factor; default `*` i.e. unstratified |

`NINFLUENCE` = scalar |
Number of influential points to print; default 10 |

`METHOD` = string token |
Method for ratio analysis (`separate` , `combined` , `classicalcombined` ); default `sepa` |

`SAVESUMMARY` = string token |
Whether to save just the overall summaries instead of those for each stratum (`yes` , `no` ); default `no` |

`COMBINEDSTRATUM` = scalar |
Stratum for which the ratio should be set to the combined ratio estimate; default `*` |

`ROWS` = scalars |
Number of rows of plot-matrix; default `*` i.e. set automatically depending on number of levels of `STRATUMFACTOR` |

`COLUMNS` = scalars |
Number of columns of plot-matrix; default `*` i.e. set automatically depending on number of levels of `STRATUMFACTOR` |

`NBOOT` = scalar |
Number of bootstrap samples to use; default 0 |

`SEED` = scalar |
Seed for random number generator for bootstrap; default 0 |

`CIPROBABILITY` = scalars |
The probability level for the confidence intervals; default 0.95 |

`CIMETHOD` = string token |
Method for forming confidence intervals (`automatic` , `tdistribution` , `percentile` ); default `auto` |

`COMPACT` = string token |
Whether to produce output in a compact (plaintext) format (`yes` , `no` ); default `no` |

### Parameters

`Y` = variates |
Response data |
---|---|

`X` = variates |
Base data; if unset expansion raising is used |

`LABELS` = variates, factors or texts |
Structure for labelling influential points |

`NUNITS` = tables, scalars or variates |
Numbers of units in each stratum in the population |

`XTOTALS` = tables, scalars or variates |
Population totals of the base data in each stratum |

`TOTALS` = tables or scalars |
Saves total estimates |

`SETOTALS` = tables or scalars |
Saves standard errors of estimates |

`MEANS` = tables or scalars |
Saves mean estimates |

`SEMEANS` = tables or scalars |
Saves standard errors of mean estimates |

`RATIOS` = tables |
Saves estimates of ratios |

`FITTEDVALUES` = variates |
Saves fitted values for the observations |

`INFLUENCE` = variates |
Saves influence statistics |

`LTOTALS` = tables or scalars |
Saves lower confidence limit for total |

`UTOTALS` = tables or scalars |
Saves upper confidence limit for total |

`LMEANS` = tables or scalars |
Saves lower confidence limit for mean |

`UMEANS` = tables or scalars |
Saves upper confidence limit for mean |

`VARIANCES` = tables or scalars |
Saves residual variances in each stratum |

### Description

`SVSTRATIFIED`

analyses the results from a stratified random survey, either by expansion or ratio raising, and allows detection of outliers. The sample data are supplied, in a variate, using the `Y`

parameter. Similarly the base data are provided using the `X`

parameter. The `LABELS`

parameter can supply a variate, factor or text for labelling individual units in the output. If `X`

is unset or missing, expansion raising is used (i.e. the usual stratified random sampling analysis) but within a stratum units must either all have base data or all lack it. (Note: *stratum* is used here in the survey sense, not as in the `ANOVA`

directive: i.e. the units are assumed to be classified into groups, and each group is called a stratum.) If option `XMISSING`

is set to `fault`

, any missing base data will cause a fault.

The vectors `Y`

, `X`

and `LABELS`

should usually have one row for each unit in the survey population, with unsampled or non-responding units having a missing value in the Y variate. However, if parameter `NUNITS`

is set, the `Y`

variate may contain only the response data; `NUNITS`

then supplies the information about the number of units in each stratum in the full population. Similarly, if ratio estimation is required, `XTOTALS`

should contain the population totals of `X`

in each stratum.

The `METHOD`

specifies which method of ratio estimation to use. The setting `separate`

estimates a ratio for each stratum, whereas settings `combined`

and `classicalcombined`

assume a common ratio in all strata. The `classicalcombined`

method follows the approach shown in most textbooks, where the estimate for a stratum is given by ∑`X`

× *ratio* where the summation is over all units in the stratum. This approach can produce illogical estimates in some situations (e.g. the estimate may be less than the sum of the responses) and so the `combined`

method estimates only for the unobserved units and adds this to the sum of the observed responses in the stratum, i.e. ∑`Y`

+ ∑`X`

× *ratio* where the summation of `Y`

is over sampled (or responding) units and the summation of `X`

is over unsampled units. Option `COMBINEDSTRATUM`

is used with the separate ratio method and allows the ratio in a particular stratum to be reset to the combined ratio value; this can be a useful technique for dealing with the extreme ratios sometimes produced when the sampling fraction in a stratum is very low.

Printing is controlled via the `PRINT`

option. The default settings are `summary`

, `totals`

and `influence`

; these print a summary of the data, estimated totals and influence statistics, respectively. The setting `means`

produces a table showing the estimated means, whilst `ratio`

produces a low-resolution plot of the confidence limits for the ratio estimates; this can be useful when deciding whether a combined ratio estimate is to be used. The setting `extra`

displays extra information relating to the analysis, including sums and means of the response data and raising factors (weights).

The `CIPROBABILITY`

option sets the probability level used in calculation of confidence limits for means and totals. The `CIMETHOD`

option controls how confidence limits are formed after bootstrapping: `percentile`

uses simple percentiles of the bootstrapped distribution, whilst `tdistribution`

calculates a standard error from the bootstrapped estimates and then uses the t-distribution to form intervals; the default of `automatic`

uses the percentile method unless less than 400 bootstrap samples have been made.

The `NINFLUENCE`

option controls the number of points of high influence printed. The `COMPACT`

option can be used to switch to a compact, plain-text style for the output, designed for printing concise summaries of an analysis. When `COMPACT=yes`

, the information printed depends on the width of the first output channel, with more information being displayed when this can be done without splitting tables.

By default all standard errors and confidence limits are calculated using the conventional approximations. Alternatively, bootstrap methods may be used by setting the `NBOOT`

option to the required number of bootstrap samples. In the case of ratio estimation, the samples are used to form bootstrap estimates of the ratio, which are then applied to the known population totals for `X`

. Bootstrapping is carried out independently in each stratum, using the method described by Sarndal *et al.* (1992, page 442); this involves creating a “pseudopopulation” containing *n* replicates of each observation, where *n* is nearest integer to the expansion raising factor (inverse of inclusion probability) for the stratum. Bootstrap samples of the same size as the original sample are then taken from the pseudopopulation and used to compute the estimates. The `SEED`

option specifies the seed to use in the random number generator used to construct the bootstrap samples. The default value of zero continues an existing sequence of random numbers or, if the generator has not yet been used in this run of Genstat, it initializes the generator automatically.

Graphical output is available by setting the `PLOT`

option. The setting `single`

produces a single plot of the response data against `X`

or against the stratum number if `X`

is unset. A fitted line is shown if one of the combined ratio methods is used. The `separate`

setting produces one graph for each stratum, with up to six graphs on each screen. All graphs are plotted on the log scale.

Output can be saved using the parameters `TOTALS`

, `SETOTALS`

, `MEANS`

, `SEMEANS`

, `LTOTALS`

, `UTOTALS`

, `LMEANS`

and `UMEANS`

. These are generally set to a table classified by the stratification factor but, if option `SAVESUMMARY=yes`

, then they save scalars containing only the grand total summed over all strata. Ratios can be saved in a table using the `RATIOS`

parameter, whilst the residual variances in each stratum can be saved using `VARIANCES`

; the latter are useful for working out optimal allocation strategies for future surveys. Fitted values and influence statistics may be saved using parameters `FITTEDVALUES`

and `INFLUENCE`

. The fitted values are the `X`

value multiplied by the appropriate ratio for each unit or, where expansion raising is used, the mean `Y`

value for the stratum.

Options: `PRINT`

, `PLOT`

, `XMISSING`

, `RESTRICTED`

, `STRATUMFACTOR`

, `NINFLUENCE`

, `METHOD`

, `SAVESUMMARY`

, `COMBINEDSTRATUM`

, `ROWS`

, `COLUMNS`

, `NBOOT`

, `SEED`

, `CIPROBABILITY`

, `CIMETHOD`

, `COMPACT`

.

Parameters: `Y`

, `X`

, `LABELS`

, `NUNITS`

, `XTOTALS`

, `TOTALS`

, `SETOTALS`

, `MEANS`

, `SEMEANS`

, `RATIOS`

, `FITTEDVALUES`

, `INFLUENCE`

, `LTOTALS`

, `UTOTALS`

, `LMEANS`

, `UMEANS`

, `VARIANCES`

.

### Method

The methods used are described in most survey analysis textbooks; see for example, Sampford (1962) or Lehtonen & Pahkinen (1994). Most calculations are carried out using Genstat table structures.

### Action with `RESTRICT`

The action with `RESTRICT`

depends of the setting of the `RESTRICTED`

option. By default restricted units are totally excluded from the analysis. If `RESTRICTED`

is set to `add`

, restricted observations are excluded from the ratio calculations but then added back into the total estimates; this is a technique for dealing with nonrepresentative outliers (see e.g. Lee, 1995), which are believed to be genuine observations but are not representative of the wider population.

### References

Lee, H. (1995). Outliers in Business Surveys. Chapter 26 of *Business Survey Methods* (ed. Cox, Binder, Hinnappa, Christianson, Colledge & Kott). Wiley, New York.

Lehtonen, R. & Pahkinen, E.J. (1994). *Practical Methods for Design and Analysis of Complex Surveys*. Wiley, New York.

Sampford, M.R. (1962). *An introduction to Sampling Theory*. Oliver & Boyd, London.

### See also

Procedures: `SVBOOT`

, `SVCALIBRATE`

, `SVGLM`

, `SVHOTDECK`

, `SVREWEIGHT`

, `SVSAMPLE`

, `SVTABULATE`

, `SVWEIGHT`

.

Commands for: Survey analysis.

### Example

CAPTION 'SVSTRATIFIED example',\ 'Orkney oats data (Sampford, Table 5.1, page 61).';\ STYLE=meta,plain " Firstly stratified random sample, entered with sample data only, plus table with population size - see Table 6.1, page 73." VARIATE Oats READ Oats 15 20 18 18 23 27 25 60 28 128 69 72 : FACTOR [LEVELS=3; VALUES=4(1,2,3)] Stratum TABLE [CLASS=Stratum; VALUES=12,12,11] N SVSTRATIFIED [PRINT=summary,totals; STRATUMFACTOR=Stratum] Oats; NUNITS=N " Secondly ratio analysis - data entered as one row for each farm in the population - see page 109." VARIATE Oats READ Farm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 : READ Crops 50 50 52 58 60 60 62 65 65 68 71 74 78 90 91 92 96 110 140 140 156 156 190 198 209 240 274 300 303 311 324 330 356 410 430 : READ Oats 17 17 10 16 6 15 20 18 14 20 24 18 23 0 27 34 25 24 43 48 44 45 60 63 70 28 62 59 66 58 128 38 69 72 103 : " To form the sample of 5 farms used, replace the others with missing values." CALCULATE Oats=MVINSERT(Oats; Farm.NI.!(1,15,23,30,33)) SVSTRATIFIED [PRINT=summary,totals,means] Oats; X=Crops