Bootstraps data from random surveys (S.D. Langton).

### Options

`PRINT` = string token |
Controls printed output (`summary` ); default `*` i.e. none |
---|---|

`SEED` = scalar |
Seed for random numbers; default 0 |

`STRATUMFACTOR` = factor |
Stratification factor |

`SAMPLINGUNITS` = factor |
Sampling units (default single stage design) |

`WEIGHTS` = variates |
Weights variates (not required for simple bootstrap) |

`METHOD` = string token |
Method (`simple` , `sarndal` ); default `simp` |

`POPULATION` = pointers |
Units in the population |

`SAVEUNITS` = variate |
Units in the bootstrapped sample |

`BSTRATUMFACTOR` = factor |
Bootstrapped stratification factor |

`BSAMPLINGUNITS` = factor |
Bootstrapped sampling units |

### Parameters

`DATA` = variates or factors |
Data to bootstrap |
---|---|

`BOOT` = variates or factors |
Saves bootstrap sampling units |

### Description

`SVBOOT`

forms a single bootstrap sample using data from a stratified one- or two-stage survey. It is designed to be used in a `FOR`

loop, with a new sample being formed and analysed each time that the loop is executed. The `DATA`

parameter supplies a list of structures to be bootstrapped, whilst `BOOT`

contains the corresponding bootstrapped structures. Alternatively, the `SAVEUNITS`

option can be used to save the units in the bootstrapped samples, allowing the bootstrapped structures to be formed by a `CALCULATE`

statement. Options `STRATUMFACTOR`

and `SAMPLINGUNITS`

supply the stratification factor and the sampling units respectively, whilst survey weights are supplied by the `WEIGHTS`

option.

When option `METHOD=simple`

, sampling is with replacement within each stratum. This is the correct approach for an infinite population, but will give reasonable results as long as sampling proportion is not very high. `METHOD=sarndal`

uses the method described by Sarndal *et al*. (1992, page 442), as implemented by Grilli & Pratesi (2004), in which an artificial population is created, containing each element of the sample *w* times, where *w* is the survey weight (the inverse of the probability of inclusion), rounded to the nearest integer. Sampling is then carried out without replacement (not with replacement as Sarndal recommends). For two-stage sampling `WEIGHTS`

should be set to a list of two variates, the first giving the overall sampling weights and the second the weights at the first stage only (typically the inverse of the probability of selection of the primary sampling units).

The Sarndal approach works well as long as either the weights are integers, or they are large enough that the effect of rounding is negligible. For surveys with high sampling fractions, `METHOD=random`

implements a variant on the Sarndal method in which the artificial population is formed by a random process, using resampling in proportion to the weights and ensuring that each observation is present at least once in the population. Care must be taken when using this method, as means, totals and other statistics will vary slightly between the different artificial populations. With this method it may sometimes be helpful to form repeated bootstrap samples from the same pseudo-population; this can be achieved by means of the `POPULATION`

option.

Except in simple surveys with no restrictions, the number of units in each bootstrapped sample will not be the same as the original survey and so options `BSTRATUMFACTOR`

and `BSAMPLINGUNITS`

save new factors for use with the bootstrapped structures.

Options: `PRINT`

, `SEED`

, `STRATUMFACTOR`

, `SAMPLINGUNITS`

, `WEIGHTS`

, `METHOD`

, `POPULATION`

, `SAVEUNITS`

, `BSTRATUMFACTOR`

, `BSAMPLINGUNITS`

.

Parameters: `DATA`

, `BOOT`

.

### Method

a) simple, one-stage

A new variate is formed for each stratum containing the unit numbers associated with each stratum, indexed by a grouping factor. The new bootstrap sample is then formed by selecting from these at random with replacement. Any weights set are ignored. The new samples are in stratum order, rather than the order of the original dataset.

b) simple, two-stage

The method described above is applied twice, once to select primary sampling units at random from those in the stratum, and once to select secondary sampling units from those in the appropriate psu.

c) Sarndal, one-stage

An artificial population is generated for each stratum, with each unit being replicated *w* times, where *w* is the appropriate weight, rounded to the nearest integer. Sampling is then carried out, without replacement, using the inverse of the weights as inclusion probabilities. For reasons of computational simplicity, the bootstrap sample sizes are not fixed, and will therefore differ slightly from the one in the original sample.

d) Sarndal, two-stage

The method described above is applied twice, once to select primary sampling units at random from those in the stratum, and once to select secondary sampling units from those in the appropriate psu.

e) Random

This method is designed as an alternative to the Sarndal method when the sampling fraction is very high, so that the rounded weights are equal to one and the same sample is always generated. The pseudo-population is formed by including each of the sampled observations once and then resampling with replacement from the sampled observations to generate the remaining *N*–*n* units in the pseudo-population (where *N* is the population size, and *n* is the sample size in the stratum). This method is currently only implemented for one stage sampling with equal weights in a stratum. The pseudo-population is then sampled without replacement, as in the Sarndal method.

### Action with `RESTRICT`

Restricted units are excluded from the bootstrapping process and do not occur in the resampled dataset The restriction is defined by the first variate in the DATA list, if this is set.

### References

Grilli, L. & Pratesi, M. (2004). Weighted estimation in multilevel ordinal and binary models in the presence of informative sampling designs. *Survey Methodology*, 30, 93-103.

Sarndal, C., Swensson, B. & Wretman, J. (1992). *Model Assisted Survey Sampling*. Springer-Verlag, New York.

### See also

Procedures: `BOOTSTRAP`

, `SVCALIBRATE`

, `SVGLM`

, `SVHOTDECK`

, `SVREWEIGHT`

, `SVSAMPLE`

, `SVSTRATIFIED`

, `SVTABULATE`

, `SVWEIGHT`

.

Commands for: Survey analysis.

### Example

CAPTION 'SVBOOT example',\ 'Data from Sampford, Table 5.1, page 61, using farms of Table 6.1.';\ STYLE=meta,plain FACTOR [LEVELS=3] stratum TABLE [CLASS=stratum; VALUES=12,12,11] N READ farm,stratum,crops,oats 6 1 60 15 7 1 62 20 8 1 65 18 12 1 74 18 13 2 78 23 15 2 91 27 17 2 96 25 23 2 190 60 26 3 240 28 31 3 324 128 33 3 356 69 34 3 410 72 : SVBOOT [STRATUM=stratum; BSTRATUMFACTOR=bstratum; SEED=9817904]\ farm,crops,oats; BOOT=bfarm,bcrops,boats PRINT bstratum,bfarm,bcrops,boats