Creates a separation plot for visualising the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome (V.M. Cave).

### Options

`METHOD` = string token |
Method used to plot the predicted probabilities ( |

`PLOT` = string tokens |
Information to be plotted on the graph (`key` , `traceline` , `expectednumber` ); default `key` , `trac` , `expe` when `METHOD=rectangles` or `lines` , and `key` when `METHOD=rbands` or `lbands` |

`SUCCESSLEVEL` = string token |
Specifies which level corresponds to success when `GROUPS` supplies a factor with 2 levels (`first` , `second` ); default `seco` |

`LINEORDER` = string token |
If `METHOD=lines` , whether the failures or successes are plotted first (`failurefirst` , `successfirst` ); default `fail` |

`NGROUPS` = scalar |
Number of discrete bands used to group the predicted probabilities when `METHOD=rbands` or `lbands` ; default 10 |

`TIES` = string token |
How tied data values in `PROBABILITIES` are handled when `METHOD=rectangles` or `lines` (`permute` , `same` ); default `perm` |

`SEED` = scalar |
Seed for random number generator used to permute the tied data; default 0 |

`COLOURS` = variate or text |
The two colours used to plot the predicted probabilities |

`THICKNESS` = scalar |
Thickness of the line for plotting the predicted probabilities when `METHOD=lines` or `lbands` ; default 1 |

`BACKGROUND` = scalar or text |
Colour of the background when `METHOD=lines` or `lbands` ; default `ligh` |

`BORDER` = string token |
Whether to draw borders around the rectangles when `METHOD=rectangles` or `rbands` (`yes` , `no` ); default `no` |

`USEPENS` = string token |
Whether to use the current pen definitions of pens 2 and 3 for plotting the `traceline` and `expectednumber` . respectively (`yes` , `no` ); default `no` |

`SAVE` = rsave or pointer |
Regression or HGLM save structure to provide the data if `PROBABILITIES` , `GROUPS` , `NSUCCESSES` and `NBINOMIAL` are not specified |

### Parameters

`PROBABILITIES` = variate or matrix |
Variate containing probabilities of success for a binary outcome (i.e. for binary or binomial data), or matrix containing probabilities of membership in each group for a polytomous outcome |

`GROUPS` = variate or factor |
Actual outcome, when `NSUCCESSES` and `NBINOMIAL` are not supplied |

`NSUCCESSES` = variate |
Number of successes when `PROBABILITIES` supplies predicted probabilities from binomial data |

`NBINOMIAL` = variate |
Number of trials when `PROBABILITIES` supplies predicted probabilities from binomial data |

`TITLE` = text |
Title for the plot; default generates the title automatically |

`XTITLE` = text |
Title for the x-axis; default * i.e. none |

### Description

The `DSEPARATIONPLOT`

procedure creates a separation plot, which is a graphical approach for assessing the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome. A separation plot provides a visualisation of a model’s ability to predict occurrences of the event of interest (i.e. successes) with high probability, and non-occurrences (i.e. failures) with low probability. The procedure can accommodate models for binary, binomial and polytomous data.

The predicted probabilities are supplied using the `PROBABILITIES`

parameter. For models for binary or binomial data, the predicted probabilities of success are supplied in a variate. For models for polytomous data, the predicted probabilities of membership to each group are supplied in a matrix.

The actual outcome is defined using the `GROUPS`

parameter for binary and polytomous data, and the `NSUCCESSES`

and `NBINOMIAL`

parameters for binomial data. For models for binary data, `GROUPS`

must supply either a binary variate (i.e. a variate containing only zeros or ones) or a factor with two levels. If a binary variate is supplied, one corresponds to success in relation to `PROBABILITIES`

. Alternatively, if a factor is supplied the default is that the second level corresponds to success. You can set option `SUCCESSLEVEL=first`

to specify that the first level corresponds to success instead.

You can use the `SAVE`

option to supply a save structure, from a regression or an HGLM analysis, to provide the data if the `PROBABILITIES`

, `GROUPS`

, `NSUCCESSES`

and `NBINOMIAL`

are not specified. The analyses must involve either a generalized linear model with a binomial distribution or an HGLM with a binomial distribution for the mean model. If neither those parameters nor SAVE are specified, the data are taken from the most recent regression analysis.

For models for polytomous data, `GROUPS`

must supply a factor with the same number of levels as the columns in the matrix supplied by `PROBABILITIES`

. The first level of the `GROUPS`

factor then corresponds to the first column of the matrix, the second level to the second column, and so on (i.e. the predicted probabilities of membership to the group that correspond to the i^{th} level of the factor are in the i^{th} column of the matrix supplied by `PROBABILITIES`

.)

For models for binomial data, `NSUCCESSES`

must supply a variate giving the number of successes, and `NBINOMIAL`

must supply a variate giving the number of trials. The `GROUPS`

parameter is then ignored.

The predicted probabilities can be plotted as rectangles, lines or in banded groups. This is specified using the `METHOD`

option with the following settings.

`rectangles` |
the predicted probabilities, ordered from smallest to largest, are plotted as rectangles that are coloured according to whether or not the observation corresponds to a success (i.e. an actual occurrence of the event of interest); this is the default. |

`lines` |
this is similar to `rectangles` , except that line segments are plotted instead of rectangles. |

`rbands` |
a separate graph is drawn for each actual outcome (i.e. success/failure for dichotomous data or each group for polytomous data) with the predicted probabilities of that outcome ordered from smallest to largest, and plotted as rectangles. The rectangles are coloured using a graduated band of colours formed by grouping the predicted probabilities into distinct bands. |

`lbands` |
this is similar to `rbands` , except that line segments are plotted instead of rectangles. |

The `COLOURS`

option defines the colours that are used to plot the predicted probabilities. It must supply two colours, either in a variate (containing two numbers defining the colours using the RGB system) or in a text (containing the names of two of Genstat’s pre-defined colours; see `PEN`

for details). When `METHOD=rectangles`

or `lines`

, the first colour corresponds to failures (i.e. non-occurrences of the event of interest) and the second to successes (i.e. occurrences of the event of interest); defaults are a shade of pink (RGB value = 12917629) and a shade of green (RBG value = 5083681). When `METHOD=rbands`

or `lbands`

, the two colours define the start and end colours values used by `DCOLOURS`

to form a linear band of graduated colours, with the first colour corresponding the lowest probability band, and the second to the highest probability band; defaults are a pale shade of yellow (RGB value = 16777011) and a dark shade of red (RBG value = 15073280). The number of discrete bands (and therefore colours) used to group the predicted probabilities into bands is specified using the `NGROUPS`

option. By default the predicted probabilities are grouped into 10 distinct bands; [0,0.1), [0.1,0.2), [0.2,0.3), [0.3,0.4), [0.4,0.5), [0.5,0.6), [0.6,0.7), [0.7,0.8), [0.8,0.9), [0.9,1]. (Note: the highest probability band is always a closed interval. All other probability bands are right half-open intervals.)

With large data sets, the lines on a separation plot may overlap. The `THICKNESS`

option can be used modify the thickness of lines plotted when `METHOD=lines`

or `lband`

, by specifying a value by which the standard thickness is to be multiplied; default 1.

When `METHOD=lines`

, the default is to plot the failures (i.e. non-occurrences) before the successes (i.e. occurrences of the event of interest). The success lines may then overlap and obscure the failure lines. Alternatively, you can set option `LINEORDER=success`

to plot the successes lines first. The failures may then obscure the successes.

The `BACKGROUND`

specifies the background colour when `METHOD=lines`

or `lband`

; default `lightgray`

. Either a scalar (defining the colour using the RGB system) or a text (containing the name of a pre-defined colour; see `PEN`

for details) may be supplied.

By default, borders are not drawn around the rectangles when `METHOD=rectangles`

or `rbands`

. However, you include borders by setting option `BORDER=yes`

. Their appearance can be modified by altering the settings of pen -7 (see `PEN`

for details).

With `METHOD=rectangles`

or `lines`

, the individual predicted probabilities are plotted in order from smallest to largest. The `TIES`

option controls how tied probabilities are handled. The default, `TIES=permute`

, randomly permutes the order in which the tied values are plotted, thereby breaking up any pre-existing patterns that may distort the appearance of the separation plot. Alternatively, `TIES=same`

plots the tied values in the same order as they appear in `PROBABILITIES`

.

The `SEED`

option specifies the seed for the random-number generator, used by `RANDOMIZE`

, to make the permutations when `TIES=permute`

. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If you use the same (non-zero) seed more than once, the tied values will be permuted in the same way, and hence you will get same separation plot.

The `PLOT`

option controls what additional information is plotted on the graph, with the following settings.

`key` |
adds a key to the graph. |

`traceline` |
adds a line graph of the ordered predicted probabilities when `METHOD=rectangles` or `lines` . |

`expectednumber` |
adds a symbol (default star) denoting the expected number of successes when `METHOD=rectangles` or `lines` . This is calculated as the sum of the predicted probabilities for the occurrence of the event of interest (i.e. the sum of the predicted probabilities of success). |

By default, the `key`

is plotted. Also, when `METHOD=rectangles`

or `lines`

, the `traceline`

and the `expectednumber`

are plotted by default. You can suppress any additional information by setting `SHOW=*`

.

You can set option `USEPENS=yes`

to use the settings of pens 2 and 3 for the line drawn by `SHOW=traceline`

and for the symbol added by `SHOW=expectednumber`

, respectively. You can thus modify their appearance by modifying the settings of these pens prior to using `DSEPARATIONPLOT`

. (See `PEN`

for details.)

The `TITLE`

and `XTITLE`

parameters can supply an overall title and a x-axis title for the separation plot, respectively. If no overall title is supplied, a suitable title is generated automatically. To omit the title, a blank string can be supplied, i.e. `TITLE=' '`

. By default, the x-axis title is not displayed.

Options: `METHOD`

, `PLOT`

, `SUCCESSLEVEL`

, `LINEORDER`

, `NGROUPS`

, `TIES`

, `SEED`

, `COLOURS`

, `THICKNESS`

, `BACKGROUND`

, `BORDER`

, `USEPENS`

, `SAVE`

.

Parameters: `PROBABILITIES`

, `GROUPS`

, `NSUCCESSES`

, `NBINOMIAL`

, `TITLE`

, `XTITLE`

.

### Method

`DSEPARATIONPLOT`

uses the methods described by Greenhill *et al*. (2011).

### Action with `RESTRICT`

The `DSEPARATIONPLOT`

does not allow restrictions. A fault will result if any of `PROBABILITIES`

, `GROUPS`

, `NSUCCESSES`

or `NBINOMIAL`

are restricted.

### References

Greenhill, B., Ward, M.B. & Sacks, A. (2011). The separation plot: a visual method for evaluating the fit of binary models. *American Journal of Political Science* **55**, 990-1002.

### See also

Directive: `MODEL`

Commands for: Regression analysis.

### Example

CAPTION 'DSEPARATION example'; STYLE=meta CAPTION 'Binary data','Goorin et al. (1987)',\ !T('(The Guide to the Genstat Command Language, Part 2: Statistics',\ 'Example 3.5.2)'); \ STYLE=major,plain,plain FACTOR Li,Sex,Aop READ Li,Sex,Aop,Free 1 1 1 1 2 1 1 1 2 2 1 1 2 2 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 2 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 2 2 0 1 1 2 1 2 1 2 1 2 2 1 0 2 2 2 0 1 1 2 1 2 1 2 1 2 2 1 0 2 2 2 0 1 2 1 1 2 1 2 1 2 2 2 1 2 2 2 0 1 2 1 1 2 1 2 0 2 2 2 1 2 2 2 0 1 2 1 1 2 1 2 0 2 2 2 1 2 2 2 0 1 2 1 1 2 2 1 1 2 2 2 1 2 2 2 0 1 2 2 1 2 2 1 1 2 2 2 1 2 2 2 0 2 1 1 1 2 2 1 1 2 2 2 1 2 1 1 1 2 2 1 1 2 2 2 0 : MODEL [DISTRIBUTION=binomial; LINK=logit] Free; NBINOMIAL=1 TERMS [FACT=9] Sex+Aop+Li FIT Sex+Aop+Li RKEEP FITTEDVALUES=fittedFree DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=lbands] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=fittedFree; GROUPS=Free CAPTION 'Binomial data','Finney (1971) analgesic drug data',\ !T('(A Guide to Regression, Nonlinear and Generalized Linear Models in',\ 'Genstat, Section 3.4)'); \ STYLE=major,plain,plain SPLOAD [PRINT=*] '%gendir%/data/Drug.gsh' CALCULATE LogDose = LOG(Dose) MODEL [DISTRIBUTION=binomial; LINK=probit; DISPERSION=1] R; NBINOMIAL=N TERMS [FACT=9] LogDose*Drug FIT [PRINT=*] LogDose*Drug RKEEP FITTEDVALUES=fittedR CALCULATE estprob = fittedR/N DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=lbands; THICKNESS=0.001] PROBABILITIES=estprob; \ NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N