Plots probability distributions, and estimates their parameters (D.B. Baird).

### Options

`PRINT` = string tokens |
Controls whether to print estimated parameters of the distribution or test statistics (`parameters` , `tests` ); default `para` |
---|---|

`DISTRIBUTION` = string token |
Distribution for expected values against which to plot values (`normal` , `stdnormal` , `lognormal` , `exponential` , `gamma` , `weibull` , `beta` , `b2` , `pareto` , `chisquare` , `cauchy` , `logistic` , `ev1` , `ev2` , `ev3` , `gev` , `invnormal` , `t` , `f` , `uniform` , `stduniform` , `laplace` , `gpareto` , `ubetamix` , `ugammamix` , `loggamma` , `loglogistic` , `paralogistic` , `igamma` , `iweibull` , `burr` , `iburr` ); default `norm` |

`METHOD` = string token |
Method used for the plot axes (`quantile` , `probability` , `stabilizedprobability` ); default `quan` |

`GRAPHICS` = string token |
Type of graphics (`highresolution` , `lineprinter` ); default `high` |

`PLOT` = string tokens |
Whether to plot differences from expectations or the 1-1 reference line (`differences` , `reference` ); default `refe` |

`CONSTANT` = string token |
Whether to estimate the constant for the distribution (`estimate` , `omit` ) default `omit` |

`BANDS` = string token |
What type of confidence bands to plot, if any (`simultaneous` , `pointwise` ); default `simu` |

`NSIMULATIONS` = scalar |
Number of simulations for pointwise bands; default 100 |

`ALPHA` = scalar |
Acceptance limits for confidence bands; default 0.95 |

`DF` = scalar |
Number of degrees of freedom of chi-square or t distribution; default 1 |

`DFNUMERATOR` = scalar |
Numerator degrees of freedom of F distribution; default 1 |

`DFDENOMINATOR` = scalar |
Denominator degrees of freedom of F distribution; default 1 |

`WINDOW` = scalar |
Window to use for the plot; default 3 |

`XMETHOD` = string token |
Scaling of X / Expected Plot axes (`quantile` , `probability` , `stabilizedprobability` ); if unset, takes the same setting as `METHOD` |

`QMETHOD` = string token |
Whether to standardize plotted score in expected quantiles (`standardized` , `unstandardized` ); default `stan` |

`TMETHOD` = string tokens |
Specifies the method used to perform the goodness-of-fit tests (`likelihoodratio` , `traditional` ); default `like` |

`NTIMES` = scalar |
Number of Monte-Carlo simulations to perform for likelihood-ratio tests; default 999 |

`SEED` = scalar |
Seed for random number generation for the likelihood-ratio tests; default 0 continues an existing sequence or, if none, selects a seed automatically |

### Parameters

`DATA` = variates |
Values to plot |
---|---|

`TITLE` = text |
Title for the graph; default `*` generates an appropriate title automatically |

`ESTIMATES` = variates |
Saves the estimated parameters for the distribution |

`SE` = variates |
Saves standard errors for the estimated parameters |

`LOWERTRUNCATION` = scalars |
Lower truncation points for Loss distributions |

`UPPERTRUNCATION` = scalars |
Upper truncation points for Loss distributions |

`DEVIANCE` = scalars |
Saves the deviance for the fitted distribution |

`PROBABILITIES` = variates |
Saves the probabilities from the goodness-of-fit tests |

### Description

To assess the how well empirical data approximates a particular theoretical distribution, `DPROBABILITY`

plots the sorted values (order statistics, *X _{i}*) against the expected values of the order statistics

*E*from the given distribution. However, usually the particular parameters of the distribution are not known and these have to be estimated first to obtain the expected values.

_{i}If the distribution has a cumulative density function of F(*x*), and the inverse of this function is G(*x*) (i.e. G(F(*x*)) = *x*), then the expected values of the order statistics, are approximately G((*i*-0.5)/*n*), where *i* = 1…*n*, and *n* is the number of values in the sample. A plot of *X _{i}* versus

*E*is known as a Quantile-Quantile (or Q-Q) plot. The data can also be plotted on the probability scale by plotting the cumulative probabilities of the data under the assumed distribution against their expected probabilities, i.e. F(X(

_{i}*i*)) versus (

*i*-0.5)/

*n*. This is known as a Probability-Probability (or P-P) plot.

A third plot called the stabilized probability (SP) plot (Michael 1983), was introduced, which rescales the probabilities using the transformation

*sp* = (2/π) × `ARCSIN`

(`SQRT`

(*p*))

so that the variance of the plotted points is approximately equal over the range of probability values. In the SP plot the scaled values *sp* are plotted rather than the unscaled *p* values. The `METHOD`

option allows the choice of which scale is used in the graph (`quantile`

, `probability`

or `stabilizedprobability`

for the Q-Q, P-P or SP plots respectively).

By default the x-value used in plotting Q, P or SP is the corresponding expected value of these statistics. Alternative x-values can be used by setting the `XMETHOD`

option to `quantile`

, `probability`

, or `stabilizedprobability`

. So for example a Q-P plot can be obtained with the option settings `METHOD=quantile`

and `XMETHOD=probability`

or a P-Q plot with the settings `METHOD=probability`

and `XMETHOD=quantile`

.

The `QMETHOD`

option allows the scaling of the expected quantiles plotted on the x-axis to be set. By default quantiles are standardized to have a mean of zero and variance of one (as in a normal score plot) but, if `QMETHOD=unstandardized`

, the quantiles are scaled to the same mean and variance as the data.

The `DATA`

parameter specifies the data values, in a variate. The `TITLE`

parameter can specify a title for the graph. The `ESTIMATES`

parameter can be used to save the values estimated for the parameters for the distribution, and the `SE`

parameter can save their standard errors.

The distribution for the expected values against which to plot the data is specified by the `DISTRIBUTION`

option. Some distributions (Log-Normal, Gamma, Weibull and Pareto) can have an extra parameter (*a*) estimated, so that *X*–*a* follows the specified distribution. Setting option `CONSTANT=estimate`

estimates a value for *a*. Some of the distributions (Chi Square, T and F) cannot have the parameters estimated by the usual `DISTRIBUTION`

directive, so the procedure provides 3 options (`DF`

, `DFNUMERATOR`

, `DFDENOMINATOR`

) for specifying the parameters of these distributions. However, if for example you set `DF=*`

, the degrees of freedom are estimated along with the other parameters of the distribution.

Some distributions (`normal`

, `loggamma`

, `loglogistic`

, `paralogistic`

, `igamma`

, `iweibull`

, `burr`

, `iburr`

) can be estimated and plotted in a truncated form. The values in the distribution less than `LOWERTRUNCATION`

and greater than `UPPERTRUNCATION`

are removed (if either of these are set), and the distribution between these limits is rescaled to have an area of one. If only `LOWERTRUNCATION`

is set, the distribution is left-truncated, and it is right-truncated if only `UPPERTRUNCATION`

is set.

The `BANDS`

option allows two forms of confidence intervals to be displayed in the graph. `BANDS=pointwise`

simulates `NSIMULATIONS`

distributions of the same size as the data, from the theoretical distribution, and plots the range of values at each value of the order statistics that contain the proportion specified by the option `ALPHA`

of simulated values. Thus a sample drawn from the assumed distribution has approximately a probability `ALPHA`

of lying within the limits at each point. However, overall there will be a probability of less than `ALPHA`

that a sample will completely lie within the confidence bands. The `BANDS=simultaneous`

uses a statistic given by Michael (1983) for which the overall probability of plotted data lying completely within the confidence bands is approximately the specified value of `ALPHA`

, under the null hypothesis that the data is a random iid sample from the specified distribution. This form of confidence limits has the advantage that it is much faster to calculate and that probability of the data points falling outside the limits is approximately constant over the range of the data.

When plotting the data against the expected values, setting option `PLOT=reference`

allows the 1-1 line to be added to the graph, so that departures from this can be more easily observed. The other `PLOT`

setting, `difference`

, plots the difference between the data and the expected values, so that departures can be observed more easily in a horizontal direction rather than on a 45 degree slant. Setting option `GRAPHICS=lineprinter`

produces a character based graph in the output window rather than in the high-resolution graphics window as usual. The `WINDOW`

option can be used to specify which graphics window to use for a high-resolution graph.

The `PRINT`

option control of the output that is printed. The `parameters`

setting prints the fitted parameters of the specified distribution, and some sample statistics of the observed data. The `test`

setting provides output from three empirical distribution tests, namely the Anderson-Darling, Cramer-von Mises and Watson statistics. The method used to perform these tests is specified by the `TMETHOD`

option, with settings `likelihoodratio`

for the Zhang (2002) likelihood-ratio based method, and `traditional`

for the traditional approach. The default is to use the likelihood-ratio based tests, which are generally more powerful. Monte-Carlo simulations are used to calculate the empirical probability values of the test statistics under the likelihood-ratio based method. The `NTIMES`

option defines how many Monte-Carlo simulations are used; default 999. The `SEED`

option specifies the seed for the random-number generator used during the Monte-Carlo simulations. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, the seed is initialized automatically. The test probabilities can be saved, in a variate, by the `PROBABILITIES`

parameter.

The distributions fitted in this procedure are described further in the books by Hogg & Klugman (1984) and Johnson, Kotz & Balakrishnan (1994, 1995).

Options: `PRINT`

, `DISTRIBUTION`

, `METHOD`

, `GRAPHICS`

, `PLOT`

, `CONSTANT`

, `BANDS`

, `NSIMULATIONS`

, `ALPHA`

, `DF`

, `DFNUMERATOR`

, `DFDENOMINATOR`

, `WINDOW`

, `XMETHOD`

, `QMETHOD`

, `TMETHOD`

, `NTIMES`

, `SEED`

.

Parameters: `DATA`

, `TITLE`

, `ESTIMATES`

, `SE`

, `LOWERTRUNCATION`

, `UPPERTRUNCATION`

, `DEVIANCE`

, `PROBABILITIES`

.

### Method

The parameters for the distribution are estimated using the `DISTRIBUTION`

or `FITNONLINEAR`

directives. The cumulative distribution probability values of the observed and expected values are calculated with the `CL`

series of functions. The goodness-of-fit tests are performed by the `EDFTEST`

procedure.

### Action with `RESTRICT`

If the `DATA`

variate is restricted, the plots and tests will be calculated using only the units included by the restriction.

### Reference

Hogg, R. V. & Klugman, S. A. (1984). *Loss Distributions*. John Wiley & Sons, New York.

Johnson, N. L., Kotz, S. & Balakrishnan N. (1994). *Continuous Univariate Distributions, Volume 1, 2nd edition*. John Wiley & Sons, New York.

Johnson, N. L., Kotz, S. & Balakrishnan N. (1995). *Continuous Univariate Distributions, Volume 2, 2nd edition*. John Wiley & Sons, New York.

Michael, J. R. (1983). The stabilized probability plot. *Biometrika*, 70, 11-17.

Zhang (2002). Powerful goodness-of-fit tests based on the likelihood ratio. *Journal of the Royal Statistical Society, Series B*, 64, 281-294.

### See also

Directive: `DISTRIBUTION`

.

Procedures: `BBINOMIAL`

, `EDFTEST`

, `MAVOLCANO`

.

Commands for: Graphics, Basic and nonparametric statistics.

### Example

CAPTION 'DPROBABILITY example'; STYLE=major CALCULATE [SEED=287987] N = GRNORMAL(100;1;2) DPROBABILITY [PRINT=parameters,tests; DISTRIBUTION=normal] N DPROBABILITY [PRINT=*; DISTRIBUTION=Normal; METHOD=probability;\ BANDS=pointwise; ALPHA=0.99; NSIMULATIONS=400] N DPROBABILITY [PRINT=*; DISTRIBUTION=normal; METHOD=stabilized;\ BANDS=simultaneous; PLOT=difference] N CALCULATE C = GRCHISQUARE(1000;3) DPROBABILITY [PRINT=tests; DISTRIBUTION=chiSquare; DF=3; METHOD=probability;\ BANDS=*] C; TITLE='Chi Square 3 df P-P plot'