Performs Bayesian computing using the Differential Evolution Markov Chain algorithm (W. van den Berg & R.W. Payne).

### Options

`PRINT` = string token |
What to print (`results` , `monitoring` , `scatterplot` , `histogram` ); default `resu` , `moni` , `scat` , `hist` |
---|---|

`CALCULATION` = expression |
Calculation(s) of logposterior, involving explanatory or pointer variate; if unset, this is calculated by the procedure specified by the `PROCEDURE` option |

`LOGPOSTERIOR` = scalar |
Identifier of scalar holding log-posterior within `CALCULATION` (must be set if `CALCULATION` is set) |

`MULTIPLE` = scalar |
Number of populations is number of parameters times `MULTIPLE` ; default 3 |

`UNIFORMLIMIT` = scalar |
Uniform random numbers are drawn from (`-UNIFORMLIMIT` , `UNIFORMLIMIT` ) and added to candidate parameter sets; default 0.00001 |

`DATA` = identifiers |
Data structures used in `CALCULATION` or by `PROCEDURE` |

`NGENERATIONS` = scalar |
Maximum number of iterations; default 1000 |

`STEP1` = scalar or variate |
Generations for which gamma is set to 1; default 0 |

`FRACTIONBURNIN` = scalar |
Fraction of iterations used for burn-in; default 0.5 |

`GRVARIANCE` = scalar or variate |
Variance to generate populations from initial values of the parameters; default 0.1 |

`PERCENTAGES` = variate |
Percentages for which quantiles has to be calculated; default !(2.5, 25, 50, 75, 97.5) |

`PROCEDURE` = identifier |
Identifier of procedure to calculate `LOGPOSTERIOR` if `CALCULATION` is unset; default `_DEMCLOGPOSTERIOR` |

`SEED` = scalar |
Seed for the random numbers; default 0 |

`NWINDOWS` = scalar |
Number of histograms and scatterplots per screen when plotting estimates and logposterior from all iterations |

`SDLOGPOSTERIOR` = scalar |
Saves the s.d. for `LOGPOSTERIOR` |

`QUANTILESLOGPOSTERIOR` = variate |
Saves quantiles for `LOGPOSTERIOR` |

`RHATLOGPOSTERIOR` = scalar |
Saves the convergence criterion for `LOGPOSTERIOR` |

`ALLLOGPOSTERIOR` = variate |
Saves the parameter estimates for `LOGPOSTERIOR` from all the iterations |

`IPOPULATIONS` = pointers |
Pointer to supply initial populations of the parameters and the corresponding log-posteriors |

`FPOPULATIONS` = pointers |
Pointer to save final populations of the parameters and the corresponding log-posteriors |

### Parameters

`PARAMETER` = scalars |
Parameters to estimate |
---|---|

`INITIAL` = scalars |
Initial values of the parameters; must be set unless `IPOPULATIONS` is set |

`SD` = scalars |
Standard errors of the estimates |

`QUANTILES` = variates |
Saves the quantiles for each parameter |

`RHAT` = scalars |
Convergence criteria |

`ALLESTIMATES` = variates |
Saves the parameter estimates from all the iterations |

### Description

`DEMC`

uses the *Differential Evolution Markov Chain* algorithm of Ter Braak (2006) to do Bayesian computations by Markov chain Monte Carlo. The logarithm of the posterior density for each set of parameters can be calculated either by a list of expressions supplied by the `CALCULATION`

option, or by a (user-defined) procedure whose name is specified by the `PROCEDURE`

option (with default name `_DEMCLOGPOSTERIOR`

). The names of the parameters and their initial values are specified by the `PARAMETER`

and `INITIAL`

parameters, respectively. Data structures containing information that is needed to calculate the log-posterior are supplied by the `DATA`

option. Also, if you are using the `CALCULATION`

option, you must define the identifier of the log-posterior (as used to store the results of the calculations) using the `LOGPOSTERIOR`

option.

The number of populations of parameters to be generated is defined as the number of parameters multiplied by the value supplied by the `MULTIPLE`

option (default 3). The Normal variance used to generate the initial population from the initial values is specified by the `GRVARIANCE`

option. You can set this to a scalar to use the same variance for each parameter, or to a variate to define different variances for the parameters; by default `GRVARIANCE=0.1`

. The fraction of the data used for burn-in is specified by the `FRACTIONBURNIN`

option (default 0.5).

The `NGENERATIONS`

option defines the number of generations to form from the populations, and the `FRACTIONBURNIN`

option defines the proportion of these that are for burn-in. (The distributions of the parameters are determined only from the generations that are produced after burn-in is complete.) The `SEED`

option defines a seed for the random numbers that are used within `DEMC`

. The default value 0 continues from the previous random-number generation or (if none) initializes the seed automatically. Options `UNIFORMLIMIT`

and `STEP1`

, which control how the new populations are formed, are explained in the Method section.

Once the generations are complete, the identifiers defined by `PARAMETER`

are defined as scalars containing the means of the parameters over the populations generated after burn-in. Standard deviations and convergence criteria for the parameters can be saved, in scalars, using the `SD`

and `RHAT`

parameters. If `RHAT`

is greater than 1.1, say, for any parameter, the number of generations should be increased. The `QUANTILES`

parameter allows to save a variate for each `PARAMETER`

, containing quantiles at percentages specified by the `PERCENTAGES`

option (by default 2.5, 25, 50, 75, 97.5). To study the parameter distributions in more detail, you can also use the `ALLESTIMATES`

parameter to save variates containing all the values generated after burn-in for each `PARAMETER`

. The `LOGPOSTERIOR`

, `SDLOGPOSTERIOR`

, `RHATLOGPOSTERIOR`

, `QUANTILESLOGPOSTERIOR`

and `ALLLOGPOSTERIOR`

allow the equivalent information to be saved for the log-posterior.

The final populations and corresponding log-posteriors can be saved, in a pointer, by the `FPOPULATIONS`

option. You can then restart `DEMC`

from the current position, and run some more generations, by using this pointer as the setting of the `IPOPULATIONS`

option. `FPOPULATIONS[1...N]`

have number of units equal to the number of parameters *d*, while `FPOPULATIONS[N1]`

has number of units equal to `N`

, where `N`

= `MULTIPLE`

× *d*. This can cause problems if you try to save `FPOPULATIONS[]`

using procedure `EXPORT`

.

Options: `PRINT`

, `CALCULATION`

, `LOGPOSTERIOR`

, `MULTIPLE`

, `UNIFORMLIMIT`

, `DATA`

, `NGENERATIONS`

, `STEP1`

, `FRACTIONBURNIN`

, `GRVARIANCE`

, `PERCENTAGES`

, `PROCEDURE`

, `SEED`

, `NWINDOWS`

, `SDLOGPOSTERIOR`

, `QUANTILESLOGPOSTERIOR`

, `RHATLOGPOSTERIOR`

, `ALLLOGPOSTERIOR`

, `IPOPULATIONS`

, `FPOPULATIONS`

.

Parameters: `PARAMETER`

, `INITIAL`

, `SD`

, `QUANTILES`

, `RHAT`

, `ALLESTIMATES`

.

### Method

DEMC uses the DE-MC algorithm of Ter Braak (2006) to perform Markov chain Monte Carlo (MCMC); see Congdon (2001, 2003), Gelman et al. (2004) or Lee (2003). The DE-MC algorithm combines the genetic algorithm called Differential Evolution (DE) with MCMC. The values of the `INITIAL`

parameter are used to generate *n* parameter sets, by generating *d* independent Normal deviates with means `INITIAL`

and variance `GRVARIANCE`

. Here, *d* is the number of parameters, and *n* is *d* multiplied by the value of the `MULTIPLE`

option.

For each parameter set *i* (*i*=1…*n*), the algorithm selects two other parameter sets at random, and calculates the differences between their parameter values, multiplied by a parameter γ and a random number taken from the uniform distribution on (`-UNIFORMLIMIT`

, `UNIFORMLIMIT`

); γ generally takes the value 2.38/√(2×*d*), but the `STEP1`

option allows you to define generations in which γ takes the value 1 (by default there are none). These differences are then added to the parameter values in set *i* to form a new candidate set of values. The candidate set replaces set *i* if its log-posterior likelihood is greater than the log-posterior likelihood of set *i* + the logarithm of a random number from the uniform distribution on (0,1); see Ter Braak 2006).

### References

Congdon, P. (2001). *Bayesian Statistical Modelling*. Wiley, Chichester, England

Congdon, P. (2003). *Applied Bayesian Modelling*. Wiley, Chichester, England.

Gelman, A., Carlin, J.B., Stern, H.S. & D.B. Rubin (2004). *Bayesian Data Analysis, 2nd Edition*. Chapman & Hall, London.

Lee, P.M. (2003). *Bayesian Statistics an Introduction, 3rd Edition*. Arnold, London.

Ter Braak, C.J.F. (2006) A Markov chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. *Statistics & Computing*, 16, in press.

### See also

Procedure: `BGXGENSTAT`

.

Commands for: Bayesian methods.

### Example

CAPTION 'DEMC example',!t(\ 'Coagulation time data from Table 11.2 of Gelman, Carlin, Stern & Rubin',\ '(2004). Bayesian Data Analysis, 2nd Edition, p. 299.'); STYLE=meta,plain VARIATE [VALUES=62,60,63,59,63,67,71,64,65,66,68,66,\ 71,67,68,68,56,62,60,61,63,64,63,59] Coagulation_time FACTOR [LABELS=!t(A,B,C,D); VALUES=4(1),6(2,3),8(4)] Diet VARIATE [VALUES=4(0)] muvar VCOMPONENTS Diet REML [PRINT=model,components] Coagulation_time VKEEP [SIGMA2=sigma2reml] Diet; COMPONENT=compreml; MEAN=mean CALCULATE muin=MEAN(mean) EXPRESSION p[1...5]; VALUE=\ !E(muvar$[1...4] = mu1, mu2, mu3, mu4 ),\ !E(fit = NEWL(Diet; muvar)),\ !E(l1 = -12 * logs2 - 0.5 * SUM((Coagulation_time - fit)**2) / EXP(logs2)),\ !E(l2 = -2 * logtau2 - 0.5 * SUM((muvar - mu)**2) / EXP(logtau2)),\ !E(lposterior = l1 + l2 + logtau2 / 2 - 14 * LOG(2 * C('pi'))) DEMC [PRINT=results,monitoring,histogram;\ CALCULATION=p[]; LOGPOSTERIOR=lposterior;\ DATA=Coagulation_time,Diet; PERCENTAGES=!(25,50,75);\ NGENERATIONS=1000; SEED=349472; SDLOGPOSTERIOR=sdlposterior;\ RHATLOGPOSTERIOR=rhlposterior; QUANTILESLOGPOSTERIOR=qu[8]]\ mu1,mu2,mu3,mu4,mu,logs2,logtau2; INITIAL=#mean,muin,1,1