Fits frequency distributions to accumulated counts (R.C. Butler, M.E. O’Neill, P. Brain & H. Turner).
Options
PRINT = string tokens 
Controls printed output (model , summary , estimates , correlations , fittedvalues , monitoring ); default mode , summ , esti 

DISTRIBUTION = string token 
Which distribution to use (normal , logistic , complementaryloglog , acomplementaryloglog , inversenormal , weibull , exponential ); default norm 
TRANSFORMATION = string token 
Whether to use log(TIME ) if DISTRIBUTION = normal , logistic , complementarylog , or acomplementarylog (log , none ); default * uses log except when DISTRIBUTION = inversenormal , weibull or exponential 
LAG = string token 
Type of lag to add to TIME (none , positive , unconstrained ); default none 
ALLRESPOND = string token 
If TOTUNITS is set, whether all units are constrained to respond (yes , no ); default no 
FORM = string token 
Whether DATA are cumulated or differences (cumulated , differences ); default cumu 
LOSTUNITS = string token 
Whether data are leftcensored (yes , no ); default no 
SEPARATE = string token 
Which parameters to estimate separately for each group (lag , b , m , propn , gamma ); default * 
POPSEPARATE = string token 
Which parameters to estimate separately for populations in each group (b , m , lag ); default * 
PLOT = string token 
Which graphs to draw (cumulative , density , trcumulative , trdensity ); default cumu 
MAXCYCLE = scalar 
Number of iterations for fitting, as in RCYCLE ; default 30 
Parameters
DATA = variates or pointers 
Specifies the accumulated counts 

TIME = variates or pointers 
Defines the time at which each count was recorded 
GROUPS = factors 
Factor indicating groups 
INITIAL = variates 
Initial values for all parameters 
IB = scalars or variates 
Initial values for b 
IM = scalars or variates 
Initial values for m 
ILAG = scalars or variates 
Initial values for lag 
IGAMMA = scalars or variates 
Initial values for gamma 
IPROPN = scalars or variates 
Initial values for proportions 
STEPLENGTHS = variates 
Steplengths for all parameters 
SB = scalars or variates 
Steplengths for b 
SM = scalars or variates 
Steplengths for m 
SLAG = scalars or variates 
Steplengths for lag 
SGAMMA = scalars or variates 
Steplengths for gamma 
SPROPN = scalars or variates 
Steplengths for proportions 
TOTUNITS = scalars or variates 
Total number 
NPOPULATION = scalars 
Number of populations (1, 2 or 3); default 1 
SAVE = pointers 
Saves the results 
Description
CUMDISTRIBUTION
fits frequency distributions to a variate of counts, accumulated over time. The counts are specified by the DATA
parameter and the time (t) at which each count is supplied, in a variate, by the TIME
parameter. Counts may be accumulated over time (option FORM=cumulated
), or be the change in count from the previous time (FORM=difference
). Neither the DATA
or TIME
variate maybe restricted, nor must they contain any missing values. The DATA
values must all be nonnegative integers.
The form of the cumulative density function is indicated by the DISTRIBUTION
option, which has the following settings (z is a function of TIME
as defined below).
DISTRIBUTION 
cumulative density function 
normal 
NORMAL (b × (z–m)) 
complementaryloglog 
EXP ( –EXP (b × (z–m))) 
acomplementaryloglog 
1 – EXP (EXP (b × (z–m))) 
logistic 
1 /(1 + EXP (b × (z–m)) 
inversenormal 
NORMAL (SQRT (b/z) × (z/m – 1)) + EXP (2b/m) × (NORMAL (SQRT (b/z) × (1+z/m)) – 1) 
weibull 
1 – EXP ((m × z)** b) 
exponential 
1 – EXP (m × z) 
The parameters b and m are estimated, and relate to the distribution of transformed time z as follows.
DISTRIBUTION 
Parameter b  Parameter m 
normal 
1 / sd  mean, t50 
logistic 
2 × relative response rate at z=m  mean, t50 
complementaryloglog 
relative response rate at z=m  mode 
acomplementaryloglog 
(e1) × relative response rate at z=m  mode 
inversenormal 
(mean**3) / (sd**2)  mean 
weibull 
shape  scale 
exponential 
1/mean 
For some of the distributions, TIME
may be logged by setting option TRANSFORMATION=log
. A lag time before any units respond may be estimated by setting the option LAG=positive
. You can set LAG=unconstrained
to estimate a negative lag, which assumes that some units responded before TIME
=0. These options give z using the following functions of TIME
.
TRANSform=none 
TRANSFORM=log 

LAG=no 
z=TIME 
z=LOG (TIME ) 
LAG=positive or unconstrained 
z=TIME –LAG 
z=LOG (TIME –LAG ) 
The available combinations of LAG
and TRANSFORMATION
for the various distributions are shown below.
DISTRIBUTION 
TRANSFORM 
Equivalent distribution  Possible settings for LAG 
normal 
none 
none 

log 
lognormal  none , positive , unconstrained 

logistic 
none 
none 

log 
loglogistic  none , positive , unconstrained 

complementaryloglog 
none 
Gumbel, Extreme Value1  none 
log 
Extreme value2  none , positive , unconstrained 

acomplementaryloglog 
none 
none 

log 
Weibull  none , positive , unconstrained 

inversenormal 
none 
none , positive , unconstrained 

weibull 
none 
none , positive , unconstrained 

exponential 
none 
none , positive , unconstrained 
TRANSFORMATION
is set to log
by default for the first four distributions, and none
for the last three.
If the total number of units is known, it can be supplied by setting the TOTUNITS
parameter. By default, a parameter gamma, the proportion of TOTUNITS
that can respond, will be estimated. If option ALLRESPOND
is set to yes
, then gamma is fixed at 1 (indicating that all units will respond). If some units were lost before counting began, the number of these can be estimated by setting option LOSTUNITS=yes
.
Data for several groups can be fitted together, either by setting DATA
to a pointer of variates, or by setting the GROUPS
parameter to a factor to identify the different groups. If DATA
is set to a pointer, TIME
can be set to one variate if all the DATA
variates are the same length. Otherwise, it must be set to a pointer with a variate for each DATA
variate. Parameters for the groups are constrained to be equal by default, but any of the parameters b, m, lag and gamma can be estimated separately between groups by setting the SEPARATE
option.
The counts can be from a single population or from a mixture of up to 3 populations, as specified by the NPOPULATIONS
parameter (default 1). Parameters b, m and lag can be estimated separately between the populations by setting the POPSEPARATE
option. If this is set, the proportion (propn) of units in each population will also be estimated. If there are GROUPS
in the data, then the proportions can be estimated separately for each group by setting SEPARATE=propn
. NPOPULATIONS
is the same for each group.
Initial parameter values are estimated within the procedure, but can be supplied separately using any of the parameters IB
, IM
, ILAG
, IGAMMA
and IPROPN
, or in one list using the INITIAL
parameter. If any parameter is to be estimated separately between GROUPS
or populations, there must be one initial value for each parameter of that type to be estimated. For example, if there are two groups, and SEPARATE=m
, then IM
should be set to a variate of length 2. If INITIAL
is set, its values will be used even if the other initial value parameters are set. The values in INITIAL
must be in the order b, m, lag, gamma, propn, with enough values for the number of each being estimated. For propn, there must be 1 less than NPOPULATIONS
. For example, with 2 groups and 3 populations, with SEPARATE=b,m
and POPSEP=m
there will be 2 initial values for b and 6 for m with two for propn. Steplengths for the fitting process can be supplied similarly using STEPLENGTHS
or SB
, SM
, SLAG
, SGAMMA
, SPROPN
. MAXCYCLE
controls the maximum number of iterations, as in the RCYCLE
directive.
Output is controlled by the PRINT
option, with settings as in FITNONLINEAR
. Parameter estimates are indexed by groups and/or population numbers, with group labels first if both populations and groups are used. If PRINT=estimates
, parameters calculated from the fitted parameters (mean, sd, t50) are also printed. Option PLOT
determines the form of the graphical output:
cumulative 
fitted curve and cumulated counts, 

density 
differenced fitted curve and counts, 
trcumulative 
trellis version of cumulative when there are GROUPS , 
trdensity 
trellis version of density when there are GROUPS . 
Setting PLOT=*
suppresses all graphs).
Some results can be saved using RKEEP
(as with FIT
). Further results can be saved by setting the SAVE
parameter. This creates a pointer with three sections labelled by their contents. SAVE['Data']
points to the columns used in the fitting process:
ndata 
the (differenced) counts, 

ntime 
times for each count, 
groups 
grouping factor, 
fitted 
fitted values, 
cumdata 
cumulated counts, 
cumfitted 
cumulated fitted values, 
z 
transformed time variate (as above). 
SAVE['CalcParams']
contains the calculated parameters and their standard errors (Mean, Sd, T50, seMean, seSd, seT50). SAVE['Viable']
contains the estimated number of viable units (Nv) for each group and, if NPOP
>1, the number in each population (PopNv).
Options: PRINT
, DISTRIBUTION
, TRANSFORMATION
, LAG
, ALLRESPOND
, FORM
, LOSTUNITS
, SEPARATE
, POPSEPARATE
, PLOT
, MAXCYCLE
.
Parameters: DATA
, TIME
, GROUPS
, INITIAL
, IB
, IM
, ILAG
, IGAMMA
, IPROPN
, INITIAL
, IB
, IM
, ILAG
, IGAMMA
, IPROPN
, STEPLENGTHS
, SB
, SM
, SLAG
, SGAMMA
, SPROPN
, TOTUNITS
, NPOPULATION
, SAVE
.
Method
This procedure extends the methods described by Brain & Butler (1988). If FORM=cumulated
, the DATA
vector is differenced, and if DATA
is set to a pointer, the DATA
variates are stacked, and a factor created to identify the groups. The resulting data variate is then used with FITNONLINEAR
. The model to be fitted is set up in a pointer to expressions formed according to the settings of the various options and parameters.
Action with RESTRICT
Because the calculations in the procedure involve differencing the counts, the TIME
and DATA
variates must not be restricted.
Reference
Brain, P. & Butler, R.C. (1988). Cumulative count data. Genstat Newsletter, 22, 3847.
See also
Directive: DISTRIBUTION
.
Procedure: RSURVIVAL
.
Commands for: Repeated measurements, Survival analysis.
Example
CAPTION 'CUMDISTRIBUTION example',\ !t('1) Data from Hunter, E.A., Glasbey, C.A., & Naylor, R.E.L.',\ '(1984). J. Agric. Sci. 102, 207213.'); STYLE=meta,plain VARIATE Count,Time; VALUES=!(0,1,7,27,22,8,13,3,6,1,1,1,1),\ !(49,55,62,72,79,86,96,103,120,127,144,151,168) CUMDISTRIBUTION [PRINT=model,summary,estimates,fittedvalues;\ FORM=differences; DISTRIBUTION=normal; TRANSFORMATION=log;\ LAG=positive] DATA=Count;TIME=Time CAPTION '2) Randomly generated data from three groups' VARIATE [NVALUES=8] Time, Cum[1,2,3] READ Time,Cum[] 0 0 0 0 56 3 1 3 64 17 16 16 72 36 48 34 80 57 65 61 88 79 80 77 96 85 85 83 104 89 90 88 : CUMDISTRIBUTION [DISTRIBUTION=inversenormal; SEPARATE=b,m,l]\ DATA=Cum; TIME=Time CAPTION '3) Example fitting subpopulations and groups' VARIATE [NVALUES=15] Time; !(0,2,3,5,7,9,14,16,19,24,29,34,39,44,49) & Counts[1]; !(0,0,0,38,73,27,41,16,88,37,23,6,1,1,1) & Counts[2]; !(0,0,0,81,39,11,11,13,82,20,21,3,3,4,1) CAPTION '3a) All parameters varying between groups and populations' CUMDISTRIBUTION [SEPARATE=b,m,lag,gamma,propn; POPSEPARATE=b,m,lag;\ LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\ NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2) CAPTION '3b) Only some parameters varying between groups or populations' CUMDISTRIBUTION [SEPARATE=lag,gamma,propn; POPSEPARATE=m,lag;\ LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\ NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2)