Performs a CateNelson graphical analysis of bivariate data (V.M. Cave).
Options
PRINT = string tokens 
Controls printed output (
summary , quadrants , errorquadrants ); default summ , quad 

PLOT = string tokens 
What graphs to plot (
catenelson , criticalvalues ); default cate 
DIRECTION = string token 
Direction of the association between the y and x values (ascending , descending ); default asce i.e. a positive trend 
YCRITICAL = scalar 
Prespecified critical value of y; default * i.e. the critical value of y is estimated

XCRITICAL = scalar 
Prespecified critical value of x; default * i.e. the critical value of x is estimated

TITLE = text 
Title for the CateNelson plot; if unset, the title is generated automatically 
YTITLE = text 
Yaxis title for the CateNelson plot; if unset, the title is generated automatically 
XTITLE = text 
Xaxis title for the CateNelson plot; if unset, the title is generated automatically 
WINDOW = scalar 
Window to use for the graphs; default 3 
SAVE = identifier 
Specifies the save structure of regression model holding the yvalues, distribution, link function and weights; default * i.e. that from last regression fitted 
Parameters
x = variates 
Supplies the xvalues for each analysis 

RESULTS = pointers 
Saves the critical value of x, the critical value of y and the quadrant allocations for each X variate 
Description
The RCATENELSON
procedure performs a graphical analysis of bivariate data (x,y) as defined by Cate & Nelson (1971). It also extends their analysis to yvariates with nonNormal distributions.
Before using RCATENELSON
, you need to give a MODEL
statement defining the yvariate. The distribution of the yvariate, a link function and weights can also be defined with the MODEL
statement. (Note, however, that multinomial distributions, userdefined distributions and link functions and generalized least squares are not accommodated by RCATENELSON
.) The variate containing the xvalues is supplied using the X
parameter.
The objective of the CateNelson graphical analysis is to divide the data into two groups, based on the xvalues, so that there is maximum statistical homogeneity within each group. The procedure finds the value of x that, in terms of predictive ability, best divides the data into two groups. This critical value of x is determined by iteratively dividing the data into two groups at each candidate critical xvalue and selecting the one that minimizes the residual sum of squares, or the deviance for distributions other than the Normal. Alternatively, a prespecified critical value of x may be supplied, as a scalar, using the XCRITICAL
option.
After determining the critical value of x, the procedure then finds the critical value of y. (For the Binomial distribution, y is defined as the proportion of successes.) The critical values of x and y split the scatter plot of y on x into four quadrants: two of these contain data that follow the predictive model, and two (known as the error quadrants) contain data do not follow the model. The critical value of y is also determined iteratively, but here the critical value minimizes the number of observations that fall into error quadrants, i.e. those that do not conform with the predictive model. Alternatively, a prespecified critical value of y may be supplied, as a scalar, using the YCRITICAL
option.
The DIRECTION
option specifies whether the association between the y and x values is ascending
(i.e. following a positive trend; the default) or descending
(i.e. following a negative trend). This determines the error quadrants. For an ascending
trend (i.e. where y increases with increasing x), observations in the top left (I) and in the bottom right (III) quadrants do not conform with the predictive model. Therefore, for data with an ascending trend, the critical yvalue minimizes the number of observations that fall into Quadrants I and III. Conversely, for a descending
trend (i.e. y where decreases with increasing x), the error quadrants are the top right (II) and bottom left (IV).
When there is more than one candidate critical xvalue, or more than one candidate critical yvalue, results are generated for each possibility.
Printed output is controlled by the PRINT
option, with the following settings.
summary 
prints a summary of the analysis, including the critical xvalue, the critical yvalue, the error rate (i.e. the percentage of observations falling into the two error quadrants) and the count and percentage of observations in each quadrant. 
quadrants 
prints the allocation of data to each quadrant 
errorquadrants 
prints the data falling into the error quadrants. 
The PLOT
option controls the graphical output, with these settings.
catenelson 
produces a CateNelson plot. Here, a scatter plot of y on x is drawn, with a horizontal line superimposed through the critical value of y, and a vertical line superimposed through the critical value of x, splitting the data into four quadrants. Observations that fall into the error quadrants are drawn as red crosses, labelled by their unit number. Observations that followed the predictive model are drawn as black hollow circles. 
criticalvalues 
produces a plot of the residual sum of squares (or deviance for nonNormal distributions) against the candidate critical values of x, and a plot of the number of observations falling into the error quadrants against the candidate critical values of y. If XCRITICAL is supplied, no residual diagnostic plot will be produced for the residual sum of squares or deviance. If YCRITICAL is supplied, no diagnostic plot will be produced for the error quadrants. 
By default, the CateNelson plot is produced.
The TITLE
, YTITLE
and XTITLE
options can supply an overall title, a yaxis title and a xaxis title for the CateNelson plot, respectively. If these are not supplied, suitable titles are generated automatically. To omit a title, a blank string can be supplied, e.g.
XTITLE=' '
The WINDOW
option defines the window to use for the plots; default 3.
Results can be saved using the RESULTS
parameter. They are in a single pointer if there is only one critical x and critical y value. If there are several, they are in a pointer containing a pointer for each pair of critical x and critical y values. The first element of these pointers, indexed by ‘Critical xvalue
‘, is a scalar storing the critical value of x. The second element, indexed by ‘Critical yvalue
‘, is a scalar storing the critical value of y. The third element, indexed by ‘Quadrant
‘, stores the allocation of data to each quadrant, and is ordered by the unit number.
Options: PRINT
, PLOT
, DIRECTION
, YCRITICAL
, XCRITICAL
, TITLE
, YTITLE
, XTITLE, WINDOW
, SAVE
.
Parameters: X
, RESULTS
.
Method
RCATENELSON
uses the methods described in Cate & Nelson (1971) and Mangiafico (2013), but extended to accommodate yvariates with nonNormal distributions.
Candidate critical values of x are formed by ordering the unique values in X
, and calculating the midpoint between each adjacent pair. Following Cate & Nelson (1971), the procedure ensures that at least two xvalues fall to the left and to the right of each candidate value. The critical value of x minimizes the Residual Sum of Squares, or deviance for nonNormal distributions, which is obtained using the MODEL
and FIT
directives.
Candidate critical values of y are formed by ordering the unique values in Y, and calculating the midpoint between each adjacent pair. (For the Binomial distribution, the proportion of successes is used.) The critical value of y minimizes the number of observations in the error quadrants.
Action with RESTRICT
RCATENELSON
will work with restricted X
variates, and restricted Y
, NBINOMIAL
and WEIGHTS
settings of MODEL
. However, if more than one is restricted, they must be restricted in the same way.
References
Cate, R.B. & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings, 35, 658–660.
Mangiafico, S.S. (2013). CateNelson analysis for bivariate data using Rproject. Journal of Extension, 51, 5TOT1.
See also
Commands for: Graphics, Regression analysis.
Example
CAPTION 'RCATENELSON example',\ !T('The data are relative yield of cotton (%) and the potassium',\ 'concentration of the soil (ppm).'),\ !T('From de Freitas et al. (1966). Determination of potassium',\ 'deficient areas for cotton. Potash Review.'); \ STYLE=meta,plain,plain VARIATE [VALUES=53.5,64.8,63.0,40.8,79.5,70.3,63.0,64.0,94.0,99.0,66.5,\ 103.0,97.3,85.3,101.3,97.0,96.8,98.0,85.8,92.3,96.8,88.3,\ 106.8,97.5] Yield VARIATE [VALUES=26,28,30,31,34,35,40,44,49,56,68,75,77,78,78,102,118,118,\ 131,133,133,152,193,211] K MODEL Y=Yield RCATENELSON [PLOT=catenelson,criticalvalues; \ YTITLE='Relative yield of cotton (%)';\ XTITLE='Soil potassium concentration (ppm)'] X=K