Select menu: Stats | Distributions | Kernel Density Estimation
Use this to evaluate a Kernel density estimate for a selected variate. A Kernel Density estimate can be thought of as a smoothed form of a histogram.
- After you have imported your data, from the menu select
Stats | Distributions | Kernel Density Estimation. - Fill in the fields as required then click Run.
You can set additional Options then after running, you can save the results by clicking Store.
Kernel density estimation is a useful tool for exploring the unknown underlying distribution of a sample. The kernel method constructs an estimate fh(t) of the true density function by placing a kernel function K(t;x_{i},h) over each observation x_{i} in the sample. The kernel function K(t;x,h) is itself a density function with location parameter x and scale parameter h, also called bandwidth in this context. The density estimate is then given by
fh(t) = the sum of (K(t-x_{i})/h)/(nh) from i = 1...n
where n denotes the sample size. The choice of kernel function K is not very critical for the resulting estimate fh(t) and so a Gaussian kernel is used.
The following graph showing the sum of the normal kernels at 5 data points illustrates the ideas behind the kernel density estimation.
Bandwidth
The choice of bandwidth, h, is of crucial importance in kernel density estimation. A large value of h will give rise to an over smoothed density estimate, while a small value of h will produce a very ragged density with many spikes at the observations. It is recommended that a range of values of h be used, and the resulting kernel density estimates be examined, since this will highlight different features of the data.
For automatic use of kernel density estimation, estimation of the bandwidth h from the data is very helpful. The following automatic data driven estimates are available (n = the number of observations in the selected variate):
Sheather & Jones | The method of Sheather & Jones (1991). Jones, Marron & Sheather (1996) recommend this for general purposes |
Standard deviation | s_{1} = 1.06 * (standard deviation) * n**(-1/5) |
Interquartile range | s_{2} = 0.79 * (inter quartile range) * n**(-1/5) |
Min(Std Dev,IQ range) | s_{3} = 0.90 * minimum(standard deviation, interquartile range/1.34) * n**(-1/5) |
Given | You provide your own estimate for the bandwidth in the associated field along side the dropdown list |
The s_{1},s_{2} and s_{3} estimates of bandwidth are popular due to their simplicity and are optimal in some sense for data from a normal distribution.
Proportions for quantiles
Proportions at which to calculate quantiles of the kernel density estimate. This is either a comma or space separated list of numbers or may be the name of an existing variate.
Action buttons
Run | Process the Kernel Density Estimation on the selected data. |
Options | Opens the Kernel Density Estimation Options dialog to allow various options to be set which control the output and graph. |
Save | Open the Kernel Density Estimation Save Options dialog to specify save structures for the analysis. |
Cancel | Close the dialog without running any more analyses. |
Defaults | Reset all options to their default values. |
Action Icons
Pin | Controls whether to keep the dialog open when you click Run. When the pin is down the dialog will remain open, otherwise when the pin is up the dialog will close. | |
Restore | Restore names into edit fields and default settings. | |
Clear | Clear all fields and list boxes. | |
Help | Open the Help topic for this dialog. |
See also
- Kernel Density Estimation Options
- Kernel Density Estimation Save Options
- KERNELDENSITY procedure in command mode
- Probability Distribution Plot
- Fit Distribution
- Further details of distributions
- Empirical Distribution Tests menu
- DISTRIBUTION directive in command mode