Lists the data matrix in abbreviated form.
Options
GROUPS = factor |
Defines groupings of the units; used to split the printed table at appropriate places and to label the groups; default * |
---|---|
UNITS = text or variate |
Names for the rows (i.e. units) of the table; default * |
Parameters
DATA = variates or factors |
The data variables |
---|---|
TEST = string tokens |
Test type, defining how each variable is treated in the calculation of the similarity between each unit (simplematching , jaccard , russellrao , dice , antidice , sneathsokal , rogerstanimoto , cityblock , manhattan , ecological , euclidean , pythagorean , minkowski , divergence , canberra , braycurtis , soergel ); default * ignores that variable |
RANGE = scalars |
Range of possible values of each variable; if omitted, the observed range is taken |
Description
HLIST
lists the values of the data matrix in a condensed form, either in their original order or, more usefully, in the order determined by a cluster analysis (see HCLUSTER
). This representation can be very helpful for revealing patterns in the data, associated with clusters, or for an initial scan of the data to pick out interesting features of the variables.
The DATA
parameter specifies a list of variates or factors, all of which must be of the same length. The TEST
parameter specifies a list of strings, one for each variate or factor in the DATA
parameter list, to define the “type” of each one. This is similar to the TEST
parameter used in FSIMILARITY
to determine how differences in variate or factor values for each unit contribute to the overall similarity between units. However, HLIST
distinguishes only between qualitative variables (factors or variates with settings simplematching - rogerstanimoto
) and quantitative variables (variates with other settings). The values of qualitative variates are printed directly. If the range of a quantitative variate is greater than 10, the printed values are scaled to lie in the range 0 to 10. This scaling is done by subtracting the minimum value, dividing by the range and then multiplying by 10. If the range is less than 10, the values are printed unscaled; so quantitative variates with values that are all less than 1 will appear as 0 in the abbreviated table. The values are printed with no decimal places, and in a field-width of 3.
The RANGE
parameter contains a list of scalars, one for each variable in the DATA
list. This allows you to check that the values of each variable lie within the given range. The range is also used to standardize quantitative variates, so that you can impose a standard range for example when variates are measured on commensurate scales. You can omit the RANGE
parameter for all or any of the variables by giving a missing identifier or a scalar with a missing value; Genstat then uses the observed range.
The UNITS
option allows you to change the labelling of the units in the table; you can specify a text or a pointer or a variate.
You can use the GROUPS
option to specify a factor that will split the units into groups. The table from HLIST
is then divided into sections corresponding to the groups. If the factor has labels, these are used to annotate the sections; otherwise a group number is used.
Options: GROUPS
, UNITS
.
Parameters: DATA
, TEST
, RANGE
.
Action with RESTRICT
You can restrict any of the DATA
variates or factors to list only a subset of the units. If more than one of these is restricted, then they must all be restricted to the same set of units.
See also
Directives: HCLUSTER
, HDISPLAY
, HSUMMARIZE
.
Commands for: Multivariate and cluster analysis.
Example
" Genstat example HCLU-1: Cluster analysis Data from 'Observers Book of Automobiles', 1986 16 Italian cars and 10 measurements: 1. engine capacity c.c. CC 2. number of cylinders NCyl 3. fuel tank litres Tank 4. unladen weight kg Wt 5. length cm Length 6. width cm Width 7. height cm Ht 8. wheelbase cm Wbase 9. top speed kph TSpeed 10. time to 100kph secs StSt 11. carburettor/inj/diesel 1/2/3 Carb 12. front/rear wheel drive 1/2 Drive " TEXT [VALUES=Estate,'Arna1.5','Alfa2.5',Mondialqc,Testarossa,Croma,\ Panda,Regatta,Regattad,Uno,X19,Contach,Delta,Thema,Y10,Spider] Cars POINTER [VALUES=CC,NCyl,Tank,Wt,Length,Width,Ht,WBase,TSpeed,StSt,\ Carb,Drive] Vars " Read the data - measurements and carnames - from the file 'HCLU-1.DAT', and then display it." OPEN '%gendir%/examples/HCLU-1.DAT'; CHANNEL=cardat READ [CHANNEL=cardat] Vars[] CLOSE cardat " Treat the number of cylinders, data[2], differently to the continuous measurements." HLIST [UNITS=Cars] \ Vars[]; TEST=4(cityblock,euclidean),2(cityblock,simplematching) " Form a hierarchical clustering of the cars, using the single linkage method." SYMMETRIC [ROWS=Cars] CarSim FSIMILARITY [SIMILARITY=CarSim]\ Vars[]; TEST=4(cityblock,euclidean),2(cityblock,simplematching) HCLUSTER [PRINT=amalgamations; METHOD=single] CarSim " Use the average-linkage method." HCLUSTER [PRINT=dendrogram; METHOD=average] CarSim;\ AMALGAMATIONS=Am; PERMUTATION=Perm " Display a high-resolution dendrogram." DDENDROGRAM [ORDERING=given] DATA=Am; PERMUTATION=Perm; LABELS=Cars;\ TITLE='Italian cars clustered by average linkage'