Joins or merges two sets of vectors together, based on the values of sets of classifying keys (C.F. Johnston & D.B. Baird).

### Options

`NINDEX` = scalar |
Number of index vectors in structures (up to 10); default 1 |
---|---|

`METHOD` = string token |
Type of join (`inner` , `left` , `right` , `full` ); default `full` |

`REPEATS` = string token |
How to handle repeats of matches (`combinations` , `single` ); default `sing` outputs one row per match |

`INCLUDE` = string token |
How to handle restrictions on the input vectors (`all` , `nonrestricted` ); default `all` uses all the data rows |

`SORT` = string token |
Whether `NEWVECTORS` should be sorted on the index vectors (`ascending` , `descending` , `unsorted` ); default `unsorted` keeps the same ordering as the input sets |

### Parameters

`LEFTVECTORS` = pointer |
Pointer to a list of vectors in left set (keys and variables) |
---|---|

`RIGHTVECTORS` = pointer |
Pointer to a list of vectors in right set (keys and variables) |

`NEWVECTORS` = pointer |
Pointer to a list of output vectors (keys and variables) |

### Description

This procedure can be used to produce a set of `NEWVECTORS`

, which is the result of joining (or merging) two sets according to index (or key) vectors in each set. `JOIN`

supports SQL style joins, as well as merges, as implemented in Genstat for Windows, SAS and SPSS.

The number of index vectors is given by the `NINDEX`

option (up to 10). Each of `LEFTVECTORS`

and `RIGHTVECTORS`

is a pointer to `NINDEX`

keys followed by any number of extra vectors. The `NEWVECTORS`

parameter is a pointer to `NINDEX`

keys followed by the total number of non-index vectors in the two input sets. The output order in `NEWVECTORS`

will be the combined keys from the left and right sets, then the non-index vectors from the left set, followed by the non-index vectors from the right set. You need not have declared the pointer already; it will be declared automatically if necessary. The vectors may be variates, factors or texts. Warnings are given if the types of index vectors in each set do not match, although a factor can be matched with a text. Attempting to match a text with a factor or variate will result in a fault.

The `METHOD`

option controls the type of join and determines which rows from each input set will be output. `METHOD=inner`

outputs only those rows where the keys from both sets match. `METHOD=left`

outputs all rows from the `LEFTVECTORS`

set and only those rows from `RIGHTVECTORS`

where the keys from both sets match. `METHOD=right`

outputs all rows from the `RIGHTVECTORS`

set and only those rows from `LEFTVECTORS`

where the keys from both sets match. `METHOD=full`

outputs all rows from both sets. Where keys do not match, missing values are inserted into the non-index vectors from the set without that key value.

The `REPEATS`

option determines what happens when both input sets have repeats of the same matching key values. `REPEATS=single`

outputs one row for each match, so that if there are `M`

repeats in `LEFTVECTORS`

and `N`

repeats in `RIGHTVECTORS`

, `MAX(M,N)`

rows will be output. This is the same behaviour as the merge statements of SAS and SPSS and the Merge Spreadsheets menu of Genstat *for Windows*. `REPEATS=combinations`

outputs all combinations of the repeats, giving `M*N`

rows. This is equivalent to an SQL join and may produce very large output sets.

The `INCLUDE=nonrestricted`

option allows the use of restrictions on the vectors in each input set to be used to subset the rows. The `SORT=unsorted`

option allows the resulting vectors to be returned in the original order of the input data set, or sorted on the key vectors in either ascending or descending direction.

**Note**: this procedure may take some time to complete for joins of large data sets. You should also ensure the data space is large enough for the resultant vectors, especially if using the option `REPEATS=combinations`

.

Options: `NINDEX`

, `METHOD`

, `REPEATS`

, `INCLUDE`

, `SORT`

.

Parameters: `LEFTVECTORS`

, `RIGHTVECTORS`

, `NEWVECTORS`

.

### Method

The `LEFTVECTORS`

and `RIGHTVECTORS`

are sorted by the index variables. If there are restrictions on the vectors in either input set, this is used to subset the input vectors if `INCLUDE=nonrestricted`

. For each of the rows of the two sets, the keys are compared and output rows are appended according to the `METHOD`

option. If repeats of the same matching key values in both sets occur and `REPEATS=combinations`

the procedure loops through all combinations of the matching rows. The resulting vectors are then sorted into the order specified in the `SORT`

option.

### Action with `RESTRICT`

Any of the input vectors may be restricted. If `INCLUDE=nonrestricted`

, only those rows which are not excluded by any restriction on vectors in each input set will be processed, otherwise the restrictions will be ignored.

### See also

Directive: `EQUATE`

.

Procedures: `APPEND`

, `STACK`

, `UNSTACK`

, `VEQUATE`

.

Commands for: Calculations and manipulation.

### Example

CAPTION 'JOIN example'; STYLE=meta VARIATE [NVALUES=7] K11,K12,V11 TEXT [NVALUES=7] T13 READ K11,K12,V11,V12,T13 1 1 1 6.2 'Red' 1 1 2 5.7 'Green' 1 1 3 4.5 'Blue' 1 2 4 7.3 'Red' 2 1 5 4.1 'Yellow' 3 2 6 5.1 'Blue' 3 3 7 1.9 'Black' : VARIATE [NVALUES=5] K21,K22,V21 READ K21,K22,V21 1 1 1 1 1 2 2 1 3 2 3 4 5 2 5 : PRINT K11,K12,V11,V12,T13; DECIMALS=3(0),1,0 & K21,K22,V21; DECIMALS=0 JOIN [NINDEX=2; METHOD=inner; REPEATS=single]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=left; REPEATS=single]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=right; REPEATS=single]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=full; REPEATS=single; SORT=descending]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 JOIN [NINDEX=2; METHOD=full; REPEATS=comb; SORT=ascending]\ LEFT=!p(K11,K12,V11,V12,T13); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,T3,V4) PRINT K1,K2,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,0,1,0,0 RESTRICT K11; K11 < 3 JOIN [NINDEX=2; METHOD=full; REPEATS=comb; INCLUDE=nonrestricted]\ LEFT=!p(K11,K12,V11,V12); RIGHT=!p(K21,K22,V21);\ NEW=!p(K1,K2,V1,V2,V3) PRINT K1,K2,V1,V2,V3; FIELDWIDTH=8; DECIMALS=0,0,0,1,0 RESTRICT K21; K21 < 3 JOIN [NINDEX=1; METHOD=full; REPEATS=single; INCLUDE=all] \ LEFT=!p(K11,V11,V12,T13); RIGHT=!p(K21,V21);\ NEW=!p(K1,V1,V2,T3,V4) PRINT K1,V1,V2,T3,V4; FIELDWIDTH=8; DECIMALS=0,0,1,0,0