Select menu: Data | Load | Data file
This dialog provides additional options when data are being read in from either the Clipboard or from opening a CSV file. If the first row contains text values these will be used as column names in the Genstat spreadsheet, otherwise default column names will be generated (this behaviour can be modified using the options outlined below).
- From the menu select Data | Load then select an option.
Column names ending in an exclamation mark (!) will have this character removed and the column will be loaded as a factor. Likewise names ending with a dollar ($) or a hash (#) will be loaded as text or variates respectively. Characters in column names that would be invalid as a Genstat identifier name will be converted to underscore (_).
Sort factor levels
When selected, columns loaded as factors will have their levels (or labels for a text column) sorted into ascending order.
Suggest columns to be factors
Prompts you to convert columns to factors where the columns have repeated values, and fewer unique values than the specified number given in the Suggest converting columns with <= N unique items option on the Spreadsheet Options | Conversions tab.
Remove empty rows
Remove any rows from the spreadsheet that contain no data.
Remove empty columns
Remove any columns from the spreadsheet that contain no data.
Data contains variates & factors only
Any column containing text will be converted to either a variate or a factor. If the number of labels within the spreadsheet column is fewer than 20%, the column will be made into a variate, otherwise the column will be made into a factor. When reading the column as a variate, common errors, such as entering a letter O for the number 0 (or I for 1) will be fixed.
Column descriptions in row
The cells in the specified row number will be used for column descriptions (the EXTRA keyword for Genstat structures).
Ignore type markers (!#$) in column names
The markers (! for a factor, # for a variate and $ for a text) will be ignored and the contents of the column used to decide its column type (a variate for numbers only, otherwise a text). If one of the type markers is present in the column name the column will be read as the specified type.
Read column names in first row
Specifies where to take the columns names from.
Yes if all labels | If all the cells in the first row contain only labels or are missing, the column name will be from the first row. |
Yes | Use the cells in the first row for column names. If the Clipboard item contains a number, this is prefixed with “_” to make it a valid Genstat name. |
No | Do not use the first row for column names – instead generate default names. |
Commas
This controls how to treat commas in the text on the Clipboard.
Leave | Interpret the text with commas as normal. |
Change to decimal | Change the commas to decimal places. This may be required to convert data if they are represented in European numeric formatting where a comma is used for a decimal place. For example, 3,14 will be converted to 3.14. |
Remove | Remove the commas from the text. This may be required if commas have been used when representing numbers, for example 21,000. |
Missing value text
Lets you supply an alternative text to ‘*’ for missing values, so for example ‘NA’ could be used to represent a missing value.
Check columns for date values
When selected, Genstat checks all text columns to see if they contain data in date format. If columns appear to contain data in date format you are prompted to convert these to dates. The default setting for the check for dates values option can be set on the Spreadsheet Options | Conversions tab.
Date format
This specifies the default date format and base date for columns read in as dates, through being marked with a :D on the end of the column name. This opens the Date Format dialog.