Reads data from an input file, an unformatted file or a text.
Options
PRINT = string tokens |
What to print (data , errors , summary ); default erro , summ |
---|---|
CHANNEL = identifier |
Channel number of file, or text structure from which to read data; default current file |
SERIAL = string token |
Whether structures are in serial order, i.e. all values of the first structure, then all of the second, and so on (yes , no ); default no , i.e. values in parallel |
SETNVALUES = string token |
Whether to set number of values of vectors from the number of values read (yes , no ); default no causes the number of values to be set only for structures whose lengths are not defined already (e.g. by declaration or by UNITS ) |
LAYOUT = string token |
How values are presented (separated , fixedfield ); default sepa |
END = text |
What string terminates data (* means there is no terminator); default ‘:’ |
SEQUENTIAL = scalar |
To store the number of units read (negative if terminator is met); default * |
ADD = string token |
Whether to add values to existing values (yes , no ); default no (available only in serial read) |
MISSING = text |
What character represents missing values; default '*' |
SKIP = scalar |
Number of characters (LAYOUT=fixe ) or values (LAYOUT=sepa ) to be skipped between units (* means skip to next record); default 0 (available only in parallel read) |
BLANK = string token |
Interpretation of blank fields with LAYOUT=fixe (missing , zero , error ); default miss |
JUSTIFIED = string tokens |
How values are to be assumed justified with LAYOUT=fixe (left , right ); default righ |
ERRORS = scalar |
How many errors to allow in the data before reporting a fault rather than a warning, a negative setting, –n, causes reading of data to stop after the nth error; default 0 |
FORMAT = variate |
Allows a format to be specified for situations where the layout varies for different units, option SKIP and parameters FIELDWIDTH and SKIP are then ignored (in the variate: 0 switches to fixed format; 0.1, 0.2, 0.3 or 0.4 to free format with space, comma, colon or semi-colon respectively as separators; * skips to the beginning of the next line; in fixed format, a positive integer n indicates an item in a field width of n, –n skips n characters; in free format, n indicates n items, –n skips n items); default * |
QUIT = scalar |
Channel number of file to return to after a fatal error; default * i.e. current input file |
UNFORMATTED = string token |
Whether file is unformatted (yes , no ); default no |
REWIND = string token |
Whether to rewind the file before reading (yes , no ); default no |
SEPARATOR = text |
Text containing the (single) character to be used in free format; default ' ' |
SETLEVELS = string token |
Whether to define factor levels or labels (according to the setting of FREPRESENTATION ) automatically from those that occur in the data (yes , no ); default no causes them to be set only when they are not defined already |
TRUNCATE = string tokens |
Truncation of leading or trailing spaces of strings read in fixed format (leading , trailing ); default * i.e. none |
CASE = string token |
Whether the case of letters (small and capital) should be regarded as significant or ignored when forming factor labels automatically (significant , ignored ); default sign |
LDIRECTION = string token |
How to define the ordering of levels or labels when these are formed automatically (ascending , given ); default asce |
Parameters
STRUCTURE = identifiers |
Structures into which to read the data |
---|---|
FIELDWIDTH = scalars |
Field width from which to read values of each structure (LAYOUT=fixe only) |
DECIMALS = scalars |
Number of decimal places for numerical data containing no decimal points |
SKIP = scalars |
Number of values (LAYOUT=sepa ) or characters (LAYOUT=fixe ) to skip before reading a value |
FREPRESENTATION = string tokens |
How factor values are represented (labels , levels , ordinals ); default leve |
Description
Data values can be read into any Genstat data structure using the READ
directive. In its simplest form, you merely list the structure whose values are to be read: for example
READ Weight
The data values for Weight
are then assumed to come on the following line or lines. They are assumed to be in free format, separated one from another by one or more spaces or tabs or new lines, and to be terminated by a colon.
READ
has a PRINT
option with settings:
summary |
to print a summary of the data |
---|---|
data |
to print a copy of the input lines |
errors |
to print a detailed report on any errors in the data |
By default PRINT=summary,errors
.
The CHANNEL
option allows you to read data from another file; this must already have been opened (see the OPEN
directive). You can also read data from a Genstat text structure. Each line of input is then treated as if it had been read from a file. Note: you should use CHANNEL
if you want to use READ
in an IF
or CASE
structure, a FOR
loop or a procedure.
You can read values for more than one structure in a single READ
statement. The values can be taken either serially or in parallel. The default is to take the values in parallel: the first element of each structure is read, then the second element of each, until all the data are read. For example:
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4 :
or
a1 b1 c1 a2
b2 c2
a3 b3 c3 a4 b4 c4 :
Here A
, B
and C
are in parallel, each with four values. The complete set of values for all three structures is given, followed by one terminating colon. The term parallel merely indicates the order in which READ
is to read the values: that is, the first element of each structure, then the second element of each, and so on. It is not necessary for the data to be laid out in neat columns, although this may make a data file easier to work with. Different types of structures can be read in parallel and they may have different kinds of values (numerical or text).
Alternatively, you can set option SERIAL=yes
to read the structures in series. Then all the values of the first structure are read, followed by all the values for the second structure, and so on, until all the data structures have been read. For example
x1 x2 x3 :
y1 y2 :
z1 z2 z3 z4 z5 z6 :
Here all the values of X
are given first, followed by all the values for Y
, and then all the values for Z
. Unlike the parallel layout, each set of values must end with the terminating colon, so that READ
can tell when to move on to the next structure; this means that the structures can be of different lengths.
When you are working interactively, Genstat produces a prompt indicating the name of the data structure and the unit number of the next value it expects to read. If Genstat knows how many values to expect, it will terminate the input automatically, without asking for the terminating colon, if the last value is at the end of a line. However, it is quite correct to include the colon at the end of that line of data if you want. If you type too many values by mistake you will get a warning message telling you that the extra data has been ignored.
If a structure whose values are to be read has not already been declared, Genstat will define it automatically as a variate. Likewise, if the length of a vector is undefined, this too will be set automatically. READ
first checks whether the vector is being read in parallel with other vectors whose lengths have been defined, then it looks to see if a default length has been defined for vectors using the UNITS
directive. If neither of these is available to define the length, it is set to the number of data values that are provided in the input. Lengths of vectors can also be redefined according to the number of data values that are read, by setting option SETNVALUES=yes
. The END
option allows you to define another string of characters to be used instead of a colon to mark the end of the data, or you can set END=*
to indicate that there is no terminating string.
The values of numerical structures (scalars, variates, matrices, symmetric and diagonal matrices and tables) can be entered in any of the standard forms: for example
1.20 -.2 3e1 -1.25E-2 27
are all valid.
Textual values (strings) in free format must be enclosed within single quotes if they contain any characters that have special meaning to READ
(space, tab, comma, colon, asterisk, backslash, single or double quote). The quotes can be omitted for other strings. For example:
TEXT [NVALUES=5] Country
READ Country
Australia Canada 'Great Britain' U.S.A. 'New Zealand' :
The rules for strings in READ
are thus slightly different to those for lists of strings, where quotes are required for any string that does not start with a letter or contains any character other than letters or digits. Thus Newcastle-on-Tyne
and 500Km
are both valid when read in as data, but not in a TEXT
declaration. Rules for strings in fixed format are described later.
The values of factors are usually represented by their levels. You can change this by setting the FREPRESENTATION
parameter. If you set it to labels
, READ
will accept as values the labels of the factor, using the same rules as for reading textual strings. The strings given as data values must match exactly the labels of the factor if they have been declared. The setting FREPRESENTATION=ordinals
causes READ
to expect an integer in the range 1 up to n, the number of levels declared for the factor. As FREPRESENTATION
is a parameter it can be set to a list of values which are cycled in parallel with the structures to be read. Thus, you are allowed to read several factors in one READ
statement, possibly using a different method for reading each one. The setting of this parameter is ignored for any structures that are not factors, but remember that the list will still be cycled in parallel with these other structures.
If you set option SETLEVELS=yes
, READ
will set up the factor levels or labels according to the values that it finds when reading the data. By default it distinguishes between capital and small letters when forming factor labels, but you can set option CASE
=ignored to ignore the case of letters. Also, by default the levels or labels are sorted into ascending order, but you can set option LDIRECTION=given
to leave them in the order in which they are found in the data file.
The values of pointers are identifiers, that is, names of other data structures. When reading a pointer only simple identifiers are allowed: suffixes cannot be used. For example, Winston
is allowed but Orwell[1984]
is not.
You cannot read formulae or expressions directly. The easiest way to do this is to read the required value into a text which can then be used in an appropriate declaration using either the macro-substitution symbols ##
or the EXECUTE
directive. You cannot read values into compound data structures; these should be formed using the appropriate directives or by reading their components individually.
By default, a missing value should be indicated by an asterisk (*
); this means that any data item that begins with *
is treated as missing. For example, any of the three strings
* *** *789
will be treated as missing. You can use the MISSING
option to change this to any other single character; for example, if you set MISSING='-'
then any negative numbers will be read as missing values.
In free format, values are usually separated by spaces or tabs. The SEPARATOR
option can be used to specify another character to use as a separator. For example you can use a comma:
READ [SEPARATOR=','] Weights
24.3, 25.6, 57.3, 43.8, 45.3,
46.5, 47.9, 97.0, 77.5, 64.3 :
You can use spaces and tabs in addition to the specified separator, so long as the separator is present between each pair of values (except at the end of line, when it may be omitted).
The SEPARATOR,
END
and MISSING
strings are all case-sensitive; for example, END=enddata
is different from END=EndData
. The missing-value and separator characters must be distinct and neither may be part of the END
string.
In free format, the SKIP
option can be used to skip values between complete units of data. For example, with a file in channel 2 containing five columns of data, the statement
READ [CHANNEL=2; SKIP=3] X,Y
would read X
and Y
from the first two columns, and then skip the final three columns: Genstat reads the first value for X
and Y
, the next three values are skipped before reading the second value of X
; so READ
moves onto the next line of the file, and so on. You can also set SKIP=*
to skip directly to the next line of data; you could use this if there were varying numbers of additional columns in the file. By default, SKIP
is zero, so no values are skipped. The SKIP
parameter is interpreted in parallel with the structures whose values are to be read, and indicates how many values should be skipped before reading the value for the corresponding structure.
In fixed format, data values are arranged in specific fields on each line of the file. Each field consists of a fixed number of characters. There is no need for separating spaces; the tab character is not permitted, nor are comments. So, depending on how the fields are defined, the sequence of digits 123456
could be interpreted for example as the single number 123456, or two numbers 123 and 456, or three numbers 123, 4 and 56. Data like this are usually produced by special-purpose programs or equipment; for example, automatic data recorders.
To read data in fixed format you set the LAYOUT
option to fixed
, and then specify the format to be used. If the values for a structure always occupy the same number of character positions, you can do this with the FIELDWIDTH
parameter. For example,
READ [CHANNEL=2; LAYOUT=fixed] Weight,Height; FIELDWIDTH=3,5
takes data from channel 2 in fixed format. The data are in parallel: that is, reading across lines of the file, values for Weight
and Height
appear alternately. The FIELDWIDTH
parameter is processed in parallel with the structures to be read, so each item of Weight
data takes up three characters, and each item of Height
data takes up five. If the fieldwidth for a structure is not constant, that is if different layouts are used for different units of the data, then you need to use the FORMAT
option, described later.
Suppose there are 80 characters per line in the file; each pair of Weight
and Height
values takes up 8, and so you have 10 pairs per line. The first line looks like:
Weight1Height1Weight2Height2 ... Weight10Height10
Suppose that the first two values for Weight were 1 and 200, and that the first two for Height were 10 and 1200. Then, using ⊔ to represent a space, the first four items on this line would be:
⊔⊔1⊔⊔⊔10200⊔1200
Genstat is able to identify the separate values 10 and 200 because it is reading a fixed number of characters for each structure.
Genstat input files have a nominal width, set by default to 80. This can be altered by an OPEN
statement to a different value if necessary. When reading in fixed format, each line of input is taken to be exactly this width; shorter lines are extended with spaces (blanks). It is important to make sure that you account for this when setting the options for READ
, otherwise you may read some values from these blank fields (the BLANK
option, described below, explains how the blank fields would be interpreted). In the example above, if the values for Height
occupied four characters instead of five there would be 11 pairs of values per line of 77 characters. Using the default settings, the final three characters on the first line would be read as the 12th value of Weight
, and READ
would then be out of step as the 12th value of Height
would be read in from the beginning of the next line. The simplest solution is to set the file width to 77 in the OPEN
statement, but you can also use the SKIP
option and parameter (see below) or the FORMAT
option to avoid this sort of problem.
When you are using fixed format, the data terminator must begin within the first field to be read after the final data value: so you must ensure that you set the field widths and position the terminator appropriately. If you are using either the SKIP
option or parameter, you must take care not to skip accidentally over the terminator, as READ
will continue to take input – and probably generate many error messages.
Normally Genstat treats a blank field in fixed-format data as a missing value, and the only indication will be in the count of missing values in the printed summary. You can request warning messages for blank fields by setting the option BLANK=error
. Alternatively, you can cause blanks to be interpreted as zeroes, by setting BLANK=zero
.
Data in fixed format are normally taken to be right-justified: that is, their right-hand ends are flush with the right-hand end of the field; you can have either blanks or leading zeroes (for numbers) in the redundant spaces at the left of the field. You can change this default by setting the JUSTIFIED
option. For example the value 123 can appear in a field of width 5 as:
⊔⊔ 123 JUSTIFIED=right |
there may be leading blanks (the default), |
---|---|
123⊔⊔ JUSTIFIED=left |
there may be trailing blanks |
00123 JUSTIFIED=left,right |
there must be no blanks, or |
⊔123⊔ JUSTIFIED=* |
there may be leading or trailing blanks. |
In this way, JUSTIFIED
allows you to check the blanks in each field. If a data field contains any blanks that are not allowed by the current setting, an error will be reported. Note that when reading numerical data embedded blanks are never permitted. So a field containing, for example 1⊔2⊔3
, will always produce an error message.
As an example, we can read the values of five scalars using a fixed format with values left-justified in their fields by the following:
SCALAR V,W,X,Y,Z
READ [LAYOUT=fixed;JUSTIFIED=left] V,W,X,Y,Z;\
FIELDWIDTH=4,5,7,4,5
1.235.62⊔678.9⊔⊔3.7810.31:
This reads the values 1.23, 5.62, 678.9, 3.78 and 10.31 into V
, W
, X
, Y
and Z
respectively.
The general principles of the SKIP
option and parameter are discussed in the context of a free format read in the previous section. When reading in fixed format the same ideas apply, but the SKIP
settings now specify numbers of characters to be ignored, instead of numbers of values. Thus, you can obtain exactly the same effect as in the example above by putting
READ [LAYOUT=fixed] V,W,X,Y,Z; FIELDWIDTH=4,4,5,4,5;\
SKIP=0,0,1,2,0
Sometimes fixed format data can be further compressed by omitting the decimal point. The DECIMALS
parameter allows you to re-scale data automatically when it is read (in either fixed of free format).
When reading textual data in fixed format, the contents of each field are taken exactly as they appear in the input file. There is no need to enclose values in quotes; in fact if you do so, the quotes are treated as part of the data. For example,
TEXT [NVALUES=1] T1,T2,T3,T4
READ [LAYOUT=fixed; SKIP=*] T1,T2,T3,T4; FIELDWIDTH=6,3,4,7
'What's⊔it⊔all⊔about?':
gives text T1
the value 'What's
, text T2
the value ⊔it
, text T3
the value ⊔all
, and text T4
the value ⊔about?'
. Consequently, the only way to represent a missing string in fixed format is by a blank field, as ''
or *
would both be treated literally and stored as data values.
The TRUNCATE
option has settings leading
and trailing
, allowing you to remove initial or trailing spaces in strings that are read in fixed format. For example, if we set TRUNCATE=leading
above, T2
would just contain the two letters it
. By default no truncation takes place.
The rules for reading textual data in fixed format also affect the reading of factors. If you set FREPRESENTATION=labels
and do not request any truncation, the width of the field must equal the number of characters in the label, as for example no⊔
is not the same as no
.
The FORMAT
option allows you to use use a variable format. By this we mean that the layout of the values may vary from unit to unit of the data, and may also vary within each unit. For example, suppose you have some meteorological data which was measured daily and that the file also contains some additional summary values at the end of each week. The first eleven lines are reproduced to illustrate the structure of the file:
Monday 5.5 -0.4 0.0 1.9 10.0
Tuesday -1.1 -2.1 0.0 0.0 34.0
Wednesday 0.6 -8.3 1.3 5.4 142.0
Thursday 6.8 -5.7 1.1 0.0 158.0
Friday 10.6 0.5 8.1 0.0 141.0
Saturday 10.7 6.4 8.3 0.0 152.0
Sunday 10.0 1.9 1.0 0.1 237.0
Summary week 1> 10.7 -8.3 4 19.8 7.4 10.0 124.8 237.0
Monday 9.9 2.5 0.0 4.4 229.0
Tuesday 11.4 2.1 8.5 0.3 237.0
Wednesday 11.9 6.3 18.7 0.0 520.0
Suppose the file contains data for 28 days. If you try to read a text and five variates of length 28 then the summaries found after the 7th, 14th, 21st and 28th days would cause an error in READ
. You need to read seven lines, skip one, read seven more, and so on. This can be done by setting the option FORMAT=!( (6)7,*,* )
. This means “read six values, do this seven times, skip to the next line, skip again, then return to the beginning of the format and repeat, until enough data has been read”. The format is made clear by using (6)7
which corresponds to the physical layout of the data, but 42
could have been specified instead, meaning read the next 42 values.
You can use FORMAT
when reading in either free format or fixed format, and can also switch between the two during the READ
. When you have set FORMAT
, Genstat ignores the SKIP
option and the FIELDWIDTH
and SKIP
parameters, and READ
is controlled entirely by the values of the FORMAT
. These values are not in parallel with the list of structures: they apply to data values in turn, recycling from the beginning when necessary. You set FORMAT
to a variate, which may be declared in advance or can be an unnamed structure as shown above. Each value of this variate is interpreted as follows (where n is a positive integer):
+n read n values (in free format) or one value from a field of n characters (in fixed format);
-n skip the next n values (in free format) or n characters (in fixed format)
* skip to the beginning of the next line
0.0 switch to fixed format
0.1 switch to free format using space as a separator
0.2 switch to free format using comma as a separator
0.3 switch to free format using colon as a separator
0.4 switch to free format using semicolon as a separator
0.5 switch to free format using the setting of the SEPARATOR
option
Using the FORMAT
variate READ
will start in either free format or fixed format, according to the setting of LAYOUT
(by default, LAYOUT=separated
; that is, free format). You can switch between these at any time by specifying a value in the range 0-0.5. Remember that if you use free format, spaces and tabs can also be used in addition to the specified separator, and you must use a separator that is distinct from the END
and MISSING
indicators.
You can read from unformatted files by setting option UNFORMATTED=yes
. The only options that are then relevant are CHANNEL
, REWIND
and SERIAL
. Details of how to create the unformatted files are given in the description of the PRINT
directive.
If you have more data to read than can be stored in the space available within Genstat, you can use the SEQUENTIAL
option of READ
to process the data in smaller batches. This works by reading in some of the data, partially processing it to form an intermediate result, and then overwriting the original data with a new batch that is used to update the intermediate results. This can be repeated until all the data has been read and the final summary is obtained. There are two directives that include facilities specifically designed to work with sequential data input: TABULATE
which forms tabular summaries, and FSSPM
which forms SSPM
data structures for use in linear regression. You can also use other directives, such as CALCULATE
, to process data sequentially, but you will have to program the sequential aspects yourself.
You should first declare the structures to be of some convenient size, such that you will not use up all the work space. You then use READ
as normal, but with the SEQUENTIAL
option set to the identifier of a scalar, which will be used to keep track of how the input is progressing. For example, to read in 10 variates of length 272500:
VARIATE [NVALUES=10000] X[1...10]
READ [CHANNEL=2; SEQUENTIAL=N] [1...10]
The number of values declared for X[1...10]
defines the size of batch to read (10000 in this example). So, READ
will read the first 10000 units of data (100,000 values), and set N
to 10000 to indicate that is the number of units read. This should be followed by the statements to process the first batch of data, then the READ
can be repeated. Once again N
is set to 10000, indicating that another 10000 units have been read. This can be continued until READ
finds the data terminator, when it sets the sequential indicator to minus the number of values found in the last batch. If this is less than the declared size of the data structures they will be filled out with missing values. In the example given above, after the 28th READ
the variates will each contain 2500 values followed by 7500 missing values, and N
will be set to -2500, indicating that all the data has been read and that the final batch contains only 2500 values. Usually you will use the SEQUENTIAL
facility in conjunction with FSSPM
or TABULATE
which are designed to recognize the different settings of the scalar N
.
The SEQUENTIAL
option is best used within a FOR
loop. You should set the NTIMES
option to a value large enough to ensure that sufficient batches of data are read. The loop should contain the READ
statement and any other statements required to process the data. For example
VARIATE [NVALUES=10000] X[1...10]
SSPM [TERMS=X[]] S
FOR [NTIMES=9999]
READ [PRINT=*;CHANNEL=2;SEQUENTIAL=N] X[]
FSSPM [SEQUENTIAL=N] S
EXIT N.LE.0
ENDFOR
The EXIT
directive is used to jump out of the loop once all the data has been read and processed; this is safer than trying to program an exact number of iterations for the loop. The exit condition includes the case when N
is equal to zero, as this will arise when the batch size exactly divides the total number of units. In the above example, if there were 280000 units of data altogether, the 28th READ
would terminate with N
set to 10000. This is because READ
is unable to look ahead for the terminator, as there may be other statements in the loop, such as SKIP
, which affect how the file is read. The next READ
would immediately find the data terminator, so would exit with N
set to zero. This special case is treated appropriately by FSSPM
and TABULATE
, but you should remember to allow for it if you are programming the sequential processing explicitly.
You can use the SEQUENTIAL
option to read data from more than one input channel, perhaps when a large data set is split into two or more files, but you are not allowed to read data from the current input channel (that is, the channel containing the READ
statement). If you want to process several structures sequentially from the same file, you must read them in parallel. You must also be careful not to modify the value of the scalar, N
, within the loop when using sequential data input with FSSPM
or TABULATE
, as that could interfere with the sequential processing.
Another means of handling large amounts of data is provided by the ADD
option. This allows you to add values to those already stored in a structure, thus forming cumulative totals without having to store all the individual data values. You must set SERIAL=yes
with ADD=yes
; and it is allowed only for variates. For example:
VARIATE [NVALUES=6] A
READ [ADD=yes; SERIAL=yes] 3(A)
5 12 9 * * 9 :
8 1 3 * 2 10 :
3 4 0 * 11 * :
This starts by assigning the values 5, 12, 9, *, *, and 9 to A
. Then A
is read again, and its values become 13, 13, 12, *, 2, 19: with ADD=yes
(and only then) missing values are interpreted as zeroes when being added to non-missing values. Finally A
contains the values 16, 17, 12, *, 13, 19.
If you have used the UNITS
directive to specify a variate or text containing unit labels, READ
will respect the order of these values when reading other structures in parallel with the units structure; in other words the data are re-ordered to match the order of the unit labels. If the units structure does not already have values, READ
will define order of the units as the order in which it finds them in the data. This means that if you are reading several sets of data, each having a column for the unit number (or label), the first use of READ
will define the unit order and subsequent READ
statements will ensure that this order is maintained consistently in the remaining data. If a value is specified more than once when defining the units structure, READ
will only ever locate the first occurrence of that unit label. If a unit label is repeated in the data then only the final set of values corresponding to that unit will be stored; earlier occurrences are overwritten by subsequent ones. If you try to read a value that is not present in the units structure this is regarded as a fault. Also, if the units structure contains missing values it cannot be used to re-order the data and will instead be overwritten by the new values: a warning message is printed out to tell you if this occurs. If you use the option SETNVALUES=yes
when reading structures in parallel with the units vector, the other structures will all be set to the current unit length.
When you are working interactively and typing data from the keyboard, READ
will halt immediately it finds an invalid value. You should type the correct value and then continue with the rest of the data. If you had typed several items of data then all those before the erroneous value will have been read and stored, but any remaining values will have been discarded, and so will need to be retyped. When you are reading data in batch, it is not possible to recover from errors in this way. Instead, READ
will continue processing the data, substituting missing values for any data that it cannot read, and printing out a message for every error that is found.
If errors occur when running in batch, a fault will be generated when READ
terminates, thus terminating the job. This is to avoid spurious output being produced from analyses based on incorrect data. You can override this by using the options ERRORS
and QUIT
. If you set ERRORS=n
, where n is a positive integer, then up to n errors are allowed in the data before READ
generates a fault. You might want to do this if you knew certain items of data were going to generate errors, but were prepared to accept them as missing values so that you could analyse the rest of the data. Obviously, you need to be very careful when doing this, as there may be other unexpected errors in the data. Usually you would have to try reading the data once without setting ERRORS
, so you could check all the messages, and find what value of n is appropriate. Then the READ
statement would have to be repeated, setting ERRORS
and REWIND
in order to read the data. For example, if missing values of a factor had been typed in as the letter X
, you would not want to define X
as an extra level of the factor, but if you set MISSING='X'
any numerical data that used *
for missing value could not be read either.
READ
produces a message for every data value that contains an error. This can be very useful, as you then have the opportunity to correct all the errors at once, before trying to read the data again. However, the error messages may not be due to errors in the data, but may be caused by an incorrectly specified READ
statement. For example, if you are reading many structures in parallel and specify texts and variates in the wrong order in the list of structures to be read, you will get an error message every time Genstat finds a piece of text rather than a number in the position specified for a variate. This is not likely to be a problem, unless you are reading large amounts of data, when you might end up with thousands of lines of needless error messages. A sensible precaution then is to request Genstat to abort the READ
if more than a specified number of errors occur. You can do this by setting ERRORS
to a negative integer, -n. This means that up to n errors are allowed in the data, but READ
will abort if any more occur, switching control to the channel specified by QUIT
(that is, starting or continuing to read Genstat statements from that channel). If you are working in batch a fault will be generated that inhibits execution of further statements, but interactively you have the opportunity to examine the data that have been read in so far, which may help identify any problems in the original READ
statement or declarations of your data.
Options: PRINT
, CHANNEL
, SERIAL
, SETNVALUES
, LAYOUT
, END
, SEQUENTIAL
, ADD
, MISSING
, SKIP
, BLANK
, JUSTIFIED
, ERRORS
, FORMAT
, QUIT
, UNFORMATTED
, REWIND
, SEPARATOR
, SETLEVELS
, TRUNCATE
, CASE
, LDIRECTION
.
Parameters: STRUCTURE
, FIELDWIDTH
, DECIMALS
, SKIP
, FREPRESENTATION
.
Action with RESTRICT
READ
ignores any restrictions.
See also
Directives: OPEN
, COPY
, RETRIEVE
, SKIP
, SPLOAD
.
Procedures: FILEREAD
, IMPORT
, DBIMPORT
, TX2VARIATE
.
Commands for: Input and output.
Example
" Example READ-1: Reading parallel free-format data" " Open a data file on the second channel for input." OPEN '%gendir%/examples/READ-1.DAT'; CHANNEL=2; FILETYPE=input " Ignore the first three lines, and then read values of six variates, recorded in parallel (the default) in free format - allowing one error, (known to be harmless), which READ reports in a warning." SKIP [CHANNEL=2] 3 READ [CHANNEL=2; PRINT=data,error; ERROR=1] QQ[1...6] PRINT QQ[] " If you continue reading from the same file attached to a channel, you can read further data recorded after the first end-of-data marker." READ [CHANNEL=2; PRINT=data,error] ZZ PRINT ZZ " It is good practice to close files after you have finished with them." CLOSE 2 " The data can be recorded in the file with the commands: in this case, it must follow the READ command, or the end of a FOR loop if READ is in a loop, or the invocation of a procedure is the READ command is in a procedure." TEXT Text READ [PRINT=data,errors] Numbers,Text 23 Apples 22 Pears 31 Oranges 10 Bananas 4 Peaches : PRINT Numbers,Text