home *** CD-ROM | disk | FTP | other *** search
- History of CODEBOOK.BAS by Jim Groeneveld:
- -----------------------------------------------------------------------------
- NIPG TNO - - - - - <work> - - - - -|- <home> - - - - -| GROENEVELD@HDETNO51
- Postbus 124 | Wassenaarseweg 56 | Schoolweg 14 | JIM%RULTNO@HDETNO51
- 2300 AC Leiden | 2333 AL Leiden | 8071 BC Nunspeet | TNOSUR::GROENEVELD
- Nederland (NL) 071-178810 | 03412-60413 | RULTNO::JIM
- -----------------------------------------------------------------------------
-
- vs. 0.0a, 19 October 1988: initial version, called UNFORMAT.BAS.
- vs. 0.0b, 20 October 1988:
- - added question for (missing) value to replace entirely blank
- fields; this value may be alphanumeric, if it contains
- blanks, comma's, etc. it should be surrounded by double
- quotes; originally this value was fixed to -1.
- - removed question for record length of input database file;
- record length is now determined by the program and
- only used to determine the end of a record.
- Because it is determined for each record separately
- records may actually be of variable length, though
- should be sufficient to read specific columns from.
- Space is reserved for a record length of maximally 1274.
- - improved feedback on screen while processing cases.
- - added question for (max.) number of variables per output file.
- vs. 0.0c, 26 October 1988:
- - added possibility to read comment lines NOT starting with a SPACE
- - improved error report on field widths not matching columns:
- added reporting (STATGRAPHICS) variable name.
- vs. 0.0d, 27 October 1988:
- - added check for database file name with numerical extension,
- which is reserved for output file names.
- - added default values for "missing" value and number of variables.
- - added check for existing output files and question whether to
- overwrite them or not.
- vs. 0.0e, 1 November 1988:
- - corrected possible misinterpretations while 'reading' past EOL
- and report of such occurrences.
- - added check for unequal record lengths and appropriate report.
- - added checks for illegal field widths and columns and report.
- vs. 0.0f, 8 November 1988:
- - added question for number of variables in order to reserve space
- up to a number of 32767.
- - added question for maximum record length in order to reserve
- space and check for exceeding of this maximum. This maximum
- may be any number up to 32767*255-1=8355584.
- vs. 0.0g, 9 November 1988:
- - added optional automatic adaptation of maximum record length
- to actual maximum record length up to 32767*255-1=8355584.
- The actual maximum record length (to determine the number
- of data lines per record) is determined from the columns
- to be read from the codebook file as well as during
- reading the actual records.
- - added optional automatic adaptation of maximum number of
- variables to the actual number of variables up to a
- maximum of 32767. This number is deduced from the codebook
- file and is being updated during the run every time it is
- necessary by increasing it by 10. This process, however,
- slows down execution time significantly with more than 100
- variables.
- - removed report of record length of first case.
- - added report of minimum and maximum record lengths read.
- vs. 0.1, 13 July 1989:
- - changed original program name UNFORMAT.BAS into CODEBOOK.BAS.
- - changed increment of 10 with auto-adapt to actual number of variables
- into 100 (may be varied by changing a constant in the program source).
- - corrected ability to use lengths and columns > 32767 up to 8355584 by
- changing certain appropriate integer variables into single precision
- variables. (Actually values up to 32767 were possible before.)
- - corrected ability to use specific counts > 32767 up to almost infinite by
- changing certain appropriate integer variables into double precision
- variables. (Actually values up to 32767 were possible before.)
- - added optional removing of leading and trailing spaces of field values.
- - added choice between BLANK or COMMA delimited output file(s).
- - removed limit of 64 variables per output file (limit now is 32767).
- - changed default of 10 variables per output file into 58 (for STATGRAPHICS).
- vs. 0.2, 17 July 1989:
- - added optional check for (case sensitive) identical variable names.
- - added enclosing within single or double quotes of character values from
- character variables with a single or double quote in the first column of
- the corresponding description lines within the codebook file;
- for use with values containing characters like spaces, comma's and quotes;
- embedded quotes are doubled, but may not always be readable as such by
- application programs, this is for the user's own concern.
- With this feature all possible character values may be converted now.
- - some improved (more specific) error reports.
- vs. 0.3, 24 July 1989:
- - added check for number of output files. Because that number will be the
- extension of the output file, it may not exceed 999. It is calculated
- from the total number of variables in the codebook file and the user
- specified (maximum) number of variables per output file. If the number is
- larger than 999 a minimum number of variables per output file will be
- calculated and displayed.
- - added warnings for time consuming garbage collection and auto-adaptation.
- vs. 0.4, 25 July 1989:
- - added default responses for all possible prompts and changed some prompts.
- - removed prompt for maximum record length. Maximum record length now is set
- initially at a minimum value of 254 (MAX.LINE.INPUT.LENGTH-1) and is
- adapted to the actual necessary length automatically deduced from the
- codebook file. This length now only specifies the maximum column number
- to interpret. Input records may now be of an 'infinite' length. The
- remaining part of each record is processed, but not interpreted.
- Additionally some single precision variables have necessarily been changed
- into double precision variables.
- - changed increment for automatic adaptation to actual number of variables
- from 100 to the initial (negative, user specified) number of variables.
- - added adding of spaces to values from incomplete fields (reading past EOL),
- eventually being replaced by the missing value(s).
- vs. 1.0, 26 July 1989:
- - added possibility of specifying a global missing value consisting of one
- or more spaces.
- - removed limit of 10 character variable names, limit now is 255!
- - added an additional output file type: FIXED formatted (next to BLANK and
- COMMA delimited) in which all values, the missing value and variable name
- of one variable have the same output field width (eventually truncated
- from the left or right justified). All fields are contiguous. This offers
- the possibility to extract values of a limited set of variables from an
- original fixed formatted database file into another fixed formatted file.
- The quote specification (the first column in the codebook file) is ignored.
- - added another additional output file type: Report, as some special case of
- a Fixed formatted file, but with additional empty columns (1..9) between
- the fields. These 'empty' columns are eventually used to fit the variable
- name or missing value in, which is longer than the actual field width.
- Additionally added prompt for page length in lines, default 60.
- - made placement of a header line with variable names in the output files
- optional, default present with BLANK and COMMA delimited and Report output
- files and not present with FIXED formatted output files.
- - completely redesigned and rewritten algorithm for file name checking,
- which wasn't correct for subdirectory names; improved error report.
-
- Possible future features (if necessary enough):
- + inclusion of optional output of automatic CaseNumbers as the first variable
- of each output file.
- + addition of optional additional (second) line (record) with missing values for
- each variable, though I don't know of any programs using this info.
- + specification of maximum output record length instead of number of variables
- per output file (as a negative value, default -640). For each output file the
- maximum number of variables that will fit within this length will be
- calculated from the maximum per variable of the actual record length, the
- delimiter length, the lengths of the variable name and missing value and the
- length of the extra spaces in Report output files. Requires many extra
- calculations or extra array space remembering either the maximum field width
- for each variable as described above or the number of variables in each of the
- max. 999 output files. (The maximum field width may then also be used to
- improve the automation of the generation of REPORT type output files.)
- + specification of delimiting character in REPORT type output files (space,|).
- + specification of number of extra delimiting spaces within Report type output
- files per variable in the codebook file (requires additional large array).
- + inclusion of optional page numbers, date and time per page of Report output.
- + counting the number (and calculating the fraction) of missing values
- (contiguous spaces) for each variable.
- + recoding facilities other than for only blank fields (would require many
- extra arrays that take up valuable memory space).
-