Simtel MSDOS 1992 June

home *** CD-ROM | disk | FTP | other *** search

/ Simtel MSDOS 1992 June / SIMTEL_0692.cdr / msdos / statstcs / codebook.arc / CODEBOOK.TXT < prev next >

Wrap

Text File | 1989-07-26 | 9.6 KB | 152 lines

History of CODEBOOK.BAS by Jim Groeneveld: ----------------------------------------------------------------------------- NIPG TNO - - - - - <work> - - - - -|- <home> - - - - -| GROENEVELD@HDETNO51 Postbus 124 | Wassenaarseweg 56 | Schoolweg 14 | JIM%RULTNO@HDETNO51 2300 AC Leiden | 2333 AL Leiden | 8071 BC Nunspeet | TNOSUR::GROENEVELD Nederland (NL) 071-178810 | 03412-60413 | RULTNO::JIM ----------------------------------------------------------------------------- vs. 0.0a, 19 October 1988: initial version, called UNFORMAT.BAS. vs. 0.0b, 20 October 1988: - added question for (missing) value to replace entirely blank fields; this value may be alphanumeric, if it contains blanks, comma's, etc. it should be surrounded by double quotes; originally this value was fixed to -1. - removed question for record length of input database file; record length is now determined by the program and only used to determine the end of a record. Because it is determined for each record separately records may actually be of variable length, though should be sufficient to read specific columns from. Space is reserved for a record length of maximally 1274. - improved feedback on screen while processing cases. - added question for (max.) number of variables per output file. vs. 0.0c, 26 October 1988: - added possibility to read comment lines NOT starting with a SPACE - improved error report on field widths not matching columns: added reporting (STATGRAPHICS) variable name. vs. 0.0d, 27 October 1988: - added check for database file name with numerical extension, which is reserved for output file names. - added default values for "missing" value and number of variables. - added check for existing output files and question whether to overwrite them or not. vs. 0.0e, 1 November 1988: - corrected possible misinterpretations while 'reading' past EOL and report of such occurrences. - added check for unequal record lengths and appropriate report. - added checks for illegal field widths and columns and report. vs. 0.0f, 8 November 1988: - added question for number of variables in order to reserve space up to a number of 32767. - added question for maximum record length in order to reserve space and check for exceeding of this maximum. This maximum may be any number up to 32767*255-1=8355584. vs. 0.0g, 9 November 1988: - added optional automatic adaptation of maximum record length to actual maximum record length up to 32767*255-1=8355584. The actual maximum record length (to determine the number of data lines per record) is determined from the columns to be read from the codebook file as well as during reading the actual records. - added optional automatic adaptation of maximum number of variables to the actual number of variables up to a maximum of 32767. This number is deduced from the codebook file and is being updated during the run every time it is necessary by increasing it by 10. This process, however, slows down execution time significantly with more than 100 variables. - removed report of record length of first case. - added report of minimum and maximum record lengths read. vs. 0.1, 13 July 1989: - changed original program name UNFORMAT.BAS into CODEBOOK.BAS. - changed increment of 10 with auto-adapt to actual number of variables into 100 (may be varied by changing a constant in the program source). - corrected ability to use lengths and columns > 32767 up to 8355584 by changing certain appropriate integer variables into single precision variables. (Actually values up to 32767 were possible before.) - corrected ability to use specific counts > 32767 up to almost infinite by changing certain appropriate integer variables into double precision variables. (Actually values up to 32767 were possible before.) - added optional removing of leading and trailing spaces of field values. - added choice between BLANK or COMMA delimited output file(s). - removed limit of 64 variables per output file (limit now is 32767). - changed default of 10 variables per output file into 58 (for STATGRAPHICS). vs. 0.2, 17 July 1989: - added optional check for (case sensitive) identical variable names. - added enclosing within single or double quotes of character values from character variables with a single or double quote in the first column of the corresponding description lines within the codebook file; for use with values containing characters like spaces, comma's and quotes; embedded quotes are doubled, but may not always be readable as such by application programs, this is for the user's own concern. With this feature all possible character values may be converted now. - some improved (more specific) error reports. vs. 0.3, 24 July 1989: - added check for number of output files. Because that number will be the extension of the output file, it may not exceed 999. It is calculated from the total number of variables in the codebook file and the user specified (maximum) number of variables per output file. If the number is larger than 999 a minimum number of variables per output file will be calculated and displayed. - added warnings for time consuming garbage collection and auto-adaptation. vs. 0.4, 25 July 1989: - added default responses for all possible prompts and changed some prompts. - removed prompt for maximum record length. Maximum record length now is set initially at a minimum value of 254 (MAX.LINE.INPUT.LENGTH-1) and is adapted to the actual necessary length automatically deduced from the codebook file. This length now only specifies the maximum column number to interpret. Input records may now be of an 'infinite' length. The remaining part of each record is processed, but not interpreted. Additionally some single precision variables have necessarily been changed into double precision variables. - changed increment for automatic adaptation to actual number of variables from 100 to the initial (negative, user specified) number of variables. - added adding of spaces to values from incomplete fields (reading past EOL), eventually being replaced by the missing value(s). vs. 1.0, 26 July 1989: - added possibility of specifying a global missing value consisting of one or more spaces. - removed limit of 10 character variable names, limit now is 255! - added an additional output file type: FIXED formatted (next to BLANK and COMMA delimited) in which all values, the missing value and variable name of one variable have the same output field width (eventually truncated from the left or right justified). All fields are contiguous. This offers the possibility to extract values of a limited set of variables from an original fixed formatted database file into another fixed formatted file. The quote specification (the first column in the codebook file) is ignored. - added another additional output file type: Report, as some special case of a Fixed formatted file, but with additional empty columns (1..9) between the fields. These 'empty' columns are eventually used to fit the variable name or missing value in, which is longer than the actual field width. Additionally added prompt for page length in lines, default 60. - made placement of a header line with variable names in the output files optional, default present with BLANK and COMMA delimited and Report output files and not present with FIXED formatted output files. - completely redesigned and rewritten algorithm for file name checking, which wasn't correct for subdirectory names; improved error report. Possible future features (if necessary enough): + inclusion of optional output of automatic CaseNumbers as the first variable of each output file. + addition of optional additional (second) line (record) with missing values for each variable, though I don't know of any programs using this info. + specification of maximum output record length instead of number of variables per output file (as a negative value, default -640). For each output file the maximum number of variables that will fit within this length will be calculated from the maximum per variable of the actual record length, the delimiter length, the lengths of the variable name and missing value and the length of the extra spaces in Report output files. Requires many extra calculations or extra array space remembering either the maximum field width for each variable as described above or the number of variables in each of the max. 999 output files. (The maximum field width may then also be used to improve the automation of the generation of REPORT type output files.) + specification of delimiting character in REPORT type output files (space,|). + specification of number of extra delimiting spaces within Report type output files per variable in the codebook file (requires additional large array). + inclusion of optional page numbers, date and time per page of Report output. + counting the number (and calculating the fraction) of missing values (contiguous spaces) for each variable. + recoding facilities other than for only blank fields (would require many extra arrays that take up valuable memory space).