home *** CD-ROM | disk | FTP | other *** search
- - 1 -
- 23rd December, 1990
-
- About TS2ST in General (Multiple regression analysis)
- ======================
-
- Apply question mark ? with the program call for a brief description of a
- program.
-
- This package may be used and distributed freely for NON-COMMERCIAL,
- NON-INSTITUTIONAL, PRIVATE purposes, provided it is not changed in any way.
- ┌────────────────────────────────────────────────────────────────────────────┐
- │ For ANY other usage (such as use in a business enterprise or a university) │
- │ or the full scale version contact the author for a personal or a site │
- │ license. │
- └────────────────────────────────────────────────────────────────────────────┘
- Please do not distribute any part of this package separately. Uploading to
- BBSes is encouraged.
-
- The registered version is strictly for the registrant only. Identical
- programs must NOT be running on more than one computer at a time. Site
- licensed programs must not be run outside the licensed site.
-
- The programs are under development. Comments and contacts are solicited. If
- you have any questions, please do not hesitate to use electronic mail for
- communication.
- InterNet address: ts@chyde.uwasa.fi (preferred)
- Funet address: GADO::SALMI
- Bitnet address: SALMI@FINFUN
- FidoNet address: 2:515/1 (Micro Maniacs Opus, To: Timo Salmi)
-
- The author shall not be liable to the user for any direct, indirect or
- consequential loss arising from the use of, or inability to use, any program
- or file howsoever caused. No warranty is given that the programs will work
- under all circumstances.
-
- Timo Salmi
- Professor of Accounting and Business Finance
- School of Business Studies, University of Vaasa
- P.O. BOX 297, SF-65101 Vaasa, Finland
-
- CONTENTS:
-
- 1. Acknowledgements
- 2. General Description
- 3. Release Notes
- 4. The Statistics Set for Other Computers
-
- - 2 -
-
-
- 1. ACKNOWLEDGEMENTS
-
- In developing and testing multiple regression programs for the VAX 11/750
- during 1983-86 I have had many useful discussions with my colleague Martti
- Luoma (Associate Professor of Statistics). This has directly benefited
- programming the current STATREGR program for PC compatibles.
- I have also had useful suggestions from Antti Kanto (Acting Associate
- Professor of Statistics) in developing STATREGR.
-
-
- 2. GENERAL DESCRIPTION
-
- STATREGR (Ver 1.8)
-
- STATistics: multiple REGRession analysis is part of the interactive
- statistical system by Timo Salmi. It is the second program in the set.
- The first program in the set is STATistical MEASures (STATMEAS in
- TS1STxx.ARC), which is intended for univariate analysis. The third program
- in the set is STATistics: TRANsformations (STATTRAN in TS3STxx.ARC), which
- can be used for transforming the observations, and, if necessary, also as an
- editor. The fourth program in the set is STATistics: Ranks and CORrelations
- (STATRCOR in TS4STxx.ARC). The fifth program in the set is STATistics: Least
- Absolute Deviation multiple Regression (STATLADR in TS5STxx.ARC).
-
- STATREGR includes a handy built-in help system, which can be invoked by
- typing ? at any interactive question. Because of this built-in help, and the
- interactive nature of the program's user interface, no long-winding
- instructions have been included. (Who reads instructions anyhow?)
-
- The program performs and ordinary least squares (OLS) multiple regression
- analysis, that is, estimates the coefficients of
- Y = a + b(1)X(1) + ... + b(M)X(M)
- from a set of observations. Furthermore, it draws (or rather writes)
- low-resolution scatter diagrams of the data, and the regression analysis.
-
- The data can either be given from the keyboard or taken from a file. If the
- input is to be taken from a file it must first be prepared with some editor,
- or some word processor which includes an option for preparing ordinary ascii
- text. (Also STATTRAN can be used for this purpose.)
- The data is given to the program in the following format:
-
- X1 X2 X3 !variable names (! denotes a comment)
- 3.56 6.32 -1.73
- 5.12 -4.21 9.18
- 14.2 5.11 0.31
- END !END is optional in a file
-
- A missing item in an observation is marked by a hash (#). E.g. if the first
- item of the second observation were missing, the observation should be
- written as # -4.21 9.18
- The items in an observation can be separated with blanks, as in the above,
- or with commas (,) e.g. 5.12,-4.21,9.18. The number of the intervening
- blanks is irrelevant, and can be customized for increased readability. Thus
- e.g. 5.12 -4.21 9.18 and 5.12 -4.21 9.18 are equivalent.
- A row can be continued using an ampersand (&). E.g. the variables could
- be given as
- X1 X2 &
- X3
- Alternatively, * or \ can be used instead of & as the continuation marker.
-
- Comments can be added to the input data. If ! appears on a line all text
- after ! will be considered as a comment.
-
- A header can be entered on each page if output is directed to a file.
- To accomplish this start the very first line on the input file with a
- double exclamation mark (!!) and the rest of the line will be used as the
- header. Thus !! indicates a header, a single ! an ordinary comment.
-
- The maximum number of variables is 25. The maximum number of observations
- is 400 (for each variable). The public domain version, however, sets the
- limits at 4 and 100 respectively.
-
-
- 3. RELEASE NOTES
-
- Version 1.1 of STATREGR includes some minor changes and corrections in the
- user interface.
-
- Version 1.2 of STATREGR introduces CGA high-resolution diagrams for the
- analysis. Unlike the low-resolution text-mode diagrams, the high-resolution
- diagrams will always be drawn on the screen. (In the former case, directing
- the output to a file is also possible).
- Further overflow checks have been added to prevent the program crashing
- because of bad data.
- Multiple regression estimates are based inverting a cross-product matrix
- of the observations. If the explaining variables are very similar
- (multicollinearity), the matrix will be nearly singular, and the estimates are
- very unstable. Further problems of significance can arise if the values of the
- explaining variables are of a very different scale. To test the reliability of
- the estimation results the cross-product matrix is multiplied by its computed
- inverse, and the result compared with a unit matrix, and the sum of absolute
- deviations is reported as ABS DEVIATION FROM UNIT MATRIX. The smaller this
- figure, the less probability of computationally weak estimates. Although
- seldom reported, this problem is inherent to most (even the top commercial)
- statistics packages.
- If the input file is not found, you have the choice of listing directories
- from within STATREGR. The directory routine has been rewritten for a more
- relaxed syntax. (For details see the information on DIRW in TSUTIL.INF in
- TSUTILxx.ARC version 1.8 or later.)
-
- In version 1.3 the program no longer crashes from an attempt to rewrite a
- write-protected file. Second, the user now has more control over the choice
- of the graphics driver in the high-resolution scatter diagrams. Apply ? at
- the question "USE CGA GRAPHICS DRIVER" for more information. Third, the
- t-values of the residuals have been included in the tableau giving the
- regressed values.
-
- In version 1.4 the program has been recompiled with some minor changes.
-
- Version 1.5: This version introduces input recall and line editing. The
- special keys Del, CrLf, CrRg, CrUp, Home, End, and Esc are functional for
- this purpose. PgUp is the recall key. Line editing uses insert mode.
- Disk access has been made faster (the program has its own cache).
- The directory routine has been updated.
- Read-only files can be read properly.
- The program size has been reduced by limiting the graphics to CGA, EGA, and
- VGA.
-
- Version 1.6: In compiling version 1.5 I made in error in setting the
- default heap size. This caused an out of memory condition, which now has been
- remedied. In line editing the Insert key has been made functional.
-
- Version 1.7: The regression line in the high resolution scatter diagram
- was incorrectly drawn for negative regression coefficients. My thanks are due
- to acting associate professor Roy Dahlstedt for pointing it out.
-
- Version 1.8: The line editing potential of the program has been improved.
- When the task is given from the keyboard, and continuation lines are used,
- the repeated input recall (CursorUp) gets each line in turn. Ctlr-C and
- break-key abort have been enhanced.
- A line can be continued using an &, or a * at the end. Now also the
- backslash \ is accepted. This was suggested by Tuomas Eerola.
- Some help texts within the program have been extended.
- Multiple regression estimates are based inverting a cross-product matrix of
- the observations. If the explaining variables are very similar (multi-
- collinearity), the matrix will be nearly singular, and the estimates are very
- unstable. Further problems of significance can arise if the values of the
- explaining variables are of a very different scale. To test the reliability
- of the estimation results the cross-product matrix is multiplied by its
- computed inverse, and the result compared with a unit matrix, and the square
- root of the sum of the squared deviations is reported as DEVIATION FROM UNIT
- MATRIX. (Earlier I used the sum of absolute deviations, but the norm, that is
- the square root of squared deviations, is theoretically better, since it can
- be cosidered the length of the deviation vector.) The smaller this figure,
- the less probability of computationally weak estimates. Although seldom
- reported, this problem is inherent to most (even the top commercial)
- statistics packages.
- Found and corrected an annoyingly elusive bug, that caused problems if the
- last observation in the data had items with more digits than 10.
-
- Version 1.9: Several improvements to the nuts and bolts of the user
- interface.
- The new usage of the call is
- PROGNAME [/h(elp)] [/iInputFileName] [/oOutputFileName] [/cColumnsPerRow]
- (the /c option, which regulates the width of the output, is for registered
- versions, only). If you use the /i switch, it stuffs the InputFileName into
- the appropriate recall buffer. This means that when the program asks you for
- the input file name, you can invoke the input file name just by pressing the
- CursorUp key. (The same goes for the /o switch, respectively.) This is very
- convenient, if you use the program many times successively making small
- changes in your data in between. (This assumes, of course, that you have a
- command line editor like DOSEDIT or CED to recall previous MsDos commands.
- These common shareware programs can be obtained from any well-stocked BBS or
- FTP site.)
- The printer readiness test has been rewritten to be more general. The
- earlier test failed for some printers, because the codes the printers send
- when they are offline are not standardized.
- The "file exists, overwrite?" question is no more asked when the output
- file is prn, in other words when the output is directed to the printer.
- The user has now a choice of a left margin from 0 to 20 blanks when output
- is directed to the printer.
- The user has now a choice between formfeed and four blank lines to start
- each new page of output.
- When an input file is not found, the user is given the choice of listing a
- directory. The directory routine has been rewritten.
- The file ready message now also includes the file side besides the name.
-
-
- 4. THE STATISTICS SET FOR OTHER COMPUTERS
-
- The Statistical programs by Timo Salmi are also available for the Sinclair
- QL computer. Named STATPREP the system is part of a Public Domain library for
- QUANTA members. The descriptions of the files in the Quanta Library are given
- in STATMEAS.INF contained in TS1STxx.ARC, i.e. the first part of the set
- statistical programs.
-