home *** CD-ROM | disk | FTP | other *** search
- 23rd December, 1990
-
- About TS5ST in General (Least absolute deviation multiple regression)
- ======================
-
- Contents:
- 1. Introduction
- 2. Genenral description of statladr
- 3. Standard errors and goodness of fit statistics
- 4. Release notes
-
-
- 1. INTRODUCTION
-
- Apply question mark ? with the program call for a brief description of a
- program.
-
- This package may be used and distributed freely for NON-COMMERCIAL,
- NON-INSTITUTIONAL, PRIVATE purposes, provided it is not changed in any way.
- ┌────────────────────────────────────────────────────────────────────────────┐
- │ For ANY other usage (such as use in a business enterprise or a university) │
- │ or the full scale version contact the author for a personal or a site │
- │ license. │
- └────────────────────────────────────────────────────────────────────────────┘
- Please do not distribute any part of this package separately. Uploading to
- BBSes is encouraged.
-
- The registered version is strictly for the registrant only. Identical
- programs must NOT be running on more than one computer at a time. Site
- licensed programs must not be run outside the licensed site.
-
- The programs are under development. Comments and contacts are solicited. If
- you have any questions, please do not hesitate to use electronic mail for
- communication.
- InterNet address: ts@chyde.uwasa.fi (preferred)
- Funet address: GADO::SALMI
- Bitnet address: SALMI@FINFUN
- FidoNet address: 2:515/1 (Micro Maniacs Opus, To: Timo Salmi)
-
- The author shall not be liable to the user for any direct, indirect or
- consequential loss arising from the use of, or inability to use, any program
- or file howsoever caused. No warranty is given that the programs will work
- under all circumstances.
-
- Timo Salmi (in collaboration with Seppo Pynnönen)
- Professor of Accounting and Business Finance
- School of Business Studies, University of Vaasa
- P.O. BOX 297, SF-65101 Vaasa, Finland
-
-
- 2. GENERAL DESCRIPTION OF STATLADR (Ver. 1.1)
-
- STATistics: Least Absolute Deviation multiple REGRession analysis is part
- of the interactive statistical system by Timo Salmi. It is the fifth program
- in the set. The first program in the set is STATistical MEASures (STATMEAS in
- TS1STxx.ARC), which is intended for univariate analysis. The second program
- in the set is STATistics: multiple REGRession analysis (TS2STxx.ARC). The
- third program in the set is STATistics: TRANsformations (STATTRAN in
- TS3STxx.ARC), which can be used for transforming the observations, and, if
- necessary, also as an editor. The fourth program in the set is STATistics:
- Ranks and CORrelations (STATRCOR in TS4STxx.ARC).
-
- STATLADR includes a handy built-in help system, which can be invoked by
- typing ? at any interactive question. Because of this built-in help, and the
- interactive nature of the program's user interface, no long-winding
- instructions have been included. (Who reads instructions anyhow?)
-
- The program performs least absolute deviation (LAD) multiple regression
- analysis, that is, estimates the coefficients of
- Y = a + b(1)X(1) + ... + b(M)X(M)
- from a set of observations. Whereas in ordinary least squares estimation
- (OLS) the sum of squared deviations between the observations and the
- regression equation is minimized, in LAD estimation the sum of the absolute
- deviations between the observations and the regression equation is minimized.
- Least absolute deviation multiple regression is thus equivalent to the
- following linear goal programming programming problem:
-
- n
- Min Sum (Pj + Nj)
- j=1
-
- subject to
- ┌────┬─ absolute deviation
- n │ │
- a + Sum x(i,j)b(i) + Pj + Nj = y(j)
- j=1 │ │
- └─ explaining variables └─ dependent variable
-
- STATLADR finds the estimates of the intercept [a] and the regression
- coefficients [b(i)] by solving this linear goal programming problem.
- If the explaining variables are very similar (multicollinearity), problems
- tend to occur both in OLS and LAD regression estimation, and the estimates
- become very unstable. Further problems of significance can arise if the
- values of the explaining variables are of a very different scale. To test the
- reliability of the solution algorithm to inaccuracy indexes are computed and
- displayed. These are called the NON-OPTIMALITY OF THE LP SOLUTION and
- INACCURACY OF THE LP SOLUTION. The nearer to zero these figures, the less
- probability of computationally weak estimates. Although seldom reported,
- these problems are inherent to most (even the top commercial) statistics
- packages. For those in the know, the former index is the sum of positive
- coefficients in the optimal simplex-tableau. Mathematically they all are
- non-positive, but round-offs may cause some of them remain small positive
- numbers. The latter is based on the recalculating the optimal simplex-tableau
- on from the inverse of the basis matrix, and calculating the deviation of
- each item in the recalculated optimal simplex-tableau as compared with the
- original optimal simplex tableau. The inaccuracy indexes are calculated as a
- so called norm, that is the square root of the sum of the squared deviations.
- This measure is used because mathematically it represents the length of the
- deviation vector.
-
- Furthermore, STATLADR draws both low-resolution and high-resolution
- scatter diagrams of the data, and of the regression analysis results. The
- low-resolution scatter diagrams are drawn, or rather written, using ordinary
- ascii text, and they can thus be directed to a file. The high-resolution
- (graphics) scatter diagrams can only be displayed on the screen.
-
- The data can either be given from the keyboard or taken from a file. If the
- input is to be taken from a file it must first be prepared with some editor,
- or some word processor which includes an option for preparing ordinary ascii
- text. (Also STATTRAN can be used for this purpose.)
- The data is given to the program in the following format:
-
- X1 X2 X3 !variable names (! denotes a comment)
- 3.56 6.32 -1.73
- 5.12 -4.21 9.18
- 14.2 5.11 0.31
- END !END is optional in a file
-
- A missing item in an observation is marked by a hash (#). E.g. if the first
- item of the second observation were missing, the observation should be
- written as # -4.21 9.18
-
- The items in an observation can be separated with blanks, as in the above,
- or with commas (,) e.g. 5.12,-4.21,9.18. The number of the intervening
- blanks is irrelevant, and can be customized for increased readability. Thus
- e.g. 5.12 -4.21 9.18 and 5.12 -4.21 9.18 are equivalent.
- A row can be continued using an ampersand (&). E.g. the variables could
- be given as
- X1 X2 &
- X3
- Alternatively, * or \ can be used instead of & as the continuation marker.
-
- Comments can be added to the input data. If ! appears on a line all text
- after ! will be considered as a comment.
- A header can be entered on each page if output is directed to a file.
- To accomplish this start the very first line on the input file with a
- double exclamation mark (!!) and the rest of the line will be used as the
- header. Thus !! indicates a header, a single ! an ordinary comment.
-
- The maximum number of variables is 25. The maximum number of observations
- is 100 (for each variable). The public domain version, however, sets the
- limits at 4 and 50 respectively.
-
-
- 3. STANDARD ERRORS AND GOODNESS OF FIT STATISTICS
-
- This chapter describes the formulas of the new features that were added
- to statladr.exe in the updated version 1.1. This chapter has been written by
- Seppo Pynnönen.
-
- The standard errors of the estimates of the regression coefficients are
- calculated as
-
- jj
- std(b) = s * X'X ,
-
-
- where X is the n x (M+1) data matrix of x variables with vector of ones in
- jj
- the first column, X'X denotes the j:th diagonal element of the inverse of
- the X'X-matrix and the prime (') stands for the transpose, s is an estimate
- of the standard error of the residual terms of the regression model. (n
- stands for the number of observations, and M for the number of explanatory
- variables.) Here we have defined the standard error (s) of the residuals as
-
- 1
- s = ------,
- 2f(m)
-
- where
- 2d
- f(m) = -------------------
- n(e - e )
- (m+d) (m-d)
-
- with d defined below, e denote the ordered residuals, and m is the median
- (j)
- point of the ordered residuals. The parameter d depends on the sample size.
- In the literature it is suggested that it should be kept small. Here we have
- adopted the following convention and defined d as
-
- d = max[1, n'/6],
-
- where n' = n-M-1 (i.e., the number of residuals which are not zero by
- definition due to the LP-solution).
-
- The t-values are defined as b(j)/std(b) (j = 0, 1, ..., M, with b(0),
- the intercept term), where std(b) is defined in the previous paragraphs.
-
- The LAD coefficient of determination is defined as
-
- Sum |e(i)|
- i
- LAD COEFFICIENT OF DETERMINATION = 1 - ------------------
- Sum |y(i) - Md(y)|
- i
-
- (cf. the R-square in the OLS-regression), where Md(y) is the median of y.
-
-
- 4. RELEASE NOTES
-
- Version 1.1: The most important inprovements were descibed in the previous
- chapter.
- Furthermore, I have corrected a bug, which decreased the maximum capacity
- of the program by one observation.
- Some stytistic minor imporvements have also been made.
-
- Version 1.2: Several improvements to the nuts and bolts of the user
- interface.
- The new usage of the call is
- PROGNAME [/h(elp)] [/iInputFileName] [/oOutputFileName] [/cColumnsPerRow]
- (the /c option, which regulates the width of the output, is for registered
- versions, only). If you use the /i switch, it stuffs the InputFileName into
- the appropriate recall buffer. This means that when the program asks you for
- the input file name, you can invoke the input file name just by pressing the
- CursorUp key. (The same goes for the /o switch, respectively.) This is very
- convenient, if you use the program many times successively making small
- changes in your data in between. (This assumes, of course, that you have a
- command line editor like DOSEDIT or CED to recall previous MsDos commands.
- These common shareware programs can be obtained from any well-stocked BBS or
- FTP site.)
- The printer readiness test has been rewritten to be more general. The
- earlier test failed for some printers, because the codes the printers send
- when they are offline are not standardized.
- The "file exists, overwrite?" question is no more asked when the output
- file is prn, in other words when the output is directed to the printer.
- The user has now a choice of a left margin from 0 to 20 blanks when output
- is directed to the printer.
- The user has now a choice between formfeed and four blank lines to start
- each new page of output.
- When an input file is not found, the user is given the choice of listing a
- directory. The directory routine has been rewritten.
- The file ready message now also includes the file side besides the name.
-