home *** CD-ROM | disk | FTP | other *** search
- NAME
- smooth - split linear smoothing
-
- SYNOPSIS
- smooth [file] [options]
-
- USAGE
- By default, SMOOTH reads pairs of numbers (x- and y-values)
- from the standard input (or the given file), fits a smooth
- curve to the points, and writes to the standard output points
- from the smooth curve.
-
- Two smoothing algorithms are available. By default, the curve
- is calculated using the "lowness" procedure developed by W. S.
- Cleveland (see below). This technique achieves robustness by
- decreasing weights on data points which are far from the fitted
- line. An alternate procedure due to Art Owen is also provided
- (with the -s switch). This technique smooths the data while
- preserving sharp discontinuities in slope or value.
-
- As with GRAPH, each pair of points may optionally be followed
- by a comment. If the comment is surrounded by quotes "...",
- the comment may contain spaces. The given points, and their
- comments if any, will be included in the output. The
- interpolation may optionally be restarted after each label, so
- that a family of curves may be processed together (see the -b
- switch).
-
- Input lines starting with ";" are copied to the beginning of
- the output file but are otherwise ignored. Blank lines are
- ignored.
-
- OPTIONS
- Options can appear anywhere on the command line.
-
- -a [step [start]] automatic abscissas
- -b break smooth after each label
- -c general curve
- -f <num> for "lowness", the fraction of points to use for
- each fitted value (default .5)
- -n <num> for "lowness", the number of points to use for
- each fitted value (default 50%)
- -r print residuals rather than smoothed values
- -s split linear fit rather than "lowness"
- -xl take logs of x values before smoothing
- -yl take logs of y values before smoothing
- -zl take logs of z values before interpolating
- (implies -3)
- -3 3D case: x, y, and z given for each point
-
- If the -c switch is not used, the input points must be from a
- function - that is, the x values must be strictly increasing.
- The output points will also be from a function. (If the -b
- switch is used, this restriction applies only within each
- segment.)
-
- If the -c switch is used (indicating a general curve), the
- input points need not be from a function, but each pair of
- points must be separated from the previous pair by a finite
- distance. (If the -b switch is used, this restriction applies
- only within each segment.)
-
- The -f or -n switch designate the number of data points used to
- calculate a given smoothed value. The larger the number, the
- smoother the resulting curve. It is not possible to specify
- the number in terms of a range of the independent variable
- (e.g. a "time constant"). Therefore, these methods are
- appropriate when the density of data points is approximately
- constant, or else the density is higher in the "interesting"
- (i.e. rapidly changing) part of the curve.
-
- The distinction between the -f and -n switch becomes useful
- only when there are several data sets. Suppose one had two
- data sets for the same range of independent variables, and that
- one set had twice the number of data points as the other. For
- equivalent treatment, one could smooth the two sets with the
- same value for the -f switch.
-
- On the other hand, suppose two sets of data have data points at
- the same density, but that one set covered twice the range of
- independent variable (and therefore had twice as many data
- points). For equivalent smoothing, one could use the -n switch
- with the same value in each case.
-
- For general curves, the given x- and y- (and z-, if present) points
- are regarded as functions of the distance along a smoothed path. This
- doesn't work very well for split linear smoothing, since it tends to
- conceal abrupt changes in position. However, the split linear smooth
- is still able to preserve abrupt changes in the first derivative.
-
- METHODS
- Lowness by W. S. Cleveland, and split linear fit by A. Owen
-
- Lowness
-
- Robust locally weighted regression is a method for
- smoothing a scatterplot, (x[i], y[i]), i=1,...,n, in
- which the fitted value at x[k] is the value of a
- polynomial fit to the data using weighted least
- squares, where the weight for (x[i], y[i]) is large if
- x[i] is close to x[k]. Robustness is added by
- calculating residuals and repeating the procedure with
- reduced weights on points with large residuals.
-
- Reference:
- W. S. Cleveland, "Robust Locally Weighted Regression
- and Smoothing Scatterplots", Journal of the American
- Statistical Association, v74, n368, p829 (Dec 79)
-
-
- Split Linear Smoothing Algorithm
-
- Given:
- A list of window sizes, SizeList, and n pairs (x[i],y[i]) sorted on x,
-
- Returns:
- the split linear smooth of y on x.
-
-
- The general technique is due to Art Owen, who offers this
- discussion:
-
- "You should feel free to experiment with the
- algorithm, since it has some ad hoc parts. The
- essentials are: to use uncentered windows of
- varying sizes along with the central ones, to
- get zero weight on the worst fitting lines, and
- to make the weight attached to a particular
- line size and orientation vary smoothly as one
- traverses the data. We tried to find a simple
- way to meet all of these goals; the algorithm
- we settled on was the simplest that worked for
- us. ...
-
- "West and Chan et. al. are useful for getting
- numerically stable updating formulae for the
- regressions."
-
- references...
-
- John Alan McDonald and Art B. Owen, "Smoothing with
- Split Linear Fits", LCS Technical Report No.
- 7, SLAC-PUB 3423, AD-A149032, Laboratory for
- Computational Statistics, Dept. of Statistics,
- Stanford University, July 1984.
-
-
- West, D.H.D., 1979, Updating Mean and Variance
- Estimates: An Improved Method, Communications
- of the ACM, v 22, no. 9 p 532-535 (1979).
-
- Chan, T.F., Golub, G.H., and Leveque, R.J., 1983,
- Algorithms for Computing the Sample Variance:
- Analysis and Recommendations, The American
- Statistician v 37, p 242-247 (1983).
-
- IMPLEMENTATION
-
- The implementation of the split linear smoothing is based on
- pseudocode by Art Owen.
-
- The arrays take a lot of space. For n points, the number of
- doubles is approximately 38*n, plus 2*n for general curve, plus
- 2*n for 3D case. For 100 points and 8 byte doubles, this means
- at least 8*38*100=30400 bytes.
-
- Execution time... The program will employ a numeric
- coprocessor if it is available, but will run correctly without
- it. Time for "lowness" is proportional to the square of the
- number of data points. 101 points took 151 seconds on a 7.5
- MHz V-20, with no 8087, but only 0.98 seconds on a 20 MHz 80386
- with an 80387. Time for split linear smoothing increases
- slightly faster than linearly in the number of data points.
-
- The updating formulas mentioned by Art Owen are not used in
- this program. The selection of window sizes (a geometric
- sequence) is my own. -JVZ
-
- EXAMPLES
- The file ROUGH contains data points from sin(x) with one abrupt
- phase reversal (creating a discontinuity) and some added noise.
- To see the effect of the two algorithms, try
-
- C>smooth rough -f .2 >rlow
- C>smooth rough -s >rsl
-
- Then display all three files with GRAPH...
-
- C>graph rough rlow rsl -m -32 10 20
-
- Note how the split linear smooth preserved the discontinuity
- whereas "lowness" smoothed it out somewhat.
-
- The file SP contains points from a general curve...
-
- C>smooth sp -f .2 -c >splow
- C>smooth sp -s -c >spsl
- C>graph sp splow spsl -m 1 10 20
-
- This input file has a discontinuity in the first derivative
- which the split linear smooth was able to preserve.
-
- AUTHOR
- Copyright (c) 1987, 1991 by James R. Van Zandt
- (jrv@mbunix.mitre.org) 27 Spencer Dr., Nashua NH 03062,
- 603-888-2272. Resale forbidden, copying for personal use
- encouraged. Constructive comments welcome.
-
-