home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!munnari.oz.au!spool.mu.edu!agate!usenet.ins.cwru.edu!magnus.acs.ohio-state.edu!cis.ohio-state.edu!news.sei.cmu.edu!bb3.andrew.cmu.edu!crabapple.srv.cs.cmu.edu!news
- From: sef@sef-pmax.slisp.cs.cmu.edu
- Subject: Re: Help, statistics and learning algorithms
- Message-ID: <C1IMz4.MtH.1@cs.cmu.edu>
- Sender: news@cs.cmu.edu (Usenet News System)
- Nntp-Posting-Host: sef-pmax.slisp.cs.cmu.edu
- Organization: School of Computer Science, Carnegie Mellon
- Date: Wed, 27 Jan 1993 14:01:46 GMT
- Lines: 72
-
-
- From: tstanley@lamar.ColoState.EDU (Thomas R Stanley)
-
- My question is, what are the statistical equivalents of the
- supervised learning methods commonly used in constructing neural nets?
- More precisely, are there statistical equivalents to the following (see,
- I did check the FAQ first :) learning methods:
-
- ...
-
- I found one reference that said BP was equivalent to least squares
- fitting. Is this true of the rest of the methods? If these methods are
- really just variations on classical statistical procedures, then (and here
- is where I show my ignorance) what do I have to gain by using neural nets?
- Why should I expect these procedures to perform better than a parametric
- method (let's assume for the sake of argument the assumptions of the
- method (e.g. normality) are met) where there exists maximum likelihood
- estimators (MLE's) for the parameters (i.e. weights at the processing
- element or node)? MLE's guarantee unbiasedness and efficiency (minimum
- variance), you can't get better than that can you?
-
- Very brief answer: If there were exact statistical equivalents to all these
- methods, then we wouldn't be wasting our time making up new neural net
- algorithms and architectures. Well, some of us might, as more
- biologically plausible or more easily parallelizable versions of existing
- algortihms, but the field would be much less exciting than it is now, since
- many of us think we're developing algorithms with new powers.
-
- The real difference is that these neural net models are able to work with a
- different (richer, generally more powerful) set of architectures than the
- statisticians have been able to use in the past. We are trying to find an
- optimal set of parameters within some model, but the model may involve
- cascaded nonlinearities (as in backprop or cascor) or recurrent information
- flow and time-dependencies (as in Boltzmann, recurrent cascor,
- Williams-Zipser, etc.). Furthermore, the number, type, and topology of the
- basis functions and the number of parameters in the model might themselves
- be variable and subject to learning, as in cascor. (A few statistical
- methods, such as MARS, also explore the space of model sizes dynamically.)
-
- Neural nets, then, allow you to fit these more complex models to the data.
- Usually the tuning of the parameters is by some sort of procedure that
- computes or approximates some gradient of an error measure for each weight,
- dE/dw and tries to reduce it iteratively -- it's rare in these kinds of
- models to be able to solve for the optimal parmaeter set more directly.
- the iterative reduction might be simple gradient descent, or it might be
- some accelerated method like conjugate gradient or a pseudo-Newton method.
- The error functions might be minimization of sum-squared difference at the
- output, or something more complex.
-
- So the parameter estimation methods used in neural nets will look familiar
- and perhaps rather primitive to a staticstician. It is the richness of the
- available set of models that gives us some new power. Whether this
- richness buys you anyhting depends on the problem. If a plane drawn
- through a multi-D parameter space is what you need, traditional statistical
- methods win. You can often get some guarantee of optimality, but that's
- optimality within the model framework your statistical method handles, and
- with the number of parameters you have chosen. In a number of more complex
- situations, neural nets have been shown to give superior generalization.
- If you want to think of this as just a new, nonlinear branch of statistics,
- feel free.
-
- -- Scott
-
- ===========================================================================
- Scott E. Fahlman Internet: sef+@cs.cmu.edu
- Senior Research Scientist Phone: 412 268-2575
- School of Computer Science Fax: 412 681-5739
- Carnegie Mellon University Latitude: 40:26:33 N
- 5000 Forbes Avenue Longitude: 79:56:48 W
- Pittsburgh, PA 15213
- ===========================================================================
-
-