NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / ai / neuraln / 5007 < prev next >

Wrap

Text File | 1993-01-28 | 4.5 KB | 84 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!munnari.oz.au!spool.mu.edu!agate!usenet.ins.cwru.edu!magnus.acs.ohio-state.edu!cis.ohio-state.edu!news.sei.cmu.edu!bb3.andrew.cmu.edu!crabapple.srv.cs.cmu.edu!news From: sef@sef-pmax.slisp.cs.cmu.edu Subject: Re: Help, statistics and learning algorithms Message-ID: <C1IMz4.MtH.1@cs.cmu.edu> Sender: news@cs.cmu.edu (Usenet News System) Nntp-Posting-Host: sef-pmax.slisp.cs.cmu.edu Organization: School of Computer Science, Carnegie Mellon Date: Wed, 27 Jan 1993 14:01:46 GMT Lines: 72 From: tstanley@lamar.ColoState.EDU (Thomas R Stanley) My question is, what are the statistical equivalents of the supervised learning methods commonly used in constructing neural nets? More precisely, are there statistical equivalents to the following (see, I did check the FAQ first :) learning methods: ... I found one reference that said BP was equivalent to least squares fitting. Is this true of the rest of the methods? If these methods are really just variations on classical statistical procedures, then (and here is where I show my ignorance) what do I have to gain by using neural nets? Why should I expect these procedures to perform better than a parametric method (let's assume for the sake of argument the assumptions of the method (e.g. normality) are met) where there exists maximum likelihood estimators (MLE's) for the parameters (i.e. weights at the processing element or node)? MLE's guarantee unbiasedness and efficiency (minimum variance), you can't get better than that can you? Very brief answer: If there were exact statistical equivalents to all these methods, then we wouldn't be wasting our time making up new neural net algorithms and architectures. Well, some of us might, as more biologically plausible or more easily parallelizable versions of existing algortihms, but the field would be much less exciting than it is now, since many of us think we're developing algorithms with new powers. The real difference is that these neural net models are able to work with a different (richer, generally more powerful) set of architectures than the statisticians have been able to use in the past. We are trying to find an optimal set of parameters within some model, but the model may involve cascaded nonlinearities (as in backprop or cascor) or recurrent information flow and time-dependencies (as in Boltzmann, recurrent cascor, Williams-Zipser, etc.). Furthermore, the number, type, and topology of the basis functions and the number of parameters in the model might themselves be variable and subject to learning, as in cascor. (A few statistical methods, such as MARS, also explore the space of model sizes dynamically.) Neural nets, then, allow you to fit these more complex models to the data. Usually the tuning of the parameters is by some sort of procedure that computes or approximates some gradient of an error measure for each weight, dE/dw and tries to reduce it iteratively -- it's rare in these kinds of models to be able to solve for the optimal parmaeter set more directly. the iterative reduction might be simple gradient descent, or it might be some accelerated method like conjugate gradient or a pseudo-Newton method. The error functions might be minimization of sum-squared difference at the output, or something more complex. So the parameter estimation methods used in neural nets will look familiar and perhaps rather primitive to a staticstician. It is the richness of the available set of models that gives us some new power. Whether this richness buys you anyhting depends on the problem. If a plane drawn through a multi-D parameter space is what you need, traditional statistical methods win. You can often get some guarantee of optimality, but that's optimality within the model framework your statistical method handles, and with the number of parameters you have chosen. In a number of more complex situations, neural nets have been shown to give superior generalization. If you want to think of this as just a new, nonlinear branch of statistics, feel free. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+@cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================