NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / ai / 4319 < prev next >

Wrap

Text File | 1992-11-17 | 3.1 KB | 57 lines

Newsgroups: comp.ai Path: sparky!uunet!charon.amdahl.com!pacbell.com!sgiblab!spool.mu.edu!news.nd.edu!mentor.cc.purdue.edu!pop.stat.purdue.edu!hrubin From: hrubin@pop.stat.purdue.edu (Herman Rubin) Subject: Re: Looking for Good Intro to Bayesian Classifiers Message-ID: <BxvGts.FLr@mentor.cc.purdue.edu> Sender: news@mentor.cc.purdue.edu (USENET News) Organization: Purdue University Statistics Department References: <19293@ucdavis.ucdavis.edu> Distribution: usa Date: Tue, 17 Nov 1992 18:03:27 GMT Lines: 44 In article <19293@ucdavis.ucdavis.edu> f175003@wilma.ucdavis.edu writes: >I am looking for a book or paper with a good introduction into using >Bayesian Classifiers preferably with a few examples showing how to >determine the probabilities. I understand the idea behind a Bayesian >Classifier, but I have looked at a couple of examples of Bayesian >Classifiers and they appeared to use maximum likelihood estimators to >determine the values for the various probabilities. According to my >statistics book, this method provides excellent estimates if the sample >size is large, but in the cases I hope to be dealing with I will only >have a few samples and want to get as much information from them as I can. >There has also been some discussion here about whether >there exist some values for the probabilities of the classes and the >probabilities of the various attributes given the class, which tells >you as much as possible from the given samples. The methods I have >seen require a prior distribution and show you how to determine the >posterior distribution after seeing each example. One then chooses a value >from this distribution using their favorite function, such as mean, median, >or mode, or equivalently by using some loss function. It seems that their >is no value for the probabilities which will give you as much >information as possible about the samples without chosing some >type of loss function. Is this correct and is there some way to >prove if it is or is not correct? To do Bayesian statistics "correctly" requires that one sets up a prior measure on the collection of states of nature and a loss function, and then uses that procedure which minimizes the expected loss. Now it is a physical impossibility to do this accurately. Fortunately, there are robustness results which point out that certain approximations may do a good job in certain situations. Using maximum likelihood estimators to approximate parameters may or may not be robust. With a sufficiently large sample size, one can always do a better job. But the largest sample size is not any better than knowing the distribution. The prior measure and the loss function come from the user, and there is no dictum which can be given as to how to set these up. Now there are packages on the market which claim to do this for the user with little input; they should be used only with the greatest of suspicion. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@snap.stat.purdue.edu (Internet, bitnet) {purdue,pur-ee}!snap.stat!hrubin(UUCP)