NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / ai / 4680 < prev next >

Wrap

Internet Message Format | 1992-12-22 | 4.4 KB

Xref: sparky comp.ai:4680 sci.math.stat:2661 Newsgroups: comp.ai,sci.math.stat Path: sparky!uunet!statsci!almond From: almond@statsci.com (Russell G. Almond) Subject: Re: Learning from subjective data In-Reply-To: bharat@cs.uiuc.edu's message of Thu, 17 Dec 1992 06:45:25 GMT Message-ID: <ALMOND.92Dec21220007@bass.statsci.com> Sender: usenet@statsci.com (Usenet News Account) Organization: Statistical Sciences, Inc., Seattle, WA USA References: <BzE5G3.Hoq@ux1.cso.uiuc.edu> Date: Tue, 22 Dec 1992 06:00:07 GMT Lines: 87 R. Bharat Rao (bharat@cs.uiuc.edu) writes: > I was wondering if anyone knew of any work that has been done on > learning from subjective data. For instance, you may have a data set > of events with a number of independent attribute (x1...xn) and a > single dependent attribute y. However, y is a subjective rating. > For instance, the event could be a work of art and the x's could be > various nominal/real-valued attribues of the painting. Then paintings > in the datasets would be given a grade on (say) beauty (the "y" > attribute) by a number of different "experts" (whose notions of > good/bad/indifferent obviously vary wildly). Each painting would be > rated only once by a randomly chosen expert (from an arbitrarily large > pool of experts -- perhaps even a different expert for every > painting). This is generally a messy problem and I don't know that there has ever been a definative answer. I would, however, try the Psych--Stat literature, especially a graduate text intended for Psych majors. They run into this problem very frequently and are have some standard methods for dealing with it. There is probably a local guru in the Psych department who shows all the grad students how to do their statistical analysis, that would be a good person to start with. Generally speaking, the problem with one expert doing the rating is much easier than with many experts doing the rating. At least we have some hope that a single expert is self-consistent; that is not likely to be true with multiple experts. Achieving agreement among experts is a difficult problem. Some approaches I can think of off the top of my head. 1) Assume its normal and the hell with it. This approach is better than you might think at first if you have a moderately large data set. The CLT generally gives you fairly quick convergence for data of this sort. You need a couple of repetitions of your independent variables from your dependent variables to be safe here. 2) Make a contingency table and fit a "log-linear" model. If you have enough data that you can count the number of pieces with characteristic vector X which got rating Y, then you can build a contingency table. If your data is very sparse (lots of ones and zeros) then you can't use this technique. On the other hand, if it works, you can wind up with a "graphical model" not like the undirected model used by Laurizten and Spiegelhalter[1988]. Usually this is referred to as a "Generalized Linear Model", and it is supported by most of the larger Stat packages including S/S-PLUS. In S, the rating variable Y, would be called an "order factor" and there are specialized methods for including them in log-linear models. 3) Using rankings instead of ratings. There are a large number of statistical techniques which use rank orderings instead of absolute numbers. 4) Multinomial regression (Logistic regression). Properly speaking, your rating variables follows multinomial distribution whose parameter is the vector of probabilities for achieving each of the ranks. This vector of probabilities depends on the indicators X. The problem is that the software to fit the multinomial directly is fairly tough to come by. Software to fit the binomial (logistic regression) is much easier to find. Thus you may need to break your ranks up into several binary decisions: (e.g., Y=1? Y=2? Y=3? Y=4?, or Y>3? Y=4? Y=2? Y=1? for Y in {1,2,3,4,5}). You might also try looking under "rating" in a statistical literature database such as Current Index to Statistics. Russell Almond Statistical Sciences, Inc. U. Washington 1700 Westlake Ave., N Suite 500 Statistics, GN-22 Seattle, WA 98109 Seattle, WA 98195 (206) 283-8802 almond@statsci.com almond@stat.washington.edu ...From the brow of Russell Almond Gone was every trace of reason, As the fog from off the water, As pollution from the freeway. --From the "Song of Russell Almond" part of the Tree of Cliques cycle.