home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.ai:4680 sci.math.stat:2661
- Newsgroups: comp.ai,sci.math.stat
- Path: sparky!uunet!statsci!almond
- From: almond@statsci.com (Russell G. Almond)
- Subject: Re: Learning from subjective data
- In-Reply-To: bharat@cs.uiuc.edu's message of Thu, 17 Dec 1992 06:45:25 GMT
- Message-ID: <ALMOND.92Dec21220007@bass.statsci.com>
- Sender: usenet@statsci.com (Usenet News Account)
- Organization: Statistical Sciences, Inc., Seattle, WA USA
- References: <BzE5G3.Hoq@ux1.cso.uiuc.edu>
- Date: Tue, 22 Dec 1992 06:00:07 GMT
- Lines: 87
-
-
- R. Bharat Rao (bharat@cs.uiuc.edu) writes:
- > I was wondering if anyone knew of any work that has been done on
- > learning from subjective data. For instance, you may have a data set
- > of events with a number of independent attribute (x1...xn) and a
- > single dependent attribute y. However, y is a subjective rating.
-
- > For instance, the event could be a work of art and the x's could be
- > various nominal/real-valued attribues of the painting. Then paintings
- > in the datasets would be given a grade on (say) beauty (the "y"
- > attribute) by a number of different "experts" (whose notions of
- > good/bad/indifferent obviously vary wildly). Each painting would be
- > rated only once by a randomly chosen expert (from an arbitrarily large
- > pool of experts -- perhaps even a different expert for every
- > painting).
-
- This is generally a messy problem and I don't know that there has ever
- been a definative answer. I would, however, try the Psych--Stat
- literature, especially a graduate text intended for Psych majors.
- They run into this problem very frequently and are have some standard
- methods for dealing with it. There is probably a local guru in the
- Psych department who shows all the grad students how to do their
- statistical analysis, that would be a good person to start with.
-
- Generally speaking, the problem with one expert doing the rating is
- much easier than with many experts doing the rating. At least we have
- some hope that a single expert is self-consistent; that is not likely
- to be true with multiple experts. Achieving agreement among experts
- is a difficult problem.
-
- Some approaches I can think of off the top of my head.
-
- 1) Assume its normal and the hell with it.
- This approach is better than you might think at first if you have a
- moderately large data set. The CLT generally gives you fairly quick
- convergence for data of this sort. You need a couple of repetitions
- of your independent variables from your dependent variables to be safe
- here.
-
- 2) Make a contingency table and fit a "log-linear" model.
- If you have enough data that you can count the number of pieces with
- characteristic vector X which got rating Y, then you can build a
- contingency table. If your data is very sparse (lots of ones and
- zeros) then you can't use this technique. On the other hand, if it
- works, you can wind up with a "graphical model" not like the
- undirected model used by Laurizten and Spiegelhalter[1988]. Usually
- this is referred to as a "Generalized Linear Model", and it is
- supported by most of the larger Stat packages including S/S-PLUS. In
- S, the rating variable Y, would be called an "order factor" and there
- are specialized methods for including them in log-linear models.
-
- 3) Using rankings instead of ratings.
- There are a large number of statistical techniques which use rank
- orderings instead of absolute numbers.
-
- 4) Multinomial regression (Logistic regression).
- Properly speaking, your rating variables follows multinomial
- distribution whose parameter is the vector of probabilities for
- achieving each of the ranks. This vector of probabilities
- depends on the indicators X. The problem is that the software to fit
- the multinomial directly is fairly tough to come by. Software to fit
- the binomial (logistic regression) is much easier to find. Thus you
- may need to break your ranks up into several binary decisions: (e.g.,
- Y=1? Y=2? Y=3? Y=4?, or Y>3? Y=4? Y=2? Y=1? for Y in {1,2,3,4,5}).
-
- You might also try looking under "rating" in a statistical literature
- database such as Current Index to Statistics.
-
-
- Russell Almond
- Statistical Sciences, Inc. U. Washington
- 1700 Westlake Ave., N Suite 500 Statistics, GN-22
- Seattle, WA 98109 Seattle, WA 98195
- (206) 283-8802
- almond@statsci.com almond@stat.washington.edu
-
- ...From the brow of Russell Almond
- Gone was every trace of reason,
- As the fog from off the water,
- As pollution from the freeway.
-
- --From the "Song of Russell Almond" part of the Tree of
- Cliques cycle.
-
-
-
-
-