NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / bit / listserv / statl / 2291 < prev next >

Wrap

Internet Message Format | 1993-01-01 | 3.6 KB

Path: sparky!uunet!olivea!spool.mu.edu!yale.edu!jvnc.net!rutgers!concert!duke!news.duke.edu!duke.edu!feh From: feh@duke.edu (Frank Harrell) Newsgroups: bit.listserv.stat-l Subject: Re: qualitative principal components Summary: Discussion of qualitative principal components etc. Keywords: scaling Message-ID: <8299@news.duke.edu> Date: 2 Jan 93 01:33:03 GMT References: <8289@news.duke.edu> <C06AFp.8oI@prd.co.uk> Sender: news@news.duke.edu Lines: 59 Nntp-Posting-Host: biostat.mc.duke.edu Steve Blinkhorn has a good idea in opening discussion on this and related scaling methods. Let me kick it off with an example. For this example, SAS's PROC PRINQUAL would not converge, but a simple approach resulted in excellent data reduction (to 1 d.f. for regression) and a powerful predictor. In a 4000 patient dataset, I have data on 12 physiologic variables (heart rate, blood pressure, etc.). Many of these variables have a normal range, so their relationship with patient risk is very non-monotonic. There are also several binary, and ordinal variables I wish to consider. The first principal component (PC) of all the physiologic measures had a chi-square of 450 in predicting time until death using a Cox model. I fit a least squares regression for each physiologic var., predicting the first PC. To not assume a shape for the regression, I used restricted cubic spline functions with 5 knots. The prediction of the first PC 1 yielded a first approximation to the transformation for each variable. Once all were transformed, these new transformed variables were used to derive a new PC 1. This new PC 1 had a chi-square of 600 for predicting death. More importantly, the derived transformations looked amazingly like the variables' risk relationships if I peeked at the dependent variable. I iterated this procedure 4 times, getting the first PC to have a chi-square of about 700, and getting better transformations. [We normally would not be looking at these chi-squares since we want to do data reduction without examining Y, but it's informative to do so for this example.] If I considered polytomous variables, I would estimate transformations by getting cell means of PC 1. For ordinal predictors, I would estimate their re-scalings by isotonic regression on PC 1. This dataset is ideal for such data reduction because even though many transformations are U-shaped, the very first PC 1 derived captured important patient severity of illness information that could be used to scale the variables. I would be interested in this procedure has already been described in the literature, and if anyone has a better method. I would also be interested in readers' reaction to including dummy variables when deriving PC 1, using the correlation matrix to derive the PCs. The theory isn't there, but it seems to work well most of the time. SAS PRINQUAL's approach, I think, is more general, because it tries to predict each variable from the best linear combination of transformations of all the others. But I think this procedure is more prone to non-convergence and convergence to silly scores. In one example, age was transformed sensibly except that the lowest age in the dataset (one year lower than the next lowest) was transformed to a value way off the scale of the rest of the ages. This MAY have been because I was using non-linear-tail restricted cubic splines, and ordinary cubic splines have some tail difficulties. Comments to stat-l welcomed. -- ---------------------------------------------------------------------------- Frank E Harrell Jr feh@biostat.mc.duke.edu Associate Professor of Biostatistics Division of Biometry Duke University Medical Center