home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!olivea!spool.mu.edu!yale.edu!jvnc.net!rutgers!concert!duke!news.duke.edu!duke.edu!feh
- From: feh@duke.edu (Frank Harrell)
- Newsgroups: bit.listserv.stat-l
- Subject: Re: qualitative principal components
- Summary: Discussion of qualitative principal components etc.
- Keywords: scaling
- Message-ID: <8299@news.duke.edu>
- Date: 2 Jan 93 01:33:03 GMT
- References: <8289@news.duke.edu> <C06AFp.8oI@prd.co.uk>
- Sender: news@news.duke.edu
- Lines: 59
- Nntp-Posting-Host: biostat.mc.duke.edu
-
-
- Steve Blinkhorn has a good idea in opening discussion on this and
- related scaling methods. Let me kick it off with an example.
-
- For this example, SAS's PROC PRINQUAL would not converge, but a simple
- approach resulted in excellent data reduction (to 1 d.f. for regression)
- and a powerful predictor. In a 4000 patient dataset, I have data on
- 12 physiologic variables (heart rate, blood pressure, etc.). Many of
- these variables have a normal range, so their relationship with
- patient risk is very non-monotonic. There are also several binary,
- and ordinal variables I wish to consider.
-
- The first principal component (PC) of all the physiologic measures
- had a chi-square of 450 in predicting time until death using a Cox model.
- I fit a least squares regression for each physiologic var., predicting
- the first PC. To not assume a shape for the regression, I used
- restricted cubic spline functions with 5 knots. The prediction of the
- first PC 1 yielded a first approximation to the transformation for
- each variable. Once all were transformed, these new transformed variables
- were used to derive a new PC 1. This new PC 1 had a chi-square of 600
- for predicting death. More importantly, the derived transformations looked
- amazingly like the variables' risk relationships if I peeked at the
- dependent variable. I iterated this procedure 4 times, getting the
- first PC to have a chi-square of about 700, and getting better
- transformations. [We normally would not be looking at these chi-squares
- since we want to do data reduction without examining Y, but it's
- informative to do so for this example.]
-
- If I considered polytomous variables, I would estimate transformations
- by getting cell means of PC 1. For ordinal predictors, I would estimate
- their re-scalings by isotonic regression on PC 1.
-
- This dataset is ideal for such data reduction because even though
- many transformations are U-shaped, the very first PC 1 derived
- captured important patient severity of illness information that
- could be used to scale the variables.
-
- I would be interested in this procedure has already been described
- in the literature, and if anyone has a better method. I would
- also be interested in readers' reaction to including dummy variables
- when deriving PC 1, using the correlation matrix to derive the PCs.
- The theory isn't there, but it seems to work well most of the time.
- SAS PRINQUAL's approach, I think, is more general, because it
- tries to predict each variable from the best linear combination
- of transformations of all the others. But I think this procedure
- is more prone to non-convergence and convergence to silly scores.
- In one example, age was transformed sensibly except that the lowest
- age in the dataset (one year lower than the next lowest) was transformed
- to a value way off the scale of the rest of the ages. This MAY have
- been because I was using non-linear-tail restricted cubic splines,
- and ordinary cubic splines have some tail difficulties.
-
- Comments to stat-l welcomed.
-
- --
- ----------------------------------------------------------------------------
- Frank E Harrell Jr feh@biostat.mc.duke.edu
- Associate Professor of Biostatistics
- Division of Biometry Duke University Medical Center
-