NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / lang / pascal / 7759 < prev next >

Wrap

Text File | 1992-12-30 | 1.7 KB | 50 lines

Newsgroups: comp.lang.pascal Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!torn!news.ccs.queensu.ca!slip206.telnet1.QueensU.CA!dmurdoch From: dmurdoch@mast.queensu.ca (Duncan Murdoch) Subject: Re: Correlation Coefficient Message-ID: <dmurdoch.270.725775983@mast.queensu.ca> Lines: 38 Sender: news@knot.ccs.queensu.ca (Netnews control) Organization: Queen's University References: <92365.223423BOYDJ@QUCDN.QueensU.CA> Date: Thu, 31 Dec 1992 04:26:24 GMT In article <92365.223423BOYDJ@QUCDN.QueensU.CA> Jeff Boyd <BOYDJ@QUCDN.QueensU.CA> writes: >Let > n be the number of data pairs > Sx = sum of {x} > Sy = sum of {y} > Sxx = sum of {x^2} > Syy = sum of {y^2} > Sxy = sum of {xy} > >Then > n(Sxy) - (Sx)(Sy) > r = --------------------------------------------- > sqrt[n(Sxx)-(Sx)(Sx)] * sqrt[n(Syy)-(Sy)(Sy)] > >This is the formula of choice for simple computations. That's a fast way to calculate the correlation coefficient, but it's probably not the best, because it can be very inaccurate. It's badly subject to rounding error in the three subtractions, because you're possibly subtracting huge numbers from other nearly identical huge numbers. A more stable and accurate way to calculate it is using two passes through the data. On the first pass, calculate Xbar = Sx/n and Ybar = Sy/n. Then calculate the "corrected" sums of squares CSxx = sum of {(x-Xbar)^2} CSyy = sum of {(y-Ybar)^2} CSxy = sum of {(x-Xbar)(y-Ybar)} and finally r = CSxy / sqrt(CSxx * CSyy) If two passes are impossible, there are ways to calculate the corrected sums of squares in a single pass. I think they share the stability of this calculation, but I can't remember the formulas right now. Duncan Murdoch