home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.pascal
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!torn!news.ccs.queensu.ca!slip206.telnet1.QueensU.CA!dmurdoch
- From: dmurdoch@mast.queensu.ca (Duncan Murdoch)
- Subject: Re: Correlation Coefficient
- Message-ID: <dmurdoch.270.725775983@mast.queensu.ca>
- Lines: 38
- Sender: news@knot.ccs.queensu.ca (Netnews control)
- Organization: Queen's University
- References: <92365.223423BOYDJ@QUCDN.QueensU.CA>
- Date: Thu, 31 Dec 1992 04:26:24 GMT
-
- In article <92365.223423BOYDJ@QUCDN.QueensU.CA> Jeff Boyd <BOYDJ@QUCDN.QueensU.CA> writes:
- >Let
- > n be the number of data pairs
- > Sx = sum of {x}
- > Sy = sum of {y}
- > Sxx = sum of {x^2}
- > Syy = sum of {y^2}
- > Sxy = sum of {xy}
- >
- >Then
- > n(Sxy) - (Sx)(Sy)
- > r = ---------------------------------------------
- > sqrt[n(Sxx)-(Sx)(Sx)] * sqrt[n(Syy)-(Sy)(Sy)]
- >
- >This is the formula of choice for simple computations.
-
- That's a fast way to calculate the correlation coefficient, but it's
- probably not the best, because it can be very inaccurate. It's badly
- subject to rounding error in the three subtractions, because you're
- possibly subtracting huge numbers from other nearly identical huge numbers.
-
- A more stable and accurate way to calculate it is using two passes
- through the data. On the first pass, calculate Xbar = Sx/n and Ybar = Sy/n.
- Then calculate the "corrected" sums of squares
-
- CSxx = sum of {(x-Xbar)^2}
- CSyy = sum of {(y-Ybar)^2}
- CSxy = sum of {(x-Xbar)(y-Ybar)}
-
- and finally
-
- r = CSxy / sqrt(CSxx * CSyy)
-
- If two passes are impossible, there are ways to calculate the corrected
- sums of squares in a single pass. I think they share the stability of
- this calculation, but I can't remember the formulas right now.
-
- Duncan Murdoch
-