NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / sci / crypt / 7154 < prev next >

Wrap

Internet Message Format | 1993-01-26 | 7.6 KB

Xref: sparky sci.crypt:7154 alt.security.pgp:602 Path: sparky!uunet!olivea!decwrl!waikato.ac.nz!comp.vuw.ac.nz!cavebbs!tornado!sai From: sai@tornado.welly.gen.nz (Simon McAuliffe) Newsgroups: sci.crypt,alt.security.pgp Subject: The Sidelnikov Saga... Now he's wrong again. Message-ID: <D0F1XB1w165w@tornado.welly.gen.nz> Date: 26 Jan 93 11:52:00 GMT Organization: Tornado BBS, Wellington, New Zealand. Lines: 193 A while ago, I wrote: > It looks like _THIS_IS_ what Dr. Sidelnikov also did, and it means one > of three things: (ranked in order of likelihood as I see it) > > 1) Both Dr. Sidelnikov and I stuffed up. > > 2) This correlation test isn't appropriate for this situation. > > 3) Something funny is going on and either the RNG is badly > implemented or IDEA has a problem. > > Number 1 sounds like the most likely and number 3 is just plain scary. After looking at things more closely, I have decided that both number 1 and number 2 are true. This is not to say that a correlation test is not suitable, but that the correlation test performed was meaningless. The correlation formula posted recently makes good sense, but looking at the formula for the mean, you can't help feeling something is wrong. 1 mu = -----, n >= 2 n - 1 Correlation is basically a fairly simple idea. | . . | . | . | . . . . | | . | . . . . . | . . | . . .| . . | . |. ----------+---------- ----------+---------- ----------+---------- . . .| | . . . . . | . . . | |.. . . | . . | | . . . | . | | . | Correlation ~1 Correlation ~-1 Correlation ~0 Very roughly and informally, when the data is plotted as above correlated data tends to form fuzzy lines. A positive gradient gives a positive correlation and a negative gradient gives a negative correlation. In our situation the correlation is close to 0, we are just trying to decide if it is close enough. Okay, back to the formula for the mean. Substituting n = 2 into the equation yields mu = 1. Really?! Lets now imagine placing two points on our graph. The correlation measures how good a line forms between all the points. With only two points there is a perfect line. Given any two random (continuous) points connected by a line we must either have a positive gradient or a negative gradient (assuming a probability of 0 of them being horizontally or vertically adjacent). If the gradient is positive we have a correlation of 1, and if the gradient is negative we have a correlation of -1, therefore the expected value (mean) of the correlation for n = 2 is mu = P(gradient is -ve) * -1 + P(gradient is +ve) * -1 = -0.5 + 0.5 = 0 Not only does the working above show this to be true, but intuitively you would expect random numbers to be completely uncorrelated and therefore have a correlation value of 0. Why then does the formula give a value of 1? This is the maximum possible value of correlation, how can we have a maximum as the mean?! This problem is also true for n = 3, n = 4 and upwards, it just becomes less ridiculous for larger n. Correlation values range from -1 to +1. A formula which is always positive must be a little worrying in this case. Should the correlation be biased towards positive? Why so? Why not negative? The random variables are supposedly from a uniform distribution. The mean simply _MUST_ be 0 for a truely random source. Not just "nearly random." For any given random sample, we won't actually get a value of 0, but the expected value (the mean) is _EXACTLY_ 0. If this were not true, there would never really be any random numbers. So I think we can assume that the given equation is not really what it was intended to be. I don't know what Knuth's book really says because I don't have it, but perhaps there was something lost in the posting of it? If we take the square of the correlation values, then the mean variation from 0 seems to be roughly equal to this function. This hasn't been tested of course, but there is a tendency there. Perhaps then, this is what the equations are (the mean deviation from the mean of the squared correlation). That's not the impression I got from the posting. Never mind, in any case it isn't particularly useful. I now know why my last test gave such suprising results. Time to try a different approach. We can still use the correlation data obtained earlier (see my last post). Following is a list of the correlation values I used last time. These values are still valid as they do not involve the dubious correlation mean and standard deviation. Each correlation value is taken from an independent sample of 1048576 bytes (524288 byte pairs). -0.000151 0.001282 -0.001413 0.001116 0.001013 -0.002072 0.000340 -0.001034 -0.001972 -0.000230 0.000508 -0.000425 0.002383 -0.001605 0.000506 0.002695 -0.001196 0.000450 0.000016 0.002822 We now wish to construct a hypothesis test with the null hypothesis _ H0: x = mu (ie the data is random) and the alternative _ H1: x =/= mu (the data is _not_ random) The test statistic is, _ x - mu t = ------------- s / sqrt(n) _ where the mean, x is _ x = 0.0001517 and the sample standard deviation, s is s = 0.001463 As mentioned above, the mean correlation, mu is 0 for random data. So we now have, t = 0.0001517 --------------------- 0.001463 / sqrt(20) = 0.4634 This test statistic should approximate a t distribution with n - 1 = 19 degrees of freedom. So, to test H0 for truth at the alpha = 5% level, we require that 0.4634 < t (19) 0.05/2 => 0.4634 < 2.093 Which is obviously true, so at the 5% level we do not have enough evidence to reject H0. In fact, for this test we have an observed significance level of 64.8% so we can clearly see that H0 cannot be rejected and we can safely assume that the data is indeed random[ish] (at least as far as a contiguous byte correlation test is concerned). Of course what I have done here does not show that the random numbers are "cryptographically strong" it only shows that Dr. Sidelnikov's claim of "strong correlation between contiguous bytes" is false. He probably made the same mistake I made earlier! There's nothing to say I didn't make more mistakes of course. It wouldn't suprise me at all. That's why you're here - to find my mistakes. :) There may still be correlation between other bits. I believe somebody is working on this area as we speak however. There may also be other weaknesses in IDEA which result in weak random numbers. I imagine some people are working in this area too. Anyway, I hope this is of more interest than my last balls-up. And I hope that this isn't another one... :-) Disclaimer: As I said in my last post, I am not a statistician by any stretch of the imagination. Don't take what I say as gospel. Sai (e-mail sai@tornado.welly.gen.nz) (s-mail PO Box 3598 Wellington, New Zealand)