home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Chip 2000 May
/
Chip_2000-05_cd1.bin
/
zkuste
/
Perl
/
ActivePerl-5.6.0.613.msi
/
䆊䌷䈹䈙䏵-䞅䞆䞀㡆䞃䄦䠥
/
_16d50a157a0ef6fb1e84f9ed6b6101f1
< prev
next >
Wrap
Text File
|
2000-03-23
|
5KB
|
141 lines
<HTML>
<HEAD>
<TITLE>C<Statistics::ChiSquare> - How random is your data?</TITLE>
<LINK REL="stylesheet" HREF="../../../Active.css" TYPE="text/css">
<LINK REV="made" HREF="mailto:">
</HEAD>
<BODY>
<TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
<TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
<STRONG><P CLASS=block> C<Statistics::ChiSquare> - How random is your data?</P></STRONG>
</TD></TR>
</TABLE>
<A NAME="__index__"></A>
<!-- INDEX BEGIN -->
<UL>
<LI><A HREF="#name">NAME</A></LI><LI><A HREF="#supportedplatforms">SUPPORTED PLATFORMS</A></LI>
<LI><A HREF="#synopsis">SYNOPSIS</A></LI>
<LI><A HREF="#description">DESCRIPTION</A></LI>
<LI><A HREF="#examples">EXAMPLES</A></LI>
<LI><A HREF="#author">AUTHOR</A></LI>
</UL>
<!-- INDEX END -->
<HR>
<P>
<H1><A NAME="name">NAME</A></H1>
<P><CODE>Statistics::ChiSquare</CODE> - How random is your data?</P>
<P>
<HR>
<H1><A NAME="supportedplatforms">SUPPORTED PLATFORMS</A></H1>
<UL>
<LI>Linux</LI>
<LI>Solaris</LI>
<LI>Windows</LI>
</UL>
<HR>
<H1><A NAME="synopsis">SYNOPSIS</A></H1>
<PRE>
use Statistics::Chisquare;</PRE>
<PRE>
print chisquare(@array_of_numbers);</PRE>
<P>Statistics::ChiSquare is available at a CPAN site near you.</P>
<P>
<HR>
<H1><A NAME="description">DESCRIPTION</A></H1>
<P>Suppose you flip a coin 100 times, and it turns up heads 70 times.
<EM>Is the coin fair?</EM></P>
<P>Suppose you roll a die 100 times, and it shows 30 sixes.
<EM>Is the die loaded?</EM></P>
<P>In statistics, the <STRONG>chi-square</STRONG> test calculates ``how random'' a series
of numbers is. But it doesn't simply say ``yes'' or ``no''. Instead, it
gives you a <EM>confidence interval</EM>, which sets upper and lower bounds
on the likelihood that the variation in your data is due to chance.
See the examples below.</P>
<P>If you've ever studied elementary genetics, you've probably heard
about Georg Mendel. He was a wacky Austrian botanist who discovered
(in 1865) that traits could be inherited in a predictable fashion. He
did lots of experiments with cross breeding peas: green peas, yellow
peas, smooth peas, wrinkled peas. A veritable Brave New World of legumes.</P>
<P>But Mendel faked his data. A statistician by the name of R. A. Fisher used
the chi-square test to prove it.</P>
<P>There's just one function in this module: chisquare(). Instead of
returning the bounds on the confidence interval in a tidy little
two-element array, it returns an English string. This was a deliberate
design choice---many people misinterpret chi-square results, and the
string helps clarify the meaning.</P>
<P>The string returned by <CODE>chisquare()</CODE> will always match one of these patterns:</P>
<PRE>
"There's a >\d+% chance, and a <\d+% chance, that this data is random."</PRE>
<P>or</P>
<PRE>
"There's a <\d+% chance that this data is random."</PRE>
<P>or</P>
<PRE>
"I can't handle \d+ choices without a better table."</PRE>
<P>That last one deserves a bit more explanation. The ``modern''
chi-square test uses a table of values (based on Pearson's
approximation) to avoid expensive calculations. Thanks to the table,
the <CODE>chisquare()</CODE> calculation is very fast, but there are some
collections of data it can't handle, including any collection with more
than 21 slots. So you can't calculate the randomness of a 30-sided
die.</P>
<P>
<HR>
<H1><A NAME="examples">EXAMPLES</A></H1>
<P>Imagine a coin flipped 1000 times. The most likely outcome is
500 heads and 500 tails:</P>
<PRE>
@coin = (500, 500);
print chisquare(@coin);</PRE>
<P>prints ``There's a >90% chance, and a <100% chance, that this data is random.</P>
<P>Imagine a die rolled 60 times that shows sixes just a wee bit too often.</P>
<PRE>
@die1 = (8, 7, 9, 8, 8, 20);
print chisquare(@die1);</PRE>
<P>prints ``There's a >1% chance, and a <5% chance, that this data is random.</P>
<P>Imagine a die rolled 600 times that shows sixes <STRONG>way</STRONG> too often.</P>
<PRE>
@die2 = (80, 70, 90, 80, 80, 200);
print chisquare(@die2);</PRE>
<P>prints ``There's a <1% chance that this data is random.''</P>
<P>How random is rand()?</P>
<PRE>
srand(time ^ $$);
@rands = ();
for ($i = 0; $i < 60000; $i++) {
$slot = int(rand(6));
$rands[$slot]++;
}
print "@rands\n";
print chisquare(@rands);</PRE>
<P></P>
<PRE>
prints (on my machine)</PRE>
<PRE>
10156 10041 9991 9868 10034 9910
There's a >10% chance, and a <50% chance, that this data is random.</PRE>
<P>So much for pseudorandom number generation.</P>
<P>
<HR>
<H1><A NAME="author">AUTHOR</A></H1>
<P>Jon Orwant</P>
<P>MIT Media Laboratory</P>
<P><A HREF="mailto:orwant@media.mit.edu">orwant@media.mit.edu</A></P>
<TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
<TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
<STRONG><P CLASS=block> C<Statistics::ChiSquare> - How random is your data?</P></STRONG>
</TD></TR>
</TABLE>
</BODY>
</HTML>