NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / programm / 3557 < prev next >

Wrap

Internet Message Format | 1993-01-23 | 2.5 KB

Path: sparky!uunet!spool.mu.edu!yale.edu!ira.uka.de!sbusol.rz.uni-sb.de!coli.uni-sb.de!coli-gate.coli.uni-sb.de!spackman From: spackman@disco-sol.dfki.uni-sb.de (Stephen Spackman) Newsgroups: comp.programming Subject: Re: hashing floating-point numbers Date: 23 Jan 1993 16:16:56 GMT Organization: DFKI Saarbruecken GmbH, D-W 6600 Saarbruecken Lines: 42 Message-ID: <SPACKMAN.93Jan23172204@disco-sol.dfki.uni-sb.de> References: <1993Jan22.175723.17245@chpc.utexas.edu> Reply-To: stephen@acm.org NNTP-Posting-Host: disco-sol.dfki.uni-sb.de In-reply-to: pefv700@chpc.utexas.edu's message of Fri, 22 Jan 93 17:57:23 GMT In article <1993Jan22.175723.17245@chpc.utexas.edu> pefv700@chpc.utexas.edu (Christopher Phillips) writes: |What are the usual methods for hashing floating-point numbers |given that numbers "equivalent" to a certain precision should |hash to the same value? If you think about the equivalence classes that you have in mind for a moment, you'll realise that this probably isn't a good idea anyway. Consider the following algorithm for your problem (which isn't usably efficient and doesn't have good statistical properties, but shows the problem up neatly): problematicFloatHash [ (epsilon :: Float), .. the difference defining "equivalence" (hash :: UninterpretedBinary(size Float) -> Hash), .. an arbitrary hash function for the bits (x :: Float) .. the input value ] : .if x = x + epsilon .then hash x .else problematicFloatHash[epsilon, hash, x + epsilon] . Basically all numbers with an exponent small enough that they can represent a difference of your epsilon will have to hash to the same value; if you choose a value for epsilon like 10^6, and you have, say, 48 bits in your mantissa, all numbers absolutely less than something like 2^48/10^6 ~ 2^28 will get folded together! The problem is much worse if the error is expressed in relative terms: quite likely this model will be continuous, and *all* numbers will have to hash to the same value. More likely what you want to do is round the things to a multiple of your bucket size *before* hashing them, and make sure that the context of use of your algorithm isn't sensitive to off-by-one errors in choice of bucket. (Of course, that's what fixed-size representations are all about). ---------------------------------------------------------------------- stephen p spackman +49 681 302 5288(o) 5282(sec) stephen@acm.org dfki / stuhlsatzenhausweg 3 / d-w-6600 saarbruecken 11 / germany ----------------------------------------------------------------------