home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!spool.mu.edu!yale.edu!ira.uka.de!sbusol.rz.uni-sb.de!coli.uni-sb.de!coli-gate.coli.uni-sb.de!spackman
- From: spackman@disco-sol.dfki.uni-sb.de (Stephen Spackman)
- Newsgroups: comp.programming
- Subject: Re: hashing floating-point numbers
- Date: 23 Jan 1993 16:16:56 GMT
- Organization: DFKI Saarbruecken GmbH, D-W 6600 Saarbruecken
- Lines: 42
- Message-ID: <SPACKMAN.93Jan23172204@disco-sol.dfki.uni-sb.de>
- References: <1993Jan22.175723.17245@chpc.utexas.edu>
- Reply-To: stephen@acm.org
- NNTP-Posting-Host: disco-sol.dfki.uni-sb.de
- In-reply-to: pefv700@chpc.utexas.edu's message of Fri, 22 Jan 93 17:57:23 GMT
-
- In article <1993Jan22.175723.17245@chpc.utexas.edu> pefv700@chpc.utexas.edu (Christopher Phillips) writes:
-
- |What are the usual methods for hashing floating-point numbers
- |given that numbers "equivalent" to a certain precision should
- |hash to the same value?
-
- If you think about the equivalence classes that you have in mind for a
- moment, you'll realise that this probably isn't a good idea anyway.
- Consider the following algorithm for your problem (which isn't usably
- efficient and doesn't have good statistical properties, but shows the
- problem up neatly):
-
- problematicFloatHash
- [ (epsilon :: Float), .. the difference defining "equivalence"
- (hash :: UninterpretedBinary(size Float) -> Hash),
- .. an arbitrary hash function for the bits
- (x :: Float) .. the input value
- ]
- : .if x = x + epsilon
- .then hash x
- .else problematicFloatHash[epsilon, hash, x + epsilon]
- .
-
- Basically all numbers with an exponent small enough that they can
- represent a difference of your epsilon will have to hash to the same
- value; if you choose a value for epsilon like 10^6, and you have, say,
- 48 bits in your mantissa, all numbers absolutely less than something
- like 2^48/10^6 ~ 2^28 will get folded together!
-
- The problem is much worse if the error is expressed in relative terms:
- quite likely this model will be continuous, and *all* numbers will have
- to hash to the same value.
-
- More likely what you want to do is round the things to a multiple of
- your bucket size *before* hashing them, and make sure that the context
- of use of your algorithm isn't sensitive to off-by-one errors in choice
- of bucket. (Of course, that's what fixed-size representations are all
- about).
- ----------------------------------------------------------------------
- stephen p spackman +49 681 302 5288(o) 5282(sec) stephen@acm.org
- dfki / stuhlsatzenhausweg 3 / d-w-6600 saarbruecken 11 / germany
- ----------------------------------------------------------------------
-