The best practice I'm aware of for handling equality calculations
with Floats is avoid them completely.
I do wish that more people would read "What Every Computer Scientist
Should Know About Floating-Point Arithmetic". There may be a mistake
or too in that title, but you'll certainly find the paper around on
the Web. and Sun used to make a habit of shipping it with their
compilers. This is not directed at Robert Jarvis, but at his audience.
Floating-point equality tests are in fact perfectly well behaved (in
the absence of NaN, sigh). More than that, when the operands and
result are integers in the range -(2**53 - 1) .. +(2**53-1) held as
IEEE 754 double precision numbers, addition, subtraction,
multiplication, remainder() -- hence also division via rint((x -
remainder(x,y))/y) -- and comparison are EXACT.
If you want to test whether a number x is a positive infinity, then x
== Inf is the best way to do it.
There are plenty of examples where floating-point equality is exactly
the right thing to do. Any blanket ban on floating point equality is
too strict.
The problem is not equality. The problem is that floating-point
arithmetic is BINARY, not decimal, and it's APPROXIMATE, not exact. It
just plain doesn't do what people expect. With the possible exception
of absolute value and unary minus, there is NO floating-point
operation which does what a naive user would expect. And while IEEE
floating-point is bizarre, it isn't outright broken like many of the
hardware floating-point systems that preceded it.
What I'd really like to get my hands on is the decimal floating-point
arithmetic in the revised IEEE standard. *That's* the arithmetic you
want for a spreadsheet. That's the arithmetic you want if people are
not to be tripped up by base 2 -vs- base 10. In fact you *can* get
your hands on a software implementation if you know where to look, but
wouldn't it be nice to have it going at full hardware speed?
You should establish what you consider to be an acceptable epsilon
value based on your understanding of your data and use it as follows:
It's not just your data you have to understand; more generally it is
your algorithm. Anyone who understands them well enough to choose a
good epsilon already knows how to do the fuzzy comparisons.
maxEpsilon = 0.000001. . . . (f1 - f2) abs < maxEpsilon ifTrue:
["f1 and f2 are approximately equal"] ifFalse: ["f1 and f2 are not
approximately equal"]
Urk. Absolute tolerances seldom work very well. See Knuth, The Art of
Computer PRogramming, Volume 2 "Seminumerical Algorithms" for a
thorough discussion of "fuzzy" floating-point comparison.
The really nasty thing about fuzzy comparisons is that they aren't
transitive: (x fuzzyEquals: y) and: [y fuzzyEquals: z] does NOT imply
x fuzzyEquals: z. And yes, I *have* known programs (in APL and in IBM
Prolog) go wrong because their programmers didn't really understand
that they were getting fuzzy comparison and/or didn't appreciate the
consequences. (What Robert Jarvis is recommending is *explicitly*
doing fuzzy comparison with a *specifically* chosen *local* tolerance,
not implicit fuzzy comparison with a *global* tolerance. So it should
be less risky.)
Do not under any circumstances use floating point numbers in
financial calculations. Floats are imprecise, often only
approximate, and utterly inappropriate for any calculation where all
the fiddly little decimal places really count.
Except decimal floats. Addition, subtraction, and multiplication of
in-range numbers stated in decimal with in-range results are *exact*.
We *really* want the new IEEE standard, don't we?
I wish I understood the ANSI Smalltalk 'ScaledDecimal' interface a bit
better. I'm not sure I believe all of what I think I do understand.
And of course *some* financial calculations are *supposed* to be
approximate. The thing is, as always 'know what you are doing'.
>This does not require the use of a Float. In Smalltalk I'd use
either a >Fraction or a ScaledDecimal.
Unfortunately, there is no ScaledDecimal class in Squeak. I do have an
implementation of ScaledDecimal I wrote for another Smalltalk, but it
would require some work to fit with the traditional double dispatch,
and I am wary about changing the compiler to recognise ScaledDecimals.
Above all, I'm not sure I've interpreted the standard correctly.
--
Some good news for Richard: ScaledDecimal was added to Squeak 3.7.
(which is now being released)
--
What Every Computer Scientist Should Know About Floating-Point
Arithmetic http://docs.sun.com/source/806-3568/ncg_goldberg.html
I wrote:
> Except decimal floats. Addition, subtraction, and multiplication of
> in-range numbers stated in decimal with in-range results are *exact*.
> We *really* want the new IEEE standard, don't we?
From some other comemnts I have seen, I should emphasise that the decimal
arithmetic in the new IEEE standard is *not* based on BCD, and should be
able to go nearly as fast as binary arithmetic. It has a slightly greater
range than IEEE binary arithmetic, and really, the only defect it has
relative to IEEE binary arithmetic is the 'wobbling precision' problem that
affects any base other 2. Given the kluges Microsoft have layered on top
of IEEE binary arithmetic, corrupting the results of Excel calculations to
avoid surprising the naive, the sooner we get decent decimal arithmetic in
our hardware the better for everyone using spreadsheets. Less surprise
with *no* fiddling with the results of the hardware.
"Jarvis, Robert P. (Bob) (Contingent)" <bob.jarvis at timken.com> wrote:
While it certainly sounds like an improvement, I think this is more of a
band-aid than a fix. As soon as you exceed the precision of a floating
point number, be it binary or decimal, you have the situation where a
comparison like
f = f + 1.0
answers 'true', at which point the auditors and lawyers will want to have a
word.
You need to remember that not everyone is an American.
This country has 4 million people (including infants).
Let them each have 20 million dollars (please! start with me!).
Keep all money accounts in cents.
The sum fits in 53 bits; this amount can be represented exactly in
IEEE arithmetic.
It is true that not ALL financial calculations can be done using
floating point.
It is also true that if your language supports 64-bit integer arithmetic
(and COBOL has long had a minimum requirement of 18 decimal digits, and
C these days has 'long long') you get an even bigger accurate range by
using integers than floats.
Many financial calculations *MAY* be done safely using floating point.
As I said before, there are more than a few calculations where
the use of floating point will give the auditors and lawyers nothing
to complain of. Their concern is that you should keep accurate records
of *actual* sums of money which have passed through your hands. However,
projections, forecasts, statistical analyses, data mining, all that kind
of stuff may legitimately use approximations.
My choice, if I was given one, would be to get rid of floating
point entirely and replace it with some form of
unlimited-precision scaled decimal.
Horses for courses. I don't know if you've ever tried doing any serious
matrix calculations in a symbolic algebra package like Macsyma or
Reduce, but if you haven't you'd be *amazed* how fast
unlimited-precision arithmetic can fill your memory up. There are
calculations where you have exact inputs and exact algorithms so exact
results are appropriate. There are also calculations where you have
approximate inputs and approximate algorithms so that approximate
results are appropriate. Smalltalk is one of the few languages that
offers BOTH. (Java of course has java.math.BigInteger and
java.math.BigDecimal, corresponding to Large{Positive,Negative}Integer
and ScaledDecimal, more or less. There doesn't seem to be anything like
Fraction, though.)
--
Quoting http://www2.hursley.ibm.com/decimal/ :
"Most computers today support binary floating-point in hardware. While
suitable for many purposes, binary floating-point arithmetic should not
be used for financial, commercial, and user-centric applications or web
services because the decimal data used in these applications cannot be
represented exactly using binary floating-point. (See the Frequently
Asked Questions pages for more explanation and examples.)
The problems of binary floating-point can be avoided by using base 10
(decimal) exponents and preserving those exponents where possible. This
site describes a decimal arithmetic which achieves the necessary results
and is suitable for both hardware and software implementation. It brings
together the relevant concepts from a number of ANSI, IEEE, ECMA, and ISO
standards, and conforms to the proposed decimal formats and arithmetic in
the current draft of the ongoing IEEE 754 revision.
Notably, a single data type can be used for integer, fixed-point, and
floating-point decimal arithmetic, and the design permits compatible
fixed-size and arbitrary-precision implementations. For the background
and rationale for the design, see Decimal Floating-Point: Algorism for
Computers, Cowlishaw, 2003, 16th IEEE Symposium on Computer Arithmetic."