Bug #46577 dtoa missuse causes printing of floats beyond their precision (thus vary)
Submitted: 6 Aug 2009 3:10 Modified: 10 Sep 2009 16:10
Reporter: Ross McFarland Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Data Types Severity:S3 (Non-critical)
Version:6.0.11-alpha OS:FreeBSD
Assigned to: Alexey Kopytov CPU Architecture:Any
Tags: regression

[6 Aug 2009 3:10] Ross McFarland
Description:
see this bug i found and recently fixed in drizzle, same problem applies here as the relevant code hasn't been modified:
    https://bugs.launchpad.net/drizzle/+bug/408694

it's a mis-use of the new dtoa function, passing in the buffer size as ndigits rather than the maximum number of digits to 'print' (e.g. precision.)

patch is the first piece of the following, the test problems may not apply.
    https://code.launchpad.net/~rwmcfa1/drizzle/fix-bug408694

-rm

How to repeat:
run the following on linux & freebsd, slightly different answers are returned. note that this was with drizzle, but the relevant code/paths are very similar/the same so it's likely the results will be. (problem is there regardless.)

select log(3,9);

on linux that returns 2.0, on freebsd it returns 2.0000000000000004

note that the 2.0...04 has more sig-digits in it than a 64-bit double can hold (~16)

Suggested fix:
dtoa.c:224 is
  res= dtoa(x, 4, type == MY_GCVT_ARG_DOUBLE ? width : min(width, FLT_DIG),
            &decpt, &sign, &end, buf, sizeof(buf));
and should be:
  res= dtoa(x, 4, type == MY_GCVT_ARG_DOUBLE ? min(width, DBL_DIG) :
            min(width, FLT_DIG), &decpt, &sign, &end, buf, sizeof(buf))
[6 Aug 2009 5:48] Sveta Smirnova
Thank you for the report.

Verified as described on FreeBSD. On Linux I get:

select log(3,9);
log(3,9)
2

Only version 6.0 is affected, bug does not exist in Azalea.
[11 Aug 2009 16:12] Konstantin Osipov
Alexey, do you have an idea why this is not repeatable in azalea?
The code should be the same.
[11 Aug 2009 16:36] Ross McFarland
what is azalea? this dtoa path is new to 6.x. < 6 will use a completely different double to string code-path. if azalea is newer than 6.0.11-alpha then i have no clue why it would fail to reproducible.
[2 Sep 2009 19:10] Liz Drachnik
Hello Ross  - 

In order for us to continue the process of reviewing your contribution to MySQL - We need
you to review and sign the Sun|MySQL contributor agreement (the "SCA")

The process is explained here: 
http://forge.mysql.com/wiki/Sun_Contributor_Agreement

Getting a signed/approved SCA on file will help us facilitate your contribution-- this
one, and others in the future.

Thank you !
[3 Sep 2009 21:47] Ross McFarland
the patch was made and applied to drizzle, it's a few character change and shouldn't be considered a contribution. someone should just repeat the change in mysql. i don't want to fool with this process and/or sign anything for something this small.
[6 Sep 2009 12:16] Alexey Kopytov
Hi Ross,

This is not a bug. We want to print DOUBLEs with more than DBL_DIG signifiant digits in some cases to ensure we get the same binary numbers when reading printed decimal ones back in. For example, see bug #21497 (which is now back in drizzle, look at the type_float.test changes in your drizzle patch).

This behavior is documented in our manual, see dtoa-related changes in http://dev.mysql.com/doc/refman/5.4/en/type-conversion.html.

DBL_DIG=15 means that converting *any* decimal number with 15 digits to a binary value and back is "safe" (we get the same string). It does not mean we have to limit *all* binary floating point numbers to 15 digits when converting to a string.

Going back to your test case with LOG(3,9) producing 2.0000000000000004 on FreeBSD, the reason is an inefficient log() implementation in FreeBSD returning a different binary number than the Linux one. dtoa() just reflects this fact. Similarly, LOG(9,3) returns 0.5 on Linux and 0.4999999999999999 on FreeBSD which does not mean we have to print all DOUBLE values with just 1 digit.

Please file a FreeBSD bug.
[7 Sep 2009 0:08] Ross McFarland
this is not a problem with freebsd's log function. it is correct to 16 significant digits which is the value of DBL_DIG, past that the value can't be relied upon. it differs from 2.0 in the 17th significant digit. i understand what you're saying and don't know whether or not it changes the behavior in other places, but it _is_not_ correct to pull more than 16 sig. digits out of a ieee double value. so this is still a bug, but the proposed fix may not be correct. it's not clear what the passed in size should be, why is it FLT_DIG for floats and somehting totally different for doubles? that doesn't seem correct. i couldn't make much sense of the dtoa code itself when i tried to dig in to it to see what was going wrong, i just found that it seemed to be passing in the wrong value in this case.
[7 Sep 2009 9:42] Alexey Kopytov
Ross,

> it is correct to 16 significant digits which is the value of DBL_DIG,

It is actually 15:

$ gcc -E -dM - < /dev/null | grep DBL_DIG
#define __DBL_DIG__ 15
#define __LDBL_DIG__ 18

> it _is_not_ correct to pull more than 16 sig.

Yes, it is. Here is a more formal definition of FLT_DIG/DBL_DIG/LDBL_DIG taken from http://www.gnu.org/s/libc/manual/html_node/Floating-Point-Parameters.html:

"
FLT_DIG
    This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.

DBL_DIG
DBL_DIG
LDBL_DIG
    These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10.
"

In no way the above definition implies that we cannot print more than DBL_DIG significant digits. In fact, in some cases we have to, as bug #21497 shows.

That misunderstanding is caused by the fact that printf() conversion _may_ produce garbage digits when passed a precision value greater than the number actually has. For example,  printf("%.17g", 0.1) prints "0.10000000000000001". Both this number and 0.1 correspond to the same ieee value (0x3fb999999999999a) when converted back to binary, so the least significant digit in this case is garbage.

dtoa() in mode 0 (which is what MySQL uses) does not have this problem. In mode 0 dtoa(d) produces the "shortest string that yields d when read in and rounded to nearest". The "shortest string" means it won't print any "garbage" past the least significant digit. The 'ndigits' argument is just an upper limit. Passing a field's width as ndigits means "no upper limit for significant digits if we have enough space".

Going back to the test case, 2 and 2.0000000000000004 correspond to two different binary numbers, 0x4000000000000000 and 0x4000000000000001 respectively. So it is correct to have distinct decimal representations in this case.

> why is it FLT_DIG for floats and somehting totally different for doubles

Passing FLT_DIG for FLOAT values is a hack I had to implement because there is no float version of dtoa(). dtoa() works with double values, so its results will not always be the shortest string that yields the same _float_ number when read in and rounded to nearest. That's why we intentionally limit float numbers to FLT_DIG digits. The downside is that we print _some_ float numbers not as precisely as we potentially could. But it's not a big problem, since in most cases if one cares about precision, he should be using DOUBLE rather than FLOAT in the first place.

Hope the above answers your quesions. Setting back to "Not a Bug".
[10 Sep 2009 16:10] Ross McFarland
why does the float case use min(width, FLT_DIG) which essentially will always be FLT_DIG then?
[10 Sep 2009 16:55] Alexey Kopytov
That's because in most cases FLOATs are handled as DOUBLEs internally. The only place where we pass MY_GCVT_ARG_FLOAT to my_gcvt() is Field_float::val_str() where we always have a string buffer of sufficient length to print all digits.

It's a separate problem (and also a source of some bugs, e.g. bug #4486). Fixing it is the goal of worklog #3977.
[10 Sep 2009 16:57] Alexey Kopytov
The correct bug number from the previous comment is bug #4485.