MySQL Bugs: #36829: Decimal to double conversion is inaccurate

Bug #36829	Decimal to double conversion is inaccurate
Submitted:	20 May 2008 21:04	Modified:	16 Apr 2010 17:27
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: General	Severity:	S3 (Non-critical)
Version:	5.0.37, 5.0.60, 5.1.41-bzr	OS:	Linux
Assigned to:	Alexey Botchkov	CPU Architecture:	Any
Tags:	conversion, decimal, double, regression

Description:
This is a regression from MySQL 4.0 to MySQL 5.0. The problem occurs because 4.0 used strtod and atof, and 5.0 provides its own implementation via my_strtod. A value that strtod/atof are able to parse correctly on most platforms is  8.8279417262832873e-139

How to repeat:
main() {
    double d = 8.8279417262832873e-139;
    double d2 = atof("8.8279417262832873e-139");
    printf("%.16e\n", d);
    printf("%.16e\n", d2);
}

$ ./d
8.8279417262832873e-139
8.8279417262832873e-139

Output from instrumented version of mysqld that displays the input (string) and output (double) from StrToDouble for the conversion.
StrToDouble converts 8.8279417262832873e-139 to 8.8279417262832859e-139

SQL for which the error occurs:
create table d (d double);
insert into d values ("8.8279417262832873e-139");

I ran the repro case using gcc 3.4, Linux 2.6 and an x86_64 server.

Suggested fix:
Use strtod/atof from glibc on the platforms for which they work. System libraries from most of the Unix vendors and Linux are ieee754 compliant. I doubt that my_strtod is in MySQL 5.0.

I added a SQL function, ieee754_to_string, that converts double -> decimal with 17 digits of precision. This is used with mysqldump to dump tables that have columns of type Double. When the data is reloaded, the decimal -> double conversion should produce the same value as prior to the dump because double -> decimal -> double is lossless for most values when the double -> decimal conversion generates 17 digits of precision. However, that assumption has been broken by the implementation of my_strtod and now my data on reload does not match the data prior to the dump.

And while I am at it, why does my_atof pass in a bogus pointer to the end of the string when calling my_strtod. The interface for my_strtod is that the second arg is an IN/OUT param and specifies the end of the input on entry to my_strtod. my_atof makes up a bogus end pointer -- end + 64kb. I assume this is to save the  cost of calling strlen, but it now prevents any reimplementation of my_strtod from trusting the value on input. And my_atof better hope that the arithmetic doesn't overflow.

double my_atof(const char *nptr)
{
  int error;
  const char *end= nptr+65535;                  /* Should be enough */
  return (my_strtod(nptr, (char**) &end, &error));
}

Thank you for a problem report.

my_strntod_8bit also has code that uses pointer + 64k to guess at the end of the string.

The problem seems to be one of precision when converting the number to a string as the text protocol uses the "%.14g" format for conversion.

Discussing Alexey Kopytov, for the case with tables, the problem seems to lie within Field_double::val_str, which uses the "%-*.*g" when converting the double to string. The problem is that by default it uses a DBL_DIG precision, but this has been fixed in 6.0 by using dtoa which "automagically" chooses a precision.

This is a duplicate of Bug#21497.

Also related to WL#2934

The code that you mention in Field_double::val_str has not changed from 4.0.26 to 5.0.37:

  uint to_length=max(field_length, DOUBLE_TO_STRING_CONVERSION_BUFFER_SIZE);
  val_buffer->alloc(to_length);
  char *to=(char*) val_buffer->ptr();

  if (dec >= NOT_FIXED_DEC)
  {
    sprintf(to,"%-*.*g",(int) field_length,DBL_DIG,nr);
    to=strcend(to,' ');
  }

The comments above about StrToDouble refer to code in my branch that consolidates common code in field.cc that calls my_strntod.

I have changed my 5.0.37 branch so that my_strtod calls strtod() and I removed all code from my_strtod. With this change, the behavior is what I expect and matches MySQL 4.0 behavior.

The problem occurs when the String literal "8.2..." is converted to a double on insert, not when the double is converted to a string during the select.

My investigation was done on 5.1. Out of curiosity, I took your example and compiled it using my my_atof/my_atof:

int main()
{
  const char str[]= "8.8279417262832873e-139";
  double d = 8.8279417262832873e-139;
  double d2 = my_atof(str);
  printf("%s\n", str);
  printf("%.16e\n", d);
  printf("%.16e\n", d2);
  return 0;
}

davi@skynet:~/mysql-5.0-bugteam/strings$ ./str
8.8279417262832873e-139
8.8279417262832873e-139
8.8279417262832873e-139

Do you get different results?

Taking a more closely look on my example and mysql's strtod.. yikes, you are right. There is also a problem with strtod and it's subtle, will expand on this later.

Decided no to fix this in 5.0 as changes needed are too significant.
Besides that was fixed in 6.0.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/89435

2940 Alexey Botchkov	2009-11-05
      Bug#36829      Decimal to double conversion is inaccurate
          it's the 'string to double' conversion that isn't accurate here.
          if we use 'long double' instead of the 'double' to accumulate the
          result, the precision isn't lost.
          The 'double to string' conversion still isn't fixed in this case, but
          that's how it was in 4.0
      
      per-file comments:
        strings/strtod.c
      Bug#36829      Decimal to double conversion is inaccurate
         long double used to accumulate the result

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94734

2940 Alexey Botchkov	2009-12-16
      Bug#36829      Decimal to double conversion is inaccurate
          it's the 'string to double' conversion that isn't accurate here.
          if we use 'long double' instead of the 'double' to accumulate the
          result, the precision isn't lost.
          The 'double to string' conversion still isn't fixed in this case, but
          that's how it was in 4.0
      
      per-file comments:
        strings/strtod.c
      Bug#36829      Decimal to double conversion is inaccurate
         long double used to accumulate the result

'long double' is non-standard and platform-specific, we should not use it. Even on x86, 'long double' is a synonym for 'double' on Windows. 

If the patch for this bug had a test case it would immediately reveal that after the patch the bug is still present on Windows.

I think the proper fix is to backport the dtoa's version of strtod() and get rid of our own one. Replacing the code doing string-to-number conversion is much less intrusive and less dangerous than changing string-to-number conversion. IIRC, in the dtoa integration patch, almost all changes in the test suite were caused by the difference in number-to-string conversions, only a few test cases were affected by the different string-to-number conversion routine.

Correction: the sentence in the previous comment should be "Replacing the code doing string-to-number conversion is much less intrusive and
less dangerous than changing number-to-number conversion".

fixed by the backported patch for WL#2934

This is part of the inclusion of dtoa in 5.5.3. No separate changelog entry.