MySQL Bugs: #94672: Optimizer trace uses imprecise routines to print floating point numbers

Bug #94672	Optimizer trace uses imprecise routines to print floating point numbers
Submitted:	15 Mar 2019 12:05	Modified:	4 Nov 2020 11:43
Reporter:	Alexey Kopytov	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.6, 5.7, 8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When the optimizer trace needs to print a floating point number
(e.g. display a plan cost), it uses the following code in
sql/opt_trace.cc:
---
Opt_trace_struct& Opt_trace_struct::do_add(const char *key, const Cost_estimate &value)
{
  char buf[32];                         // 32 is enough for digits of a double
  my_snprintf(buf, sizeof(buf), "%g", value.total_cost());
  DBUG_PRINT("opt", ("%s: %s", key, buf));
  stmt->add(key, buf, strlen(buf), false, false);
  return *this;
}
---

(there's a similar code in Opt_trace_struct::do_add(const char *key,
double val))

Using sprintf("%g") to convert doubles to decimal representation is
imprecise and may result in the same representation for different binary
numbers.  Here's a fragment of a trace demonstrating the problem. One
could guess a covering index scan was chosen over a range scan due to a
lower cost, but it is not obvious at all from just looking at the cost
values (they are both "2.06e6"):

---

                  "best_covering_index_scan": {
                    "index": "k_1",
                    "cost": 2.06e6,
                    "chosen": true
                  },
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "PRIMARY",
                        "ranges": [
                          "1798955 < id"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": true,
                        "rows": 8233926,
                        "cost": 2.06e6,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
---

The server code has a better routine to perform such conversion, it is
called my_gcvt() and is located in strings/dtoa.c. Here's how the same
trace looks like after replacing my_snprintf() with my_cvt(). It is, in
my opinion, more self-explaining than the previous one:

---
                 "best_covering_index_scan": {
                    "index": "k_1",
                    "cost": 2058480.5263157894,
                    "chosen": true
                  },
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "PRIMARY",
                        "ranges": [
                          "1798955 < id"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": true,
                        "rows": 8233926,
                        "cost": 2058482.5100000002,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
---

How to repeat:
Code analysis.

Suggested fix:
Use my_gcvt() instead of my_snprintf() to convert floating point numbers
to strings.

Hi Kaamos,

How are you ???

Thank you for your bug report.

I fully agree with your analysis.

Verified as reported.

Have a nice day.

This has been fix in 8.0 with the following commit: https://github.com/mysql/mysql-server/commit/ab4919602ffe5543b4ebbab4d79c008c023be1c6

Why did you use FLT_DIG instead of DBL_DIG to print a double number?

--- a/sql/opt_trace.cc
+++ b/sql/opt_trace.cc
@@ -29,7 +29,6 @@
 #include "my_pointer_arithmetic.h"
 #include "my_sys.h"
 #include "mysql/psi/psi_base.h"
-#include "mysql/service_my_snprintf.h"
 #include "mysqld_error.h"
 #include "prealloced_array.h"
 #include "sql/current_thd.h"
@@ -353,7 +352,7 @@ Opt_trace_struct& Opt_trace_struct::do_add(const char *key, double val)
 {
   DBUG_ASSERT(started);
   char buf[32];                         // 32 is enough for digits of a double
-  my_snprintf(buf, sizeof(buf), "%g", val);
+  my_gcvt(val, MY_GCVT_ARG_DOUBLE, FLT_DIG, buf, nullptr);
   DBUG_PRINT("opt", ("%s: %s", key, buf));
   stmt->add(key, buf, strlen(buf), false, false);
   return *this;
@@ -391,7 +390,7 @@ Opt_trace_struct& Opt_trace_struct::do_add(const char *key, Item *item)
 Opt_trace_struct& Opt_trace_struct::do_add(const char *key, const Cost_estimate &value)
 {
   char buf[32];                         // 32 is enough for digits of a double
-  my_snprintf(buf, sizeof(buf), "%g", value.total_cost());
+  my_gcvt(value.total_cost(), MY_GCVT_ARG_DOUBLE, FLT_DIG, buf, nullptr);
   DBUG_PRINT("opt", ("%s: %s", key, buf));
   stmt->add(key, buf, strlen(buf), false, false);
   return *this;
@@ -663,7 +662,7 @@ void Opt_trace_stmt::next_line()

I was merely suggesting to use my_gcvt() instead of my_snprintf().

But the commit that implemented that change looks wrong to me. Neither FLT_DIG nor DBL_DIG make any sense as a field width, simply because both constants define limits in terms of decimal digits, and the `width` argument to my_gcvt() defines the field width in characters, which may include sign, dot, and exponent.

It's not the only wrong thing about the commit mentioned here and in bug#101430. I'll report them separately.

Reported a follow-up bug#101457.

Thanks a lot, Kaamos.

See also BUG#101457.

Thanks a lot, Jon ......