MySQL Bugs: #112904: Precision lost with decimals in JSON text that is IEEE 754 64-bit float

Bug #112904	Precision lost with decimals in JSON text that is IEEE 754 64-bit float
Submitted:	31 Oct 2023 17:07	Modified:	2 Nov 2023 11:58
Reporter:	Kaiwang CHen (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: JSON	Severity:	S3 (Non-critical)
Version:	5.7, 8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
An IEEE 754 float may be set into a JSON value. However, when a JSON text with such a value is sent to MySQL, it may be parsed to be another number that differs at the last digit.

Although float by definition is approximate, we would still expect any code-point number to be stable. Consider such a case: 

1) The application writes to Instance A an IEEE 754 float to a json field by JSON_SET(). It will be a decimal.
2) The user generates a dump file which contains such a record.
3) He loads the dump file into Instance B.
4) He verifies the table in A and B with some checksum tools. The tables are considered not same without peeking into JSON values

This problem is because MySQL invokes the RapidJSON SAX parser with default configuration (kParseDefaultFlags) which does not preserve full precision, while RapidJSON does have a full precision flag that could be explored to workaround the problem. See http://rapidjson.org/md_doc_internals.html#ParsingDouble.

Performance with above configurations are reported be significant by benchmark. See https://github.com/miloyip/nativejson-benchmark. However, in practice the difference might be acceptable. A one-million executions test showed average overhead to be 0.7us comparing to 46.76us with default flag.

  mysqlslap --query="select cast('{\"x\":1.4334335999999999}' as json)" -I 1000000

  select exec_count, total_latency, max_latency, avg_latency from sys.statement_analysis;

Note that PostgreSQL uses numeric rather than integers and double to store JSON numbers. However, my_decimal in MySQL is 72 bytes which is much bigger than double. Storing decimal into binary json is not so economic.

How to repeat:
create table t1 (doc json);
insert t1 values ('{"x": 1}');
update t1 set doc = json_set(doc, "$.x", -0.00000000000005684341886080803);
 -- {"x": -0.00000000000005684341886080803}
select * from t1;
-- checksum A: 4058023359
select crc32(doc) from t1;

create table t2 (doc json);
insert t2 values ('{"x": -0.00000000000005684341886080803}');
-- checksum B: 3198227320
select crc32(doc) from t2;

Suggested fix:
Consider making IEEE 754 code point numbers stable across instances.

diff --git a/sql-common/json_dom.cc b/sql-common/json_dom.cc
index 1f807f55009d..63297f2d990a 100644
--- a/sql-common/json_dom.cc
+++ b/sql-common/json_dom.cc
@@ -564,7 +564,15 @@ Json_dom_ptr Json_dom::parse(const char *text, size_t length,
   Rapid_json_handler handler(depth_handler);
   rapidjson::MemoryStream ss(text, length);
   rapidjson::Reader reader;
-  bool success = reader.Parse<rapidjson::kParseDefaultFlags>(ss, handler);
+
+  bool success;
+#ifdef MYSQL_SERVER
+  THD *thd = current_thd;
+  if (thd && thd->variables.json_full_precision)
+    success = reader.Parse<rapidjson::kParseFullPrecisionFlag>(ss, handler);
+  else
+#endif
+    success = reader.Parse<rapidjson::kParseDefaultFlags>(ss, handler);
 
   if (success) return handler.get_built_doc();
 
diff --git a/sql-common/json_syntax_check.cc b/sql-common/json_syntax_check.cc
index dec238285f6b..55b2e47f787d 100644
--- a/sql-common/json_syntax_check.cc
+++ b/sql-common/json_syntax_check.cc
@@ -35,6 +35,11 @@
 #include "my_sys.h"
 #include "mysqld_error.h"
 
+#ifdef MYSQL_SERVER
+#include "sql/sql_class.h"       // THD
+#include "sql/current_thd.h"  // current_thd
+#endif
+
 bool Syntax_check_handler::StartObject() {
   m_too_deep_error_raised = check_json_depth(++m_depth, m_depth_handler);
   return !m_too_deep_error_raised;
@@ -64,7 +69,14 @@ bool is_valid_json_syntax(const char *text, size_t length, size_t *error_offset,
   Syntax_check_handler handler(depth_handler);
   rapidjson::Reader reader;
   rapidjson::MemoryStream ms(text, length);
-  const bool valid = reader.Parse<rapidjson::kParseDefaultFlags>(ms, handler);
+  bool valid;
+#ifdef MYSQL_SERVER
+  THD *thd = current_thd;
+  if (thd && thd->variables.json_full_precision)
+    valid = reader.Parse<rapidjson::kParseFullPrecisionFlag>(ms, handler);
+  else
+#endif
+    valid = reader.Parse<rapidjson::kParseDefaultFlags>(ms, handler);
 
   if (!valid && (error_offset != nullptr || error_message != nullptr)) {
     const std::pair<std::string, size_t> error = get_error_from_reader(reader);
diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
index 4d85a84e576f..96db17ef770d 100644
--- a/sql/sys_vars.cc
+++ b/sql/sys_vars.cc
@@ -7701,3 +7701,8 @@ static Sys_var_enum Sys_explain_format(
     SESSION_VAR(explain_format), CMD_LINE(OPT_ARG), explain_format_names,
     DEFAULT(static_cast<ulong>(Explain_format_type::TRADITIONAL)),
     NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(nullptr), ON_UPDATE(nullptr));
+
+static Sys_var_bool Sys_json_full_precision(
+    "json_full_precision", "full precision",
+    HINT_UPDATEABLE SESSION_VAR(json_full_precision), CMD_LINE(OPT_ARG),
+    DEFAULT(false), NO_MUTEX_GUARD, IN_BINLOG, ON_CHECK(nullptr), ON_UPDATE(nullptr));
diff --git a/sql/system_variables.h b/sql/system_variables.h
index 55a452ca1f5d..0f6acbe62e92 100644
--- a/sql/system_variables.h
+++ b/sql/system_variables.h
@@ -487,6 +487,7 @@ struct System_variables {
     committed or rolled back from another connection.
    */
   bool xa_detach_on_prepare;
+  bool json_full_precision;
 
   /**
     @sa Sys_debug_sensitive_session_string

Hi Mr. Chen,

Thank you for your bug report.

It looks very interesting, but there is something you should try.

Have you tried typecasting the floating point constants and the expression itself as DOUBLE. You could have also used scientific notation.

Also, do note that some of your constants have number of digits that are larger then 15. That is because even DOUBLE can not guarantee the sixteenth digits.

Let us know how did it go with typecasting and decreasing significant digits to 15.

Here are some examples:

-- CAST(str AS JSON)
-- A number in literal-string json is handled by ParseNumber() in the RapidJSON SAX parser

-- Here is released code behavior
-- Converted to json by Json_dom::parse(text,...) with kParseDefaultFlags
-- got: {"x": 1.4334336}
-- (gdb) 0100001101001001011101101000000001010110101011111000000000000000
-- (gdb) x /1gt &d
-- 0x7fffac6f52a0: 0011111111110110111011110101100000010010000011111100110011111101
set json_full_precision = off;
select cast('{\"x\":1.4334335999999999}' as json);

-- Here is full precision in discussion
-- Converted to json by Json_dom::parse(text,...) with kParseFullPrecisionFlag
-- got: {"x": 1.4334335999999999}
-- (gdb) x /1gt &d
-- 0x7fffac6f5298: 0011111111110110111011110101100000010010000011111100110011111100
set json_full_precision = on;
select cast('{\"x\":1.4334335999999999}' as json);

-- JSON_SET(doc, path, value)
-- A literal number is parsed by MySQL Lexer lex_one_token() and converted to target type then to json value

-- The literal value is parsed as decimal (Item_decimal), converted by str2my_decimal()
-- Converted to json by create_scalar<Json_decimal>(..., my_decimal)
-- got: {"x": 1.4334335999999999}
-- (gdb) p *r
-- $5 = {<decimal_t> = {intg = 1, frac = 16, len = 9, sign = false, buf = 0x7fff3401c8ac}, foo1 = 123, buffer = {1, 433433599, 999999900, 
--    -1886417009, -1886417009, -1886417009, -1886417009, -1886417009, -1886417009}, foo2 = 123, static test_value = 123}
select json_set('{}', "$.x", 1.4334335999999999);

-- The literal value is parsed as double (Item_float), converted by my_strtod()
-- Converted to json by create_scalar<Json_double>(..., double d)
-- got: {"x": 1.4334335999999999}
-- (gdb) x /1gt &d
-- 0x7fffac6f54d8: 0011111111110110111011110101100000010010000011111100110011111100
select json_set('{}', "$.x", 1.4334335999999999e0);

-- The value is a double converted from literal decimal, by my_decimal2double()
-- Converted to json by create_scalar<Json_double>(..., double d)
-- got: {"x": 1.4334335999999999}
-- (gdb) x /1gt &d
-- 0x7fffac6f54d8: 0011111111110110111011110101100000010010000011111100110011111100
select json_set('{}', "$.x", cast(1.4334335999999999 as double));

Here are more examples:

-- got: {"x": -0.00000000000005684341886080803} 
select json_set('{}', "$.x", -0.00000000000005684341886080803)

-- got: {"x": -0.00000000000005684341886080803}
select json_set('{}', "$.x", -0.00000000000005684341886080803e0);

-- got: {"x": -0.00000000000005684341886080803}
select json_set('{}', "$.x", -cast(0.00000000000005684341886080803 as double));

-- got: {"x": -0.00000000000005684341886080803}
select json_set('{}', "$.x", cast(-0.00000000000005684341886080803 as double));

-- default precision (last digit becomes 4)
-- got: {"x": -0.00000000000005684341886080804}
select cast('{"x": -0.00000000000005684341886080803}' as json);

-- full precision
-- got: {"x": -0.00000000000005684341886080803}
select cast('{"x": -0.00000000000005684341886080803}' as json);

The number literals and conversions:

(A) Literal -0.00000000000005684341886080803

It is parsed as Item_func_neg wrapping Item_decimal. The decimal is obtained by str2my_decimal().

(gdb) p *decimal_value
$62 = {<decimal_t> = {intg = 1, frac = 29, len = 9, sign = false, buf = 0x7fff3401c8e4}, foo1 = 123, buffer = {0, 0, 56843, 418860808, 
    30000000, -1886417009, -1886417009, -1886417009, -1886417009}, foo2 = 123, static test_value = 123}

When the whole expr is casted as double by my_decimal2double(), it is firstly serialized by decimal2string() to “-0.00000000000005684341886080803” then parsed by my_strtod().

(gdb) p *d
$64 = {<decimal_t> = {intg = 1, frac = 29, len = 9, sign = true, buf = 0x7fffac6f535c}, foo1 = 123, buffer = {0, 0, 56843, 418860808, 
    30000000, -1886417009, -1886417009, -1886417009, -1886417009}, foo2 = 123, static test_value = 123}
(gdb) p res
$68 = -5.6843418860808027e-14
(gdb) x /1gt &res
0x7fffac6f5158: 1011110100110000000000000000000000000000000000000000000000000001

(B) Literal -0.00000000000005684341886080803e0

It is parsed as Item_float. The double value is obtained by my_strtod().

(gdb) p res
$69 = 5.6843418860808027e-14
(gdb) x /1gt &res
0x7fffac6f3ab8: 0011110100110000000000000000000000000000000000000000000000000001

The JSON conversions:

(A) Default precision, internal:: StrtodNormalPrecision (), precision lost.

(gdb) p i64
$4 = 5684341886080803
(gdb) p d
$5 = 5684341886080803
(gdb) x /1gt &d
0x7fffac6f52a0: 0100001100110100001100011110000011111010111001101101011100100011
(gdb) n
1720                   if (d > (std::numeric_limits<double>::max)()) {
(gdb) p d
$6 = 5.684341886080804e-14
(gdb)  x /1gt &d
0x7fffac6f52a0: 0011110100110000000000000000000000000000000000000000000000000010

(B) Full precision, internal::StrtodFullPrecision()

(gdb) p d
$7 = 5684341886080803
(gdb) p i64
$8 = 5684341886080803
(gdb) p d
$9 = 5684341886080803
(gdb) x /1gt &d
0x7fffac6f5298: 0100001100110100001100011110000011111010111001101101011100100011
(gdb) n
1720                   if (d > (std::numeric_limits<double>::max)()) {
(gdb) p d
$10 = 5.6843418860808027e-14
(gdb) x /1gt &d
0x7fffac6f5298: 0011110100110000000000000000000000000000000000000000000000000001

Output conversions:

my_gcvt() with arg MY_GCVT_ARG_DOUBLE.

(A) MySQL double

(gdb) p x
$12 = -5.6843418860808027e-14
(gdb) x /1gt &x
0x7fffac6f4a48: 1011110100110000000000000000000000000000000000000000000000000001
(gdb) p buffer
$14 = 0x7fffac6f5960 "-0.", '0' <repeats 13 times>, "5684341886080803"

(B) JSON double, obtained by normal precision

(gdb) p x
$24 = -5.684341886080804e-14
(gdb) x /1gt &x
0x7fffac6f4788: 1011110100110000000000000000000000000000000000000000000000000010
(gdb) p buffer.m_ptr
$26 = 0x7fff34010340 "{\"x\": -0.", '0' <repeats 13 times>, "5684341886080804"

(C) JSON double, obtained by full precision

(gdb) p x
$21 = -5.6843418860808027e-14
(gdb) x /1gt &x
0x7fffac6f4788: 1011110100110000000000000000000000000000000000000000000000000001
(gdb) p buffer.m_ptr
$23 = 0x7fff3400ee70 "{\"x\": -0.", '0' <repeats 13 times>, "5684341886080803"

In short, MySQL double and JSON double are parsed in their own ways, and printed in the same way.
The JSON library with default flag works for most cases, but in certain cases it loses precision. To ensure consistency strictly, MySQL should implement RawNumber() by its own string to number functions rather than taking values from integers and double callbacks.  However, it incurs more overhead, and the full precision flag works fine by observation.

BTW, where is 15 significant digits defined? Is it cemment "double supports only 15 digits" in the code?
Then is it possible to print only significant bits?

Hi Mr. Chen,

Sorry, but your report is not a bug.

Simply, IEEE definition of double-precision floating point is quite clear. It is 16 digits that are supported, but 16th digit can be sometimes rounded. 

Anyway, the conversions are not done in our code, but in the FPU part of the CPU. 

MySQL does not do the conversion by itself. Code is generated that uses FPU's commands.

Not a bug.