MySQL Bugs: #116160: Precision loss of double value in JSON

Bug #116160	Precision loss of double value in JSON
Submitted:	19 Sep 2024 10:28	Modified:	20 Sep 2024 2:11
Reporter:	Xizhe Zhang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: JSON	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The double type number in JSON changes again and again when being parsed. Precision loss is normal, but continued precision loss after a loss of precision is confusing.

How to repeat:
-- 1. Create the table
CREATE TABLE `test_json` (
  `id` int DEFAULT NULL,
  `content` json DEFAULT NULL
) ENGINE=InnoDB;

-- 2. Insert two rows
mysql> insert into test_json values (1, '[9088544342.689998]');
Query OK, 1 row affected (0.00 sec)

mysql> insert into test_json values (2, '[9088544342.689999]');
Query OK, 1 row affected (0.00 sec)

-- 3. 9088544342.689998 will be changed to 9088544342.689999, and 9088544342.689999 will be changed to 9088544342.69
mysql> select * from test_json;
+------+---------------------+
| id   | content             |
+------+---------------------+
|    1 | [9088544342.689999] |
|    2 | [9088544342.69]     |
+------+---------------------+

-- 4. Dump this table and load it back
mysql> select * from test_json;
+------+---------------------+
| id   | content             |
+------+---------------------+
|    1 | [9088544342.69]     |
|    2 | [9088544342.69]     |
+------+---------------------+

Suggested fix:
The Json_dom::parse function finally calls the ParseNumber function in rapidjson. When inserting a JSON field, is it possible to only verify the correctness of the number format without changing the number itself?

HI Mr. Zhang,

We managed to reproduce your results.

This bug is verified now ......... for versions 8.0 and higher.

IMHO, the problem here is that 9088544342.689999 becomes 9088544342.69 in JSON.

If doing the same test without JSON, 9088544342.689999 stays 9088544342.689999.

mysql [localhost:8030] {msandbox} ((none)) > create table test_jfg.t(id int NOT NULL PRIMARY KEY, d double NOT NULL);
Query OK, 0 rows affected (0.01 sec)

mysql [localhost:8030] {msandbox} ((none)) > insert into test_jfg.t values (1, '9088544342.689998'), (2, '9088544342.689999');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql [localhost:8030] {msandbox} ((none)) > select * from test_jfg.t;
+----+-------------------+
| id | d                 |
+----+-------------------+
|  1 | 9088544342.689999 |
|  2 | 9088544342.689999 |
+----+-------------------+
2 rows in set (0.00 sec)

The logic of rapidjson parsing double type numbers is as follows, taking 9088544342.689998 as an example:
1. Ignore the decimal point and get d=9088544342689998
2. Get 6 digits after the decimal point (decimal)
3. Calculate d = d / 10^6 = (double)9088544342689998 / 1000000

If you write a C++ program, you can find that:
(double)9088544342689998 / 1000000 will get 9088544342.689999, because there is a loss of precision.

However, there is a problem when 9088544342.689999 is converted to 9088544342689999 in step 1. The hexadecimal representation of 9088544342689999 is 0x2049FBA11248CF, which is 54 bits. This exceeds the 53-bit limit on the length of the double type's fraction, triggering rounding and becoming 0x2049FBA11248D0. The fraction can be stored in 53 bits, and the decimal value becomes 9088544342690000. After the calculation in step 3:
d = (double)9088544342690000 / 1000000 = 9088544342.69

In summary, the precision loss in step 3 caused 9088544342.689998 to become 9088544342.689999, while the precision loss in step 1 caused 9088544342.689999 to further become 9088544342.69

See also bug#83954, which looks related.