Bug #116160 Precision loss of double value in JSON
Submitted: 19 Sep 10:28 Modified: 20 Sep 2:11
Reporter: Xizhe Zhang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[19 Sep 10:28] Xizhe Zhang
Description:
The double type number in JSON changes again and again when being parsed. Precision loss is normal, but continued precision loss after a loss of precision is confusing.

How to repeat:
-- 1. Create the table
CREATE TABLE `test_json` (
  `id` int DEFAULT NULL,
  `content` json DEFAULT NULL
) ENGINE=InnoDB;

-- 2. Insert two rows
mysql> insert into test_json values (1, '[9088544342.689998]');
Query OK, 1 row affected (0.00 sec)

mysql> insert into test_json values (2, '[9088544342.689999]');
Query OK, 1 row affected (0.00 sec)

-- 3. 9088544342.689998 will be changed to 9088544342.689999, and 9088544342.689999 will be changed to 9088544342.69
mysql> select * from test_json;
+------+---------------------+
| id   | content             |
+------+---------------------+
|    1 | [9088544342.689999] |
|    2 | [9088544342.69]     |
+------+---------------------+

-- 4. Dump this table and load it back
mysql> select * from test_json;
+------+---------------------+
| id   | content             |
+------+---------------------+
|    1 | [9088544342.69]     |
|    2 | [9088544342.69]     |
+------+---------------------+

Suggested fix:
The Json_dom::parse function finally calls the ParseNumber function in rapidjson. When inserting a JSON field, is it possible to only verify the correctness of the number format without changing the number itself?
[19 Sep 11:22] MySQL Verification Team
HI Mr. Zhang,

We managed to reproduce your results.

This bug is verified now ......... for versions 8.0 and higher.
[19 Sep 16:47] Jean-François Gagné
IMHO, the problem here is that 9088544342.689999 becomes 9088544342.69 in JSON.

If doing the same test without JSON, 9088544342.689999 stays 9088544342.689999.

mysql [localhost:8030] {msandbox} ((none)) > create table test_jfg.t(id int NOT NULL PRIMARY KEY, d double NOT NULL);
Query OK, 0 rows affected (0.01 sec)

mysql [localhost:8030] {msandbox} ((none)) > insert into test_jfg.t values (1, '9088544342.689998'), (2, '9088544342.689999');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql [localhost:8030] {msandbox} ((none)) > select * from test_jfg.t;
+----+-------------------+
| id | d                 |
+----+-------------------+
|  1 | 9088544342.689999 |
|  2 | 9088544342.689999 |
+----+-------------------+
2 rows in set (0.00 sec)
[20 Sep 2:11] Xizhe Zhang
The logic of rapidjson parsing double type numbers is as follows, taking 9088544342.689998 as an example:
1. Ignore the decimal point and get d=9088544342689998
2. Get 6 digits after the decimal point (decimal)
3. Calculate d = d / 10^6 = (double)9088544342689998 / 1000000

If you write a C++ program, you can find that:
(double)9088544342689998 / 1000000 will get 9088544342.689999, because there is a loss of precision.

However, there is a problem when 9088544342.689999 is converted to 9088544342689999 in step 1. The hexadecimal representation of 9088544342689999 is 0x2049FBA11248CF, which is 54 bits. This exceeds the 53-bit limit on the length of the double type's fraction, triggering rounding and becoming 0x2049FBA11248D0. The fraction can be stored in 53 bits, and the decimal value becomes 9088544342690000. After the calculation in step 3:
d = (double)9088544342690000 / 1000000 = 9088544342.69

In summary, the precision loss in step 3 caused 9088544342.689998 to become 9088544342.689999, while the precision loss in step 1 caused 9088544342.689999 to further become 9088544342.69
[20 Sep 8:58] Knut Anders Hatlen
See also bug#83954, which looks related.