Bug #93254 MAX treating Numbers as Strings when used in group by and JSON_UNQUOTE
Submitted: 20 Nov 2018 3:33 Modified: 17 Apr 22:22
Reporter: Stanley Leong Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7.24, 8.0.13 OS:Ubuntu
Assigned to: CPU Architecture:Other (x64)
Tags: GROUP BY, INTEGER, JSON_QUOTE, MAX, number

[20 Nov 2018 3:33] Stanley Leong
Description:
When using query like

SELECT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group';

The result returned from data->'$.number' seems to represent a String instead of a number.

However, the following statement reveals all fields as INTEGER

SELECT json_type(data->'$.number') FROM test;

The MAX aggregator appears to perform aggregation in string's lexicological order instead of number's.

How to repeat:
Prepare test data:

CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `data` json NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO `test` (`id`,`data`) VALUES (1,'{\"group\": \"a\", \"number\": 5}');
INSERT INTO `test` (`id`,`data`) VALUES (2,'{\"group\": \"a\", \"number\": 10}');
INSERT INTO `test` (`id`,`data`) VALUES (3,'{\"group\": \"b\", \"number\": 50}');
INSERT INTO `test` (`id`,`data`) VALUES (4,'{\"group\": \"b\", \"number\": 1}');

Query test data:

mysql> select * from test;
+----+------------------------------+
| id | data                         |
+----+------------------------------+
|  1 | {"group": "a", "number": 5}  |
|  2 | {"group": "a", "number": 10} |
|  3 | {"group": "b", "number": 50} |
|  4 | {"group": "b", "number": 1}  |
+----+------------------------------+
4 rows in set (0.00 sec)

Verifying the data type returned from data->'$.number' is INTEGER:

mysql> SELECT json_type(data->'$.number') FROM test;
+-----------------------------+
| json_type(data->'$.number') |
+-----------------------------+
| INTEGER                     |
| INTEGER                     |
| INTEGER                     |
| INTEGER                     |
+-----------------------------+
4 rows in set (0.00 sec)

Query that manifest the bug:

mysql> SELECT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group';
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 5                     |
| b                | 50                    |
+------------------+-----------------------+
2 rows in set (0.00 sec)

Expected Result:
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 10                    |
| b                | 50                    |
+------------------+-----------------------+

Version I'm using:

mysql> select @@version;
+-------------------------+
| @@version               |
+-------------------------+
| 5.7.24-0ubuntu0.18.04.1 |
+-------------------------+
1 row in set (0.00 sec)

Suggested fix:
Fix the aggregator to aggregate in number's lexicological order for json_extract output of type INTEGER.
[20 Nov 2018 3:47] Stanley Leong
To prove that MySQL behaviour is inconsistent,

mysql> INSERT INTO `billing`.`test` (`id`, `data`) VALUES ('5', '{\"group\": \"b\", \"number\": 9}');
Query OK, 1 row affected (0.02 sec)

mysql> select * from test;
+----+------------------------------+
| id | data                         |
+----+------------------------------+
|  1 | {"group": "a", "number": 5}  |
|  2 | {"group": "a", "number": 10} |
|  3 | {"group": "b", "number": 50} |
|  4 | {"group": "b", "number": 1}  |
|  5 | {"group": "b", "number": 9}  |
+----+------------------------------+
5 rows in set (0.00 sec)

mysql> SELECT max(data->'$.number') FROM test;
+-----------------------+
| max(data->'$.number') |
+-----------------------+
| 50                    |
+-----------------------+
1 row in set (0.00 sec)

Without group by, the aggregator is performing in number's lexicological order correctly.
[20 Nov 2018 6:18] Umesh Shastry
Hello Stanley,

Thank you for the report and test case.

regards,
Umesh
[4 Jan 12:29] Knut Anders Hatlen
In the cases where the aggregation is performed using filesort on disk-based temporary tables, such as when SQL_BIG_RESULT or WITH ROLLUP is specified, it seems to be able to compute the maximum value for group "a" correctly:

mysql> SELECT SQL_BIG_RESULT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group';
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 10                    |
| b                | 50                    |
+------------------+-----------------------+
2 rows in set (0,00 sec)

mysql> SELECT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group' WITH ROLLUP;
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 10                    |
| b                | 50                    |
| NULL             | 50                    |
+------------------+-----------------------+
3 rows in set (0,00 sec)
[17 Apr 22:22] Jon Stephens
Documented fix in the MySQL 8.0.17 changelog as follows:

    MAX() and MIN() used on expressions returning JSON sometimes
    compared these values as strings rather than JSON values, which
    caused unexpected results; this was especially evident when the
    JSON values were numbers.

    This was due to the fact that GROUP BY when using temporary
    tables with indexes did not compare JSON values correctly.

Closed.