Bug #93254 MAX treating Numbers as Strings when used in group by and JSON_UNQUOTE
Submitted: 20 Nov 2018 3:33 Modified: 20 Nov 2018 6:18
Reporter: Stanley Leong Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7.24, 8.0.13 OS:Ubuntu
Assigned to: CPU Architecture:Other (x64)
Tags: GROUP BY, INTEGER, JSON_QUOTE, MAX, number

[20 Nov 2018 3:33] Stanley Leong
Description:
When using query like

SELECT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group';

The result returned from data->'$.number' seems to represent a String instead of a number.

However, the following statement reveals all fields as INTEGER

SELECT json_type(data->'$.number') FROM test;

The MAX aggregator appears to perform aggregation in string's lexicological order instead of number's.

How to repeat:
Prepare test data:

CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `data` json NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO `test` (`id`,`data`) VALUES (1,'{\"group\": \"a\", \"number\": 5}');
INSERT INTO `test` (`id`,`data`) VALUES (2,'{\"group\": \"a\", \"number\": 10}');
INSERT INTO `test` (`id`,`data`) VALUES (3,'{\"group\": \"b\", \"number\": 50}');
INSERT INTO `test` (`id`,`data`) VALUES (4,'{\"group\": \"b\", \"number\": 1}');

Query test data:

mysql> select * from test;
+----+------------------------------+
| id | data                         |
+----+------------------------------+
|  1 | {"group": "a", "number": 5}  |
|  2 | {"group": "a", "number": 10} |
|  3 | {"group": "b", "number": 50} |
|  4 | {"group": "b", "number": 1}  |
+----+------------------------------+
4 rows in set (0.00 sec)

Verifying the data type returned from data->'$.number' is INTEGER:

mysql> SELECT json_type(data->'$.number') FROM test;
+-----------------------------+
| json_type(data->'$.number') |
+-----------------------------+
| INTEGER                     |
| INTEGER                     |
| INTEGER                     |
| INTEGER                     |
+-----------------------------+
4 rows in set (0.00 sec)

Query that manifest the bug:

mysql> SELECT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group';
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 5                     |
| b                | 50                    |
+------------------+-----------------------+
2 rows in set (0.00 sec)

Expected Result:
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 10                    |
| b                | 50                    |
+------------------+-----------------------+

Version I'm using:

mysql> select @@version;
+-------------------------+
| @@version               |
+-------------------------+
| 5.7.24-0ubuntu0.18.04.1 |
+-------------------------+
1 row in set (0.00 sec)

Suggested fix:
Fix the aggregator to aggregate in number's lexicological order for json_extract output of type INTEGER.
[20 Nov 2018 3:47] Stanley Leong
To prove that MySQL behaviour is inconsistent,

mysql> INSERT INTO `billing`.`test` (`id`, `data`) VALUES ('5', '{\"group\": \"b\", \"number\": 9}');
Query OK, 1 row affected (0.02 sec)

mysql> select * from test;
+----+------------------------------+
| id | data                         |
+----+------------------------------+
|  1 | {"group": "a", "number": 5}  |
|  2 | {"group": "a", "number": 10} |
|  3 | {"group": "b", "number": 50} |
|  4 | {"group": "b", "number": 1}  |
|  5 | {"group": "b", "number": 9}  |
+----+------------------------------+
5 rows in set (0.00 sec)

mysql> SELECT max(data->'$.number') FROM test;
+-----------------------+
| max(data->'$.number') |
+-----------------------+
| 50                    |
+-----------------------+
1 row in set (0.00 sec)

Without group by, the aggregator is performing in number's lexicological order correctly.
[20 Nov 2018 6:18] Umesh Shastry
Hello Stanley,

Thank you for the report and test case.

regards,
Umesh
[4 Jan 12:29] Knut Anders Hatlen
In the cases where the aggregation is performed using filesort on disk-based temporary tables, such as when SQL_BIG_RESULT or WITH ROLLUP is specified, it seems to be able to compute the maximum value for group "a" correctly:

mysql> SELECT SQL_BIG_RESULT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group';
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 10                    |
| b                | 50                    |
+------------------+-----------------------+
2 rows in set (0,00 sec)

mysql> SELECT data->>'$.group', max(data->'$.number') FROM test group by data->>'$.group' WITH ROLLUP;
+------------------+-----------------------+
| data->>'$.group' | max(data->'$.number') |
+------------------+-----------------------+
| a                | 10                    |
| b                | 50                    |
| NULL             | 50                    |
+------------------+-----------------------+
3 rows in set (0,00 sec)