MySQL Bugs: #106312: Huge value in GROUP BY clause cause unexpected output

Bug #106312	Huge value in GROUP BY clause cause unexpected output
Submitted:	27 Jan 2022 18:14	Modified:	28 Jan 2022 13:22
Reporter:	Yu Liang	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.7	OS:	Any (Ubuntu 20.04.3 LTS)
Assigned to:		CPU Architecture:	Any (Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz)
Tags:	GROUP BY

Description:
mysql> CREATE TABLE v0(c1 INT);
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO v0 ( c1 ) VALUES ( 1 );
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO v0 ( c1 ) VALUES ( 2 );
Query OK, 1 row affected (0.00 sec)

mysql> SELECT c1 FROM v0 GROUP BY c1;
+------+
| c1   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

mysql> SELECT c1 FROM v0 GROUP BY 1;
+------+
| c1   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

-- Expected, using the first column in the table v0.

mysql> SELECT c1 FROM v0 GROUP BY 100;
ERROR 1054 (42S22): Unknown column '100' in 'group statement'

-- Expected

mysql> SELECT c1 FROM v0 GROUP BY 2980934862161213708244036181968970267084642523502984526032083243547504727735692194184722708345683;   -- Random big number. 
+------+
| c1   |
+------+
|    1 |
+------+
1 row in set, 1 warning (0.00 sec)

-- Expects error, gets row output instead

----------------------------------------------------------

In the query above, we create a table v0 with simple integer column c1. We insert two values: 1 and 2 into column c1. We then check the values inside c1 using the SELECT statements. 

We know that the common usage of GROUP BY clause is to pass column names into it. And by the SQL standard, we can also pass the id number of the table column to specify the columns that we want grouping. But if the id number is bigger than the total column number inside the table, an error would be returned. These are expected behaviors and they are correctly demonstrated by the sample query above. However, if we pass in a random large value into the GROUP BY clause, the error message goes away and the query returns just one row of the table. This is unexpected to us. IMHO, even given a huge number,  the system should return the same error message that warn users the implied column id not exists in 'group statement'. 

We cannot find any documentations that explain this behaviors of GROUP BY clause when pass in large values. Looking forward to more details about what is going on here. 

Hope the report helps making MySQL better. Enjoy coding and have a nice day!

How to repeat:
In the mysql server version 8.0.27, use the following query commands: 

mysql> CREATE TABLE v0(c1 INT);
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO v0 ( c1 ) VALUES ( 1 );
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO v0 ( c1 ) VALUES ( 2 );
Query OK, 1 row affected (0.00 sec)

mysql> SELECT c1 FROM v0 GROUP BY c1;
+------+
| c1   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

mysql> SELECT c1 FROM v0 GROUP BY 1;
+------+
| c1   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

-- Expected, using the first column in the table v0.

mysql> SELECT c1 FROM v0 GROUP BY 100;
ERROR 1054 (42S22): Unknown column '100' in 'group statement'

-- Expected

mysql> SELECT c1 FROM v0 GROUP BY 2980934862161213708244036181968970267084642523502984526032083243547504727735692194184722708345683;   -- Random big number. 
+------+
| c1   |
+------+
|    1 |
+------+
1 row in set, 1 warning (0.00 sec)

-- Expects error, gets row output instead

Suggested fix:
mysql> CREATE TABLE v0(c1 INT);
mysql> INSERT INTO v0 ( c1 ) VALUES ( 1 );
mysql> INSERT INTO v0 ( c1 ) VALUES ( 2 );
mysql> SELECT c1 FROM v0 GROUP BY 2980934862161213708244036181968970267084642523502984526032083243547504727735692194184722708345683;   -- Random big number. 

-- Should return ERROR 1054 (42S22): Unknown column 'xxx' in 'group statement'

Hi Mr. Liang,

Thank you for your bug report.

We were able to repeat it and hence we do considered it a bug. However, this is a totally minor bug, since nobody uses such huge integers for column numbers. Hence, it is a low priority bug.

Verified as reported.

BUG#106312 is a duplicate of this bug.

Not a documentation issue.

An integer used with ORDER BY or GROUP is interpreted as the position of a column in the result set.

Any such value not matching the position of a column in the result should raise error 1054 (ER_BAD_FIELD_ERROR).

Permitting the use of a value that has no match and not rejecting the statement as stated above must be regarded as a defect to be corrected.

Referring to Development for handling.

Posted by developer:
 
The integer checked as a column indicator can overflow. In the code,
it is cast as an unsigned 32 bits integer. Giving max unsigned int32 (4,294,967,295) + 2, i.e.

   SELECT c1 FROM v0 GROUP BY 4294967297;

works the same as giving

   SELECT c1 FROM v0 GROUP BY 1;

A non-integer expression is not treated as a position indicated. If the integer given is large enough,
it will be represented internally as a DECIMAL and handled accordingly, i.e. not as a position indication at all.
E.g.

   > SELECT count(*) FROM v0 GROUP BY 18446744073709551615;
   ERROR 1054 (42S22): Unknown column '18446744073709551615' in 'group statement'

but:

   > SELECT count(*) FROM v0 GROUP BY 18446744073709551616;
   +----------+
   | count(*) |
   +----------+
   |        2 |
   +----------+

since 18446744073709551616 == max(unsigned 64 bits int) + 1, i.e. it can't be represented
as in integer internally.

This behavior is bizarre.

Thank you, Jon.