Bug #98035 ORDER BY does not work for a column with a string function and utf8mb4_0900_ai_c
Submitted: 20 Dec 2019 5:25 Modified: 22 Jan 2020 3:49
Reporter: Yuichi SUGIYAMA Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.18, 8.0.11 OS:Any
Assigned to: CPU Architecture:Any

[20 Dec 2019 5:25] Yuichi SUGIYAMA
Description:
The following ORDER BY for the utf8mb4_0900_ai_ci column returns in correct order.

mysql> SELECT c1, SUBSTR(c1, 1) AS dummy FROM t1 ORDER BY dummy ASC;
+------+-------+
| c1   | dummy |
+------+-------+
|      |       |
| NULL | NULL  |
|      |       |
| NULL | NULL  |
+------+-------+
4 rows in set (0.00 sec)

The NULL values should be presented first as same as the result of ORDER BY c1.
> > When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.
> https://dev.mysql.com/doc/refman/8.0/en/working-with-null.html

mysql> SELECT c1, SUBSTR(c1, 1) AS dummy FROM t1 ORDER BY c1 ASC;
+------+-------+
| c1   | dummy |
+------+-------+
| NULL | NULL  |
| NULL | NULL  |
|      |       |
|      |       |
+------+-------+
4 rows in set (0.00 sec)

Other string functions also have same behavior.
e.g. 
SELECT c1, CONCAT(c1, '') AS dummy FROM t1 ORDER BY dummy ASC;
SELECT c1, LTRIM(c1) AS dummy FROM t1 ORDER BY dummy ASC;

And ORDER BY with DESC also does not work.
mysql> SELECT c1, SUBSTR(c1, 1) AS dummy FROM t1 ORDER BY dummy DESC;
+------+-------+
| c1   | dummy |
+------+-------+
|      |       |
| NULL | NULL  |
|      |       |
| NULL | NULL  |
+------+-------+

How to repeat:
mysql> CREATE TABLE t1 (c1 char(100) COLLATE 'utf8mb4_0900_ai_ci');
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO t1 VALUES (''), (NULL), (''), (NULL);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT c1, SUBSTR(c1, 1) AS dummy FROM t1 ORDER BY c1 ASC;
+------+-------+
| c1   | dummy |
+------+-------+
| NULL | NULL  |
| NULL | NULL  |
|      |       |
|      |       |
+------+-------+
4 rows in set (0.00 sec)

mysql> SELECT c1, SUBSTR(c1, 1) AS dummy FROM t1 ORDER BY dummy ASC;
+------+-------+
| c1   | dummy |
+------+-------+
|      |       |
| NULL | NULL  |
|      |       |
| NULL | NULL  |
+------+-------+
4 rows in set (0.00 sec)

mysql> SELECT c1, CONCAT(c1, '') AS dummy FROM t1 ORDER BY dummy ASC;
+------+-------+
| c1   | dummy |
+------+-------+
|      |       |
| NULL | NULL  |
|      |       |
| NULL | NULL  |
+------+-------+
4 rows in set (0.00 sec)

mysql> SELECT c1, LTRIM(c1) AS dummy FROM t1 ORDER BY dummy ASC;
+------+-------+
| c1   | dummy |
+------+-------+
|      |       |
| NULL | NULL  |
|      |       |
| NULL | NULL  |
+------+-------+
4 rows in set (0.00 sec)

mysql> SELECT c1, SUBSTR(c1, 1) AS dummy FROM t1 ORDER BY dummy DESC;
+------+-------+
| c1   | dummy |
+------+-------+
|      |       |
| NULL | NULL  |
|      |       |
| NULL | NULL  |
+------+-------+
4 rows in set (0.00 sec)

Suggested fix:
The ORDER BY SUBSTR(c1, 1) ASC returns same result as ORDER BY c1 ASC.
[20 Dec 2019 6:07] MySQL Verification Team
Hello SUGIYAMA-San,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[22 Jan 2020 3:49] Jon Stephens
Documented fix as follows in the MySQL 8.0.20 changelog:

    The NULL indicator was not properly written for items used as
    variable-length keys, such that all such items were assumed to
    be not NULL, which was considered equal to the empty string when
    using certain collations. One visible effect of this issue was
    that ordering by an expression using a nullable string was
    sometimes not performed correctly. An example of such a query, 
    where c1 contained both NULL and empty string values, is shown 
    here:

    SELECT c1, SUBSTR(c1,1) AS c2 FROM t ORDER BY c2;

Closed.