Bug #81810 Inconsistent sort order for blob/text between InnoDB and filesort
Submitted: 11 Jun 2016 0:01 Modified: 23 Jun 2016 18:01
Reporter: Manuel Ung Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.7.13, 5.6.31 OS:Any
Assigned to: CPU Architecture:Any

[11 Jun 2016 0:01] Manuel Ung
Description:
Text/blob fields are not handling the sorting of the empty string consistently between InnoDB and filesort.

How to repeat:
Use the following test case:

CREATE TABLE t1 (
  b LONGTEXT CHARACTER SET "latin1" COLLATE "latin1_bin",
  KEY b (b(32))
) ENGINE=innodb;

INSERT INTO t1 (b) VALUES ('a'), (_binary 0x1), (_binary 0x0), ('');

explain select hex(b) from t1 order by b;
select hex(b) from t1 order by b;

drop table t1;

CREATE TABLE t1 (
  b LONGTEXT CHARACTER SET "latin1" COLLATE "latin1_bin",
  PRIMARY KEY b (b(32))
) ENGINE=innodb;

INSERT INTO t1 (b) VALUES ('a'), (_binary 0x1), (_binary 0x0), ('');

explain select hex(b) from t1;
select hex(b) from t1;

drop table t1;

Observe different sort order:

filesort:
hex(b)

00
01
61

InnoDB index order:
hex(b)
00
01

61

Suggested fix:
In Field_blob::make_sort_key, a memset to zero is done for the empty string. 

Instead of memset'ing to zero, you should fill using pad_char from the collation instead.
[11 Jun 2016 4:07] Umesh Shastry
Hello Manuel Ung,

Thank you for the report and test case.
Verified as described with 5.6.31/5.7.13 builds.

Thanks,
Umesh
[23 Jun 2016 18:01] Manuel Ung
Related to this topic, it doesn't really make sense for Field_blob::make_sort_key to always write down the full length of the blob for binary collations. The length should be the minimum of the requested key length, and the blob length.

For example, if we had 3 rows:
a
abcd
abcde

and we request a key of size 4, we currently write:
'a'  0  1 0
'a' 'b' 4 0
'a' 'b' 5 0

It makes more sense to have
'a'  0  1 0
'a' 'b' 2 0
'a' 'b' 2 0

This issue has no effect on correctness, but you might see some order when you expect unordered data (although if it's "unordered" then any order is valid anyway).