MySQL Bugs: #99900: The result of distinct with varlen

Bug #99900	The result of distinct with varlen_keys is wrong
Submitted:	16 Jun 2020 12:51	Modified:	1 Jul 2020 14:28
Reporter:	bai Kevin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	distinct

Description:
When the table is bigger and the query needs to merge multiple chunk and
at the same time needs to remove the duplicate keys, then the result is
wrong because the duplicate keys can't be removed. Because the function
key_is_greater_than is wrong for the varchar or blob.
If the function cmp_varlen_keys returns true, it means that the key1 is
less than the key2. Otherwise, it means that the key1 is greater than or
equal with the key2. So if the key_is_greater_than for varlen_keys returns
true, it means that the key1 is greater than or equal with the key2. Then
it's wrong using !mcl.key_is_greater_than to determine if the key is equal
with the key2 in the function merge_buffers.

How to repeat:
CREATE TABLE t2 (
  auto int not null auto_increment,
  fld1 int(6) unsigned zerofill DEFAULT '000000' NOT NULL,
  companynr tinyint(2) unsigned zerofill DEFAULT '00' NOT NULL,
  fld3 char(30) DEFAULT '' NOT NULL,
  fld4 char(35) DEFAULT '' NOT NULL,
  fld5 char(35) DEFAULT '' NOT NULL,
  fld6 char(4) DEFAULT '' NOT NULL,
  UNIQUE fld1 (fld1),
  KEY fld3 (fld3),
  PRIMARY KEY (auto)
);
 
 
INSERT INTO t2 VALUES (1,000001,00,'Omaha','teethe','neat','');
INSERT INTO t2 VALUES (2,011401,37,'breaking','dreaded','Steinberg','W');
INSERT INTO t2 VALUES (3,011402,37,'Romans','scholastics','jarring','');
INSERT INTO t2 VALUES (4,011403,37,'intercepted','audiology','tinily','');
INSERT INTO t2 VALUES (5,011501,37,'bewilderingly','wallet','balled','');
INSERT INTO t2 VALUES (6,011701,37,'astound','parters','persist','W');
INSERT INTO t2 VALUES (7,011702,37,'admonishing','eschew','attainments','');
INSERT INTO t2 VALUES (8,011703,37,'sumac','quitter','fanatic','');
INSERT INTO t2 VALUES (9,012001,37,'flanking','neat','measures','FAS');
....

+SET @@sort_buffer_size = 32768;
+select count(*) from (select distinct fld3,repeat("a",length(fld3)*5),count(*) from t2 group by companynr,fld3 order by fld3) t;
+count(*)
+1188
+SET @@sort_buffer_size = 327680;
+select count(*) from (select distinct fld3,repeat("a",length(fld3)*5),count(*) from t2 group by companynr,fld3 order by fld3) t;
+count(*)
+1171

Suggested fix:
--- a/sql/filesort.cc
+++ b/sql/filesort.cc
@@ -1776,8 +1776,8 @@ struct Merge_chunk_greater {
     if (m_len) return memcmp(key1, key2, m_len) > 0;

     if (m_param)
-      return !cmp_varlen_keys(m_param->local_sortorder, m_param->use_hash, key1,
-                              key2);
+      return cmp_varlen_keys(m_param->local_sortorder, m_param->use_hash, key2,
+                             key1);

The data of sample table can be found in the mysql-test/include/common-tests.inc, line49 - line1247.

HI Mr. Kevin,

Thank you very much for your bug report.

I managed to repeat it without any problems:

count(*)
1188
count(*)
1171

Verified as reported.

HI Mr. Kevin,

Thank you very much for your bug report.

I managed to repeat it without any problems:

count(*)
1188
count(*)
1171

Verified as reported.

Posted by developer:
 
Fixed in 8.0.22.

Merging during filesort operations could fail to remove duplicates
for queries that used DISTINCT.

Thank you, Paul ......