Bug #99900 The result of distinct with varlen_keys is wrong
Submitted: 16 Jun 2020 12:51 Modified: 1 Jul 2020 14:28
Reporter: bai Kevin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: distinct

[16 Jun 2020 12:51] bai Kevin
Description:
When the table is bigger and the query needs to merge multiple chunk and
at the same time needs to remove the duplicate keys, then the result is
wrong because the duplicate keys can't be removed. Because the function
key_is_greater_than is wrong for the varchar or blob.
If the function cmp_varlen_keys returns true, it means that the key1 is
less than the key2. Otherwise, it means that the key1 is greater than or
equal with the key2. So if the key_is_greater_than for varlen_keys returns
true, it means that the key1 is greater than or equal with the key2. Then
it's wrong using !mcl.key_is_greater_than to determine if the key is equal
with the key2 in the function merge_buffers.

How to repeat:
CREATE TABLE t2 (
  auto int not null auto_increment,
  fld1 int(6) unsigned zerofill DEFAULT '000000' NOT NULL,
  companynr tinyint(2) unsigned zerofill DEFAULT '00' NOT NULL,
  fld3 char(30) DEFAULT '' NOT NULL,
  fld4 char(35) DEFAULT '' NOT NULL,
  fld5 char(35) DEFAULT '' NOT NULL,
  fld6 char(4) DEFAULT '' NOT NULL,
  UNIQUE fld1 (fld1),
  KEY fld3 (fld3),
  PRIMARY KEY (auto)
);
 
 
INSERT INTO t2 VALUES (1,000001,00,'Omaha','teethe','neat','');
INSERT INTO t2 VALUES (2,011401,37,'breaking','dreaded','Steinberg','W');
INSERT INTO t2 VALUES (3,011402,37,'Romans','scholastics','jarring','');
INSERT INTO t2 VALUES (4,011403,37,'intercepted','audiology','tinily','');
INSERT INTO t2 VALUES (5,011501,37,'bewilderingly','wallet','balled','');
INSERT INTO t2 VALUES (6,011701,37,'astound','parters','persist','W');
INSERT INTO t2 VALUES (7,011702,37,'admonishing','eschew','attainments','');
INSERT INTO t2 VALUES (8,011703,37,'sumac','quitter','fanatic','');
INSERT INTO t2 VALUES (9,012001,37,'flanking','neat','measures','FAS');
....

+SET @@sort_buffer_size = 32768;
+select count(*) from (select distinct fld3,repeat("a",length(fld3)*5),count(*) from t2 group by companynr,fld3 order by fld3) t;
+count(*)
+1188
+SET @@sort_buffer_size = 327680;
+select count(*) from (select distinct fld3,repeat("a",length(fld3)*5),count(*) from t2 group by companynr,fld3 order by fld3) t;
+count(*)
+1171

Suggested fix:
--- a/sql/filesort.cc
+++ b/sql/filesort.cc
@@ -1776,8 +1776,8 @@ struct Merge_chunk_greater {
     if (m_len) return memcmp(key1, key2, m_len) > 0;

     if (m_param)
-      return !cmp_varlen_keys(m_param->local_sortorder, m_param->use_hash, key1,
-                              key2);
+      return cmp_varlen_keys(m_param->local_sortorder, m_param->use_hash, key2,
+                             key1);
[16 Jun 2020 12:56] bai Kevin
The data of sample table can be found in the mysql-test/include/common-tests.inc, line49 - line1247.
[16 Jun 2020 14:56] MySQL Verification Team
HI Mr. Kevin,

Thank you very much for your bug report.

I managed to repeat it without any problems:

count(*)
1188
count(*)
1171

Verified as reported.
[16 Jun 2020 14:56] MySQL Verification Team
HI Mr. Kevin,

Thank you very much for your bug report.

I managed to repeat it without any problems:

count(*)
1188
count(*)
1171

Verified as reported.
[1 Jul 2020 14:28] Paul DuBois
Posted by developer:
 
Fixed in 8.0.22.

Merging during filesort operations could fail to remove duplicates
for queries that used DISTINCT.
[1 Jul 2020 14:41] MySQL Verification Team
Thank you, Paul ......