Bug #98592 distinct(hex(weight_string(ch))) cut result off
Submitted: 14 Feb 2020 3:37 Modified: 3 Apr 2020 12:46
Reporter: Fengchun Hua Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.7,8.0, 5.6.47, 5.7.29, 8.0.19 OS:Any
Assigned to: CPU Architecture:Any

[14 Feb 2020 3:37] Fengchun Hua
Description:
Here is the case:

create table t1 (ch varchar(1),name varchar(64))character set latin2 collate latin2_czech_cs;

insert into t1 values (0x4F,'LATIN CAPITAL LETTER O');

select hex(weight_string(ch)), name from t1;

output:
+------------------------+------------------------+
| hex(weight_string(ch)) | name                   |
+------------------------+------------------------+
| 140127014D014D00       | LATIN CAPITAL LETTER O |
+------------------------+------------------------+

but when you add distinct around hex fucntion:

select distinct(hex(weight_string(ch))) w, name from t1;

output:
+------+------------------------+
| w    | name                   |
+------+------------------------+
| 14   | LATIN CAPITAL LETTER O |
+------+------------------------+

field w is just prefix of the real result.

That is because when you add distinct, a tmp table will create, this function's result will save into result_field, but the result_field's length is shorter than expected.

How to repeat:
see Description 

Suggested fix:
in item_strfunc.cc
function bool Item_func_weight_string::resolve_type(THD *);

change this line:
set_data_type_string(
      field ? field->pack_length()
            : result_length ? result_length
                            : cs->mbmaxlen * max(args[0]->max_char_length(),
                                                 num_codepoints));

to:

  set_data_type_string(
          field ? field->pack_length()
                : result_length
                  ? result_length
                  : (uint32)cs->coll->strnxfrmlen(
                          cs, cs->mbmaxlen *
                              max<size_t>(args[0]->max_char_length(), num_codepoints)));

Do you forget to warp strnxfrmlen function around?
this max_length will always equals to args[0] 's max_char_length * cs->mbmaxlen, which is shorter than result converted by weight_string
[14 Feb 2020 4:35] MySQL Verification Team
Hello Fengchun Hua,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[3 Apr 2020 12:46] Jon Stephens
Documented fix in the MySQL 8.0.21 changelog as follows:

    SELECT DISTINCT( HEX( WEIGHT_STRING(varchar_column) ) ) returned
    a truncated result.

Closed.