MySQL Bugs: #101278: Field_string::cmp suboptimal string comparison

Bug #101278	Field_string::cmp suboptimal string comparison
Submitted:	22 Oct 2020 17:51	Modified:	23 Oct 2020 7:17
Reporter:	Georgy Kirichenko	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	Contribution

Description:
Before Field_string::cmp actually compares two strings it decodes both strings length in bytes using my_charpos what could be relatively expensive in case of variable-length encodings like UTF8. However, in case if both string have a difference in their heads, it is useless to decode tails of the strings.

For instance if we have two strings like 'aaaaaaa..a'(100 characters in length) and 'bbbb..b'(100 characters in length) then comparison could stop immediately after comparing the first 'a' and the first 'b' without decoding consequence 99*2 characters.

My proposal is to virtually split string into small 8-character chunks and compare chunk by chunk until first difference found. According to my benchmarking of query like `select count(distinct c) from sbtest1;` using standard sysbench dataset there is up to 4x speedup.

How to repeat:
Initialize MySQL with sysbench standard dataset and then execute queries like 
`select count(distinct c) from sbtest1;` and compare results for patched and unpatched versions.

Suggested fix:
Contribution is attached

Hello Georgy Kirichenko,

Thank you for the report and contribution.

regards,
Umesh