Bug #110444 Optimize FindLowestBitSet for Arm64
Submitted: 21 Mar 2023 3:02 Modified: 23 Oct 2024 20:15
Reporter: Cai Yibo (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0 OS:Linux
Assigned to: CPU Architecture:ARM (aarch64)
Tags: contributions

[21 Mar 2023 3:02] Cai Yibo
Description:
FindLowestBitSet is optimized on x86 by taking advantage of the fact
that the input cannot be zero. [1]
We can implements similar refinement on Arm64 to reduce assembly instructions from 4 to 2. [2]
Microbenchmark shows 30% improvement on Graviton-3 (Neoverse-V1). [3]

[1] https://github.com/mysql/mysql-server/blob/8.0/sql/join_optimizer/bit_utils.h#L70
[2] https://godbolt.org/z/9GWq79a69
[3] https://github.com/cyb70289/mytests/blob/master/bench-lsb.cc

How to repeat:
N/A
[21 Mar 2023 3:03] Cai Yibo
patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-Optimize-FindLowestBitSet-for-Arm64.patch (text/x-patch), 1.21 KiB.

[21 Mar 2023 6:30] MySQL Verification Team
Hello Cai Yibo,

Thank you for the report and contribution.

regards,
Umesh
[23 Oct 2024 20:12] Knut Anders Hatlen
Posted by developer:
 
I don't think this change is needed any more, after this fix:

    Bug#35813111: Use C++ standard library for more bit operations
    [find lowest/highest bit, noclose]
    
    Some of the functions in bit_utils.h used various platform-dependent
    intrinsics to find the most significant or least significant bit that
    was set. Use std::countl_zero() and std::countr_zero() instead.
    
    Change-Id: Icc5555a54777ff54e34a94573466b11488856abc

Recent versions of clang and GCC appear to generate the same instructions for the current implementation of the function as both the hand-coded assembly and __builtin_ctzll. See the different versions in Compiler Explorer: https://godbolt.org/z/ocWbajaEd