| Bug #85819 | Optimize AARCH64 CRC32c implementation | ||
|---|---|---|---|
| Submitted: | 6 Apr 2017 3:04 | Modified: | 7 Apr 2017 8:22 |
| Reporter: | Yuqi Gu (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S5 (Performance) |
| Version: | 5.7 | OS: | Linux |
| Assigned to: | CPU Architecture: | ARM | |
| Tags: | Contribution | ||
[6 Apr 2017 5:41]
Yuqi Gu
Contribution submitted via Github - Bug #85819 Add AArch64 optimized crc32c implementation #136n (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug85819-5.7.txt (text/plain), 5.40 KiB.
[6 Apr 2017 5:59]
Yuqi Gu
ARMv8 defines a set of optional CRC32/CRC32C instructions. The CRC32 function for AArch64 that uses these instructions will optimize the performance rather than that uses table-based lookup.
[6 Apr 2017 6:13]
Alexey Kopytov
Duplicate of bug #79144 ?
[7 Apr 2017 5:22]
Yuqi Gu
ARMv8 defines PMULL crypto instruction. The new patch optimizes crc32c calculate with the instruction when available rather than (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug85819-02-5.7.txt (text/plain), 8.38 KiB.
[7 Apr 2017 5:22]
Yuqi Gu
I updated the crc32 optimization code. ARMv8 defines PMULL crypto instruction. The new patch optimizes crc32c calculate with the PMULLinstruction when available rather than original linear crc32 instructions. The result of benchmark: Platform \ Case (millisecond) | Software CRC | AArch64 CRC Intrinsics | AArch64 Crypto instruction AMD seattle (Softiron) |1101.783 |200.535 |114.509 Cavium ThunderX |1504.497 |479.690 |286.274 Hisilicon Taishan( Huawei) |1035.202 |232.984 |115.580 It shows that the performance for CRC32 of innodb on AArch64 is better than linear crc32 instruction.
[7 Apr 2017 5:31]
Yuqi Gu
GH PR: https://github.com/mysql/mysql-server/pull/136
[7 Apr 2017 5:34]
Yuqi Gu
ARMv8 defines PMULL crypto instruction.This patch optimizes crc32c calculate with the instruction.
[7 Apr 2017 8:22]
MySQL Verification Team
Hello Yuqi Gu, Thank you for the report and contribution. Thanks, Umesh
[28 Apr 2017 16:10]
OCA Admin
Contribution submitted via Github - Bug #85819 Add AArch64 optimized crc32c implementation (*) Contribution by Yuqi Gu (Github guyuqi, mysql-server/pull/136#issuecomment-292444031): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_114523745.txt (text/plain), 11.05 KiB.

Description: ARMv8 defines a set of optional CRC32/CRC32C instructions. The CRC32 function for AArch64 that uses these instructions will optimized the performance rather than uses table-based lookup. How to repeat: The benchmark App source: /*********************************************************************** * */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <stdint.h> #include <sys/time.h> #ifdef ARMV8_CRC32 extern bool ut_crc32_sse2_enabled; extern uint32_t ut_crc32_aarch64(const uint8_t* buf, uint64_t len); #else extern uint32_t ut_crc32_sw(const uint8_t* buf, uint64_t len); extern void ut_crc32_slice8_table_init(void); #endif long int GetTickCount() { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * 1000000 + tv.tv_usec; } int main() { static const uint64_t kSize = 1024 * 1024 + 29; uint8_t* buf = (uint8_t *)malloc(sizeof(uint8_t) * kSize); uint32_t i; #ifdef ARMV8_CRC32 ut_crc32_sse2_enabled = true; #else ut_crc32_slice8_table_init(); #endif srand(0); for (i = 0; i < kSize; i++) { buf[i] = (uint8_t)(rand() % 256u); } uint32_t kLoop = 1024; long int start, end; uint32_t crc = 0; start = GetTickCount(); for (i = 0; i < kLoop; i++) { #ifdef ARMV8_CRC32 crc = ut_crc32_aarch64(buf, kSize); #else crc = ut_crc32_sw(buf, kSize); #endif } end = GetTickCount(); if (kSize < 20) { for (i = 0; i < kSize; i++) { printf("%3u,", (uint32_t)buf[i]); } printf("\n"); } printf("crc result = %x, time cost per loop:%f ms\n", crc, (double)(end - start) / kLoop); free(buf); return 0; } /* * ****************************************************************************/ Build benchmark App on AArch64: /*********************************************************************** * */ linux@wls-ci-arm:~/mysql-server$ git diff extra/CMakeLists.txt diff --git a/extra/CMakeLists.txt b/extra/CMakeLists.txt index 3adf988..4fa1004 100644 --- a/extra/CMakeLists.txt +++ b/extra/CMakeLists.txt @@ -141,6 +141,11 @@ IF(WITH_INNOBASE_STORAGE_ENGINE) MYSQL_ADD_EXECUTABLE(innochecksum innochecksum.cc ${INNOBASE_SOURCES}) TARGET_LINK_LIBRARIES(innochecksum mysys mysys_ssl ${LZ4_LIBRARY}) ADD_DEPENDENCIES(innochecksum GenError) + + SET(CRC_TEST_SOURCES + ../storage/innobase/ut/ut0crc32.cc + ) + MYSQL_ADD_EXECUTABLE(mycrctest mycrctest.cc ${CRC_TEST_SOURCES}) ENDIF() /* * **************************************************************************/ AArch64 Platform: Platform \ Case (millisecond) Software CRC AArch64 CRC Intrinsics AMD seattle (Softiron) 1101.783 200.535 Cavium ThunderX 1504.497 479.690 Hisilicon Taishan(Huawei) 1035.202 232.984