Bug #110752 | el7 aarch64 might return invalid for value cache line causing crash during boot | ||
---|---|---|---|
Submitted: | 20 Apr 2023 19:07 | Modified: | 23 Jun 2023 19:02 |
Reporter: | Will Saxon | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Compiling | Severity: | S3 (Non-critical) |
Version: | 8.0.33 | OS: | Linux |
Assigned to: | CPU Architecture: | ARM (Apple M1 Pro/AWS Graviton2) |
[20 Apr 2023 19:07]
Will Saxon
[21 Apr 2023 12:16]
MySQL Verification Team
Hi Mr. Saxon, Thank you for your bug report. We have already received several bug reports with similar stacktraces. We discovered that a problem is not in our server. It was always found that on of the two factors are causing this problem. First one is the setting of your docker environment. Especially the memory configuration. The second cause problem is with a cacheline. Run the following command: getconf -a | grep CACHE If you get any result like this, then it is the problem. LEVEL1_ICACHE_SIZE 0 LEVEL1_ICACHE_ASSOC 0 LEVEL1_ICACHE_LINESIZE 0 LEVEL1_DCACHE_SIZE 0 LEVEL1_DCACHE_ASSOC 0 LEVEL1_DCACHE_LINESIZE 0 LEVEL2_CACHE_SIZE 0 LEVEL2_CACHE_ASSOC 0 LEVEL2_CACHE_LINESIZE 0 LEVEL3_CACHE_SIZE 0 LEVEL3_CACHE_ASSOC 0 LEVEL3_CACHE_LINESIZE 0 LEVEL4_CACHE_SIZE 0 LEVEL4_CACHE_ASSOC 0 LEVEL4_CACHE_LINESIZE 0 In any of the two cases, this is not our bug.
[6 May 2023 0:49]
Will Saxon
Thank you for following up. We see the all-zeroes cache line, so perhaps that's the problem. I'm wondering if you can offer some guidance about how this is a problem in 8.0.33 but not 8.0.32? Our environment has not changed, and in particular the cache line appears to be all zeros on any CentOS 7 ARM host, both inside a container and when run directly on an EC2 instance. It seems like this would be a widespread issue affecting the majority of ARM users in the field. As for the Docker configuration, again, can you offer any guidance on what we should look for? Is there a KB article posted somewhere you can point me to? We see this with standard out-of-the-box Docker configuration, and again we're only seeing this on ARM whereas an otherwise-identical configuration on Intel works correctly. Thanks, -Will
[8 May 2023 12:43]
MySQL Verification Team
Hi Mr. Saxon, We have many customers and users running MySQL on ARM systems with Linux or with macOS. Your report is the only one about the problem with starting MySQL. Hence, this is some problem in your configuration. Not a bug.
[8 May 2023 14:48]
Will Saxon
I'm not arguing that it's a bug. I'm pointing out that with standard out of the box configuration, new installations work immediately with 8.0.32 and fail immediately on 8.0.33. You pointed out that in all cases where this has been reported, it was docker configuration or an empty cache line. I have replied that we are using standard/out of the box configuration for Docker, and that all the ARM systems I've looked at have a zero cache line. So, while I realize this is a bug database and not a support site, I was hoping that, given you've apparently seen this several times, you could point me to a KB article or documentation saying "you need to do XYZ to run MySQL 8.0.33 on ARM." Because, from my perspective, I changed nothing other than the version I am installing, and the new version doesn't work. If you are not prepared to do that, please go ahead and close this report (or I will, if that's my responsibility).
[8 May 2023 15:01]
MySQL Verification Team
Hi, Since we do not have any other reports on this problem with 8.0.33, we do not have any KB articles nor any documentation that we could provide for you .....
[8 May 2023 15:11]
Will Saxon
> We have already received several bug reports with similar stacktraces. We discovered that a problem is not in our server. Then, > Your report is the only one about the problem with starting MySQL. Then, > Since we do not have any other reports on this problem with 8.0.33, we do not have any KB articles nor any documentation that we could provide for you ..... Thank you for your efforts here. Please close the bug.
[5 Jun 2023 14:34]
Will Saxon
Hello, I have some more information on this that you might like to see. The Ubuntu project has a bug filed in their system where they see the same behavior on armhf with every release that packages 8.x: https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/2019203 It references this commit: https://github.com/mysql/mysql-server/commit/be8348a7c3e8510b998a063065b626a459631b32 It appears that reverting this commit, which was introduced by the MySQL project for the 8.0.33 release, fixes the problem. Maybe you'd like to reconsider whether this is a bug, or at least acknowledge that this is not an end-user issue? I went ahead and modified the severity, etc. assuming so.
[6 Jun 2023 12:23]
MySQL Verification Team
Hi Mr. Saxon, If that patch is committed, then it is a solution for your problem. Also, if you take a look at our "Download" pages, you will see that Ubuntu on ARM is not supported among the supported OS. Regarding macOS with M1 or M2, it runs on our test systems without any problem.
[6 Jun 2023 15:33]
Will Saxon
Please actually read the bug report. This is not about running on macOS. The issue is running MySQL 8.0.33 on Linux on ARM64. I mentioned Apple M1 because I was trying to run in a CentOS 7 Docker container on a Mac. I mentioned the Ubuntu issue because it illustrates the same problem and that group tracked down which commit by the MySQL project introduced the behavior. We/I are personally trying to run MySQL 8.0.33 on *CentOS 7 on ARM64*, which is on your supported OS list at https://www.mysql.com/support/supportedplatforms/database.html. We have seen the behavior noted by this bug report on the Apple M1 (via Docker) and AWS Graviton implementations of the ARM64 architecture. Please take 5 minutes and actually *try installing your own package on this platform* instead of repeatedly dismissing this report. Your group has been defensive and rude about this for no obvious reason. Do you want people to report problems they encounter with your software or not?
[7 Jun 2023 12:11]
MySQL Verification Team
Hi Mr. Saxon, Yes, we are interested in the bugs in our latest releases. However, we can not repeat your problems. Before we release any package , we test it by installing it on the OS for which it is built. We test our builds only. We test them on each operating system, for CPU's supported and on Linux Docker environment. That is why we have tools for the Docker that you can find on our download site. We have not encountered any problems with the installations on any OS, nor with Docker. Regarding your problems with installations on AWS, you will have to report it to the cloud provider. We do test our packages on OCI. Can't repeat.
[7 Jun 2023 12:20]
MySQL Verification Team
HI Mr. Saxon, We have one additional question for you. Have you tried installing our package on the CentOS 7 on ARM, as standalone, without Docker. It could be a problem with Docker installation. If you succeed installing it without Docker, please read our chapter 2.5.6 in the Manual.
[7 Jun 2023 12:22]
MySQL Verification Team
Hi, Another area which could cause the problem is Docker configuration.
[7 Jun 2023 12:26]
MySQL Verification Team
Hi, This looks more and more like Docker misconfiguration. The stacktrace looks like you have not configured memory utilisation in Docker.
[7 Jun 2023 16:29]
Will Saxon
Screen log of new instance demonstrating the issue.
Attachment: screenlog.0 (application/octet-stream, text), 483.58 KiB.
[7 Jun 2023 16:41]
Will Saxon
> Have you tried installing our package on the CentOS 7 on ARM, as standalone, without Docker. Yes, we have. I am sorry I wasn't more explicit about this in my comment from [6 Jun 15:33]. We realized this issue wasn't Docker-specific, which is why I removed Docker from this report's metadata the other day. If it helps, I just set up a new EC2 instance with the latest CentOS 7 AMI provided by the CentOS project, set up your release repository, installed mysql-community-server, and experienced the failure immediately when trying to then start mysqld. I attached a screen log capturing the output of this effort; I would recommend reading it using the `more` utility to handle the escape codes. Again, this works immediately using the default installed configuration with 8.0.32. We began seeing this issue with 8.0.33. We found that it was subsequently noticed and reported by other users to the Ubuntu project.
[7 Jun 2023 20:32]
Neil Hodges
I was digging around on the EC2 instance I've been testing with (CentOS 7.9.2009 ARM64, stock glibc 2.17, and no Docker) and discovered that the cache line size is set in sysfs: # grep . /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size:64 /sys/devices/system/cpu/cpu0/cache/index1/coherency_line_size:64 /sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size:64 /sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size:64 But getconf is unable to get it: # getconf -a | grep -i 'cache.*linesize' LEVEL1_ICACHE_LINESIZE 0 LEVEL1_DCACHE_LINESIZE 0 LEVEL2_CACHE_LINESIZE 0 LEVEL3_CACHE_LINESIZE 0 LEVEL4_CACHE_LINESIZE 0 Is there a reason why this block ( https://github.com/mysql/mysql-server/blob/ea7087d885006918ad54458e7aad215b1650312c/sql/me... ) is limited to S/390? It seems like it could solve this problem on any problem where the libc is unable to pull the cache line size for whatever reason. And if the libc's sysconf() is able to get the cache line size, it would be skipped over and the block would have no impact. Separately, if I pull down the official mysql-community-server image down (Oracle Linux 8.7 with stock glibc 2.28) and create a Docker container, it both has the sysfs entry populated: # grep . /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size:64 /sys/devices/system/cpu/cpu0/cache/index1/coherency_line_size:64 /sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size:64 /sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size:64 And getconf is able to get at it: # getconf -a | grep -i 'cache.*linesize' LEVEL1_ICACHE_LINESIZE 64 LEVEL1_DCACHE_LINESIZE 64 LEVEL2_CACHE_LINESIZE 0 LEVEL3_CACHE_LINESIZE 0 LEVEL4_CACHE_LINESIZE 0 To be clear, we are unwilling to modify our CentOS 7 OSes with glibc 2.28. That is not a reasonable path to take.
[8 Jun 2023 12:39]
MySQL Verification Team
Hi All, Thank you for all of your comments. Especially for the last comment from Mr. Hodges. This does not seem to be a bug In MySQL code, but some problem in Operating System or the usage of containers. We must also inform you that we are not allowed to test reported bugs in any container, due to the many valid reasons, which we are not allowed to reveal publicly. So far, this truly does not seem to be MySQL bug.
[8 Jun 2023 13:39]
Neil Hodges
As I said, the problem occurs when we are NOT using a container. Please read my last message more carefully. > I was digging around on the EC2 instance I've been testing with (CentOS 7.9.2009 ARM64, stock glibc 2.17, AND NO DOCKER) and discovered that the cache line size is set in sysfs: > > # grep . /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size > /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size:64 > /sys/devices/system/cpu/cpu0/cache/index1/coherency_line_size:64 > /sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size:64 > /sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size:64 > > But getconf is unable to get it: > > # getconf -a | grep -i 'cache.*linesize' > LEVEL1_ICACHE_LINESIZE 0 > LEVEL1_DCACHE_LINESIZE 0 > LEVEL2_CACHE_LINESIZE 0 > LEVEL3_CACHE_LINESIZE 0 > LEVEL4_CACHE_LINESIZE 0 This quoted case is when MySQL crashes at startup. That's proof enough that containers have absolutely nothing to do with this. Is that clear?
[8 Jun 2023 13:55]
MySQL Verification Team
Hi, Yes, you are correct. This is NOT a problem with containers. This is a problem with the Operating System, most precisely with the glibc installed. With Oracle Linux and proper glibc, you get the proper results with `getconf`. We hope we were quite clear this time ......
[8 Jun 2023 15:00]
Will Saxon
Perhaps you should consider removing CentOS 7 and ARM64 from your list of supported platforms here: https://www.mysql.com/support/supportedplatforms/database.html Since you clearly do not support MySQL on this platform.
[8 Jun 2023 15:04]
MySQL Verification Team
Thanks. We agree with your conclusion.
[8 Jun 2023 16:29]
Terje Røsten
Packages were built and verified on Ampere A1 hardware: https://www.oracle.com/cloud/compute/arm/ You should be able to use "Oracle Cloud Free Tier" to get access to such platform.
[8 Jun 2023 16:38]
Terje Røsten
Output from such platform: $ getconf -a | grep -i 'cache.*linesize' LEVEL1_ICACHE_LINESIZE 64 LEVEL1_DCACHE_LINESIZE 64 LEVEL2_CACHE_LINESIZE 0 LEVEL3_CACHE_LINESIZE 0 LEVEL4_CACHE_LINESIZE 0
[8 Jun 2023 16:48]
Terje Røsten
While the /sys directory structure is different: $ tree /sys/devices/system/cpu/cpu0/cache/index0 /sys/devices/system/cpu/cpu0/cache/index0 ├── level ├── shared_cpu_list ├── shared_cpu_map ├── type └── uevent
[8 Jun 2023 16:52]
Terje Røsten
The s390 fix is https://bugs.mysql.com/bug.php?id=107081 so this bug can be seen as an extention of that fix to aarch64 platform.
[9 Jun 2023 11:56]
MySQL Verification Team
Thank you, Terje .....
[23 Jun 2023 16:17]
Philip Olson
Posted by developer: Fixed as of the upcoming MySQL Server 8.0.35 / 8.2.0 releases, and here's the proposed changelog entry from the documentation team: On EL7 aarch64-based platforms, fixed an issue related to how fetching the CPU cache line size returned 0 that caused the MySQL server to unexpectedly halt. Thank you for the bug report and staying persistent. Note: this fix may make it into an earlier release, depending on various circumstances.
[23 Jun 2023 19:02]
Will Saxon
Thank you!
[26 Jun 2023 12:23]
MySQL Verification Team
Thanks, Philip !!!!