Bug #110752 el7 aarch64 might return invalid for value cache line causing crash during boot
Submitted: 20 Apr 2023 19:07 Modified: 23 Jun 2023 19:02
Reporter: Will Saxon Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Compiling Severity:S3 (Non-critical)
Version:8.0.33 OS:Linux
Assigned to: CPU Architecture:ARM (Apple M1 Pro/AWS Graviton2)

[20 Apr 2023 19:07] Will Saxon
Description:
The mysqld process cannot be started on new installations of mysql-community-server 8.0.33-1.el7.

We see errors in mysqld.log like:

2023-04-20T18:45:48.249435Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.33) starting as process 291
2023-04-20T18:45:48Z UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=9ae612eb139943fd0ad78ded1a800776626be701
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x44) [0x1e07324]
/usr/sbin/mysqld(print_fatal_signal(int)+0x33c) [0xf121fc]
/usr/sbin/mysqld(handle_fatal_signal+0x98) [0xf122d8]
[0xffff7f94a790]
/usr/sbin/mysqld(memory::Aligned_atomic<long>::Aligned_atomic()+0x70) [0x1ac75b0]
/usr/sbin/mysqld(Delegate::Delegate(unsigned int)+0x5c) [0x1ac785c]
/usr/sbin/mysqld(delegates_init()+0x40) [0x1ac79c0]
/usr/sbin/mysqld() [0xce3b60]
/usr/sbin/mysqld(mysqld_main(int, char**)+0x1d8c) [0xcea76c]
/lib64/libc.so.6(__libc_start_main+0xf0) [0xffff7f1a8724]
/usr/sbin/mysqld() [0xccfc40]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

How to repeat:
Simplest possible reproduction:

1. docker run -it --rm centos:7
2. rpm -i https://dev.mysql.com/get/mysql80-community-release-el7-7.noarch.rpm
3. yum install mysql-community-server sudo
4. sudo -u mysql /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

I've tested the above on:

- macOS Ventura, Apple M1 w/ latest Docker Desktop for Mac
- CentOS 7.x, AWS EC2 instance using a Graviton 2 w/ latest Docker

Suggested fix:
For us, pinning to 8.0.32-1.el7 works, except that version is already gone from some mirrors.
[21 Apr 2023 12:16] MySQL Verification Team
Hi Mr. Saxon,

Thank you for your bug report.

We have already received several bug reports with similar stacktraces. We discovered that a problem is not in our server.

It was always found that on of the two factors are causing this problem.

First one is the setting of your docker environment. Especially the memory configuration.

The second cause  problem is with a cacheline. Run the following command:

getconf -a | grep CACHE

If you get any result like this, then it is the problem.

LEVEL1_ICACHE_SIZE                 0
LEVEL1_ICACHE_ASSOC                0
LEVEL1_ICACHE_LINESIZE             0
LEVEL1_DCACHE_SIZE                 0
LEVEL1_DCACHE_ASSOC                0
LEVEL1_DCACHE_LINESIZE             0
LEVEL2_CACHE_SIZE                  0
LEVEL2_CACHE_ASSOC                 0
LEVEL2_CACHE_LINESIZE              0
LEVEL3_CACHE_SIZE                  0
LEVEL3_CACHE_ASSOC                 0
LEVEL3_CACHE_LINESIZE              0
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0

In any of the two cases, this is not our bug.
[6 May 2023 0:49] Will Saxon
Thank you for following up.

We see the all-zeroes cache line, so perhaps that's the problem. I'm wondering if you can offer some guidance about how this is a problem in 8.0.33 but not 8.0.32? Our environment has not changed, and in particular the cache line appears to be all zeros on any CentOS 7 ARM host, both inside a container and when run directly on an EC2 instance. It seems like this would be a widespread issue affecting the majority of ARM users in the field.

As for the Docker configuration, again, can you offer any guidance on what we should look for? Is there a KB article posted somewhere you can point me to? We see this with standard out-of-the-box Docker configuration, and again we're only seeing this on ARM whereas an otherwise-identical configuration on Intel works correctly.

Thanks,

-Will
[8 May 2023 12:43] MySQL Verification Team
Hi Mr. Saxon,

We have many customers and users running MySQL on ARM systems with Linux or with macOS.

Your report is the only one about the problem with starting MySQL.

Hence, this is some problem in your configuration.

Not a bug.
[8 May 2023 14:48] Will Saxon
I'm not arguing that it's a bug. I'm pointing out that with standard out of the box configuration, new installations work immediately with 8.0.32 and fail immediately on 8.0.33.

You pointed out that in all cases where this has been reported, it was docker configuration or an empty cache line. I have replied that we are using standard/out of the box configuration for Docker, and that all the ARM systems I've looked at have a zero cache line.

So, while I realize this is a bug database and not a support site, I was hoping that, given you've apparently seen this several times, you could point me to a KB article or documentation saying "you need to do XYZ to run MySQL 8.0.33 on ARM." Because, from my perspective, I changed nothing other than the version I am installing, and the new version doesn't work.

If you are not prepared to do that, please go ahead and close this report (or I will, if that's my responsibility).
[8 May 2023 15:01] MySQL Verification Team
Hi,

Since we do not have any other reports on this problem with 8.0.33, we do not have any KB articles nor any documentation that we could provide for you .....
[8 May 2023 15:11] Will Saxon
> We have already received several bug reports with similar stacktraces. We discovered that a problem is not in our server.

Then,

> Your report is the only one about the problem with starting MySQL.

Then,

> Since we do not have any other reports on this problem with 8.0.33, we do not have any KB articles nor any documentation that we could provide for you .....

Thank you for your efforts here. Please close the bug.
[5 Jun 2023 14:34] Will Saxon
Hello,

I have some more information on this that you might like to see.

The Ubuntu project has a bug filed in their system where they see the same behavior on armhf with every release that packages 8.x:

https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/2019203

It references this commit:

https://github.com/mysql/mysql-server/commit/be8348a7c3e8510b998a063065b626a459631b32

It appears that reverting this commit, which was introduced by the MySQL project for the 8.0.33 release, fixes the problem.

Maybe you'd like to reconsider whether this is a bug, or at least acknowledge that this is not an end-user issue? I went ahead and modified the severity, etc. assuming so.
[6 Jun 2023 12:23] MySQL Verification Team
Hi Mr. Saxon,

If that patch is committed, then it is a solution for your problem.

Also, if you take a look at our "Download" pages, you will see that Ubuntu on ARM is not supported among the supported OS.

Regarding macOS with M1 or M2, it runs on our test systems without any problem.
[6 Jun 2023 15:33] Will Saxon
Please actually read the bug report. This is not about running on macOS.

The issue is running MySQL 8.0.33 on Linux on ARM64. I mentioned Apple M1 because I was trying to run in a CentOS 7 Docker container on a Mac. I mentioned the Ubuntu issue because it illustrates the same problem and that group tracked down which commit by the MySQL project introduced the behavior.

We/I are personally trying to run MySQL 8.0.33 on *CentOS 7 on ARM64*, which is on your supported OS list at https://www.mysql.com/support/supportedplatforms/database.html. We have seen the behavior noted by this bug report on the Apple M1 (via Docker) and AWS Graviton implementations of the ARM64 architecture.

Please take 5 minutes and actually *try installing your own package on this platform* instead of repeatedly dismissing this report. Your group has been defensive and rude about this for no obvious reason. Do you want people to report problems they encounter with your software or not?
[7 Jun 2023 12:11] MySQL Verification Team
Hi Mr. Saxon,

Yes, we are interested in the bugs in our latest releases.

However, we can not repeat your problems.

Before we release any package , we test it by installing it on the OS for which it is built. We test our builds only. We test them on each operating system, for CPU's supported and on Linux Docker environment. That is why we have tools for the Docker that you can find on our download site.

We have not encountered any problems with the installations on any OS, nor with Docker.

Regarding your problems with installations on AWS, you will have to report it to the cloud provider. We do test our packages on OCI.

Can't repeat.
[7 Jun 2023 12:20] MySQL Verification Team
HI Mr. Saxon,

We have one additional question for you.

Have you tried installing our package on the CentOS 7 on ARM, as standalone, without Docker.

It could be a problem with Docker installation. If you succeed installing it without Docker, please read our chapter 2.5.6 in the Manual.
[7 Jun 2023 12:22] MySQL Verification Team
Hi,

Another area which could cause the problem is Docker configuration.
[7 Jun 2023 12:26] MySQL Verification Team
Hi,

This looks more and more like Docker misconfiguration. 

The stacktrace looks like you have not configured memory utilisation in Docker.
[7 Jun 2023 16:29] Will Saxon
Screen log of new instance demonstrating the issue.

Attachment: screenlog.0 (application/octet-stream, text), 483.58 KiB.

[7 Jun 2023 16:41] Will Saxon
> Have you tried installing our package on the CentOS 7 on ARM, as standalone, without Docker.

Yes, we have. I am sorry I wasn't more explicit about this in my comment from [6 Jun 15:33]. We realized this issue wasn't Docker-specific, which is why I removed Docker from this report's metadata the other day.

If it helps, I just set up a new EC2 instance with the latest CentOS 7 AMI provided by the CentOS project, set up your release repository, installed mysql-community-server, and experienced the failure immediately when trying to then start mysqld. I attached a screen log capturing the output of this effort; I would recommend reading it using the `more` utility to handle the escape codes.

Again, this works immediately using the default installed configuration with 8.0.32. We began seeing this issue with 8.0.33. We found that it was subsequently noticed and reported by other users to the Ubuntu project.
[7 Jun 2023 20:32] Neil Hodges
I was digging around on the EC2 instance I've been testing with (CentOS 7.9.2009 ARM64, stock glibc 2.17, and no Docker) and discovered that the cache line size is set in sysfs:

# grep . /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size
/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size:64
/sys/devices/system/cpu/cpu0/cache/index1/coherency_line_size:64
/sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size:64
/sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size:64

But getconf is unable to get it:

# getconf -a | grep -i 'cache.*linesize'
LEVEL1_ICACHE_LINESIZE             0
LEVEL1_DCACHE_LINESIZE             0
LEVEL2_CACHE_LINESIZE              0
LEVEL3_CACHE_LINESIZE              0
LEVEL4_CACHE_LINESIZE              0

Is there a reason why this block ( https://github.com/mysql/mysql-server/blob/ea7087d885006918ad54458e7aad215b1650312c/sql/me... ) is limited to S/390?  It seems like it could solve this problem on any problem where the libc is unable to pull the cache line size for whatever reason.  And if the libc's sysconf() is able to get the cache line size, it would be skipped over and the block would have no impact.

Separately, if I pull down the official mysql-community-server image down (Oracle Linux 8.7 with stock glibc 2.28) and create a Docker container, it both has the sysfs entry populated:

# grep . /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size
/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size:64
/sys/devices/system/cpu/cpu0/cache/index1/coherency_line_size:64
/sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size:64
/sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size:64

And getconf is able to get at it:

# getconf -a | grep -i 'cache.*linesize'
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_LINESIZE              0
LEVEL3_CACHE_LINESIZE              0
LEVEL4_CACHE_LINESIZE              0

To be clear, we are unwilling to modify our CentOS 7 OSes with glibc 2.28.  That is not a reasonable path to take.
[8 Jun 2023 12:39] MySQL Verification Team
Hi All,

Thank you for all of your comments. Especially for the last comment from Mr. Hodges.

This does not seem to be a bug In MySQL code, but some problem in Operating System or the usage of containers.

We must also inform you that we are not allowed to test reported bugs in any container, due to the many  valid reasons, which we are not allowed to reveal publicly.

So far, this truly does not seem to be MySQL bug.
[8 Jun 2023 13:39] Neil Hodges
As I said, the problem occurs when we are NOT using a container.  Please read my last message more carefully.

> I was digging around on the EC2 instance I've been testing with (CentOS 7.9.2009 ARM64, stock glibc 2.17, AND NO DOCKER) and discovered that the cache line size is set in sysfs:
> 
> # grep . /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size
> /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size:64
> /sys/devices/system/cpu/cpu0/cache/index1/coherency_line_size:64
> /sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size:64
> /sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size:64
> 
> But getconf is unable to get it:
> 
> # getconf -a | grep -i 'cache.*linesize'
> LEVEL1_ICACHE_LINESIZE             0
> LEVEL1_DCACHE_LINESIZE             0
> LEVEL2_CACHE_LINESIZE              0
> LEVEL3_CACHE_LINESIZE              0
> LEVEL4_CACHE_LINESIZE              0

This quoted case is when MySQL crashes at startup.

That's proof enough that containers have absolutely nothing to do with this.  Is that clear?
[8 Jun 2023 13:55] MySQL Verification Team
Hi,

Yes, you are correct.

This is NOT a problem with containers.

This is a problem with the Operating System, most precisely with the glibc installed. With Oracle Linux and proper glibc, you get the proper results with `getconf`.

We hope we were quite clear this time ......
[8 Jun 2023 15:00] Will Saxon
Perhaps you should consider removing CentOS 7 and ARM64 from your list of supported platforms here: https://www.mysql.com/support/supportedplatforms/database.html

Since you clearly do not support MySQL on this platform.
[8 Jun 2023 15:04] MySQL Verification Team
Thanks.

We agree with your conclusion.
[8 Jun 2023 16:29] Terje Røsten
Packages were built and verified on Ampere A1 hardware:

 https://www.oracle.com/cloud/compute/arm/

You should be able to use "Oracle Cloud Free Tier" to get access to such platform.
[8 Jun 2023 16:38] Terje Røsten
Output from such platform:

$ getconf -a | grep -i 'cache.*linesize'
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_LINESIZE              0
LEVEL3_CACHE_LINESIZE              0
LEVEL4_CACHE_LINESIZE              0
[8 Jun 2023 16:48] Terje Røsten
While the /sys directory structure is different:

$ tree /sys/devices/system/cpu/cpu0/cache/index0
/sys/devices/system/cpu/cpu0/cache/index0
├── level
├── shared_cpu_list
├── shared_cpu_map
├── type
└── uevent
[8 Jun 2023 16:52] Terje Røsten
The s390 fix is

 https://bugs.mysql.com/bug.php?id=107081

so this bug can be seen as an extention of that fix to aarch64 platform.
[9 Jun 2023 11:56] MySQL Verification Team
Thank you, Terje .....
[23 Jun 2023 16:17] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Server 8.0.35 / 8.2.0 releases, and here's the proposed changelog entry from the documentation team:

On EL7 aarch64-based platforms, fixed an issue related to how fetching
the CPU cache line size returned 0 that caused the MySQL server to
unexpectedly halt.

Thank you for the bug report and staying persistent.

Note: this fix may make it into an earlier release, depending on various circumstances.
[23 Jun 2023 19:02] Will Saxon
Thank you!
[26 Jun 2023 12:23] MySQL Verification Team
Thanks, Philip !!!!