Bug #107658 Shared_lock triggers Invalid write on Centos+ARM
Submitted: 26 Jun 2022 3:19 Modified: 5 Jul 2022 12:37
Reporter: Brian Yue (OCA) Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.25 OS:CentOS (Centos7.6)
Assigned to: CPU Architecture:ARM (ARM: aarch64 (Kunpeng maybe))

[26 Jun 2022 3:19] Brian Yue
Description:
Hello,
  Recently we are doing some tests using MySQL-8.0.25 on Centos7.6 with ARM CPU (aarch64), and with the help of valgrind memcheck we find that there are serveral Invalid write triggered by Shared_lock:

==400279== Invalid write of size 8
==400279==    at 0x4CB003C: memory::Aligned_atomic<long>::Aligned_atomic() (aligned_atomic.h:282)
==400279==    by 0x4CAF717: memory::Aligned_atomic<long>::Aligned_atomic(long) (aligned_atomic.h:286)
==400279==    by 0x4CAF6A7: lock::Shared_spin_lock::Shared_spin_lock() (shared_spin_lock.h:170)
==400279==    by 0x4CAA377: Delegate::Delegate(unsigned int) (rpl_handler.cc:100)
==400279==    by 0x4CAF3EB: Trans_delegate::Trans_delegate() (rpl_handler.h:285)
==400279==    by 0x4CAB537: delegates_init() (rpl_handler.cc:378)
==400279==    by 0x381BA03: init_server_components() (mysqld.cc:6102)
==400279==    by 0x3821C37: mysqld_main(int, char**) (mysqld.cc:7601)
==400279==    by 0x380E66B: main (main.cc:25)
==400279==  Address 0xd2c4d80 is 0 bytes after a block of size 0 alloc'd
==400279==    at 0xC454ADC: operator new[](unsigned long) (vg_replace_malloc.c:423)
==400279==    by 0x4CB0013: memory::Aligned_atomic<long>::Aligned_atomic() (aligned_atomic.h:282)
==400279==    by 0x4CAF717: memory::Aligned_atomic<long>::Aligned_atomic(long) (aligned_atomic.h:286)
==400279==    by 0x4CAF6A7: lock::Shared_spin_lock::Shared_spin_lock() (shared_spin_lock.h:170)
==400279==    by 0x4CAA377: Delegate::Delegate(unsigned int) (rpl_handler.cc:100)
==400279==    by 0x4CAF3EB: Trans_delegate::Trans_delegate() (rpl_handler.h:285)
==400279==    by 0x4CAB537: delegates_init() (rpl_handler.cc:378)
==400279==    by 0x381BA03: init_server_components() (mysqld.cc:6102)
==400279==    by 0x3821C37: mysqld_main(int, char**) (mysqld.cc:7601)
==400279==    by 0x380E66B: main (main.cc:25)

and each time we start the MySQL server with valgrind, we find several Invalid writes.
  
Reported by GoldenDB team.

How to repeat:
This problem is not existed on Redhat7.4+x86,so it should be platform based.
If you want to repeat this problem, try the same platform: Centos7.6+ARM.
[28 Jun 2022 12:24] MySQL Verification Team
Hi Mr. Yue,

Thank you very much for your bug report.

However, your tests were done on an old MySQL release. Meanwhile, we have fixed a lot of memory problems and you could be repeating some reports that are resolved.

If you manage to repeat the same issues with 8.0.29, or even better 8.0.30 when it comes out, please write to us again.

However, do send us a full description of how have your run the tests , which options have you used and everything to the last detail, so that we do not have any problems in repeating your results.

Thanks in advance.
[5 Jul 2022 11:20] Brian Yue
Hello,
  I have made the same test on MySQL8.0.29 today, and found that this problem is still existed.

  The procedure of my test is followed:

[yxx@dbs34 yxx]$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (AltArch)
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$ lscpu
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    1
Core(s) per socket:    64
Socket(s):             2
NUMA node(s):          4
Model:                 0
BogoMIPS:              200.00
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              65536K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
NUMA node2 CPU(s):     64-95
NUMA node3 CPU(s):     96-127
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$ ./bin/mysqld --version
/goldendb/yxx/bin/mysqld  Ver 8.0.29 for Linux on aarch64 (Source distribution)
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$ cat ~/etc/my.cnf
# Don't change configuration items to other sections
[general]
instance_num = 1

# generic configuration options
[mysqld]
log-error = /goldendb/yxx/log/mysqld1.log
skip-external-locking
transaction-isolation = READ-COMMITTED
lower_case_table_names=1
performance_schema = ON
innodb_data_file_path = ibdata1:500M:autoextend
innodb-file-per-table
innodb_buffer_pool_size = 600M
innodb_buffer_pool_instances = 1
innodb_buffer_pool_dump_at_shutdown = 0
innodb_buffer_pool_dump_now = 0
innodb_buffer_pool_load_at_startup = 0
innodb_buffer_pool_load_now = 0
innodb_log_files_in_group = 2
innodb_log_file_size = 8G
default_authentication_plugin = mysql_native_password
port=6302
socket=/goldendb/yxx/bin/mysql1.sock
bind_address=127.0.0.1
datadir=/goldendb/yxx/data/data
pid-file=/goldendb/yxx/bin/mysqld1.pid
innodb_data_home_dir=/goldendb/yxx/data/data
innodb_log_group_home_dir=/goldendb/yxx/data/redo
innodb_undo_directory=/goldendb/yxx/data/undo
server-id=16782862
basedir=/goldendb/yxx
log-bin=../binlog/mysql-bin
relay-log=../relaylog/relay-bin
tmpdir=/goldendb/yxx/data/tmp
secure-file-priv =
gtid_mode = on
enforce_gtid_consistency = on
lock_wait_timeout = 5

[client]
port=6302
socket=/goldendb/yxx/bin/mysql1.sock
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$ valgrind --tool=memcheck  --leak-check=full --track-origins=yes   --log-file=valgrind.log mysqld --defaults-file='~/etc/my.cnf' &
[1] 107017
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$ sleep 10
[yxx@dbs34 yxx]$
[yxx@dbs34 yxx]$ grep -A21 "Invalid" valgrind.log | head -20
==107017== Invalid write of size 8
==107017==    at 0x199DD18: Aligned_atomic (aligned_atomic.h:283)
==107017==    by 0x199DD18: Aligned_atomic (aligned_atomic.h:288)
==107017==    by 0x199DD18: Shared_spin_lock (shared_spin_lock.h:170)
==107017==    by 0x199DD18: Delegate::Delegate(unsigned int) (rpl_handler.cc:89)
==107017==    by 0x199E46F: Trans_delegate (rpl_handler.h:285)
==107017==    by 0x199E46F: delegates_init() (rpl_handler.cc:363)
==107017==    by 0xC2738F: init_server_components() (mysqld.cc:6008)
==107017==    by 0xC2D8DB: mysqld_main(int, char**) (mysqld.cc:7640)
==107017==    by 0x4FF15D3: (below main) (in /usr/lib64/libc-2.17.so)
==107017==  Address 0x58f0040 is 0 bytes after a block of size 0 alloc'd
==107017==    at 0x4874ADC: operator new[](unsigned long) (vg_replace_malloc.c:423)
==107017==    by 0x199DD17: Aligned_atomic (aligned_atomic.h:283)
==107017==    by 0x199DD17: Aligned_atomic (aligned_atomic.h:288)
==107017==    by 0x199DD17: Shared_spin_lock (shared_spin_lock.h:170)
==107017==    by 0x199DD17: Delegate::Delegate(unsigned int) (rpl_handler.cc:89)
==107017==    by 0x199E46F: Trans_delegate (rpl_handler.h:285)
==107017==    by 0x199E46F: delegates_init() (rpl_handler.cc:363)
==107017==    by 0xC2738F: init_server_components() (mysqld.cc:6008)
==107017==    by 0xC2D8DB: mysqld_main(int, char**) (mysqld.cc:7640)
[yxx@dbs34 yxx]$
[5 Jul 2022 12:12] MySQL Verification Team
Hi,

Your report is quite valid.

However, problem does not occur in our code. Problems is caused by a bug in the STL of the C++ compiler used on CentOS. We do not maintain any of the compilers.

Hence, you should report it to them .....
[5 Jul 2022 12:18] Brian Yue
Hello,
  I disagree with you. I have locate the reason of this problem and I have also fixed it  now.
  The reason is that the CACHELINE info of the OS is invalid, and the code of mysql-server didn't consider about it.
  The CACHELINE info of the problem is followed:

[root@dbs34 ~]# getconf -a | grep CACHE
LEVEL1_ICACHE_SIZE                 0
LEVEL1_ICACHE_ASSOC                0
LEVEL1_ICACHE_LINESIZE             0
LEVEL1_DCACHE_SIZE                 0
LEVEL1_DCACHE_ASSOC                0
LEVEL1_DCACHE_LINESIZE             0
LEVEL2_CACHE_SIZE                  0
LEVEL2_CACHE_ASSOC                 0
LEVEL2_CACHE_LINESIZE              0
LEVEL3_CACHE_SIZE                  0
LEVEL3_CACHE_ASSOC                 0
LEVEL3_CACHE_LINESIZE              0
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0
[5 Jul 2022 12:23] MySQL Verification Team
Hi Mr. Yue,

It is great that you have discovered it.

Let us try to understand what you are proposing. You are claiming that this is a bug in the OS and that MySQL code should consider checking for all of the possible OS bugs in its own code.

Have we understood you correctly ????
[5 Jul 2022 12:31] MySQL Verification Team
Hi,

One more question for you.

Upon checking for the download pages, we have not been able to find any of our official packages for CentOS on the ARM CPU. Only Red Hat ones ......

Hence, can we know where from have you downloaded our package for that OS/CPU ?????
[5 Jul 2022 12:37] Brian Yue
Hi,
  I'm not sure if it's a bug that I failed to get the info of CACHELINE on Centos7.6, thus I'm reporting this problem so that you can consider about it. By the way, Centos is also a popular OS, many enterprises are using Centos.
  About the package, I didn't download a package from MySQL webpages, I download the source code and I compile a MySQL server by myself.

  If Centos is not supported or you think that there is no necessary to consider this problem, forget the problem I'm reporting and I'm not feeding back anymore.
[5 Jul 2022 12:40] MySQL Verification Team
Hi,

We shall get back to you on your last questions.

We are in consultations with our teams on whether self-compiled CentOS binaries are supported by us or not.

In short, we shall come back with an official answer.