MySQL Bugs: #96532: SQL node crashed while sysbench data loading and 'show create table'

Bug #96532	SQL node crashed while sysbench data loading and 'show create table'
Submitted:	14 Aug 2019 12:32	Modified:	19 Sep 2019 5:22
Reporter:	Hiroyuki Itoh	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.6.11	OS:	CentOS (7.6.1810)
Assigned to:		CPU Architecture:	x86
Tags:	ndbcluster, sqlnode

Description:
SQL node crashed while sysbench data loading and 'show create table'.

I got crashed first time when I check data loading progress that I used 'show create table sbtest5.sbtest1\G'.

How to repeat:
1. sysbench data loading
sysbench \
  /usr/share/sysbench/oltp_common.lua \
  --db-driver=mysql \
  --tables=8 \
  --table-size=10000000 \
  --threads=2 \
  --mysql_storage_engine=ndbcluster \
  --mysql-db=sbtest5 \
  --mysql-user=sbtest \
  --mysql-password=password \
  --mysql-host=10.3.65.13 \
  prepare

sysbench version: sysbench-1.0.17-2.el7.x86_64 (use akopytov_sysbench.repo)

2. SQL node: run below command
while [ true ] ; do echo ' show create table sbtest5.sbtest1\G' | mysql; done

-----------------

Log and environments.(Test Servers)

o OS
CentOS Linux release 7.6.1810 (Core) / 3.10.0-957.21.2.el7.x86_64

o CPU(lscpu. Data Node is HT off, SQL Node is HT on)
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz
Stepping:              2
CPU MHz:               3600.256
CPU max MHz:           3700.0000
CPU min MHz:           1200.0000
BogoMIPS:              6784.63
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-5
NUMA node1 CPU(s):     6-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d

o NDB Cluster info
ndb_mgm> show
Connected to Management Server at: 10.3.65.1:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=11   @10.3.65.11  (mysql-5.7.27 ndb-7.6.11, Nodegroup: 0, *)
id=12   @10.3.65.12  (mysql-5.7.27 ndb-7.6.11, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.3.65.1  (mysql-5.7.27 ndb-7.6.11)

[mysqld(API)]   9 node(s)
id=101 (not connected, accepting connect from any host)
id=102 (not connected, accepting connect from any host)
id=103  @10.3.65.13  (mysql-5.7.27 ndb-7.6.11)
id=104  @10.3.65.13  (mysql-5.7.27 ndb-7.6.11)
id=105  @10.3.65.14  (mysql-5.7.27 ndb-7.6.11)
id=106  @10.3.65.14  (mysql-5.7.27 ndb-7.6.11)
id=107 (not connected, accepting connect from any host)
id=108 (not connected, accepting connect from any host)
id=109 (not connected, accepting connect from any host)

ndb_mgm>

o config.ini
[tcp default]
SendBufferMemory=32M
ReceiveBufferMemory=32M

[ndb_mgmd default]
DataDir=/var/lib/mysql-cluster

[ndb_mgmd]
HostName=10.3.65.1
NodeId=1

[ndbd default]
DataDir=/var/lib/mysql-cluster
FileSystemPath=/var/lib/mysql-cluster
BackupDataDir=/var/ndbbackup
NoOfReplicas=2

LockPagesInMainMemory=1

SharedGlobalMemory=4G
DataMemory=72G
StringMemory=128M
LongMessageBuffer=128M

ODirect=1

BackupWriteSize=4M
BackupMaxWriteSize=16M

EnableRedoControl=1

TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=2000
RedoBuffer=128M
TransactionDeadlockDetectionTimeout=6000
StartPartitionedTimeout=4294967039
TwoPassInitialNodeRestartCopy=1
BuildIndexThreads=12
NoOfFragmentLogFiles=24
NoOfFragmentLogParts=16
FragmentLogFileSize=256M
InitFragmentLogFiles=FULL

CompressedLCP=1
CompressedBackup=1

MaxNoOfExecutionThreads = 12
MaxNoOfAttributes = 20000
MaxNoOfTables = 1024
MaxNoOfConcurrentOperations = 300000
MaxNoOfLocalOperations = 330000
MaxNoOfConcurrentScans = 500
MaxNoOfOrderedIndexes = 4000
MaxNoOfUniqueHashIndexes = 300
MaxNoOfConcurrentTransactions = 50000
MaxParallelScansPerFragment=1024

Numa=1

[ndbd]
HostName = 10.3.65.11
NodeId = 11

[ndbd]
HostName = 10.3.65.12
NodeId = 12

[mysqld]
NodeId=101

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

o my.cnf
[mysqld]
ndbcluster
datadir=/var/lib/mysql-cluster
socket=/var/lib/mysql-cluster/mysql.sock
user=mysql
symbolic-links=0
max_connections=4096
ndb-cluster-connection-pool=2
thread_cache_size=256
query_cache_type=0

skip-external-locking
skip-name-resolve

open_files_limit = 65535

core_file

log_timestamps = SYSTEM
log-error=/var/log/mysqld.log

[mysqld_safe]
pid-file=/var/run/mysqld/mysqld.pid

[client]
socket=/var/lib/mysql-cluster/mysql.sock

[mysql_cluster]
ndb-connectstring=10.3.65.1

o sql node error log
2019-08-14T20:53:22.736969+09:00 592 [Note] NDB Binlog: CREATE TABLE Event: REPL$sbtest5/sbtest4
2019-08-14T20:54:31.335056+09:00 1 [Note] NDB Schema dist: Data node: 12 reports unsubscribe from node 105, subscriber bitmask 000000000
2019-08-14T20:54:31.335082+09:00 1 [Note] NDB Schema dist: Data node: 11 reports unsubscribe from node 105, subscriber bitmask 000000000
11:54:31 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=5
max_threads=4096
thread_count=5
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1635872 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f64004f3500
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f646c585e30 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xf8605b]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x837af1]
/lib64/libpthread.so.0(+0xf5d0)[0x7f64929095d0]
/usr/sbin/mysqld(_ZNK13NdbDictionary5Table19getPartitionBalanceEv+0x1)[0x1032f11]
/usr/sbin/mysqld(_ZN13ha_ndbcluster18append_create_infoEP6String+0xb6)[0xfa3e66]
/usr/sbin/mysqld(_Z17store_create_infoP3THDP10TABLE_LISTP6StringP24st_ha_create_informationb+0xee5)[0xd9aa65]
/usr/sbin/mysqld(_Z18mysqld_show_createP3THDP10TABLE_LIST+0x9d4)[0xd9c2e4]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x2ea1)[0xd4e751]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ad)[0xd5215d]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xa65)[0xd52ca5]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x19f)[0xd5470f]
/usr/sbin/mysqld(handle_connection+0x290)[0xe162b0]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x14b1d94]
/lib64/libpthread.so.0(+0x7dd5)[0x7f6492901dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f64913baead]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f6400002720): show create table sbtest5.sbtest1
Connection ID (thread ID): 26662
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
2019-08-14T11:54:32.196803Z 0 [Warning] Could not increase number of max_open_files to more than 5000 (request: 65535)
2019-08-14T11:54:32.197001Z 0 [Warning] Changed limits: table_open_cache: 447 (requested 2000)
2019-08-14T20:54:32.359415+09:00 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

o backtrace
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f64929069d1 in __pthread_kill (threadid=<optimized out>, signo=11) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:59
59        val = INTERNAL_SYSCALL (tgkill, err, 3, THREAD_GETMEM (THREAD_SELF, pid),
Missing separate debuginfos, use: debuginfo-install mysql-cluster-community-server-7.6.11-1.el7.x86_64
(gdb) bt
#0  0x00007f64929069d1 in __pthread_kill (threadid=<optimized out>, signo=11) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:59
#1  0x0000000000837a65 in handle_fatal_signal (sig=11)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/signal_handler.cc:220
#2  <signal handler called>
#3  NdbDictionary::Table::getPartitionBalance (this=this@entry=0x0)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/storage/ndb/src/ndbapi/NdbDictionary.cpp:780
#4  0x0000000000fa3e66 in ha_ndbcluster::append_create_info (this=0x7f64000086b0, packet=<optimized out>)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/ha_ndbcluster.cc:10255
#5  0x0000000000d9aa65 in store_create_info (thd=thd@entry=0x7f64004f3500, table_list=<optimized out>, packet=packet@entry=0x7f646c583a70,
    create_info_arg=create_info_arg@entry=0x0, show_database=show_database@entry=false)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/sql_show.cc:1926
#6  0x0000000000d9c2e4 in mysqld_show_create (thd=thd@entry=0x7f64004f3500, table_list=table_list@entry=0x7f64000030d8)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/sql_show.cc:967
#7  0x0000000000d4e751 in mysql_execute_command (thd=thd@entry=0x7f64004f3500, first_level=first_level@entry=true)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/sql_parse.cc:3576
#8  0x0000000000d5215d in mysql_parse (thd=thd@entry=0x7f64004f3500, parser_state=parser_state@entry=0x7f646c585500)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/sql_parse.cc:5570
#9  0x0000000000d52ca5 in dispatch_command (thd=thd@entry=0x7f64004f3500, com_data=com_data@entry=0x7f646c585da0, command=COM_QUERY)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/sql_parse.cc:1484
#10 0x0000000000d5470f in do_command (thd=thd@entry=0x7f64004f3500)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/sql_parse.cc:1025
#11 0x0000000000e162b0 in handle_connection (arg=arg@entry=0x329f660)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/sql/conn_handler/connection_handler_per_thread.cc:306
#12 0x00000000014b1d94 in pfs_spawn_thread (arg=0x35a9a90)
    at /export/home2/pb2/build/sb_1-34547247-1560245527.93/rpm/BUILD/mysql-cluster-gpl-7.6.11/mysql-cluster-gpl-7.6.11/storage/perfschema/pfs.cc:2190
#13 0x00007f6492901dd5 in start_thread (arg=0x7f646c586700) at pthread_create.c:307
#14 0x00007f64913baead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Suggested fix:
Do not crash sql node while sysbench data loading and 'show create table'.

Hi,

having issues reproducing this. If I understand correctly, your cluster continued to work normally, it's just SQL node that crashed?

Is the sql node crashed the one you are using to run sysbench or the one you are using to show create or you are using a single SQL node for both and that one crashed?

thanks

Hi, 

crashed SQL node(mysqld) only. data node(ndbmtd) and mgm node(ndb_mgmd) don't crash.

I run sysbench 10.3.65.22 server(use sysbench only).

I run sysbench from 10.3.65.22 to 10.3.65.13(SQL node).
I execute 'show create table\G' on 10.3.65.13 server.

Crash SQL node is 10.3.65.13 only, 10.3.65.14(another SQL node) is no problem.

Let me know, if you have another question.

thanks.

Thanks, I will redo the tests.

Thanks for the report, verified as described!

kind regards