Description:
Since gcov runs in PB2 for MySQL Cluster 7.4 trees Feb 24 2015, test ndb.ndb_suma_handover have failed regulary due to segmentation fault (signal 11), see logs below:
2015-08-18 00:26:28 [ndbd] INFO -- DBTC instance 3: Removed node 2 from takeover queue, 0 failed nodes remaining
completing gcp 10/10 in execTAKE_OVERTCCONF
2015-08-18 00:26:28 [ndbd] INFO -- DBTC instance 2: Removed node 2 from takeover queue, 0 failed nodes remaining
completing gcp 10/10 in execTAKE_OVERTCCONF
2015-08-18 00:26:28 [ndbd] INFO -- NR Status: node=2,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2015-08-18 00:26:28 [ndbd] INFO -- Node 2 has completed node fail handling
2015-08-18 00:26:29 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart ongoing
2015-08-18 00:26:40 [ndbd] INFO -- Suma: handover to node 3 gci: 17 buckets: 00000002 (2)
17/0 (16/4294967295) switchover complete bucket 1 state: 100
shutdown handover
2015-08-18 00:26:49 [ndbd] INFO -- Restarting system
2015-08-18 00:26:49 [ndbd] ALERT -- Node 4: Forced node shutdown completed. Initiated by signal 11.
-----------FAILED DATA NODE OUTPUT LOG END----------
Running test locally one sometimes got a crash in call to ndb_mgm_get_latest_error_line() with NULL handler.
(gdb) bt
#0 0x0000000001195da2 in ndb_mgm_get_latest_error_line (h=0x0)
at /home/msundell/dev/mysql-7.4/src/storage/ndb/src/mgmapi/mgmapi.cpp:436
#1 0x000000000114ea5b in TransporterRegistry::start_clients_thread (this=0x1ea2040 <globalTransporterRegistry>)
at /home/msundell/dev/mysql-7.4/src/storage/ndb/src/common/transporter/TransporterRegistry.cpp:2169
#2 0x000000000114c867 in run_start_clients_C (me=0x1ea2040 <globalTransporterRegistry>)
at /home/msundell/dev/mysql-7.4/src/storage/ndb/src/common/transporter/TransporterRegistry.cpp:1836
#3 0x00000000011be120 in ndb_thread_wrapper (_ss=0x1f1fdc0)
at /home/msundell/dev/mysql-7.4/src/storage/ndb/src/common/portlib/NdbThread.c:205
#4 0x00007fb33b863204 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fb33ab8671d in clone () from /lib64/libc.so.6
storage/ndb/src/common/transporter/TransporterRegistry.cpp:
2158 else
2159 {
2160 DBUG_PRINT("info", ("mgmd close connection early"));
2161 g_eventLogger->info
2162 ("Management server closed connection early. "
2163 "It is probably being shut down (or has problems). "
2164 "We will retry the connection. %d %s %s line: %d",
2165 ndb_mgm_get_latest_error(m_mgm_handle),
2166 ndb_mgm_get_latest_error_desc(m_mgm_handle),
2167 ndb_mgm_get_latest_error_msg(m_mgm_handle),
2168 ndb_mgm_get_latest_error_line(m_mgm_handle)
2169 );
How to repeat:
Look in PB2 for myqsl-5.6-cluster-7.4.
Or run something like ./mtr --mem --gcov --repeat=10 ndb.ndb_suma_handover.
Suggested fix:
Backport changes to ndb_mgm_get_latest_error-functions from Bug#11760802 SEVERAL MGMAPI FUNCTIONS RETURN 0(SUCCESS) WHEN NO HANDLE OR NOT CONNECTED allowing these functions to be called with NULL handles without crashing.
Or test for NULL handler in TransporterRegistry::start_clients_thread() before calling ndb_mgm_get_latest_error-functions in printout.