Bug #72929 Ndb kernel thread is stuck in: Job Handling for about 18s caused data node crash
Submitted: 10 Jun 2014 6:58 Modified: 8 Dec 2014 7:58
Reporter: paul He Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.6.17-ndb-7.3.5-cluster-gpl-log, 7.3.7 OS:Linux (Red Hat Enterprise Linux Server release 6.2 (Santiago))
Assigned to: CPU Architecture:Any

[10 Jun 2014 6:58] paul He
Description:
massive overload on the data node cause the data node crash。

There are following error messages in the data node error log file.

Time: Monday 9 June 2014 - 14:35:55
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /pb2/build/sb_0-11918329-1396368427.42/rpm/BUILD/mysql-cluster-gpl-7.3.5/mysql-cluster-gpl-7.3.5/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 36594 thr: 10
Version: mysql-5.6.17 ndb-7.3.5
Trace: /mysqlhome/log/ndbd/ndb_11_trace.log.24

There are following error messages in the data node file.

2014-06-09 14:35:58 [ndbd] WARNING  -- Ndb kernel thread 5 is stuck in: Job Handling elapsed=17124
2014-06-09 14:35:58 [ndbd] INFO     -- Watchdog: User time: 69359  System time: 7784
2014-06-09 14:35:58 [ndbd] INFO     -- Signal 6 received; Aborted
2014-06-09 14:35:58 [ndbd] INFO     -- /pb2/build/sb_0-11918329-1396368427.42/rpm/BUILD/mysql-cluster-gpl-7.3.5/mysql-cluster-gpl-7.3.5/storage/ndb/src/kernel/ndbd.cpp
2014-06-09 14:35:58 [ndbd] INFO     -- Watchdog is killing system the hard way
2014-06-09 14:35:58 [ndbd] ALERT    -- Node 11: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

How to repeat:
Execute one sql statement cause the data node crash.The SQL statement is as following:

use test
SELECT f_bms_odometers_no('bms_so_key','SODETAIL')
FROM tbl_a sm INNER JOIN tbl_b oh
ON sm.wh_code = oh.wh_code AND sm.sono=oh.wms_order_no;

CREATE FUNCTION f_bms_odometers_no(odometer_code VARCHAR (100) , prefix VARCHAR (20)) RETURNS varchar(255) CHARSET utf8 COLLATE utf8_bin
BEGIN
DECLARE str VARCHAR(255) DEFAULT '';
DECLARE no_count_id INT DEFAULT -1; 
SELECT no_count INTO no_count_id  FROM plat_odometer WHERE no_name = odometer_code;
IF (no_count_id >= 0) THEN
SET no_count_id = no_count_id + 1;
UPDATE plat_odometer SET no_count = no_count_id WHERE no_name = odometer_code;
SET str= CONCAT(prefix ,DATE_FORMAT(NOW(),'%Y%m%d'), RIGHT(CONCAT('0000000000',no_count_id),10));
END IF;
RETURN str;
END;

Note:
Execute the following sql statement returen only one row:
SELECT * FROM plat_odometer WHERE no_name ='bms_so_key';
[10 Jun 2014 14:53] MySQL Verification Team
Thank you for the report.
I cannot repeat with dummy schema/data, could you please provide complete repeatable test case(schema, data, and any other missing details(other than provided in bug report) which help us to reproduce the issue)?

Also, could you please attach the cluster logs? Preferably using the ndb_error_reporter utility:

  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-error-reporter.html

Thanks,
Umesh
[11 Jun 2014 13:53] paul He
test case which trigger the bug

Attachment: test case.TXT (text/plain), 10.79 KiB.

[19 Jun 2014 8:07] paul He
Hi,Umesh.
I have not any feedback from you for about a week.Is there any information that you would like me to provide?
[19 Jun 2014 8:35] MySQL Verification Team
Hi Paul,

I tried it couple of times(with single connection or even with mysqlslap with >15 concurrency) but it is not crashing, select takes >1hrs and query is killed but not seen node crashes.  Shall keep you updated on this.

Thanks,
Umesh
[26 Aug 2014 5:26] paul He
I am sorry that there is some mistake in the test case.I have correct the error and I will upload the test case again (named test_case_v2.sql).The ndbd will crash in about 3 minutes after execute the test_case_v2.sql.
[26 Aug 2014 5:27] paul He
test_case_v2.sql

Attachment: test_case_v2.sql (application/octet-stream, text), 10.76 KiB.

[26 Aug 2014 5:35] MySQL Verification Team
Thank you for the modified test case.
Shall run few tests and confirm if it crashes.

Thanks,
Umesh
[8 Dec 2014 7:58] MySQL Verification Team
Thank you Paul, and my apologies for taking long time on this.
With exact conf file, and updated test case I'm seeing different issue than the originally reported one which is fixed as per #18229003 and in NDB 7.1.31, 7.2.16, and 7.3.5. For change logs, please refer http://dev.mysql.com/doc/relnotes/mysql-cluster/7.3/en/mysql-cluster-news-5-6-17-ndb-7-3-5...

mysql> source test_case_v2.sql;
Database changed
Query OK, 0 rows affected (0.14 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.14 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected (0.12 sec)

Query OK, 0 rows affected (0.03 sec)

Query OK, 1 row affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (17.59 sec)

Query OK, 1 row affected (0.51 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

ERROR 1296 (HY000): Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' from NDBCLUSTER
mysql>
mysql> show errors;
+-------+------+---------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                 |
+-------+------+---------------------------------------------------------------------------------------------------------+
| Error | 1296 | Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' from NDBCLUSTER |
| Error | 1296 | Got error 122 'Unknown error code' from NDBCLUSTER                                                      |
| Error | 1181 | Got error 122 during ROLLBACK                                                                           |
+-------+------+---------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                 |
+---------+------+---------------------------------------------------------------------------------------------------------+
| Warning | 1296 | Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' from NDB        |
| Error   | 1296 | Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' from NDBCLUSTER |
| Warning | 1296 | Got error 0 'No error' from NDB                                                                         |
| Error   | 1296 | Got error 122 'Unknown error code' from NDBCLUSTER                                                      |
| Error   | 1181 | Got error 122 during ROLLBACK                                                                           |
+---------+------+---------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

// Node error log

Time: Monday 8 December 2014 - 08:28:54
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/WatchDog.cpp
Program: ndbd
Pid: 26811
Version: mysql-5.6.17 ndb-7.3.5
Trace: /export/umes

// back trace

(gdb) bt
#0  FastScheduler::traceDumpGetJam (this=<optimized out>, thr_no=<optimized out>, thrdTheEmulatedJam=@0x7f6366ab7608: 0x4, thrdTheEmulatedJamIndex=@0x7f6366ab7618: 1721)
    at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/FastScheduler.cpp:504
#1  0x00000000006c9b39 in WriteMessage (thrdMessageID=6050, thrdProblemData=0x7824a5 "Job Handling",
    thrdObjRef=0x782228 "/export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/WatchDog.cpp", nst=@0x7f6366ab766c: NST_Watchdog)
    at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/error/ErrorReporter.cpp:376
#2  0x00000000006ca005 in ErrorReporter::handleError (messageID=6050, problemData=0x7824a5 "Job Handling",
    objRef=0x782228 "/export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/WatchDog.cpp", nst=NST_Watchdog)
    at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/error/ErrorReporter.cpp:246
#3  0x00000000006ba0aa in WatchDog::shutdownSystem (this=<optimized out>, last_stuck_action=0x0)
    at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/WatchDog.cpp:399
#4  0x00000000006ba8c6 in WatchDog::run (this=0x25152a0) at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/WatchDog.cpp:385
#5  0x00000000006babb9 in runWatchDog (w=0x1) at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/kernel/vm/WatchDog.cpp:37
#6  0x00000000006f4ae7 in ndb_thread_wrapper (_ss=0x24c64e0) at /export/home/pb2/build/sb_0-11918329-1396368454.51/mysql-cluster-com-7.3.5/storage/ndb/src/common/portlib/NdbThread.c:202
#7  0x00007f63667d6df3 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f63658db47d in clone () from /lib64/libc.so.6
[8 Dec 2014 8:12] MySQL Verification Team
// 7.3.7 also affected

mysql> show variables like '%version%';
+-------------------------+------------------------------------------------------+
| Variable_name           | Value                                                |
+-------------------------+------------------------------------------------------+
| innodb_version          | 5.6.21                                               |
| ndb_version             | 459527                                               |
| ndb_version_string      | ndb-7.3.7                                            |
| ndbinfo_version         | 459527                                               |
| protocol_version        | 10                                                   |
| slave_type_conversions  |                                                      |
| version                 | 5.6.21-ndb-7.3.7-cluster-commercial-advanced-log     |
| version_comment         | MySQL Cluster Server - Advanced Edition (Commercial) |
| version_compile_machine | x86_64                                               |
| version_compile_os      | linux-glibc2.5                                       |
+-------------------------+------------------------------------------------------+
10 rows in set (0.00 sec)

mysql> source test_case_v2.sql;
Database changed
Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (15.76 sec)

Query OK, 1 row affected (0.44 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

ERROR 1296 (HY000): Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' from NDBCLUSTER

//  ndbd error log
Time: Monday 8 December 2014 - 09:07:59
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/WatchDog.cpp
Program: ndbd
Pid: 1446
Version: mysql-5.6.21 ndb-7.3.7
Trace: /export/umesh

// 

#0  FastScheduler::traceDumpGetJam (this=<optimized out>, thr_no=<optimized out>, thrdTheEmulatedJam=@0x7fc5b5839608: 0x4, thrdTheEmulatedJamIndex=@0x7fc5b5839618: 1604)
    at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/FastScheduler.cpp:508
508     /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/FastScheduler.cpp: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-55.0.4.el7_0.1.x86_64 libgcc-4.8.2-16.2.el7_0.x86_64 libstdc++-4.8.2-16.2.el7_0.x86_64
(gdb) bt
#0  FastScheduler::traceDumpGetJam (this=<optimized out>, thr_no=<optimized out>, thrdTheEmulatedJam=@0x7fc5b5839608: 0x4, thrdTheEmulatedJamIndex=@0x7fc5b5839618: 1604)
    at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/FastScheduler.cpp:508
#1  0x00000000006d08b9 in WriteMessage (thrdMessageID=6050, thrdProblemData=0x81e3d5 "Job Handling",
    thrdObjRef=0x81e490 "/export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/WatchDog.cpp", nst=@0x7fc5b583966c: NST_Watchdog)
    at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/error/ErrorReporter.cpp:376
#2  0x00000000006d0dc5 in ErrorReporter::handleError (messageID=6050, problemData=0x81e3d5 "Job Handling",
    objRef=0x81e490 "/export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/WatchDog.cpp", nst=NST_Watchdog)
    at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/error/ErrorReporter.cpp:246
#3  0x00000000006c0dea in WatchDog::shutdownSystem (this=<optimized out>, last_stuck_action=0x0)
    at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/WatchDog.cpp:408
#4  0x00000000006c14d4 in WatchDog::run (this=0x29497d0) at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/WatchDog.cpp:394
#5  0x00000000006c1919 in runWatchDog (w=0x1) at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/kernel/vm/WatchDog.cpp:37
#6  0x00000000006fc067 in ndb_thread_wrapper (_ss=0x28fc150) at /export/home/pb2/build/sb_0-13403650-1412862503.48/mysql-cluster-com-7.3.7/storage/ndb/src/common/portlib/NdbThread.c:202
#7  0x00007fc5b5558df3 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fc5b465d47d in clone () from /lib64/libc.so.6
[8 Dec 2014 8:22] MySQL Verification Team
// Build used

revision-id: mauritz.sundell@oracle.com-20141009130427-cwuq3yrz4eu63xn6
date: 2014-10-09 15:04:27 +0200
build-date: 2014-10-09 15:29:27 +0200
revno: 4444
branch-nick: mysql-cluster-7.3.7-release

MySQL source 5.6.21