Bug #78161 nextEvent() says:'!(gci_ops && (gci == gci_ops->m_gci)' on a TE_NUL event
Submitted: 21 Aug 2015 9:47 Modified: 24 Aug 2015 16:39
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S3 (Non-critical)
Version:7.4.7 OS:Any
Assigned to: CPU Architecture:Any

[21 Aug 2015 9:47] Ole John Aske
Description:
Got the crash from our regression test ndb_reconnect

2015-08-18 16:13:51 [NdbApi] INFO     -- Flushing incomplete GCI:s < 336/7
2015-08-18 16:13:51 [NdbApi] INFO     -- Flushing incomplete GCI:s < 336/7
2015-08-18 16:13:51 25893 [Note] NDB Binlog: starting log at epoch 336/7
2015-08-18 16:13:51 25893 [Note] NDB Binlog: Got first event
2015-08-18 16:13:51 25893 [Note] NDB Binlog: ndb tables writable
2015-08-18 16:13:51 25893 [Note] NDB Binlog: Startup and setup completed
2015-08-18 16:13:51 25893 [Note] NDB Schema dist: Data node: 1 reports subscribe from node 192, subscriber bitmask 00
nextEvent2: gci_ops->m_gci 1443109011464 (336/8)  gci 1443109011463 (336/7) type 0x00002000 data's operation 0x0000000d Ndb Binlog schema change monitoring
mysqld: /local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/storage/ndb/src/ndbapi/NdbEventOperationImpl.cpp:1772: NdbEventOperation* NdbEventBuffer::nextEvent2(): Assertion `gci_ops && (gci == gci_ops->m_gci)' failed.

..........

stack_bottom = 7ff0ef9fdd00 thread_stack 0x40000
2015-08-18 16:13:51 25893 [Note] NDB: Global schema lock release
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(my_print_stacktrace+0x35)[0xb6fdf9]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(handle_fatal_signal+0x404)[0x7e6060]
/lib64/libpthread.so.0(+0xf500)[0x7ff11b489500]
/lib64/libc.so.6(gsignal+0x35)[0x7ff11a3368a5]
/lib64/libc.so.6(abort+0x175)[0x7ff11a338085]
/lib64/libc.so.6(+0x2ba1e)[0x7ff11a32fa1e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x7ff11a32fae0]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(_ZN14NdbEventBuffer10nextEvent2Ev+0x3ac)[0xcaea94]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(_ZN3Ndb10nextEvent2Ev+0x1c)[0xc4da2c]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(_ZN3Ndb9nextEventEv+0x18)[0xc4da46]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(_ZN17Ndb_binlog_thread6do_runEv+0x17f2)[0xbe929a]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(_ZN13Ndb_component8run_implEv+0x6a)[0xc0aca2]
/local/home/club/club/git_mysql-5.6-cluster-7.4/262/mysql-cluster-gpl-7.4.8/sql/mysqld(Ndb_component_run_C+0x25)[0xc0ab36]
/lib64/libpthread.so.0(+0x7851)[0x7ff11b481851]
/lib64/libc.so.6(clone+0x6d)[0x7ff11a3ec90d]

From the assert we find:

- event type is the internal place holder event TE_NUL (0x00...0d)
- The 'data' for this event sees the GCI: (336/7)
- First item in the gci_ops sees the GCI (336/8)

Thus the GCI from data and gci_ops are off by one. So the previous
call to remove_consumed_gci_ops skipped a gci_ops to far -
or there never existed a gci_ops entry for (336/7) in the first place.

I am suspecting that a gci_ops entry is never inserted for a TE_NUL
as it is a pure temporary event later to be replaced when the real data
arrived, or ignored by nextEvent().

However, nextEvent() assert in the gci before possible ignoring it,
which likely is the root cause.

How to repeat:
ndb_reconnect, and likely ndb_share

Suggested fix:
move assert to after ignoring TE_NUL.
[24 Aug 2015 16:39] Jon Stephens
Fixed in NDB 7.4.8. No changes visible to end users.

Closed.
[24 Aug 2015 16:40] Jon Stephens
Also fixed in NDB 7.3.11.