Description:
This is a libndbclient issue.
When we pass a hint to Ndb.startTransaction(const NdbDictionary::Table,const char*, uint32) to a table partitioned by a varbinary primary key, we get problems, such as SIGSEGV (sometimes), and sometimes to stdout we get:
"TransporterFacade::getIsNodeSendable: Illegal node type: 1 of node: 30376"
These errors produces no failure messages in the cluster logs.
When we replace:
Ndb.startTransaction(table_obj, string_key, key_len)
with
Ndb.startTransaction()
the problem goes away.
The most common error is:
SIGSEGV (0xb) at pc=0x6bada4b0, pid=4408, tid=3086907072
Examples
Stack: [0xbfe00000,0xc0000000), sp=0xbfff95c4, free space=2021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libndbclient.so.0+0x534b0] _ZN14NdbTransaction4initEv+0x20
C [libndbclient.so.0+0x48f35] _ZN3Ndb21startTransactionLocalEjj+0x65
C [libndbclient.so.0+0x49017] _ZN3Ndb16startTransactionEPKN13NdbDictionary5TableEPKcj+0x77
C [libndbj.so+0xca33] Java_com_nortel_ahp_base_db_mysql_ndbapi_Ndb_startTransactionStr+0x193
j com.nortel.ahp.base.db.mysql.ndbapi.Ndb.startTransactionStr(JLjava/lang/String;Ljava/lang/String;)J+0
C [libndbclient.so.0+0x489b0] _ZN3Ndb26getConnectedNdbTransactionEj+0x10
C [libndbclient.so.0+0x48c4c] _ZN3Ndb9doConnectEj+0xdc
C [libndbclient.so.0+0x48f1c] _ZN3Ndb21startTransactionLocalEjj+0x4c
C [libndbclient.so.0+0x49017] _ZN3Ndb16startTransactionEPKN13NdbDictionary5TableEPKcj+0x77
C [libndbj.so+0xca33] Java_com_nortel_ahp_base_db_mysql_ndbapi_Ndb_startTransactionStr+0x193
j com.nortel.ahp.base.db.mysql.ndbapi.Ndb.startTransactionStr(JLjava/lang/String;Ljava/lang/String;)J+0
We are running a test cluster with 2 NDBDs and 2 MGMDs:
config.ini:
[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=80M # Reduced to total 100M per replica
IndexMemory=20M
NoOfFragmentLogFiles=25
TimeBetweenLocalCheckpoints=6
MaxNoOfConcurrentOperations=12500
TransactionInactiveTimeout=30000 # 30seconds of inactivity=rollback
[NDB_MGMD]
Hostname=localhost
nodeid=62
portnumber=23131
DataDir=/var/lib/mysql-cluster/dbmgmd1
[NDB_MGMD]
Hostname=localhost
nodeid=63
portnumber=23132
DataDir=/var/lib/mysql-cluster/dbmgmd2
[NDBD]
HostName=localhost
datadir=/var/lib/mysql-cluster/dbdata1
nodeid=1
[NDBD]
HostName=localhost
datadir=/var/lib/mysql-cluster/dbdata2
nodeid=2
# Auto-enumerated API node slots,
# Counting down from 61
#
[MYSQLD]
nodeid=61
[MYSQLD]
nodeid=60
[MYSQLD]
nodeid=59
[MYSQLD]
nodeid=58
[MYSQLD]
nodeid=57
[MYSQLD]
nodeid=56
[MYSQLD]
nodeid=55
[MYSQLD]
nodeid=54
[MYSQLD]
nodeid=53
[MYSQLD]
nodeid=52
[MYSQLD]
nodeid=51
[MYSQLD]
nodeid=50
[MYSQLD]
nodeid=49
[MYSQLD]
nodeid=48
[MYSQLD]
nodeid=47
[MYSQLD]
nodeid=46
How to repeat:
I tried reproducing this on a sample program, by passing a string to:
Ndb.startTransaction(table_obj, string_key, key_len)
but it worked ok.
So I don't have a reproduction handy.