Bug #45053 Using keys with big charsets could node crash
Submitted: 24 May 2009 11:33 Modified: 6 Jul 2009 13:45
Reporter: Hindisvik Reykjavik Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: Pekka Nousiainen CPU Architecture:Any
Tags: 2341, 4028, 7.0.5, cluster, Failure, ndb

[24 May 2009 11:33] Hindisvik Reykjavik
Description:
When executing a query in a table with a varchar (size > 341) the whole cluster crashes, all nodes disconnect.

How to repeat:
CREATE TABLE `xxxx` (
`xx` varchar(450) collate utf8_unicode_ci NOT NULL,
`date` date NOT NULL,
`id_l` int(11) NOT NULL,
`id_g` int(11) NOT NULL,
`id_p` int(11) NOT NULL,
KEY `xx` (`xx`(100),`date`,`id_l`,`id_g`,`id_p`)
) ENGINE=NDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

SELECT * FROM xxxx WHERE xx = 'mystring'

--> 

It crashes with :
on my SQL node : [ERROR] Got error 4028 when reading table
On the MGM :
2009-05-24 12:38:32 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (fai
led ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-05-24 12:38:32 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected
2009-05-24 12:38:32 [MgmSrvr] ALERT -- Node 3: Node 2 Disconnected
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed
2009-05-24 12:38:32 [MgmSrvr] ALERT -- Node 3: Network partitioning - arbitration required
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: President restarts arbitration thread [state=7]
2009-05-24 12:38:32 [MgmSrvr] ALERT -- Node 3: Arbitration won - positive reply from node 1
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: GCP Take over started
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: Node 3 taking over as DICT master
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: GCP Take over completed
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: kk: 13412/5 0 0
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: LCP Take over started
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: ParticipatingDIH = 0000000000000000
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: ParticipatingLQH = 0000000000000000
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_From_Master_Received = 1
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: LCP Take over completed (state = 4)
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: ParticipatingDIH = 0000000000000000
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: ParticipatingLQH = 0000000000000000
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2009-05-24 12:38:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_From_Master_Received = 1
2009-05-24 12:38:33 [MgmSrvr] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (fai
led ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-05-24 12:38:33 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected

And on the NDB (2 Data Nodes) :
This :
Time: Saturday 23 May 2009 - 17:56:50
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtux/DbtuxCmp.cpp
Error object: DBTUX (Line: 138) 0x0000000a
Program: ndbd
Pid: 6493
Trace: /servers/mysql/cluster/ndb_2_trace.log.11
Version: mysql-5.1.32 ndb-7.0.5-beta
***EOM***
Or sometimes This :
Time: Sunday 24 May 2009 - 12:38:32
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtux/DbtuxSearch.cpp
Error object: DBTUX (Line: 111) 0x0000000a
Program: ndbd
Pid: 32017
Trace: /servers/mysql/cluster/ndb_2_trace.log.14
Version: mysql-5.1.32 ndb-7.0.5-beta
***EOM*** 

Suggested fix:
If it's not possible to have a varchar > 341, why is it possible to create one? How to prevent its creation?
[24 May 2009 14:25] Pekka Nousiainen
Varchar can have size up to max tuple size, about 8000.
utf8 reserves 3 bytes for a char so limit is about 8000/3.

You've found a bug, although this code has not changed in
years, except for dependency on MySQL character sets.
Would be interesting to know if replacing utf8 with
the default (case independent ascii?) makes a difference.
[24 May 2009 17:11] Hindisvik Reykjavik
Unfortunately, it's in production, I cannot make test easier (especially when I know it crashes my cluster).

Do you think there will be a patch or fix for that issue?
[25 May 2009 10:49] Hartmut Holzgraefe
Looks to be related to both the column length and the collation used,
i can reproduce the crash with utf8_unicode_ci but not with utf8_general_ci ...

Minimized test case:

DROP TABLE IF EXISTS `t1`;

CREATE TABLE `t1` (
  id int primary key auto_increment,
  `msg` varchar(342) NOT NULL,       -- works with 341, breaks with 342
  KEY `msg` (`msg`(100))
) ENGINE=ndb DEFAULT CHARSET=utf8 
  COLLATE=utf8_unicode_ci;           -- works with utf8_general_ci

insert into t1 values(NULL, md5(rand()));

SELECT * FROM t1 WHERE msg = 'mystring';
[25 May 2009 12:59] Hartmut Holzgraefe
Can't reproduce a full disconnect btw., my test case only crashes one data node at a time (in a 2 node, 2 replica local test setup)
[25 May 2009 13:18] Hindisvik Reykjavik
And if your run the query twice?

It seems that If I run a : OPTIMIZE TABLE XXX. it crashes the cluster too... Could you confirm?

Thank you
[25 May 2009 18:48] Hindisvik Reykjavik
> i can reproduce the crash with utf8_unicode_ci but not with utf8_general_ci ...

That's true for me too, setting utf8_general_ci seems to work fine... strange!

But I confirm that for me It crashes all my nodes (2 data nodes with replicas=2)
[1 Jun 2009 8:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75357

2943 Pekka Nousiainen	2009-06-01
      bug#45053 01_buf.diff
      Increase data buffers to handle any xfrm-ed string.
      modified:
        mysql-test/suite/ndb/r/ndb_index_ordered.result
        mysql-test/suite/ndb/t/ndb_index_ordered.test
        storage/ndb/src/kernel/blocks/dbtux/Dbtux.hpp
        storage/ndb/src/kernel/blocks/dbtux/DbtuxScan.cpp
[1 Jun 2009 8:56] Bugs System
Pushed into 5.1.34-ndb-6.2.19 (revid:pekka@mysql.com-20090601085220-n597i3xpcbfw207g) (version source revid:pekka@mysql.com-20090601085220-n597i3xpcbfw207g) (merge vers: 5.1.34-ndb-6.2.19) (pib:6)
[3 Jun 2009 6:28] Bugs System
Pushed into 5.1.34-ndb-6.3.26 (revid:jonas@mysql.com-20090603062427-hn6jf5iymkowxtbh) (version source revid:jonas@mysql.com-20090603062427-hn6jf5iymkowxtbh) (merge vers: 5.1.34-ndb-6.3.26) (pib:6)
[3 Jun 2009 6:29] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:jonas@mysql.com-20090603062551-k7appx8hh5lpehed) (version source revid:jonas@mysql.com-20090603062551-k7appx8hh5lpehed) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[6 Jul 2009 13:45] Jon Stephens
Documented bugfix in the NDB-6.2.19, 6.3.26, and 7.0.7 changelogs as follows:

        Problems could arise when using VARCHAR columns whose size was
        greater than 341 characters and which used the utf8_unicode_ci 
        collation. In some cases, this combination of conditions could 
        cause certain queries and OPTIMIZE TABLE statements to crash 
        mysqld.