MySQL Bugs: #46956: Node restart does not work if data is on zfs and a node is hard aborted

Bug #46956	Node restart does not work if data is on zfs and a node is hard aborted
Submitted:	27 Aug 2009 14:40	Modified:	28 Aug 2009 7:01
Reporter:	detlef Ulherr	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-7.0	OS:	Solaris (Solaris 10 U7)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	7.0.7

Description:
When the data node stores its data under zfs file system and gets aborted. It refuses to restart.

The client who generates the load is a ksh script which does a loop around a bulk of 100 inserts to the database. The program for this insert is mysql.

It is called from a remote system.

After a 7 - 10000 inserted lines I call uadmin 1 0 then the data node refuses to start after the system comes up.

The error message is:
2009-08-26 07:11:17 [ndbd] INFO     -- Error while reading REDO log. from 16530
part: 3 D=11, F=0 Mb=0 FP=1 W1=575 W2=65564 : Invalid logword gci: 291
2009-08-26 07:11:17 [ndbd] INFO     -- DBLQH (Line: 16570) 0x00000008
2009-08-26 07:11:17 [ndbd] INFO     -- Error handler startup restarting system
2009-08-26 07:11:17 [ndbd] INFO     -- Angel received ndbd startup failure count 1.
2009-08-26 07:11:17 [ndbd] INFO     -- Error handler shutdown completed - exiting

When I repeat the same test with the data stored under UFS everything works fine.

How to repeat:

The problem is 100% reproducible.

Just repeat this test case

Thank you for the report.

Looks like you forgot to attach the test case. Please attach it.

Just to highlight the testcase.

create the table 
create table test_tbl (col1 integer, col2 text) engine=ndbcluster;

create an file load.sql containing 100 lines of
insert into test_tbl values (000000, 'iiiiiiiiiiii');
insert into test_tbl values (000003, 'iiiiiiiiiiii');
insert into test_tbl values (000002, 'iiiiiiiiiiii');
.... snip
insert into test_tbl values (000099, 'iiiiiiiiiiii');

create a shell script load.ksh
#!/bin/bash
i=0

while [ $i -lt 200000 ]
do
        let i=$i+100
        echo inserting the next 100 values  $i already done
        /usr/local/mysql/bin/mysql -h $1 -D testdb -uclient -pclient -e "source load.sql"

done

Assuming you createt the testdb and the user client with the appropriate credentials, run load.
run load.ksh 

after round about 7000 inserts  call uadmin 1 0 and boot the system again.

Update: It seems that is a known ZFS bug that is fixed in upcoming U8
  (or in newer open-solaris releases)

We havent yet verified that it actually do work on the newer ZFS releases,
but the work-around suggested by ZFS team (for the bug that they mentioned) solved out problem too.