Bug #46956 Node restart does not work if data is on zfs and a node is hard aborted
Submitted: 27 Aug 2009 14:40 Modified: 28 Aug 2009 7:01
Reporter: detlef Ulherr Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-7.0 OS:Solaris (Solaris 10 U7)
Assigned to: Assigned Account CPU Architecture:Any
Tags: 7.0.7
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[27 Aug 2009 14:40] detlef Ulherr
Description:
When the data node stores its data under zfs file system and gets aborted. It refuses to restart.

The client who generates the load is a ksh script which does a loop around a bulk of 100 inserts to the database. The program for this insert is mysql.

It is called from a remote system.

After a 7 - 10000 inserted lines I call uadmin 1 0 then the data node refuses to start after the system comes up.

The error message is:
2009-08-26 07:11:17 [ndbd] INFO     -- Error while reading REDO log. from 16530
part: 3 D=11, F=0 Mb=0 FP=1 W1=575 W2=65564 : Invalid logword gci: 291
2009-08-26 07:11:17 [ndbd] INFO     -- DBLQH (Line: 16570) 0x00000008
2009-08-26 07:11:17 [ndbd] INFO     -- Error handler startup restarting system
2009-08-26 07:11:17 [ndbd] INFO     -- Angel received ndbd startup failure count 1.
2009-08-26 07:11:17 [ndbd] INFO     -- Error handler shutdown completed - exiting

When I repeat the same test with the data stored under UFS everything works fine.

How to repeat:

The problem is 100% reproducible.

Just repeat this test case
[27 Aug 2009 16:40] Sveta Smirnova
Thank you for the report.

Looks like you forgot to attach the test case. Please attach it.
[28 Aug 2009 7:01] detlef Ulherr
Just to highlight the testcase.

create the table 
create table test_tbl (col1 integer, col2 text) engine=ndbcluster;

create an file load.sql containing 100 lines of
insert into test_tbl values (000000, 'iiiiiiiiiiii');
insert into test_tbl values (000003, 'iiiiiiiiiiii');
insert into test_tbl values (000002, 'iiiiiiiiiiii');
.... snip
insert into test_tbl values (000099, 'iiiiiiiiiiii');

create a shell script load.ksh
#!/bin/bash
i=0

while [ $i -lt 200000 ]
do
        let i=$i+100
        echo inserting the next 100 values  $i already done
        /usr/local/mysql/bin/mysql -h $1 -D testdb -uclient -pclient -e "source load.sql"

done

Assuming you createt the testdb and the user client with the appropriate credentials, run load.
run load.ksh 

after round about 7000 inserts  call uadmin 1 0 and boot the system again.
[3 Sep 2009 13:16] Jonas Oreland
Update: It seems that is a known ZFS bug that is fixed in upcoming U8
  (or in newer open-solaris releases)

We havent yet verified that it actually do work on the newer ZFS releases,
but the work-around suggested by ZFS team (for the bug that they mentioned) solved out problem too.