Bug #19141 DD: CREATE LFG during DROP of a NDB table cause DN failure and corruption
Submitted: 17 Apr 2006 15:43 Modified: 23 Apr 2006 1:15
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.10 OS:Linux (Linux 32 Bit OS)
Assigned to: Jonas Oreland CPU Architecture:Any

[17 Apr 2006 15:43] Jonathan Miller
Description:
I had the new cid_ndb_dd.pl script running against  a 4 data node cluster. The script was creating a database called TESTER, creating a table called t1, inserting a random amount of data, deleting from the table all data, dropping the table, dropping the database and repeating.

During this, I logged into mysql and started creating and dropping a log file group. I hit a time where the perl script was dropping the table as I was creating the LFG:

mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 49 to server version: 5.1.10-beta-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE LOGFILE GROUP lg1 ADD UNDOFILE './lg1/undofile.dat' INITIAL_SIZE 300 UNDO_BUFFER_SIZE=4M ENGINE=NDB;
Query OK, 0 rows affected (1.31 sec)

mysql> DROP LOGFILE GROUP lg1 ENGINE=NDB; Query OK, 0 rows affected (2.37 sec)

mysql> CREATE LOGFILE GROUP lg1 ADD UNDOFILE './lg1/undofile.dat' INITIAL_SIZE 300 UNDO_BUFFER_SIZE=4M ENGINE=NDB;
Query OK, 0 rows affected (1.21 sec)

mysql> DROP LOGFILE GROUP lg1 ENGINE=NDB; Query OK, 0 rows affected (0.97 sec)

mysql> CREATE LOGFILE GROUP lg1 ADD UNDOFILE './lg1/undofile.dat' INITIAL_SIZE 300 UNDO_BUFFER_SIZE=4M ENGINE=NDB;
ERROR 1507 (HY000): Failed to create LOGFILE GROUP
mysql> exit

Script:
529 rows of data inserted.....
Data Deleted.
DBD::mysql::st execute failed: Unknown table 't1' at cid_ndb_dd.pl line 483.
Drop table error: Unknown table 't1' at cid_ndb_dd.pl line 483.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| TESTER             |
| cluster            |
| mysql              |
| test               |
+--------------------+
5 rows in set (0.01 sec)

mysql> use TESTER;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+------------------+
| Tables_in_TESTER |
+------------------+
| t1               |
+------------------+
1 row in set (0.00 sec)

mysql> drop table t1;
ERROR 1051 (42S02): Unknown table 't1'

Time: Monday 17 April 2006 - 17:21:27
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 6724) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 9738
Trace: /space/run/ndb_2_trace.log.1
Version: Version 5.1.10 (beta)

--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019937 gsn: 199 "PREP_DROP_TAB_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 1019935 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderRef: fa0002 senderData: 759 TableId: 6
--------------- Signal ----------------
r.bn: 256 "DBUTIL", r.proc: 2, r.sigId: 1019936 gsn: 303 "UTIL_UNLOCK_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 1019935 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderData: 0
 senderRef: fa0002
 lockId: 1
 lockKey: e1
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019935 gsn: 231 "UTIL_LOCK_CONF" prio: 1
s.bn: 256 "DBUTIL", s.proc: 2, s.sigId: 1019934 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderData: 0
 senderRef: 1000002
 lockId: 1
 lockKey: e1
--------------- Signal ----------------
r.bn: 256 "DBUTIL", r.proc: 2, r.sigId: 1019934 gsn: 222 "UTIL_LOCK_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 1019933 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderData: 0
 senderRef: fa0002
 lockId: 1
 requestInfo: 0

--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019933 gsn: 82 "DROP_TABLE_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 1019932 length: 4 trace: 0 #sec: 0 fragInf: 0
 H'000002f3 H'00fa0002 H'00000006 H'00000039
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019932 gsn: 605 "ALTER_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 1019931 length: 6 trace: 0 #sec: 0 fragInf: 0
 H'00fa0002 H'000002f3 H'00000003 H'00000005 H'00000006 H'00000039
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019931 gsn: 605 "ALTER_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: -1 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00fa0002 H'000002f4 H'000000c0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019930 gsn: 605 "ALTER_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 5, s.sigId: -1 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00fa0002 H'000002f4 H'000000c0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019929 gsn: 605 "ALTER_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 4, s.sigId: -1 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00fa0002 H'000002f4 H'000000c0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019928 gsn: 605 "ALTER_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 1019927 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00fa0002 H'000002f4 H'000000c0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 1019927 gsn: 603 "ALTER_INDX_REQ" prio: 1

How to repeat:
Start this script:
perl cid_ndb_dd.pl -ho localhost --port 3306 -u root

In another window log into mysql and run the following command until drop of table and create of FLG happen together.
mysql> CREATE LOGFILE GROUP lg1 ADD UNDOFILE './lg1/undofile.dat' INITIAL_SIZE 300 UNDO_BUFFER_SIZE=4M ENGINE=NDB;

mysql> DROP LOGFILE GROUP lg1 ENGINE=NDB;
[17 Apr 2006 15:47] Jonathan Miller
Trying to recover without using --initial on the failed DN

Time: Monday 17 April 2006 - 17:25:03
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: Invalid memory access: ptr (ffffff00 0xd3d17c10) magic: (00000000 000000e2) memroot: 0xd3d18000 page: ffffff00
Error object: LGMAN (Line: 42) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 9930
Trace: /space/run/ndb_2_trace.log.2
Version: Version 5.1.10 (beta)
***EOM***

--------------- Signal ----------------
r.bn: 260 "LGMAN", r.proc: 2, r.sigId: 192856 gsn: 164 "CONTINUEB" prio: 1
s.bn: 260 "LGMAN", s.proc: 2, s.sigId: 192855 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000006 H'00004000
--------------- Signal ----------------
r.bn: 260 "LGMAN", r.proc: 2, r.sigId: 192855 gsn: 164 "CONTINUEB" prio: 1
s.bn: 260 "LGMAN", s.proc: 2, s.sigId: 192853 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000004 H'ffffff00
--------------- Signal ----------------
r.bn: 254 "CMVMI", r.proc: 2, r.sigId: 192854 gsn: 247 "EVENT_REP" prio: 1
s.bn: 260 "LGMAN", s.proc: 2, s.sigId: 0 length: 7 trace: 2 #sec: 0 fragInf: 0
 H'00000031 H'6c707041 H'676e6979 H'646e7520 H'6f74206f H'50434c20 H'0033203a
--------------- Signal ----------------
r.bn: 260 "LGMAN", r.proc: 2, r.sigId: 192853 gsn: 386 "START_RECREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 192850 length: 1 trace: 2 #sec: 0 fragInf: 0
 receivingNodeId: 3 senderRef: (28737, 27760)
 keepGci: 1735289209 lastCompletedGci: 1684960544 newestGci: 1869881455
--------------- Signal ----------------
r.bn: 248 "DBACC", r.proc: 2, r.sigId: 192852 gsn: 253 "EXPANDCHECK2" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 192850 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000001 H'00000002
--------------- Signal ----------------
r.bn: 248 "DBACC", r.proc: 2, r.sigId: 192851 gsn: 253 "EXPANDCHECK2" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 192850 length: 2 trace: 2 #sec: 0 fragInf: 0
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 192850 gsn: 89 "Unknown" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192848 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000002 H'01060002
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 192849 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 192846 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192848 gsn: 255 "FSCLOSECONF" prio: 1
s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 192846 length: 1 trace: 2 #sec: 0 fragInf: 0
 UserPointer: 0
--------------- Signal ----------------
r.bn: 252 "QMGR", r.proc: 2, r.sigId: 192847 gsn: 164 "CONTINUEB" prio: 0
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 192845 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 192846 gsn: 164 "CONTINUEB" prio: 0
s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 192844 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel every 10ms
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 192843 gsn: 257 "FSCLOSEREQ" prio: 0
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192842 length: 4 trace: 2 #sec: 0 fragInf: 0
 UserPointer: 0
 FilePointer: 60
Flags: H'00000000, Don't remove file
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192842 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192841 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192841 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192840 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192840 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192839 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192839 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192838 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192838 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192837 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 192837 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 2, s.sigId: 192836 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
[17 Apr 2006 16:34] Jonathan Miller
Data node recoverd with --initial, but database is still corrupted.

Node 2: Started (version 5.1.10)

ndb_mgm> exit

$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 62 to server version: 5.1.10-beta-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use TESTER;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> drop table t1;
ERROR 1051 (42S02): Unknown table 't1'
mysql> show tables;
+------------------+
| Tables_in_TESTER |
+------------------+
| t1               |
+------------------+
1 row in set (0.00 sec)

mysql> DROP DATABASE TESTER;
ERROR 1051 (42S02): Unknown table 't1'
mysql> show tables;
ERROR 1046 (3D000): No database selected
mysql> use TESTER;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+------------------+
| Tables_in_TESTER |
+------------------+
| t1               |
+------------------+
1 row in set (0.00 sec)

mysql>
[21 Apr 2006 14:46] Tomas Ulin
pushed to 5.1.10
[23 Apr 2006 1:15] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.1.10 changelog. Closed.