Bug #21050 | Drop db during DN recovery causes other DN to fail & causes recovery issues | ||
---|---|---|---|
Submitted: | 13 Jul 2006 23:03 | Modified: | 2 Sep 2006 6:09 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 5.1.12 | OS: | Linux (Linux 32 Bit OS) |
Assigned to: | Pekka Nousiainen | CPU Architecture: | Any |
Tags: | MySQLD Crash back trace included |
[13 Jul 2006 23:03]
Jonathan Miller
[13 Jul 2006 23:05]
Jonathan Miller
DN Error log: Time: Friday 14 July 2006 - 00:45:01 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbdict/Dbdict.cpp Error object: DBDICT (Line: 13547) 0x0000000a Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 18504 Trace: /space/run/ndb_2_trace.log.2 Version: Version 5.1.12 (beta) Trace: BLOCK ADDR ADDR ADDR ADDR ADDR ADDR ADDR ADDR ?010826 011344 DBDICT 010851 010869 010936 DBLQH 002481 DBTC 004050 DBTUP 002032 DBDICT 010851 010869 010941 DBDICT 010851 010869 010884 008403 008416 DBDICT 006393 006411 --------------- Signal ---------------- r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 11279186 gsn: 420 "DICT_UNLOCK_ORD" prio: 1 s.bn: 246 "DBDIH", s.proc: 3, s.sigId: 1109139 length: 2 trace: 2 #sec: 0 fragInf: 0 H'00000000 H'00000001 --------------- Signal ---------------- r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 11279185 gsn: 605 "ALTER_INDX_CONF" prio: 1 s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 11279184 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00fa0002 H'00003298 H'00000010 --------------- Signal ---------------- r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 11279184 gsn: 603 "ALTER_INDX_REQ" prio: 1 s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 11279163 length: 8 trace: 0 #sec: 0 fragInf: 0 H'00fa0002 H'00003297 H'00100003 H'00000002 H'00000003 H'00000003 H'00000000 H'00003298 --------------- Signal ---------------- r.bn: 262 "RESTORE", r.proc: 2, r.sigId: 11279183 gsn: 502 "NODE_START_REP" prio: 1 s.bn: 251 "NDBCNTR", s.proc: 2, s.sigId: 11279162 length: 1 trace: 2 #sec: 0 fragInf: 0 H'00000003 --------------- Signal ---------------- r.bn: 261 "PGMAN", r.proc: 2, r.sigId: 11279182 gsn: 502 "NODE_START_REP" prio: 1 s.bn: 251 "NDBCNTR", s.proc: 2, s.sigId: 11279162 length: 1 trace: 2 #sec: 0 fragInf: 0 H'00000003 --------------- Signal ---------------- r.bn: 260 "LGMAN", r.proc: 2, r.sigId: 11279181 gsn: 502 "NODE_START_REP" prio: 1 s.bn: 251 "NDBCNTR", s.proc: 2, s.sigId: 11279162 length: 1 trace: 2 #sec: 0 fragInf: 0 H'00000003 --------------- Signal ----------------
[27 Jul 2006 21:49]
Jonathan Miller
Did a little more investigation on this one with the cid_dd_ndb.pl script. If you kill a DN at the right position in the test (i.e. not during the delete data from table) the test will continue. The if you restart the DN the test will hang at the drop table part of the test. Once the DN recovers the other DN dies. You have to shutdown the cluster and restart to get both DN operational again and you have to ctl-c out of the test. If you restart the test, if first tries to drop the database if it exists. When you do this, the MySQLD will crash. #0 0x004af402 in __kernel_vsyscall () (gdb) bt #0 0x004af402 in __kernel_vsyscall () #1 0x0064364f in pthread_kill () from /lib/libpthread.so.0 #2 0x0834a457 in write_core () #3 0x081f6336 in handle_segfault () #4 <signal handler called> #5 0x00537ce8 in strcmp () from /lib/libc.so.6 #6 0x08240d8c in remove_db_from_cache () #7 0x0830588b in mysql_rm_db () #8 0x0821375b in mysql_execute_command () #9 0x08217d2b in mysql_parse () #10 0x08218609 in dispatch_command () #11 0x0821973a in do_command () #12 0x08219af1 in handle_one_connection () #13 0x00640bd4 in start_thread () from /lib/libpthread.so.0 #14 0x005984fe in clone () from /lib/libc.so.6 The mysqld_safe restarts the mysqld and then if you run the test again, it procceds without issue. So we have a couple of issues here. 1) The drop database hangs when a data node dies 2) The 2nd data node dies when the first recovers 3) An additional attempt to drop the database from the same mysqld process will cause a crash one the cluster has been recoverd unless that mysqld process as been restarted first. Since we have hangs and crashes associated with this bug, I am rasing it to P1. Thanks /jeb
[18 Aug 2006 7:57]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10612 ChangeSet@1.2245, 2006-08-18 09:56:52+02:00, pekka@orca.ndb.mysql.com +8 -0 ndb - bug#18781 bug#21017 bug#21050 : block index ops during NR + fix asserts
[23 Aug 2006 8:47]
Pekka Nousiainen
patch for 2), open new bug# if other bugs remains
[1 Sep 2006 7:58]
Jonas Oreland
pushed to 5.1.12
[1 Sep 2006 13:21]
Jon Stephens
Documented bugfix in 5.1.12 changelog.
[1 Sep 2006 19:27]
Jonas Oreland
pushed to 5.0.25
[2 Sep 2006 6:09]
Jon Stephens
Documented bugfix in 5.0.25 changelog.