| Bug #28123 | rpl_ndb_mix_innodb.test casue slave to core on sol10-sparc-a | ||
|---|---|---|---|
| Submitted: | 26 Apr 2007 14:26 | Modified: | 10 Jul 2007 7:17 |
| Reporter: | Jonathan Miller | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Cluster: Replication | Severity: | S1 (Critical) |
| Version: | 5.1-telco | OS: | Solaris (sol10-sparc-a) |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[26 Apr 2007 14:26]
Jonathan Miller
[2 Jul 2007 6:48]
Guangbao Ni
Loaded symbols for /usr/platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1
#0 0xff11fc54 in _lwp_kill () from /usr/lib/libc.so.1
(gdb) bt
#0 0xff11fc54 in _lwp_kill () from /usr/lib/libc.so.1
#1 0x00339730 in write_core (sig=11) at stacktrace.c:229
#2 0x001b69c0 in handle_segfault (sig=11) at mysqld.cc:2239
#3 0xff355bb4 in __sighndlr () from /usr/lib/libthread.so.1
#4 0xff34f80c in call_user_handler () from /usr/lib/libthread.so.1
#5 <signal handler called>
#6 0xff0b455c in strlen () from /usr/lib/libc.so.1
#7 0xff107058 in _doprnt () from /usr/lib/libc.so.1
#8 0xff108ab4 in fprintf () from /usr/lib/libc.so.1
#9 0x002b2190 in ndbcluster_commit (hton=0x0, thd=0xffffffff, all=false)
at ha_ndbcluster.cc:5152
#10 0x0029bff8 in ha_commit_one_phase(THD*, bool) (thd=0xb969e0, all=false)
at handler.cc:789
#11 0x0029c5a4 in ha_commit_trans(THD*, bool) (thd=0xb969e0, all=false)
at handler.cc:755
#12 0x0029c968 in ha_autocommit_or_rollback(THD*, int) (thd=0xb969e0,
error=0) at handler.cc:899
#13 0x00273594 in Rows_log_event::do_update_pos(st_relay_log_info*) (
---Type <return> to continue, or q <return> to quit---
this=0xc93fc8, rli=0xb1d118) at log_event.cc:6176
#14 0x0032508c in exec_relay_log_event (thd=0xb969e0, rli=0xb1d118)
at log_event.h:828
#15 0x00325aa4 in handle_slave_sql (arg=0xb1d118) at slave.cc:2407
[3 Jul 2007 7:03]
Guangbao Ni
in slave.err file, there are the following error messages: 070629 15:34:18 [ERROR] Slave: Error in Update_rows event: commit of row events failed, Error_code: 1 070629 15:34:18 [ERROR] Slave: It was not possible to update the positions of the relay log information: the slave may be in an inconsistent state. Stopped in /export/home/ngb/telco_bug28346/mysql-test/var/log/slave-relay-bin.000003 position 194946, Error_code: 1105 070629 15:34:18 [ERROR] Slave (additional info): Got error 4350 'Transaction already aborted' from NDBCLUSTER Error_code: 1296 070629 15:34:18 [Warning] Slave: Got error 4350 'Transaction already aborted' from NDB Error_code: 1296 070629 15:34:18 [Warning] Slave: Got error 4350 'Transaction already aborted' from NDBCLUSTER Error_code: 1296 070629 15:34:18 [Warning] Slave: Got error 4350 'Transaction already aborted' from NDBCLUSTER Error_code: 1296 070629 15:34:18 [Warning] Slave: Got error 4350 during COMMIT Error_code: 1180 070629 15:34:18 [Warning] Slave: Unknown error Error_code: 1105 070629 15:34:18 [Warning] Slave: Unknown error Error_code: 1105 070629 15:34:18 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 194800 NDB: Found 3 NdbTransaction's that have not been released NDB: Found 1 NdbReceiver that has not been released NDB: Found 3 NdbTransaction's that have not been released NDB: Found 1 NdbReceiver that has not been released 070629 15:34:49 [Note] Slave: received end packet from server, apparent master shutdown: 070629 15:34:49 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'master-bin.000001' position 197628 070629 15:34:49 [ERROR] Slave I/O thread: error reconnecting to master 'root@127.0.0.1:9306': Error: 'Can't connect to MySQL server on '127.0.0.1' (146)' errno: 2003 retry-time: 1 retries: 10 070629 15:34:49 [Note] /export/home/ngb/telco_bug28346/sql/mysqld: Normal shutdown 070629 15:34:49 [Note] Event Scheduler: Purging the queue. 0 events 070629 15:34:49 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read 070629 15:34:49 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 197628 070629 15:34:49 [Note] Stopping Cluster Binlog 070629 15:34:49 [Warning] Forcing shutdown of 1 plugins 070629 15:34:49 [Note] Plugin 'ndbcluster' will be forced to shutdown 070629 15:34:49 [Note] Stopping Cluster Utility thread NDB: table share ./mysql/ndb_apply_status with use_count 1 not freed NDB: table share ./tpcb/teller with use_count 1 not freed NDB: table share ./tpcb/branch with use_count 1 not freed NDB: table share ./tpcb/account with use_count 1 not freed NDB: table share ./tpcb/history with use_count 1 not freed NDB: table share ./mysql/ndb_schema with use_count 1 not freed 070629 15:34:51 [Note] /export/home/ngb/telco_bug28346/sql/mysqld: Shutdown complete
[9 Jul 2007 1:32]
Guangbao Ni
for rpl_ndb_mix_innodb
If master and slave both run on Solaris, and when master table is INNODB and slave table is NDB, in Field_blob::unpack() and Field_blob::pack(),
these two function will both call get_length() function,
in uint32 Field_blob::get_length() function,
uint32 Field_blob::get_length(const char *pos)
{
switch (packlength) {
case 1:
return (uint32) (uchar) pos[0];
case 2:
{
uint16 tmp;
#ifdef WORDS_BIGENDIAN
if (table->s->db_low_byte_first)
tmp=sint2korr(pos); //master is INNODB, will run this
else
#endif
shortget(tmp,pos); //slave is NDB, will run this
return (uint32) tmp;
}
case 3:
return (uint32) uint3korr(pos);
case 4:
{
uint32 tmp;
#ifdef WORDS_BIGENDIAN
if (table->s->db_low_byte_first)
tmp=uint4korr(pos);
else
#endif
longget(tmp,pos);
return (uint32) tmp;
}
}
return 0; // Impossible
}
and from the result of printout, i can see the master prints the result of get_length() is 36 and the slave print out 603979776. it just means the different length byte order.
[9 Jul 2007 9:16]
Guangbao Ni
and I simplify the test case as followings,
--disable_query_log
--source include/have_ndb.inc
--source include/have_innodb.inc
--source include/have_binlog_format_mixed.inc
--source include/master-slave.inc
--enable_query_log
let $off_set = 9;
let $rpl_format = 'MIX';
# Create database/tables and stored procdures
connection master;
#--source include/tpcb.inc
CREATE DATABASE tpcb;
CREATE TABLE tpcb.history (id MEDIUMINT NOT NULL AUTO_INCREMENT,
tdate DATETIME, uuidf LONGBLOB,
filler CHAR(80),PRIMARY KEY (id));
# Switch tables on slave to use NDB
--sync_slave_with_master
USE tpcb;
ALTER TABLE history ENGINE NDB;
--echo
# Load DB tpcb and run some transactions
connection master;
--disable_query_log
#CALL tpcb.load();
INSERT INTO tpcb.history VALUES(1, NOW(),
UUID(),'completed trans');
--sync_slave_with_master
[10 Jul 2007 7:17]
Guangbao Ni
duplicate with bug#29549
