Bug #36826 | rpl_slave_status fails sporadically in pushbuild | ||
---|---|---|---|
Submitted: | 20 May 2008 17:48 | Modified: | 29 Jul 2008 10:10 |
Reporter: | Sven Sandberg | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.1 | OS: | Any |
Assigned to: | Sven Sandberg | CPU Architecture: | Any |
Tags: | pushbuild, rpl_slave_status, sporadic, test failure |
[20 May 2008 17:48]
Sven Sandberg
[20 May 2008 17:49]
Sven Sandberg
Here's a result diff: rpl.rpl_slave_status 'row' [ fail ] --- /home/sven/bk/debug-max/5.1-bugteam/mysql-test/suite/rpl/r/rpl_slave_status.result 2008-05-20 20:24:41.000000000 +0300 +++ /home/sven/bk/debug-max/5.1-bugteam/mysql-test/suite/rpl/r/rpl_slave_status.reject 2008-05-20 20:31:53.000000000 +0300 @@ -98,30 +98,31 @@ flush privileges; stop slave; start slave; -show slave status; -Slave_IO_State # +"Failed while waiting for slave SQL to start" +SHOW SLAVE STATUS; +Slave_IO_State Connecting to master Master_Host 127.0.0.1 Master_User rpl -Master_Port MASTER_MYPORT +Master_Port 10080 Connect_Retry 1 Master_Log_File master-bin.000001 -Read_Master_Log_Pos # -Relay_Log_File # -Relay_Log_Pos # +Read_Master_Log_Pos 1134 +Relay_Log_File slave-relay-bin.000002 +Relay_Log_Pos 636 Relay_Master_Log_File master-bin.000001 Slave_IO_Running No -Slave_SQL_Running Yes +Slave_SQL_Running No Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table -Last_Errno 0 -Last_Error +Last_Errno 1032 +Last_Error Could not execute Delete_rows event on table mysql.user; Can't find record in 'user', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 986 Skip_Counter 0 -Exec_Master_Log_Pos # -Relay_Log_Space # +Exec_Master_Log_Pos 620 +Relay_Log_Space 1305 Until_Condition None Until_Log_File Until_Log_Pos 0 @@ -133,10 +134,7 @@ Master_SSL_Key Seconds_Behind_Master NULL Master_SSL_Verify_Server_Cert No -Last_IO_Errno # -Last_IO_Error # -Last_SQL_Errno 0 -Last_SQL_Error -drop table t1; -delete from mysql.user where user='rpl'; -drop table t1; +Last_IO_Errno 1045 +Last_IO_Error error connecting to master 'rpl@127.0.0.1:10080' - retry-time: 1 retries: 10 +Last_SQL_Errno 1032 +Last_SQL_Error Could not execute Delete_rows event on table mysql.user; Can't find record in 'user', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 986 mysqltest: In included file "./include/wait_for_slave_sql_to_start.inc": Result content mismatch Aborting: rpl.rpl_slave_status failed in default mode. To continue, re-run with '--force'. Stopping All Servers
[20 May 2008 17:51]
Sven Sandberg
Here's the end of the output from running mysqlbinlog on slave-bin.000001: # at 476 (0x1dc) # at 517 (0x205) #080520 19:31:49 server id 1 end_log_pos 517 Table_map: `test`.`t1` mapped to number 16 #080520 19:31:49 server id 1 end_log_pos 551 Write_rows: table id 16 flags: STMT_END_F BINLOG ' BQszSBMBAAAAKQAAAAUCAAAAABAAAAAAAAAABHRlc3QAAnQxAAEDAAE= BQszSBcBAAAAIgAAACcCAAAQABAAAAAAAAEAAf/+AQAAAA== '/*!*/; # at 551 (0x227) #080520 19:31:49 server id 1 end_log_pos 620 Query thread_id=3 exec_time=0 error_code=0 SET TIMESTAMP=1211304709/*!*/; COMMIT /*!*/; # at 620 (0x26c) #080520 19:31:50 server id 1 end_log_pos 688 Query thread_id=3 exec_time=0 error_code=0 SET TIMESTAMP=1211304710/*!*/; BEGIN /*!*/; # at 688 (0x2b0) # at 841 (0x349) #080520 19:31:50 server id 1 end_log_pos 841 Table_map: `mysql`.`user` mapped to number 2 #080520 19:31:50 server id 1 end_log_pos 986 Delete_rows: table id 2 flags: STMT_END_F BINLOG ' BgszSBMBAAAAmQAAAEkDAAAAAAIAAAAAAAAABW15c3FsAAR1c2VyACf+/v7+/v7+/v7+/v7+/v7+ /v7+/v7+/v7+/v7+/v7+/vz8/AMDAwND/rT+MP4p9wH3AfcB9wH3AfcB9wH3AfcB9wH3AfcB9wH3 AfcB9wH3AfcB9wH3AfcB9wH3AfcB9wH3AfcB9wH3AQICAgAAAAAA BgszSBkBAAAAkQAAANoDAAAQAAIAAAAAAAEAJ///////AAAAAIAJMTI3LjAuMC4xA3JwbCkqNjI0 NDU5Qzg3RTUzNEExMjY0OThBREUxQjEyRTBDNjZFREEwMzVBMwEBAQEBAQEBAQEBAQEBAQEBAQEC AQEBAQEBAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA== '/*!*/; # at 986 (0x3da) #080520 19:31:50 server id 1 end_log_pos 1055 Query thread_id=3 exec_time=0 error_code=0 SET TIMESTAMP=1211304710/*!*/; COMMIT /*!*/; # at 1055 (0x41f) #080520 19:31:50 server id 1 end_log_pos 1134 Query thread_id=3 exec_time=0 error_code=0 SET TIMESTAMP=1211304710/*!*/; flush privileges /*!*/; # at 1134 (0x46e) #080520 19:31:55 server id 1 end_log_pos 1153 Stop DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
[20 May 2008 17:54]
Sven Sandberg
Note that the slave's binlog contains a Delete_rows_log_event that removes the record from mysql.user, but SHOW SLAVE STATUS indicates that a Delete_rows_log_event could not be executed since the row is not in the table. This hints that it could be the case that the slave tries to execute the event twice. Also note that SHOW SLAVE STATUS indicates that the slave IO thread had errors.
[23 May 2008 13:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/46998
[26 May 2008 9:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47040
[26 May 2008 13:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47056
[26 May 2008 13:08]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47057
[27 May 2008 15:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47095
[28 May 2008 15:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47154
[7 Jul 2008 16:21]
Eric Jensen
Is it possible that this bug is the cause of my problems with RBR? Here is an excerpt of my post from http://forums.mysql.com/read.php?26,216227,216227#msg-216227 I can't stop and start the replication at all without breaking the replication. If I do "stop slave" and then "start slave" again it immediately has an error like this, seemingly for random tables that have been written to (not one in particular): Could not execute Write_rows event on table XXX; Duplicate entry 'YYY' for key 'ZZZ', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000003, end_log_pos 704849 I'm running the Linux (AMD64 / Intel EM64T) build of 5.1.25 on a xeon box...nothing that unusual about my environment or database. Only thing I could think of is that my app does some large insert delayed ... values ... statements.
[7 Jul 2008 17:44]
Sven Sandberg
Eric, this bug is probably unrelated to your problems. This bug is about a mistake in how our internal test was constructed. It is impossible to know what your problem is, because some information is missing. You'll have to debug your SQL yourself. The error you get indicates that a row was inserted on the master with the same key as a row on the slave. Try to figure out which row, and what could cause your error.
[7 Jul 2008 18:22]
Mats Kindahl
Hi Eric! As Sven mentions, your problems have nothing to do with this bug. The error HA_ERR_FOUND_DUPP_KEY is generated by the storage engine whenever there is a duplicate key in the table. Since you are not giving much information about the setup, it is hard to say exactly what is the reason for your failure, but if I were to make a guess, I suspect that for some reason, replication was cut of in the middle of a statement (or transaction). If transactional engines are used on both the master and the slave, this is not a problem, but if a non-transactional engine like MyISAM is used, applied rows on the slave will not be rolled back but replication will start from the beginning of the partially applied statement when you do a START SLAVE, causing it to stop again with a duplicate key error. What you need to do is to figure out what rows are missing on the slave, add them manually, and then do: mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; mysql> START SLAVE; This will cause replication to skip one event or transaction, and start replication after that. This will skip the partially applied statement, so unless you have applied the missing rows manually, the databases will be out of sync. Please note that you might have another situation, and also that you have to check that the databases are in sync after you have managed to get replication running again. In the future, please ask these kinds of questions to the Replication list (see http://lists.mysql.com/) since you are more likely to get a more constructive answer to your problem there.
[7 Jul 2008 18:55]
Eric Jensen
Thank you guys for the response. Obviously we shouldn't be perpetuating this discussion within this bug, but I actually did post this question to the replication list (almost a month ago now) and forum and haven't gotten a response. It sounds to me that what I'm running into is a fundamental limitation of the current mysql RBR and not a bug. Mats, you are saying that with a non-transactional engine, it is impossible to guarantee that stopping and starting RBR won't break replication, right? In my app, I'm doing lots of batched, large "insert values ..." statements that take the bulk of the time, so it's tough to be able to stop replication and not hit the middle of one of them. Therefore, starting it back up I always have an inconsistency. If I am understanding this correctly, then this should really be documented better. Or maybe more extreme, using RBR with non-transactional engines should generate at least some sort of warning? I'm having a hard time thinking of scenario where someone would want to use it unless they were using transactional tables. Thanks, eric
[28 Jul 2008 16:50]
Bugs System
Pushed into 5.1.27 (revid:joerg@mysql.com-20080626161101-h5hsaluwtf05xdr2) (version source revid:joerg@mysql.com-20080626161101-h5hsaluwtf05xdr2) (pib:3)
[29 Jul 2008 10:10]
Jon Stephens
Discussed with Sven on IRC - test case fix only, to end-user changes to document. Closed without further action.