| Bug #31190 | Exec_Master_Log_Pos in SHOW SLAVE STATUS can be incorrect in some cases | ||
|---|---|---|---|
| Submitted: | 25 Sep 2007 18:31 | Modified: | 29 Feb 2008 15:51 |
| Reporter: | Harrison Fisk | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.0.30sp1-5.0.46 | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | bfsm_2007_10_18, Exec_master_log_pos, replication, stop slave | ||
[25 Sep 2007 18:31]
Harrison Fisk
Relay log which caused the problem
Attachment: ubuntu3-relay-bin.001329.gz (application/x-gzip, text), 122.83 KiB.
[23 Jan 2008 18:28]
Andrei Elkin
It might be this one is a duplicate for bug#12691. There is a note on bug#12691 page explaining a feature in behaviour of "Exec_master_log_pos is not getting corrupted although the couter indeed behaves sometimes non-monotonically ...". As the patch for the earlier bug " [27 Nov 2007 11:48] Bugs System Pushed into 5.0.54 " we indeed may not reproduce the current, in either version.
[27 Feb 2008 12:21]
Robert Krzykawski
I am experiencing the same problem.. It's not often, but it happens. We have noticed this on 4.1.22, 5.0.32 (enterprise), 5.0.50 (enterprise).. Today for example, it happened again. We stopped replication with "slave stop;", did a "show slave status\G" and issued a "change master..." And the position was wrong. Now, when checking the logs, relay logs and masters binlog, in fact the exec_master_log_pos reported on the slave does not exist on the master. If i would restart the slave before issuing a "change master" i guess there would be no problem, since the slave would continue reading it's relaylogs and so on, but since we wanted to switch from using a ssh tunnel between the master and the slave to our new wan connection, we needed to change host. <snip from mysqld.log> 080227 4:27:50 [Note] Slave I/O thread: connected to master 'rep@127.0.0.1:3351', replication started in log 'x-bin.001507' at position 237826070 080227 4:37:52 [Note] Slave I/O thread exiting, read up to log 'x-bin.001507', position 281271771 080227 4:37:52 [Note] Slave SQL thread exiting, replication stopped in log 'x-bin.001507' at position 1756 080227 4:41:57 [Note] Slave SQL thread initialized, starting replication in log 'x-bin.001507' at position 1756, relay log '/db/x/log/relaylog/x-relay.000001' position: 98 080227 4:41:58 [Note] Slave I/O thread: connected to master 'rep@192.168.30.5:3306', replication started in log 'x-bin.001507' at position 1756 080227 4:41:58 [ERROR] Error reading packet from server: error reading log entry ( server_errno=1236) 080227 4:41:58 [ERROR] Got fatal error 1236: 'error reading log entry' from master when reading data from binary log 080227 4:41:58 [Note] Slave I/O thread exiting, read up to log 'x-bin.001507', position 1756 080227 4:44:33 [Note] Error reading relay log event: slave SQL thread was killed </snip from mysqld.log> <snip from binlog> [root@host binlog]# mysqlbinlog --start-position=1756 x-bin.001507 /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1751607660, event_type: 34 # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; </snip from binlog> This could be serious if you have a slave that you really depend upon.. In this case, we could just roll back to a snapshot of this backup database (we do binary snapshots in the SAN for this backup database every hour), but say that it was in production.. - it could cause a big mess.
[27 Feb 2008 15:33]
Harrison Fisk
I have not been able to repeat this under 5.0.54 or above. Previously, I was able to repeat it in under 5 minutes regularly, and I let a test run for 4 hours without repeating it. So it seems to me that it is indeed fixed, or at least the issue my test case was revealing was fixed.
[29 Feb 2008 15:51]
Ramil Kalimullin
See bug #12691: Exec_master_log_pos corrupted with SQL_SLAVE_SKIP_COUNTER

Description: In certain circumstances the Exec_master_log_pos value in SHOW SLAVE STATUS can be reported as incorrect. It reports a very small size (the size of the event it seems) and not the correct value. The behavior required to reproduce it: 1. Use transactions 2. Update a non-transactional table during the transaction 3. Issue STOP SLAVE at exactly the correct time during log processing Inside of the transaction log, the end_log_pos is incorrect (as documented). Normally, this isn't a problem as the slave always keeps track of the beginning of the transaction location to report in SHOW SLAVE STATUS. However, due to the non-transactional table, it can report the incorrect position in some cases. Exec_master_log_pos is used for many different things, such as correcting corrupt relay logs, setting up a new slave, etc... so this information *must* be correct. The show slave status with the incorrect information: mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.1.105 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: hfisk-desktop-bin.000009 Read_Master_Log_Pos: 289315421 Relay_Log_File: ubuntu3-relay-bin.001329 Relay_Log_Pos: 830664 Relay_Master_Log_File: hfisk-desktop-bin.000009 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 78 Relay_Log_Space: 1048828 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL 1 row in set (0.01 sec) Error log contents proving that position 78 is incorrect: 070923 3:44:45 [Note] Slave SQL thread initialized, starting replication in log 'hfisk-desktop-bin.000009' at position 288796019, relay log '/usr/local/mysql/data/ubuntu3-relay-bin.001329' position: 528932 070923 3:44:45 [Note] Slave I/O thread: connected to master 'rep@192.168.1.105:3306', replication started in log 'hfisk-desktop-bin.000009' at position 289315421 070923 3:44:48 [Note] Slave I/O thread exiting, read up to log 'hfisk-desktop-bin.000009', position 289315421 070923 3:44:48 [Note] Slave SQL thread exiting, replication stopped in log 'hfisk-desktop-bin.000009' at position 78 Notice that the SQL thread started with: hfisk-desktop-bin.000009' at position 288796019 and ended with: 'hfisk-desktop-bin.000009' at position 78 I will attach the relay log ubuntu3-relay-bin.001329 for your viewing. How to repeat: 1. Setup replication 2. Create 1 transactional table and 1 non-transaction table, for example: CREATE TABLE test ( a int ) ENGINE=memory; CREATE TABLE test2 ( a int ) ENGINE=innodb; 3. Setup a process to insert/truncate from these tables. The script I used with gypsy was: n|1|BEGIN; n|2|INSERT INTO test VALUES (?)|int n|2|INSERT INTO test2 VALUES (?)|int n|1|COMMIT; With truncate occuring occasionally from a batch process. 4. Create a script to stop replication and check the position, I used: #/bin/bash while true; do mysqladmin -u root stop-slave mysql --vertical -u root -e "show slave status" | grep "Relay_Mast\|Exec" FOO=`mysql -e "show slave status\G" | grep "Exec" | awk '{ print $2}'` if [ $FOO -lt 1000 ] then echo "Found one"; exit fi sleep 1 mysqladmin -u root start-slave mysql --vertical -u root -e "show slave status" | grep "Relay_Mast\|Exec" sleep 1 done 5. Wait a while with both processes running for the script to stop Suggested fix: Make the Exec_master_log_pos always correct. Since Exec_master_log_pos relies on it, this will most likely involve making sure that the end_log_pos is correct for all statements.