Description:
I have a setup master/slave in which applications are pointed only to master. Yesterday Master had got crashed due to "multi bit error on dimm detected" in Front indication panel in orange color.
After reboot I see the master went for recovery then it came up and application started to use the Master normally but still the err msg exist in Front indication panel. But now the slave had stuck out due to primary key constraint for a table.
The problem I face is master has a table data until what exist in binlog of master. But slave has the table data what is not in master binlog. Below is the table details when compared to Master table and Slave table.
Last info in binlog timestamp is 2013-04-29 02:13:11
System shut down timestamp 2013-04-29 02:54
On Master
==========
mysql> select * from audit where id=11298907;
------------------------------------------------------------
| id | ipaddress | dated | msisdn |
+----------+----------------+-------------------------------
|11298907 | 82.25.226.183 | 2013-04-29 02:13:11 | 998282821|
-----------------------------------------------------------
mysql> select * from audit where id > 11298907 limit 1;
-----------------------------------------------------------
| id | ipaddress | dated |msisdn |
+----------+----------------+------------------------------
|11298908 | 82.25.226.183 | 2013-04-29 04:31:13 | 992828111|
------------------------------------------------------------
On Slave
==========
mysql> select * from audit where id=11298907;
------------------------------------------------------------
| id | ipaddress | dated | msisdn |
+----------+----------------+-------------------------------
|11298907 | 82.25.226.183 | 2013-04-29 02:13:11 | 998282821|
-----------------------------------------------------------
mysql> select * from audit where id > 11298907 limit 1;
-----------------------------------------------------------
| id | ipaddress | dated |msisdn |
+--------+----------------+---------------------------------
|11298908 | 82.25.226.183 | 2013-04-29 02:13:12 | 762616173|
------------------------------------------------------------
Both the servers are ntp synched with GST. And they remain same after Master reboot too. It took almost 2 hrs to bring up Master Server. How come only slave can have the data that Master is not aware of?
Slave takes information from Master's binlog once it is written to disk, and that is how Slave DB server gets updated. ( I can get convinced when before flushing to disk the DB server got crashed and that is why I get previous data on Master). But what about the lines that found missing for me in Master binlog?
I don't find any trimmed information from Master err.log . As I see this is a vital flaw when it comes to Slave continuing from point of failure, it throws primary key constraint for the records for new entries from Master to Slave. I'm reporting this as a bug.
How to repeat:
Hit with a regular TPS of 100 to a Master 5.0.77 Mysql server under SBL that has slave of the same type hearing in the same network. Only normal DMLS with Seconds_Behind_Master=0. Crash the Master mysql server by doing a force reboot. After Master recovery, cross check the auto increments on Master and Slave tables. It says Slave having greater values than Master. Thus proving Master lags behind slave and Slave couldn't hear from Master again.
Suggested fix:
After recovery and rollback of Master DB data provide information about Master binlog position as rolled back checkpoint into binary file to rollback on Slave, whilst relay log info persist in Slave and auto rollback on Slave to the point where Master binlog checkpoint points.