Description:
Sometimes, a query can succeed on master, but still an error code (a EE_ error code, not a ER_ error code) is written in the binary log for this query.
Example:
CREATE TABLE t (a INT);
manually delete t.MYI
DROP TABLE t;
Server will say "ok" to client (no error) but as my_delete("t.MYI") failed with error code EE_DELETE (failure which is not critical as the MYI did not exist and our goal was to remove it), thd->net.last_error is EE_DELETE (6). So the query gets logged (in the binlog) with error code 6.
To sum up: client got no error, but error is written in binlog.
So if the query runs fine on slave (my_delete() succeeds), slave will say: "got error 0 (no error), expected error 6 (invalid error code)".
The example of the customer is:
mysqld got a "disk full" on master when writing a MyISAM table; mysqld retried the write and succeeded, so no error was reported to client. But still thd->net.last_error was left to EE_WRITE, which was written to binlog, and slave stopped. (True, the "disk full" was logged in the master's .err file.)
How to repeat:
On master:
CREATE TABLE t (a INT);
manually delete t.MYI
DROP TABLE t;
see slave stop with error
Suggested fix:
As EE_ errors in binlog can probably be considered neglectable (because if master logged no ER_ error in binlog it means that it considers the query successful), there are 2 ways:
1) master should write thd->net.last_error to binlog only if it's a ER_
2) slave should ignore error read in binlog if it's not a ER_.
Advantages of 2):
* to remove the bug, users just need to upgrade their slave, which is usually easier than upgrading the master.
* EE_ codes are in the binlog, they can be used for debugging
Advantage of 1):
more logical (don't write an error code if you consider the query successful).
I vote for 2) and will propose a patch in this sense.
Description: Sometimes, a query can succeed on master, but still an error code (a EE_ error code, not a ER_ error code) is written in the binary log for this query. Example: CREATE TABLE t (a INT); manually delete t.MYI DROP TABLE t; Server will say "ok" to client (no error) but as my_delete("t.MYI") failed with error code EE_DELETE (failure which is not critical as the MYI did not exist and our goal was to remove it), thd->net.last_error is EE_DELETE (6). So the query gets logged (in the binlog) with error code 6. To sum up: client got no error, but error is written in binlog. So if the query runs fine on slave (my_delete() succeeds), slave will say: "got error 0 (no error), expected error 6 (invalid error code)". The example of the customer is: mysqld got a "disk full" on master when writing a MyISAM table; mysqld retried the write and succeeded, so no error was reported to client. But still thd->net.last_error was left to EE_WRITE, which was written to binlog, and slave stopped. (True, the "disk full" was logged in the master's .err file.) How to repeat: On master: CREATE TABLE t (a INT); manually delete t.MYI DROP TABLE t; see slave stop with error Suggested fix: As EE_ errors in binlog can probably be considered neglectable (because if master logged no ER_ error in binlog it means that it considers the query successful), there are 2 ways: 1) master should write thd->net.last_error to binlog only if it's a ER_ 2) slave should ignore error read in binlog if it's not a ER_. Advantages of 2): * to remove the bug, users just need to upgrade their slave, which is usually easier than upgrading the master. * EE_ codes are in the binlog, they can be used for debugging Advantage of 1): more logical (don't write an error code if you consider the query successful). I vote for 2) and will propose a patch in this sense.