Bug #50078 | Executing REPAIR on master with broken table does not correctly update slave | ||
---|---|---|---|
Submitted: | 5 Jan 2010 8:45 | Modified: | 11 May 2010 15:43 |
Reporter: | Roel Van de Paar | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S2 (Serious) |
Version: | ALL | OS: | Any |
Assigned to: | Jon Stephens | CPU Architecture: | Any |
[5 Jan 2010 8:45]
Roel Van de Paar
[11 Jan 2010 10:50]
Sveta Smirnova
Thank you for the report. Strictly say this is not a bug: master doesn't know which records replicated on slave. But report can be verified as feature request though.
[12 Jan 2010 19:03]
Ann Harrison
Even more strictly speaking, the problem is that repair is recreating rows that were stored by a statement that failed. Many years ago, Monty explained to me that MyISAM is a transactional engine with the restriction that each transaction can include at most one statement. Turns out that's not true, of course, but if it were, the repair wouldn't bring back any of the results of the big recursive insert. Combining a storage engine without verb-level undo and statement level replication is a formula for disaster.
[17 Jan 2010 21:12]
Roel Van de Paar
> Changing how the repair command is written into the binary log will not solve the problem. I think it will (or at least alleviate it): The repair is aware of what is changing/happening: mysql> repair table test1; +-------------+--------+----------+------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------+--------+----------+------------------------------------------+ | roelt.test1 | repair | warning | Number of rows changed from 5 to 8800548 | | roelt.test1 | repair | status | OK | +-------------+--------+----------+------------------------------------------+ 2 rows in set (1.59 sec) It is aware of the previous state: 5 rows. It is aware of the new state: x rows. As such, it could write rows 6-x to the binary log. True, this is an example only, and there are other cases. Any improvements of the repair ('intelligent repair') would make the situation much better as it is today though. Another option is to fully replicate a successfully repaired table. > What if the repair fails? Then, fail replication on the slave with a error: 'Repair of table <tablename> on master failed. There is now a difference between table <tablename> on the master and the slave. Please check the table on both servers and keep the best copy.'
[17 Jan 2010 21:17]
Roel Van de Paar
Another consideration one may have is that we are trying to make MyISAM transactional. But this is not the case: the issue is rather the repair which tries to 'smart' modify the table. And, in the case of replication, this causes an issue, as per the above.
[24 Feb 2010 8:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/101294 3354 Alfranio Correia 2010-02-24 BUG#50078 Executing REPAIR on master with broken table does not correctly update slave Replicating modified rows when running a repair on a myisam's table requires changes on myisam because these modified rows do not notify the handlers used by the replication, most likely due to the low level nature of the operations. Furthermore, if such changes were implemented on myisam and replicated, this would work only in RBR and would harm a basic principle that DDL statements should be replicated as plain text. For those reasons, we do not change the replication in such way. Generating an incident event and stopping slaves when a repair changes a table would be an alternative. In other words, we want to avoid stopping the slaves when there is nothing to be repaired. However, this would require either to modify the handler's interface, change its semantics or break encapsulation: -- Modifying the handler's Interface -- There is nothing in the handler interface that allows us to know if the underlying table was changed or not. int handler::ha_repair(THD* thd, HA_CHECK_OPT* check_opt) int handler::ha_repair(THD* thd, HA_CHECK_OPT* check_opt) { int result; mark_trx_read_write(); if ((result= repair(thd, check_opt))) return result; return update_frm_version(table); } -- Changing its semantics -- The return value does not allow us to distinguish between a successful execution that has changed data, a successful execution that has changed nothing and between a failed execution that has changed data and another one that has changed nothing. So, we would need to return a bitmap to circumvent this. Indirectly, this approach represents a change on the handler's interface. -- Breaking encapsulation -- The message reported by the repair command when something is changed is generated by the engine and to know what exactly happened would require to parse the THD::Protocol *protocol thus breaking encapsulation. So, in this patch we only raise the level of the message that reports that the myisam's table was changed while running a repair, from a note to a warning message. === modified file 'storage/myisam/ha_myisam.cc' --- storage/myisam/ha_myisam.cc 2009-10-27 14:27:27 +0000 +++ storage/myisam/ha_myisam.cc 2010-02-23 16:11:14 +0000 @@ -1053,7 +1053,7 @@ !(check_opt->flags & T_VERY_SILENT)) { char llbuff[22],llbuff2[22]; - sql_print_information("Found %s of %s rows when repairing '%s'", + sql_print_warning("Found %s of %s rows when repairing '%s'", llstr(file->state->records, llbuff), llstr(start_records, llbuff2), table->s->path.str); @ mysql-test/r/mysqlcheck.result Changed test case due to the warning message. @ mysql-test/r/repair.result Changed test case due to the warning message. @ mysql-test/suite/federated/federated.result Changed test case due to the warning message. @ mysql-test/suite/federated/federated.test Changed test case due to the warning message. @ mysql-test/t/mysqlcheck.test Changed test case due to the warning message. @ mysql-test/t/repair.test Changed test case due to the warning message. @ storage/myisam/ha_myisam.cc Changed the message from a note to a warning.
[25 Feb 2010 10:51]
Roel Van de Paar
Alfranio, Would there be any way we can do: > Found %s1 of %s2 rows when repairing '%s3' IF %s1<%s2 THEN STOP SLAVE & list inconsistency error? Though I can see from your comments that such a solution would likely be complex, I think we do need to stop the slave and tell people that their data is now inconsistent. Otherwise, the error may go unnoticed (automatic repairs?).
[9 Mar 2010 0:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/102636 3375 Alfranio Correia 2010-03-09 BUG#50078 Executing REPAIR on master with broken table does not correctly update slave REPAIR commands executed on tables that support it and whose logging are not suppressed (i.e. LOCAL or NO_WRITE_TO_BINLOG) are written to the binary log along with an incident event to indicate that the master and slaves might be out of sync and some manual maintenence is required.
[19 Mar 2010 16:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/103868 3412 Alfranio Correia 2010-03-19 BUG#50078 Executing REPAIR on master with broken table does not correctly update slave When the REPAIR command is executed on the master with a broken table, the slave will be out of sync because changes on the table will not be propagated to it. To avoid that a possible divergence goes unnoticed, we do what follows. Every REPAIR commands executed on tables that support it and whose logging are not suppressed (i.e. LOCAL or NO_WRITE_TO_BINLOG) are written to the binary log along with an incident event INCIDENT_TABLE_REPAIRED to indicate that the master and slaves may be out of sync and some manual maintenance is required. @ mysql-test/suite/rpl/r/rpl_repair_incident.result Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident-master.opt Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident.test Added a test case. @ sql/log_event.cc Added a new type of incident event INCIDENT_TABLE_REPAIRED and by consequence added a description on it. @ sql/rpl_constants.h Added a new type of incident event INCIDENT_TABLE_REPAIRED. @ sql/sql_table.cc Modifieded the function mysql_admin_table in order to write an incident event to binary log when a REPAIR command is executed.
[22 Mar 2010 3:13]
Roel Van de Paar
Hi Alfranio, thanks for the patch you created. Looking at it, I cannot immediately see what error/warning would be generated - or is the idea to create an incident first, then have the incident handled by the various clients? Also, ideally, besides logging an error to the error log, it would be good to return an error to the client immediately when a REPAIR was issued so the user becomes aware of the issue immediately.
[7 Apr 2010 14:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/105159 3447 Alfranio Correia 2010-04-07 BUG#50078 Executing REPAIR on master with broken table does not correctly update slave When the REPAIR command is executed on the master with a broken table, the slave will be out of sync because changes on the table will not be propagated to it. To avoid that a possible divergence goes unnoticed, we do what follows. Every REPAIR commands executed on tables that support it and whose logging are not suppressed (i.e. LOCAL or NO_WRITE_TO_BINLOG) are written to the binary log along with an incident event INCIDENT_TABLE_REPAIRED to indicate that the master and slaves may be out of sync and some manual maintenance is required. @ mysql-test/r/archive.result Updated the result file with information on incident event. @ mysql-test/r/ps.result Updated the result file with information on incident event. @ mysql-test/suite/rpl/r/rpl_repair_incident.result Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident-master.opt Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident.test Added a test case. @ sql/log_event.cc Added a new type of incident event INCIDENT_TABLE_REPAIRED and by consequence added a description on it. @ sql/rpl_constants.h Added a new type of incident event INCIDENT_TABLE_REPAIRED. @ sql/sql_table.cc Modifieded the function mysql_admin_table in order to write an incident event to binary log when a REPAIR command is executed.
[22 Apr 2010 15:58]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106376 3447 Alfranio Correia 2010-04-22 BUG#50078 Executing REPAIR on master with broken table does not correctly update slave When the REPAIR command is executed on the master with a broken table, the slave will be out of sync because changes on the table will not be propagated to it. To avoid that a possible divergence goes unnoticed, we do what follows. Every REPAIR commands executed on tables that support it and whose logging are not suppressed (i.e. LOCAL or NO_WRITE_TO_BINLOG) are written to the binary log along with an incident event INCIDENT_TABLE_REPAIRED to indicate that the master and slaves may be out of sync and some manual maintenance is required. @ mysql-test/r/archive.result Updated the result file with information on incident event. @ mysql-test/r/ps.result Updated the result file with information on incident event. @ mysql-test/suite/rpl/r/rpl_repair_incident.result Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident-master.opt Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident.test Added a test case. @ sql/log_event.cc Added a new type of incident event INCIDENT_TABLE_REPAIRED and by consequence added a description on it. @ sql/rpl_constants.h Added a new type of incident event INCIDENT_TABLE_REPAIRED. @ sql/sql_table.cc Modified the function mysql_admin_table in order to write an incident event to binary log when a REPAIR command is executed.
[23 Apr 2010 10:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106421 3447 Alfranio Correia 2010-04-23 BUG#50078 Executing REPAIR on master with broken table does not correctly update slave When the REPAIR command is executed on the master with a broken table, the slave will be out of sync because changes on the table will not be propagated to it. To avoid that a possible divergence goes unnoticed, we do what follows. Every REPAIR commands executed on tables that support it and whose logging are not suppressed (i.e. LOCAL or NO_WRITE_TO_BINLOG) are written to the binary log along with an incident event INCIDENT_TABLE_REPAIRED to indicate that the master and slaves may be out of sync and some manual maintenance is required. @ mysql-test/r/archive.result Updated the result file with information on incident event. @ mysql-test/r/ps.result Updated the result file with information on incident event. @ mysql-test/suite/rpl/r/rpl_repair_incident.result Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident-master.opt Added a test case. @ mysql-test/suite/rpl/t/rpl_repair_incident.test Added a test case. @ sql/log_event.cc Added a new type of incident event INCIDENT_TABLE_REPAIRED and by consequence added a description on it. @ sql/rpl_constants.h Added a new type of incident event INCIDENT_TABLE_REPAIRED. @ sql/sql_table.cc Modified the function mysql_admin_table in order to write an incident event to binary log when a REPAIR command is executed.
[4 May 2010 14:01]
Jon Stephens
Since this is really a Docs bug, I've changed category/status/lead accordingly and assigned to myself for resolution. Also updated affected/target versions.
[11 May 2010 15:43]
Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.
[8 Jun 2010 2:09]
Roel Van de Paar
For anyone reading this, the outcome of this bug so far was that it was documented that a user must check both the master and the slave for data consistency after a MyISAM table was broken.