Bug #54250 | Feature request for RBR auto-fix slave functionality | ||
---|---|---|---|
Submitted: | 5 Jun 2010 3:18 | Modified: | 6 Jun 2010 17:20 |
Reporter: | Shannon Wade | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S4 (Feature request) |
Version: | OS: | Any | |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[5 Jun 2010 3:18]
Shannon Wade
[7 Feb 2014 9:15]
Simon Mudd
Please adjust version to reflect 5.6 and 5.7.
[7 Feb 2014 9:24]
Simon Mudd
Also worth pointing out that if binlog_row_image = minimal then it may not be possibly to completely recovery automatically, but it might be desirable to fix things as best as possible. So I would like to see that the auto-repair mode in the case where a full row is not available to complete the repair would also allow a "best effort repair" (not the default), so in the case of a failed update (the row was missing), an insert of the PK + "those columns provided by the update" are also included. The other columns are not known so can not be handled, and it may be that they have NOT NULL columns which would then need manual intervention (which is fine) but if they do allow NULLs or have defaults then the incomplete update may be considered better than individual manual fixes. If "best effort repair mode" is enabled then any events handled this way should be counted separately as normal auto-fix mode should safely repair any broken rows (when it can work).
[7 Feb 2014 9:24]
Simon Mudd
See: bug#71618 for an example.
[6 Jun 2015 7:06]
Simon Mudd
Shannon opened this FR for me 5 years ago today. Today I had to work around an issue which this FR would have helped me with. So it's still valid when 5.6.25 is the current 5.6 version and 5.7.7 rc is the soon to be GA 5.7 version. Nothing has changed. To be clear I'd like to see: * a slave_exec_mode=AUTO_REPAIR which does the following in RBR: * updates which fail due to a missing row trigger an insert, and a counter is incremented. * deletes which fail due to a missing row do nothing, and a counter is incremented. * inserts which fail due to a row already existing trigger an update, and a counter is incremented. * In the 3 cases above replication continues. * In the case of minimal RBR try to continue if possible. If you can't then stop (e.g. changing an update to an insert but you don't have all the columns and they don't have defaults might mean you can't continue) In any case also add similar counters for when you're in slave_exec_mode=IDEMPOTENT: * idempotent_skip_duplicate_row * idempotent_skip_missing_row * idempotent_.... other counters here This solves a real problem and avoids basically switching to a "suck it and hope for the best" IDEMPOTENT mode which make all errors invisible (that's good , you can't see them). While many people may be able to stop their system, poke them and make them better for others like myself the most important thing is to keep replication flowing to slaves so you don't rely on a single box which "might" die leaving you with nothing, and later I can look at figuring out why stuff broke, and what to do about it, one task would be to get things back in sync again, but all the time while the "MySQL service" is still running. If you can't take downtime you really can't stop systems as that affects clients and applications which expect the database to be there all the time. So fixing errors requires fixing errors on a broken "but working" system until it's no longer broken etc. I'd be most delighted if this functionality makes it to MySQL as I've been bitten on a number of occasions by RBR breakage and life is not "good" when this happens. This auto-repair functionality would in many cases allow replication to be self-healing as you'd not only be not making the state of the downstream database servers worse (which effectively the IDEMPOTENT mode does) but you'd actually be making them better.
[9 Dec 2018 9:12]
Jo Goossens
This sounds like a great idea and way better than current implementation!