Bug #86171 | master_pos_wait incompatible with MTS | ||
---|---|---|---|
Submitted: | 4 May 2017 1:53 | Modified: | 2 Nov 2017 13:49 |
Reporter: | Trey Raymond | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | replication |
[4 May 2017 1:53]
Trey Raymond
[4 May 2017 4:07]
Trey Raymond
this needs polish, but here's a pretty basic way to get current position that supports mts and non mts: ( select master_log_name,master_log_pos from mysql.slave_worker_info w join ( select @tid:=@tid+1 as tid,t.PROCESSLIST_TIME from performance_schema.threads t join (select @tid:=0) tmp where t.NAME='thread/sql/slave_worker' order by THREAD_ID asc ) tmp on tmp.tid=w.Id where tmp.PROCESSLIST_TIME=0 order by master_log_name asc,master_log_pos asc limit 1 ) union all ( select master_log_name,master_log_pos from slave_worker_info order by master_log_name desc,master_log_pos desc limit 1 ) union all ( select master_log_name,master_log_pos from slave_relay_log_info ) limit 1
[17 Aug 2017 10:33]
MySQL Verification Team
Hi, If I understand you correctly you are reporting that when you exec master_pos_wait() it's waiting till the checkpoint and not returning "during" checkpoint as soon as event is read? The manual clearly states "...It blocks until the slave has read and applied all updates up to the specified position...". The update is fully "applied" only after the checkpoint so it's actually behaving as expected. Or am I misunderstanding your report? kind regards Bogdan
[18 Aug 2017 19:22]
Trey Raymond
It's quite a stretch to say ""The update is fully "applied" only after the checkpoint"" - every thread on the db will consider an update applies after it commits, with no regard to checkpointing. This isn't a very easy fix, I know. It's not trivial for a thread to check every worker's execution since the last checkpoint, as it's not ordered or distributed in a predictable way. Problem is, the function's not very useful without a fix (well, not useful with MTS). I don't know how much it's used industry wide - we don't depend on it, but others might. I've seen switchover scripts and such that will call this after setting one host to read_only to wait for the new master to sync. In this case, it could wait until the max value of slave_checkpoint_period, as stopped writes wouldn't trigger one based on _group, and that might be an enormous amount of time. Anyone using a script like that should be aware of this. If fixing the function is impractical, then the documentation should at least be clear on this, specifying that it will wait until there's a checkpoint at or past the specified position, vs just waiting until it's been applied which is often much shorter.
[21 Aug 2017 18:59]
Trey Raymond
If you do decide to fix, logic would be like: Check all worker threads that have been active since the last checkpoint, and see if they have all processed up to/including the specified log position. this data is tracked internally, just needs logic to use it. Checking just the thread which processed the specified position isn't good enough, as others could be further behind.
[23 Aug 2017 14:19]
MySQL Verification Team
Hi, I do agree documentation here might be sketchy so that should be fixed for sure hence marking the bug as verified. As for the switch over, well those scripts will still work, they will just have to wait few more seconds/minutes longer then they would really have to - but on the other side, before checkpoint it is "unsafe" really to assume anything hence imo this behavior is, for production, and this case of switchover scripting, better and safer, then reporting "as soon" and not "after checkpoint" as it is now. In any case I agree that at least documentation should be clear on what's going on. many thanks Bogdan
[23 Aug 2017 17:26]
Trey Raymond
Yeah, it's not a good way to do switchovers, and doesn't affect me personally - this issue came up on freenode a while back when someone was looking for help, using some mysql switchover tool that used this method. it's not great, but people are using it, maybe they'll stop if it's more clearly documented.
[2 Nov 2017 13:49]
Margaret Fisher
Posted by developer: Thanks for the information. Added for 5.6, 5.7, and 8.0 in https://dev.mysql.com/doc/refman/5.6/en/miscellaneous-functions.html#function_master-pos-w... On a multi-threaded slave, the function waits until expiry of the limit set by the slave_checkpoint_group or slave_checkpoint_period system variable, when the checkpoint operation is called to update the status of the slave. Depending on the setting for the system variables, the function might therefore return some time after the specified position was reached.