Description:
There are 4 ( 0 - 3 ) stages in the binlog commit.
The log_status only hold mutex LOCK_log to protect binlog.
inline mysql_mutex_t *get_log_lock() { return &LOCK_log; }
void Log_resource_binlog_wrapper::lock() {
mysql_mutex_lock(binlog->get_log_lock());
}
The binlog position is updated at the state #1, but the GTID will be updated in the state #3.
In the stage #2, It will release the mutex LOCK_log.
if (change_stage(thd, Commit_stage_manager::SYNC_STAGE, wait_queue, &LOCK_log,
&LOCK_sync)) {
DBUG_PRINT("return", ("Thread ID: %u, commit_error: %d", thd->thread_id(),
thd->commit_error));
return finish_commit(thd);
}
If we get the result in the stage #2, we will get the GTID behind the file position.
When the bug occurs, the GTID is 1-14, and the position is 1937.
select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-14", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
If we get the result after the stage 3, the GTID is 1-15.
select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-15", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
Percona server use 2 more mutex LOCK_sync, and LOCK_commit to fix this issue.
void Log_resource_gtid_state_wrapper::lock() {
DEBUG_SYNC(current_thd, "log_resource_gtid_lock");
mysql_mutex_lock(binlog->get_sync_lock());
mysql_mutex_lock(binlog->get_commit_lock());
global_sid_lock->wrlock();
}
However, It is still a bug in Percona version when the binlog_order_commits is OFF.
How to repeat:
1. start a gdb
# gdb
(gdb) set non-stop on
(gdb) attach ${the process id of mysqld}
(gdb) set substitute-path ../../mysql-8.0.34/ /home/root/mysql-server/
(gdb) b sql/binlog.cc:9036
Breakpoint 1 at 0x1d5d435: file ../../mysql-8.0.34/sql/binlog.cc, line 9036.
(gdb) c -a
Continuing.
2. start a dml or ddl sql
# the sql will suspend at the break point
mysql [localhost:8035] {msandbox} ((none)) > create database a;
3. get the result from log_status
# the sql will wait the LOCK_log mutex
time:05:52)[db: (none)] select local from performance_schema.log_status\G
4. continue the dml or ddl query on the gdb
(gdb) thread 38
[Switching to thread 38 (Thread 0x7fe25c30f640 (LWP 187653))]
#0 MYSQL_BIN_LOG::ordered_commit () at ../../mysql-8.0.34/sql/binlog.cc:9036
9036 if (change_stage(thd, Commit_stage_manager::SYNC_STAGE, wait_queue, &LOCK_log,
(gdb) n
(gdb) c -a
Continuing.
5. get wrong GTID from the log_status
time:05:52)[db: (none)] select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-14", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
time:05:52)[db: (none)] select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-15", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
Suggested fix:
Adding 2 more mutex LOCK_sync, and LOCK_commit to fix this issue.
void Log_resource_gtid_state_wrapper::lock() {
DEBUG_SYNC(current_thd, "log_resource_gtid_lock");
mysql_mutex_lock(binlog->get_sync_lock());
mysql_mutex_lock(binlog->get_commit_lock());
global_sid_lock->wrlock();
}