Description:
There are 4 ( 0 - 3 ) stages in the binlog commit.
The log_status only hold mutex LOCK_log to protect binlog.
inline mysql_mutex_t *get_log_lock() { return &LOCK_log; }
void Log_resource_binlog_wrapper::lock() {
mysql_mutex_lock(binlog->get_log_lock());
}
The binlog position is updated at the state #1, but the GTID will be updated in the state #3.
In the stage #2, It will release the mutex LOCK_log.
if (change_stage(thd, Commit_stage_manager::SYNC_STAGE, wait_queue, &LOCK_log,
&LOCK_sync)) {
DBUG_PRINT("return", ("Thread ID: %u, commit_error: %d", thd->thread_id(),
thd->commit_error));
return finish_commit(thd);
}
If we get the result in the stage #2, we will get the GTID behind the file position.
When the bug occurs, the GTID is 1-14, and the position is 1937.
select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-14", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
If we get the result after the stage 3, the GTID is 1-15.
select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-15", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
Percona server use 2 more mutex LOCK_sync, and LOCK_commit to fix this issue.
void Log_resource_gtid_state_wrapper::lock() {
DEBUG_SYNC(current_thd, "log_resource_gtid_lock");
mysql_mutex_lock(binlog->get_sync_lock());
mysql_mutex_lock(binlog->get_commit_lock());
global_sid_lock->wrlock();
}
However, It is still a bug in Percona version when the binlog_order_commits is OFF.
How to repeat:
1. start a gdb
# gdb
(gdb) set non-stop on
(gdb) attach ${the process id of mysqld}
(gdb) set substitute-path ../../mysql-8.0.34/ /home/root/mysql-server/
(gdb) b sql/binlog.cc:9036
Breakpoint 1 at 0x1d5d435: file ../../mysql-8.0.34/sql/binlog.cc, line 9036.
(gdb) c -a
Continuing.
2. start a dml or ddl sql
# the sql will suspend at the break point
mysql [localhost:8035] {msandbox} ((none)) > create database a;
3. get the result from log_status
# the sql will wait the LOCK_log mutex
time:05:52)[db: (none)] select local from performance_schema.log_status\G
4. continue the dml or ddl query on the gdb
(gdb) thread 38
[Switching to thread 38 (Thread 0x7fe25c30f640 (LWP 187653))]
#0 MYSQL_BIN_LOG::ordered_commit () at ../../mysql-8.0.34/sql/binlog.cc:9036
9036 if (change_stage(thd, Commit_stage_manager::SYNC_STAGE, wait_queue, &LOCK_log,
(gdb) n
(gdb) c -a
Continuing.
5. get wrong GTID from the log_status
time:05:52)[db: (none)] select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-14", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
time:05:52)[db: (none)] select local from performance_schema.log_status\G
*************************** 1. row ***************************
local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-15", "binary_log_file": "binlog.000004", "binary_log_position": 1937}
Suggested fix:
Adding 2 more mutex LOCK_sync, and LOCK_commit to fix this issue.
void Log_resource_gtid_state_wrapper::lock() {
DEBUG_SYNC(current_thd, "log_resource_gtid_lock");
mysql_mutex_lock(binlog->get_sync_lock());
mysql_mutex_lock(binlog->get_commit_lock());
global_sid_lock->wrlock();
}
Description: There are 4 ( 0 - 3 ) stages in the binlog commit. The log_status only hold mutex LOCK_log to protect binlog. inline mysql_mutex_t *get_log_lock() { return &LOCK_log; } void Log_resource_binlog_wrapper::lock() { mysql_mutex_lock(binlog->get_log_lock()); } The binlog position is updated at the state #1, but the GTID will be updated in the state #3. In the stage #2, It will release the mutex LOCK_log. if (change_stage(thd, Commit_stage_manager::SYNC_STAGE, wait_queue, &LOCK_log, &LOCK_sync)) { DBUG_PRINT("return", ("Thread ID: %u, commit_error: %d", thd->thread_id(), thd->commit_error)); return finish_commit(thd); } If we get the result in the stage #2, we will get the GTID behind the file position. When the bug occurs, the GTID is 1-14, and the position is 1937. select local from performance_schema.log_status\G *************************** 1. row *************************** local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-14", "binary_log_file": "binlog.000004", "binary_log_position": 1937} If we get the result after the stage 3, the GTID is 1-15. select local from performance_schema.log_status\G *************************** 1. row *************************** local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-15", "binary_log_file": "binlog.000004", "binary_log_position": 1937} Percona server use 2 more mutex LOCK_sync, and LOCK_commit to fix this issue. void Log_resource_gtid_state_wrapper::lock() { DEBUG_SYNC(current_thd, "log_resource_gtid_lock"); mysql_mutex_lock(binlog->get_sync_lock()); mysql_mutex_lock(binlog->get_commit_lock()); global_sid_lock->wrlock(); } However, It is still a bug in Percona version when the binlog_order_commits is OFF. How to repeat: 1. start a gdb # gdb (gdb) set non-stop on (gdb) attach ${the process id of mysqld} (gdb) set substitute-path ../../mysql-8.0.34/ /home/root/mysql-server/ (gdb) b sql/binlog.cc:9036 Breakpoint 1 at 0x1d5d435: file ../../mysql-8.0.34/sql/binlog.cc, line 9036. (gdb) c -a Continuing. 2. start a dml or ddl sql # the sql will suspend at the break point mysql [localhost:8035] {msandbox} ((none)) > create database a; 3. get the result from log_status # the sql will wait the LOCK_log mutex time:05:52)[db: (none)] select local from performance_schema.log_status\G 4. continue the dml or ddl query on the gdb (gdb) thread 38 [Switching to thread 38 (Thread 0x7fe25c30f640 (LWP 187653))] #0 MYSQL_BIN_LOG::ordered_commit () at ../../mysql-8.0.34/sql/binlog.cc:9036 9036 if (change_stage(thd, Commit_stage_manager::SYNC_STAGE, wait_queue, &LOCK_log, (gdb) n (gdb) c -a Continuing. 5. get wrong GTID from the log_status time:05:52)[db: (none)] select local from performance_schema.log_status\G *************************** 1. row *************************** local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-14", "binary_log_file": "binlog.000004", "binary_log_position": 1937} time:05:52)[db: (none)] select local from performance_schema.log_status\G *************************** 1. row *************************** local: {"gtid_executed": "00008035-0000-0000-0000-000000008035:1-15", "binary_log_file": "binlog.000004", "binary_log_position": 1937} Suggested fix: Adding 2 more mutex LOCK_sync, and LOCK_commit to fix this issue. void Log_resource_gtid_state_wrapper::lock() { DEBUG_SYNC(current_thd, "log_resource_gtid_lock"); mysql_mutex_lock(binlog->get_sync_lock()); mysql_mutex_lock(binlog->get_commit_lock()); global_sid_lock->wrlock(); }