MySQL Bugs: #110266: The secondary node of the mgr cluster has a deadlock

Bug #110266	The secondary node of the mgr cluster has a deadlock
Submitted:	3 Mar 2023 6:40	Modified:	7 Mar 2023 10:53
Reporter:	xichao li (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	8.0.27	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
In our mysql mgr cluster (3 nodes, single master architecture), a large number of deadlock information is found on all slave nodes, resulting in a very slow application rate, resulting in application delay (fortunately, this situation does not affect the external service provided by the master node).

Key configurations are as follows:
binlog_transaction_dependency_tracking = WRITESET
transaction_write_set_extraction=XXHASH64
slave_preserve_commit_order=on

By comparison, it is found that the slave node deadlock always occurs on the unique index, which seems to be different from the existing several deadlock scenarios:
trx1 hold    lock : row 2023021616301333611500340  lock mode S locks gap before rec
     waiting lock : row 2023021611073027911500253  lock_mode X locks gap before rec insert intention waiting
trx  hold    lock : row 2023021611073027911500253  lock mode S locks gap before rec
     waiting lock : row 2023021616301333611500340  lock_mode X locks gap before rec insert intention waiting

According to the implementation algorithm of mysql write set, whether the primary node can judge whether multiple transactions can be concurrent is whether the primary key and unique index key value can be used to judge whether multiple transactions can be concurrent when playing back from the library.
However, when playing back concurrently from the library, there may be a conflict on the gap due to intermittent locking on the unique index, resulting in deadlock.

We intercepted an error log and the information that the index structure has been deadlocked, See attachment.

How to repeat:
Concurrent execution of the following sessions on the primary node in the mgr cluster may be easily triggered.
REPLACE INTO  stat_call_task_robot_day(tenant_id,talk_duration,time_dimension,ring_duration,robot_id,task_id,outbound_count,id,ring_co unt,answer_count,transfer_agent_count) VALUES  ('xxxxxx16639893744376c5f82cf',19,'20230216',1,'2
023021616372847111400222','700007025',1,738417224,1,1,0), (' xxxxxx16639893744376c5f82cf, 16, '20230216', 45, '2023021615335851411500521', '700007025', 1738172,1,1,0 25), ('xxxxxx16639893744376c5f82cf',15,'20230216',4,'202302161
4375465611500774','700007025',1,738417226,1,1,0), (' xxxxxx16639893744376c5f82cf, 16, '20230216', 11, '2023021616584982811400249', '700007025', 1738172, 27,1,1,0), ('xxxxxx16639893744376c5f82cf',16,'20230216',8,'20230216115731744
11400181','700007024',1,738417228,1,1,0), (' xxxxxx16639893744376c5f82cf, 12 '20230216', 14, '2023021617000720311400315', '700007025', 1738172, 29,1,1,0), ('xxxxxx16639893744376c5f82cf',11,'20230216',5,'2023021610420902111400006
','700007024',1,738417230,1,1,0), (' xxxxxx16639893744376c5f82cf ', 15, '20230216', 23, '2023021616203760911500975', '700007025', 1738172, 31,1,1,0), ('xxxxxx16639893744376c5f82cf',17,'20230216',9,'2023021610330361611500068','70000
7024',1,738417232,1,1,0), (' xxxxxx16639893744376c5f82cf ', 6, '20230216' 9, '2023021614171737011500437', '700007024', 1738172,1,1,0 33). (' xxxxxx16639893744376c5f82cf, 16, '20230216', 6, '2023021614114127711400023', '700007024', 1,73
8417234,1,1,0), (' xxxxxx16639893744376c5f82cf, 16, '20230216', 15, '2023021615490926311500953', '700007025', 1738172 35,1,1,0), (' xxxxxx16639893744376c5f82cf, 17, '20230216', 15, '2023021610185065411500590', '700007024', 1738172, 36
, 1, 0), (' xxxxxx16639893744376c5f82cf '7,' 20230216 ', 10, '2023021615134017711500859', '700007025', 1738172, 37,1,1,0), (' xxxxxx16639893744376c5f82cf, 16, '20230216', 4, '2023021616015767011400578', '700007025', 1738172, 38,1,1,0),
(' xxxxxx16639893744376c5f82cf, 11, '20230216', 26, '2023021611283073011500963', '700007024', 1738172. 39,1,1,0), (' xxxxxx16639893744376c5f82cf, 19, '20230216', 6, '2023021614363521111400759', '700007025', 1738172 40,1,1,0), (' sunin
Gbank16639893744376c5f82cf ', 15, 20230216, 14, '2023021616460814811400654', '700007025', 1738172, 41,1,1,0), (' xxxxxx16639893744376c5f82cf, 13, '20230216', 6, '2023021616200063611400389', '700007025', 1738172, 42,1,1,0), (' xxxxxx166
16, 39893744376 c5f82cf ', '20230216', 5, '2023021612353281711500171', '700007024', 1738172, 43,1,1,0);
As we are triggered by the system, we cannot provide completed cases.

mgr slave deadlock info

Attachment: deadlock-info.txt (text/plain), 21.60 KiB.

Hi Mr. li,

Thank you for your bug report.

However, we need a full test case.

Also, we need to know whether the number of threads that are executing concurrently these transactions are the same on the master and all of the slaves.

Waiting on your feedback.

Is the same, and replica_parallel_workers=16 for all slave nodes.

The complete configuration is as follows:

[mysqld]
user=mysql
### MGR-Base ###
group_replication_start_on_boot = off
group_replication_bootstrap_group = off
group_replication_group_name = "e8f1472a-3a6c-11ed-85f1-5254004717be"
group_replication_local_address = "x.y.30.137:23306"
group_replication_group_seeds = "x.y.30.137:23306,x.y.30.138:23306,x.y.30.139:23306"
group_replication_single_primary_mode = true
group_replication_enforce_update_everywhere_checks = false
group_replication_ip_allowlist= "x.y.30.137,x.y.30.138,x.y.30.139"
report_host="x.y.30.137"
### MGR-Tune ###
group_replication_consistency = EVENTUAL
group_replication_flow_control_mode = DISABLED
group_replication_exit_state_action = OFFLINE_MODE
group_replication_unreachable_majority_timeout = 13
group_replication_autorejoin_tries = 0
super_read_only = on
group_replication_member_expel_timeout = 5
### MGR-End ###
### General ###
server-id=30137
sql_require_primary_key=1
# default_authentication_plugin=mysql_native_password
transaction_isolation=READ-COMMITTED
basedir = /opt/mysql
datadir = /mysql/data
port = 3306
socket=/mysql/data/mysql.sock
tmpdir=/mysql/data/
pid-file=/mysql/data/mysqld.pid
sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
default_time_zone = '+8:00'
log_error_verbosity = 2
core_file
local_infile= 1
log_timestamps= system
thread_stack= 192K
### innodb monitor ###
performance_schema = 1
# innodb_monitor_enable="module_innodb"
# innodb_monitor_enable="module_server"
# innodb_monitor_enable="module_dml"
# innodb_monitor_enable="module_ddl"
# innodb_monitor_enable="module_trx"
# innodb_monitor_enable="module_os"
# innodb_monitor_enable="module_purge"
# innodb_monitor_enable="module_log"
# innodb_monitor_enable="module_lock"
# innodb_monitor_enable="module_buffer"
# innodb_monitor_enable="module_index"
# innodb_monitor_enable="module_ibuf_system"
# innodb_monitor_enable="module_buffer_page"
# innodb_monitor_enable="module_adaptive_hash"
skip_name_resolve=ON
# host_cache_size=0
### CharacterSet ###
character_set_server=utf8mb4
# collation_server=utf8mb4_0900_ai_ci
collation_server=utf8mb4_general_ci
### Connection ###
max_allowed_packet=32M
max_connect_errors=100000
max_connections=2000
open_files_limit=65536
table_open_cache=2048
table_definition_cache=4096
### Log ###
log-error=/opt/logs/mysql_error.log
log-bin=/mysql/binlog/mysql-bin
log_bin_index=/mysql/binlog/mysql-bin.index
binlog_cache_size=1M
# expire_logs_days=7
binlog_transaction_dependency_tracking = WRITESET
binlog_expire_logs_seconds=604800 
sync_binlog=1
binlog-format=ROW
slow_query_log=1
slow_query_log_file=/opt/logs/mysql_slow.log
#long_query_time=0 <debug mode>
long_query_time=0.1
### Memory ###
key_buffer_size=32M
sort_buffer_size=512K
read_buffer_size=256K
read_rnd_buffer_size=512K
### Innodb ###
innodb_thread_concurrency = 8
default-storage-engine=InnoDB
## Config innodb_buffer_pool_size=<physical memory>*0.4-0.6 ##
innodb_buffer_pool_size=11G
#After init db,can not modify: innodb_data_file_path
innodb_buffer_pool_instances=4
innodb_file_per_table=on
innodb_data_file_path=ibdata1:128M:autoextend
innodb_log_file_size=1G
innodb_log_files_in_group = 3
innodb_log_buffer_size=64M
innodb_flush_method=O_DIRECT
innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:10G
innodb_max_undo_log_size = 4G
# innodb_undo_tablespaces = 95 #deprecated param
innodb_undo_log_truncate = 1
# innodb_undo_directory = /mysql/data/undolog

innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_at_shutdown = 1
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_flush_sync = 0
innodb_flush_neighbors = 0
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_purge_threads = 4
innodb_page_cleaners = 4
innodb_open_files = 65535
innodb_max_dirty_pages_pct = 50
innodb_lru_scan_depth = 4000
innodb_checksum_algorithm = crc32
innodb_lock_wait_timeout = 10
innodb_rollback_on_timeout = 1
# innodb_print_all_deadlocks = 1
innodb_online_alter_log_max_size = 4G
innodb_stats_on_metadata = 0
innodb_sort_buffer_size = 67108864
### Others ###
lower_case_table_names=1
secure-file-priv=/mysql/bak/file/
explicit_defaults_for_timestamp=true
### Replication ###
binlog_order_commits=true

relay_log_recovery=1
relay_log_purge=0
skip-replica-start=1
# relay-log-info-repository=table  # default value
# master-info-repository=table   # default value
gtid_mode = on
enforce_gtid_consistency = 1
log_replica_updates
# slave-rows-search-algorithms = 'INDEX_SCAN,HASH_SCAN'    # default value
relay-log=/mysql/data/mysql-relay-bin
relay-log-index=/mysql/data/mysql-relay-bin.index
sync_source_info=1
sync_relay_log=1000
sync_relay_log_info=1

replica_parallel_type = LOGICAL_CLOCK
replica_parallel_workers = 16
replica_preserve_commit_order = 1
replica_transaction_retries = 128
replica_pending_jobs_size_max = 2147483648
[client]
socket=/mysql/data/mysql.sock
[mysqldump]
quick
max_allowed_packet=32M
default-character-set=utf8mb4
[mysql]
no-auto-rehash
default-character-set=utf8mb4
show-warnings
prompt="(\u@\h) [\d] \R:\m:\\s \c> "

Hi,

Thank you for all the data you provided. Unfortunately not enough for us to manage to reproduce this in our lab.

Few things to note, 8.0.27 is more than a year old (it was released in October 2021, it is March 2023 today and 8.0.32 is current release). We improved a lot of things with regards to replication / group replication and parallel slave execution that are all segments you have issues with.

Can you make a test with 8.0.32 and reproduce this problem yourself using latest mysql server release? You have all the necessary resources to reproduce this that you cannot share so I would have to ask you to first try to reproduce this with 8.0.32 before we move further as I tried reproducing this with 8.0.32 and I cannot.

Thanks

Ok, thank you.

We're still working on ways to simplify cases to bring this up, but it hasn't come up yet.

At the same time, we will try to upgrade the database version based on your suggestions.

Hi,

Let us know if you manage to reproduce this with latest MySQL

kind regards

Excuse me, has this problem been solved? I have also encountered this problem, Could you help me, thank you

Encountered this problem. MGR 8023.
When there is high latency and concurrency.

We need reproducible test case to be able to try to fix this potential issue.

When reproducing, the following conditions can be considered

1. have mul feilds unique index
2. writeset(MGR only writeset in applier channel)
3. high latency and concurrency
4. replace/ insert on dup statement

Hope it can help you

8023-debug can reproduce after a period of time, as follows,stuck forever

| 111 | system user |                    | NULL               | Connect | 1132 | waiting for handler commit                        | Group replication applier module                      |
| 114 | system user |                    | NULL               | Query   | 1130 | Waiting for slave workers to process their queues | NULL                                                  |
| 115 | system user |                    | test               | Query   | 2356 | Applying batch of row changes (update)            | replace into test.testpri2(a,b,c) values(144,144,144) |
| 116 | system user |                    | NULL               | Query   | 2356 | Waiting for preceding transaction to commit       | NULL                                                  |
| 117 | system user |                    | NULL               | Query   | 2356 | Waiting for preceding transaction to commit       | NULL                                                  |
| 118 | system user |                    | test               | Query   | 2356 | Applying batch of row changes (update)            | replace into test.testpri2(a,b,c) values(143,143,143) |

But I'm not sure if you can reproduce it.

mysql> show create table test.testpri2 \G
*************************** 1. row ***************************
       Table: testpri2
Create Table: CREATE TABLE `testpri2` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `a` (`a`,`b`)
) ENGINE=InnoDB AUTO_INCREMENT=43255 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

init data:
for ((i=0;i<10000;i++))  do  /newdata/mysql/mysql8023/install/bin/mysql -S'/newdata/mysql/mysql8023/tmp/mysql3329.sock'  -e "insert into test.testpri2(a,b,c) values($i,$i,$i)";do

Terminal 1:
for ((i=10000;i<20000;i++))  do  /newdata/mysql/mysql8023/install/bin/mysql -S'/newdata/mysql/mysql8023/tmp/mysql3329.sock'  -e "replace into test.testpri2(a,b,c) values($i,$i,$i)";done

Terminal 2:
for ((i=0;i<10000;i++))  do  /newdata/mysql/mysql8023/install/bin/mysql -S'/newdata/mysql/mysql8023/tmp/mysql3329.sock'  -e "replace into test.testpri2(a,b,c) values($i,$i,$i)";done
Terminal 3:
for ((i=0;i<10000;i++))  do  /newdata/mysql/mysql8023/install/bin/mysql -S'/newdata/mysql/mysql8023/tmp/mysql3329.sock'  -e "replace into test.testpri2(a,b,c) values($i,$i,$i)";done
Terminal 4:
for ((i=0;i<10000;i++))  do  /newdata/mysql/mysql8023/install/bin/mysql -S'/newdata/mysql/mysql8023/tmp/mysql3329.sock'  -e "replace into test.testpri2(a,b,c) values($i,$i,$i)";done

innodb lock info

Attachment: innodb_lock_info (application/octet-stream, text), 11.02 KiB.

The deadlock is probably the result of quite a lot of locking involved in "replace into" in case where the tupple provided by user conflicts with an existing tuple on the unique index.
See the minimal repro:
```
create table t(id int not null primary key, val int not null unique key);

insert into t values (1,1),(10,10),(20,20);

begin;
replace into t values (17,1);
select INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, LOCK_DATA from performance_schema.data_locks;
+------------+-----------+---------------+-------------+-----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE     | LOCK_STATUS | LOCK_DATA |
+------------+-----------+---------------+-------------+-----------+
| NULL       | TABLE     | IX            | GRANTED     | NULL      |
| val        | RECORD    | X             | GRANTED     | 1, 1      |
| val        | RECORD    | X             | GRANTED     | 10, 10    |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | GRANTED     | 1         |
| PRIMARY    | RECORD    | X,GAP         | GRANTED     | 20        |
| PRIMARY    | RECORD    | X,GAP         | GRANTED     | 17        |
| val        | RECORD    | X,GAP         | GRANTED     | 1, 17     |
+------------+-----------+---------------+-------------+-----------+
```
I need to investigate why each of this locks is taken.
If there is a good reason for each, then it's not a bug.
If some of them could be omitted, then it would be a feature request to optimize performance.
But if some of them is taken by mistake, then it is indeed a bug.

appliers, AFAIU, run in READ COMMITTED, so I've tried to use it, and the set of taken locks is then a bit smaller

```
+------------+-----------+---------------+-------------+-----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE     | LOCK_STATUS | LOCK_DATA |
+------------+-----------+---------------+-------------+-----------+
| NULL       | TABLE     | IX            | GRANTED     | NULL      |
| val        | RECORD    | X             | GRANTED     | 1, 1      |
| val        | RECORD    | X             | GRANTED     | 10, 10    |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | GRANTED     | 1         |
| val        | RECORD    | X,GAP         | GRANTED     | 1, 17     |
+------------+-----------+---------------+-------------+-----------+
```

Hm..also, the bug report says about S locks, but here we only see X locks, so perhaps this is not the good way to reproduce and investigate it?

This might be related to:
[Bug#32617942 ROW_INS_SCAN_SEC_INDEX_FOR_DUPLICATE SHOULD NOT LOCK THE FIRST UNEQUAL RECORD](https://mybug.mysql.oraclecorp.com/orabugs/site/bug.php?id=32617942)
and (a bit less) to
[Bug#32618107 TRANSACTION PREPARE FOR READ COMMITTED TRANSACTION COULD RELEASE ALL S LOCKS](https://mybug.mysql.oraclecorp.com/orabugs/site/bug.php?id=32618107)

The improtant detail is that some of the records in secondary index are delete-marked.
I'll investigate more later.

I have tried with 3 nodes(single primary) hosted on different boxes, with exact provided conf(provided by xichao li), using the test case provided by peng gao(tried resizing data size) but not seeing any issues. As my colleagues mentioned earlier, please check with latest builds(current GA 8.0.36) and report us back if you are seeing the issue(if possible then test case).

regards,
Umesh