Bug #110110 MGR SECONDARY UNREACHABLE
Submitted: 17 Feb 2023 12:25 Modified: 21 Feb 2023 3:01
Reporter: Kai Tian Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:8.0.31 OS:Red Hat (8.7)
Assigned to: CPU Architecture:x86 (Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz)

[17 Feb 2023 12:25] Kai Tian
Description:
MGR SECONDARY MEMBER_STATE CHANGE TO UNREACHABLE,AND RECEIVE THE FLOWING LOGS:

Version:8.0.31

PRIMARY:
2023-02-17T08:37:58.299551Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address mysql10513:3306 has become unreachable.'
2023-02-17T08:38:13.294373Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address mysql10514:3306 has become unreachable.'
2023-02-17T08:38:13.294445Z 0 [ERROR] [MY-011495] [Repl] Plugin group_replication reported: 'This server is not able to reach a majority of members in the group. This server will now block all updates. The server will remain blocked until contact with the majority is re
stored. It is possible to use group_replication_force_members to force a new group membership.'

SECONDARY:
2023-02-17T08:38:48.321734Z 17 [ERROR] [MY-010880] [InnoDB] Found index <auto_key1> whose column info does not match that of MySQL.
2023-02-17T08:38:48.321778Z 17 [ERROR] [MY-010882] [InnoDB] Build InnoDB index translation table for Table /data/mysql/tmp/#sql299f9_11_0 failed
2023-02-17T08:38:48.321787Z 17 [ERROR] [MY-011052] [InnoDB] Index <auto_key1> of tmp/#sql299f9_11_0 has 2 columns unique inside InnoDB, but MySQL is asking statistics for 3 columns. Have you mixed data dictionary from different installation?
2023-02-17T08:38:59Z UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=53d8f46f67d9f61e2a8d9ba6b79c874d195fcd66
Thread pointer: 0x7febb4000ec0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff940169c60 thread_stack 0x100000
/data/mysql/basedir/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x206ab2e]
/data/mysql/basedir/mysql/bin/mysqld(print_fatal_signal(int)+0x3c3) [0x110a893]
/data/mysql/basedir/mysql/bin/mysqld(handle_fatal_signal+0x65) [0x110a905]
/lib64/libpthread.so.0(+0x12cf0) [0x7ff954de1cf0]
/data/mysql/basedir/mysql/bin/mysqld(row_sel_convert_mysql_key_to_innobase(dtuple_t*, unsigned char*, unsigned long, dict_index_t*, unsigned char const*, unsigned long)+0x1a1) [0x22b2641]
/data/mysql/basedir/mysql/bin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0xd2) [0x2152f12]
/data/mysql/basedir/mysql/bin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x287) [0x1212407]
/data/mysql/basedir/mysql/bin/mysqld(RefIterator<false>::Read()+0xc7) [0x14bced7]
/data/mysql/basedir/mysql/bin/mysqld(FilterIterator::Read()+0x14) [0x14ac6c4]
/data/mysql/basedir/mysql/bin/mysqld(NestedLoopIterator::Read()+0xee) [0x14ac8be]
/data/mysql/basedir/mysql/bin/mysqld(NestedLoopIterator::Read()+0xa7) [0x14ac877]
/data/mysql/basedir/mysql/bin/mysqld(MaterializeIterator<DummyIteratorProfiler>::Init()+0x313) [0x14b1de3]
/data/mysql/basedir/mysql/bin/mysqld(HashJoinIterator::Init()+0x5e) [0x14bb4ae]
/data/mysql/basedir/mysql/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x302) [0x1078c92]
/data/mysql/basedir/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0xc1c) [0xfa42dc]
/data/mysql/basedir/mysql/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x403) [0xfa7fa3]
/data/mysql/basedir/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2048) [0xfaa4f8]
/data/mysql/basedir/mysql/bin/mysqld(do_command(THD*)+0x1df) [0xfab55f]
/data/mysql/basedir/mysql/bin/mysqld() [0x10fb0d8]
/data/mysql/basedir/mysql/bin/mysqld() [0x27e7c7a]
/lib64/libpthread.so.0(+0x81ca) [0x7ff954dd71ca]
/lib64/libc.so.6(clone+0x43) [0x7ff952f12e73]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7febb4111b20): is an invalid pointer
Connection ID (thread ID): 17
Status: NOT_KILLED

How to repeat:
We rebooted the secondary node,and add to the MGR cluster,it unreachable again.
[17 Feb 2023 13:44] MySQL Verification Team
Hi Mr. Tian,

Thank you for your interesting bug report.

However, we can not repeat it.

Hence, what we need from you is a fully repeatable test case, in the form of a set of SQL commands that always lead to the crash like the one you reported.

There is nothing that we can do without such a test case.

Can't repeat.
[21 Feb 2023 3:01] Kai Tian
We tried to avoid this problem by turning off innodb_adaptive_hash_index, but the problem persists.

Then we migrated the query function from MGR to slave DB, and then it reported the same error.

We crawled the SQL that caused the problem,here is part of the sql:
with x as (select *, row_number() over(partition by task_id order by gmt_modified desc) ran from spr_t_task_history)
select mcl.name 一,
	-- 日
	ifnull(cl.chuli, 0) * 1.5 二,
	ifnull(cl.chuli, 0) 三,
	ifnull(tg.tongguo, 0) 四,
	ifnull(jj.jujue, 0) 五,
	case when cl.chuli is not null then concat((ifnull(tg.tongguo, 0) / ifnull(cl.chuli, 0)) *100, '%')
	else '0%' end 六,
	ifnull(fk.fangkuan, 0) 七,
	case when tg.tongguo is not null then concat((ifnull(fk.fangkuan, 0) / ifnull(tg.tongguo, 0)) *100, '%')
	else '0%' end 八,
	ifnull(jfcl.jfchuli, 0) 九,
	ifnull(jftg.jftongguo, 0) 十,
	case when jfcl.jfchuli is not null then concat((ifnull(jftg.jftongguo, 0) / ifnull(jfcl.jfchuli, 0)) *100, '%')
	else '0%' end 十一,
	ifnull(fjfcl.fjfchuli, 0) 十二,
	ifnull(fjftg.fjftongguo, 0) 十三,
	case when fjfcl.fjfchuli is not null then concat((ifnull(fjftg.fjftongguo, 0) / ifnull(fjfcl.fjfchuli, 0)) *100, '%')
	else '0%' end 十四,
	-- 月
	ifnull(mcl.chuli, 0) * 1.5 十五,
	ifnull(mcl.chuli, 0) 十六,
	ifnull(mtg.tongguo, 0) 十七,
	ifnull(mjj.jujue, 0) 十八,
	case when mcl.chuli is not null then concat((ifnull(mtg.tongguo, 0) / ifnull(mcl.chuli, 0)) *100, '%')
	else '0%' end 十九,
	ifnull(mfk.fangkuan, 0) 二十,
	case when mtg.tongguo is not null then concat((ifnull(mfk.fangkuan, 0) / ifnull(mtg.tongguo, 0)) *100, '%')
	else '0%' end 二十一,
	ifnull(mjfcl.jfchuli, 0) 二十二,
	ifnull(mjftg.jftongguo, 0) 二十三,
	case when mjfcl.jfchuli is not null then concat((ifnull(mjftg.jftongguo, 0) / ifnull(mjfcl.jfchuli, 0)) *100, '%')
	else '0%' end 二十四,
	ifnull(mfjfcl.fjfchuli, 0) 二十五,
	ifnull(mfjftg.fjftongguo, 0) 二十六,
	case when mfjfcl.fjfchuli is not null then concat((ifnull(mfjftg.fjftongguo, 0) / ifnull(mfjfcl.fjfchuli, 0)) *100, '%')
	else '0%' end 二十七,
	ifnull(fkczth.fkcztuihui, 0) 二十八,
	case when mcl.chuli is not null then concat((ifnull(fkczth.fkcztuihui, 0) / ifnull(mcl.chuli, 0)) *100, '%')
	else '0%' end 二十九
from (
	-- 月
	select zts.id clid, zts.name, count(*) chuli, count(case when rto.business_type = 2 or rto.apply_type <> 1 then rto.id end) xcfgrchuli
	from rob_t_order rto
	left join spr_t_process stp on stp.transaction_id = rto.id and stp.is_delete = 0
	left join spr_t_task stt on stt.process_id = stp.id and stt.is_delete = 0
	left join (select * from x) stth on stth.task_id = stt.id and stth.is_delete = 0 and stth.ran = 1
	left join zeu_t_staff zts on stt.executor_id = zts.id and zts.is_delete = 0
	where rto.is_delete = 0
	and stt.definition_id = 'financingInfoSecondCheck'
	and stth.gmt_modified >= date_add(curdate(),interval -day(curdate())+1 day)
	and stth.gmt_modified < date_add(current_date, interval 0 day)
	and stth.state in (3, 8)
    and rto.business_type = 2
	group by zts.id, zts.name) mcl
left join (
	select zts.id tgid, zts.name, count(*) tongguo
	from rob_t_order rto
	left join spr_t_process stp on stp.transaction_id = rto.id and stp.is_delete = 0
	left join spr_t_task stt on stt.process_id = stp.id and stt.is_delete = 0
	left join (select * from x) stth on stth.task_id = stt.id and stth.is_delete = 0 and stth.ran = 1
	left join zeu_t_staff zts on stt.executor_id = zts.id and zts.is_delete = 0
	where rto.is_delete = 0
	and stt.definition_id = 'financingInfoSecondCheck'
	and stth.gmt_modified >= date_add(curdate(),interval -day(curdate())+1 day)
	and stth.gmt_modified < date_add(current_date, interval 0 day)
	and stth.state = 3
    and rto.business_type = 2
	group by zts.id, zts.name) mtg
on mcl.clid = mtg.tgid
left join (
	select zts.id jjid, zts.name, count(*) jujue
	from rob_t_order rto
	left join spr_t_process stp on stp.transaction_id = rto.id and stp.is_delete = 0
	left join spr_t_task stt on stt.process_id = stp.id and stt.is_delete = 0
	left join (select * from x) stth on stth.task_id = stt.id and stth.is_delete = 0 and stth.ran = 1
	left join zeu_t_staff zts on stt.executor_id = zts.id and zts.is_delete = 0
	where rto.is_delete = 0
	and stt.definition_id = 'financingInfoSecondCheck'
	and stth.gmt_modified >= date_add(curdate(),interval -day(curdate())+1 day)
	and stth.gmt_modified < date_add(current_date, interval 0 day)
	and stth.state = 8
    and rto.business_type = 2
	group by zts.id, zts.name) mjj
on mcl.clid = mjj.jjid
left join (
	select zts.id fkid, zts.name, count(*) fangkuan
	from rob_t_order rto
	left join spr_t_process stp on stp.transaction_id = rto.id and stp.is_delete = 0
	left join spr_t_task stt on stt.process_id = stp.id and stt.is_delete = 0
	left join (select * from x) stth on stth.task_id = stt.id and stth.is_delete = 0 and stth.ran = 1
	left join zeu_t_staff zts on stt.executor_id = zts.id and zts.is_delete = 0
	where rto.is_delete = 0
	and stt.definition_id = 'financingInfoSecondCheck'
	and rto.payment_time >= date_add(curdate(),interval -day(curdate())+1 day)
    and rto.business_type = 2
	group by zts.id, zts.name) mfk
on mcl.clid = mfk.fkid

After executed this SQL twice,the slave DB report error MY-001782 & MY-001756.
[21 Feb 2023 13:07] MySQL Verification Team
Hi Mr. Tian,

We already wrote that we require a fully and fully repeatable test case.

It presumes that we have mysqldump of all tables involved in that query. We would also prefer that the query is written in the single line.

You do not have to dump the entire tables , but only those parts that are relevant to the query in question.

Also, we shall test your query on a single machine, since this is not a Cluster problem, but it a problem in the optimiser or InnoDB.

To upload the test case use the table with a title of "Files".

Let us know when you upload the test case.