Bug #102838 Locks that should not be added
Submitted: 6 Mar 2021 6:53 Modified: 26 Mar 2021 7:54
Reporter: linfeng chen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S6 (Debug Builds)
Version:5.7.34 OS:Any
Assigned to: CPU Architecture:Any

[6 Mar 2021 6:53] linfeng chen
Description:
I found a wrong lock.
The code is as follows:
fil_reinit_space_header_for_table()
{
    /*void
    btr_search_s_lock_all()
    {
	for (ulint i = 0; i < btr_ahi_parts; ++i) {
	    rw_lock_s_lock(btr_search_latches[i]);
	}
    }*/
    btr_search_s_lock_all();//lock all.
    buf_LRU_flush_or_remove_pages();
        buf_LRU_drop_page_hash_for_tablespace();//The above function calls the
            buf_LRU_drop_page_hash_batch();
                btr_search_drop_page_hash_when_freed();
                    btr_search_drop_page_hash_index();
                    ......
                    latch = btr_search_latches[ahi_slot];
                    rw_lock_s_lock(latch);//Add the S lock again.This will lead to a crash, in the debug version.
                    ......
                    rw_lock_s_unlock(latch);
                    ......
                    rw_lock_x_lock(latch);//Add the X lock again.This will lead to a crash, in the debug version.This leads to a deadlock in the release version.
}

How to repeat:
I changed the code related to index cleanup to tell it not to delete adaptive hash data.

Suggested fix:
A thread can also cause a deadlock.
 If a thread holds S lock on a record and ask another S lock of the same type, then it will not be ignored.
void
rw_lock_s_lock_func(
/*================*/
	rw_lock_t*	lock,	
	ulint		pass,	
	const char*	file_name,
	ulint		line)
{
	ut_ad(!rw_lock_own(lock, RW_LOCK_S)); /* see NOTE above */
	ut_ad(!rw_lock_own(lock, RW_LOCK_X));
	if (!rw_lock_s_lock_low(lock, pass, file_name, line)) {
	     rw_lock_s_lock_spin(lock, pass, file_name, line);
	}
}
I've actually done the testing, and deadlocks do happen.
I really hope that you can take a serious look at what I said.
[8 Mar 2021 13:29] MySQL Verification Team
Hi Mr. chen,

Thank you for your bug report.

Indeed, in debug version this leads to the bug.

However, we would require a proper test case , in order to verify it.

Thanks in advance.
[9 Mar 2021 11:00] linfeng chen
The test case:
1:Modify the code so that the adapting hash data is not cleared when the index is dropped.
if (ahi) {
		//btr_search_drop_page_hash_when_freed(page_id, page_size);
	}

	if (ahi) {
		for (i = 0; i < FSP_EXTENT_SIZE; i++) {
			if (!xdes_mtr_get_bit(descr, XDES_FREE_BIT, i, mtr)) {

				/* Drop search system page hash index
				if the page is found in the pool and
				is hashed */

				//btr_search_drop_page_hash_when_freed(
				//	page_id_t(space,
				//		  first_page_in_extent + i),
				//	page_size);
			}
		}
	}
2.Turn on adaptive hashing.
3.Create a table that can use adaptive hashing.
4.Do something to generate the adaptive hash data for the table.
5.truncate table_name;
[9 Mar 2021 12:16] linfeng chen
You also need to remove the following code.
btr_search_drop_page_hash_when_freed(){
......
//ut_ad(!block->page.file_page_was_freed);
......
}

It will crash in the following positions
btr_search_drop_page_hash_index()
{
......
ut_ad(!btr_search_own_any(RW_LOCK_S));
......
}
[9 Mar 2021 12:17] linfeng chen
crash

Attachment: crash.PNG (image/png, text), 69.21 KiB.

[9 Mar 2021 13:39] MySQL Verification Team
Hi Mr. chen,

Let us explain to you what we require. In order to further process this report, we require a full test case.

Test case should consist of the set of SQL statements which would always lead, in your case, in the assert in debug version and in the deadlock in the production version.

We can not proceed further without this test case, that would be always repeatable.
[10 Mar 2021 8:49] linfeng chen
1.Modify the code
	if (ahi) {
		//btr_search_drop_page_hash_when_freed(page_id, page_size);
	}
	if (ahi) {
		for (i = 0; i < FSP_EXTENT_SIZE; i++) {
			if (!xdes_mtr_get_bit(descr, XDES_FREE_BIT, i, mtr)) {

				/* Drop search system page hash index
				if the page is found in the pool and
				is hashed */

				//btr_search_drop_page_hash_when_freed(
					//page_id_t(space,
						  //first_page_in_extent + i),
					//page_size);
			}
		}
	}
2.Compile and install the Release database
3.Run and log into the database
4.Execute the following SQL
create database db8;
use db8;
set global innodb_adaptive_hash_index=on;
create table t1 (f1 int,f2 int,key(f2),f3 int) engine=innodb;
create index idx1 on t1(f3);
DROP PROCEDURE IF EXISTS loaddata;
delimiter |
CREATE PROCEDURE loaddata()
BEGIN
  DECLARE c1 INT DEFAULT 1;
  WHILE c1 < 30000 DO
     insert into t1 values(c1,c1,c1);
	 SET c1 = c1 + 1;
  END WHILE;
END |
delimiter ;
call loaddata();
DROP PROCEDURE loaddata;

DROP PROCEDURE IF EXISTS testfun;
delimiter |
CREATE PROCEDURE testfun()
BEGIN
  DECLARE c1 INT DEFAULT 1;
  DECLARE c2 INT DEFAULT 1;
  WHILE c2 < 3000 DO
    WHILE C1 < 300 DO
     select f1 from t1 where f3=c1;
	 SET c1 = c1 + 1;
    END WHILE;
	SET c2 = c2 + 1;
  END WHILE;
END |
delimiter ;
call testfun();
DROP PROCEDURE testfun;
truncate t1;
5.A database deadlock will occur
btr_search_drop_page_hash_index()
{
......
rw_lock_x_lock(latch);//deadlock 
......
}
[10 Mar 2021 8:56] linfeng chen
1.Modify the code
	if (ahi) {
		//btr_search_drop_page_hash_when_freed(page_id, page_size);
	}
	if (ahi) {
		for (i = 0; i < FSP_EXTENT_SIZE; i++) {
			if (!xdes_mtr_get_bit(descr, XDES_FREE_BIT, i, mtr)) {

				/* Drop search system page hash index
				if the page is found in the pool and
				is hashed */

				//btr_search_drop_page_hash_when_freed(
					//page_id_t(space,
						  //first_page_in_extent + i),
					//page_size);
			}
		}
	}
2.Compile and install the Debug database
3.Run and log into the database
4.Execute the following SQL
create database db8;
use db8;
set global innodb_adaptive_hash_index=on;
create table t1 (f1 int,f2 int,key(f2),f3 int) engine=innodb;
create index idx1 on t1(f3);
DROP PROCEDURE IF EXISTS loaddata;
delimiter |
CREATE PROCEDURE loaddata()
BEGIN
  DECLARE c1 INT DEFAULT 1;
  WHILE c1 < 30000 DO
     insert into t1 values(c1,c1,c1);
	 SET c1 = c1 + 1;
  END WHILE;
END |
delimiter ;
call loaddata();
DROP PROCEDURE loaddata;

DROP PROCEDURE IF EXISTS testfun;
delimiter |
CREATE PROCEDURE testfun()
BEGIN
  DECLARE c1 INT DEFAULT 1;
  DECLARE c2 INT DEFAULT 1;
  WHILE c2 < 3000 DO
    WHILE C1 < 300 DO
     select f1 from t1 where f3=c1;
	 SET c1 = c1 + 1;
    END WHILE;
	SET c2 = c2 + 1;
  END WHILE;
END |
delimiter ;
call testfun();
DROP PROCEDURE testfun;
truncate t1;
5.The database will crash
btr_search_drop_page_hash_index()
{
......
ut_ad(!btr_search_own_any(RW_LOCK_S));// crash
......
}
[10 Mar 2021 9:06] linfeng chen
We should not add S-locks. 
If you find adaptive hash data that needs to be cleaned up, a deadlock or crash occurs.This lock should not be added if only to prevent adaptive hashing from being turned off.This lock can cause serious performance problems, especially when Truncate partitioned tables.
[10 Mar 2021 9:13] linfeng chen
crash

Attachment: crash.PNG (image/png, text), 91.11 KiB.

[10 Mar 2021 13:18] MySQL Verification Team
Hi Mr. chen,

Only couple of additional questions.

You wrote that we should modify the code and then install the release binary. Why should we modify the code if we run the test case on the unchanged binary ????

If a test is to be run on our server, with changed code, then this report can not be checked. Also, we do not see where and why should our original code be changed, at all ...... If it works without changes, then it is not our bug.

Last, have you noticed the same behaviour with latest 8.0 ????
[11 Mar 2021 1:20] linfeng chen
1:There will be no deadlocks and crashes if I don't change the code.
2:The reason I changed the code is to let you know that we don't need to protect the adaptive hash data.Because buf_LRU_flush_or_remove_pages does not change the adaptive hash data.
3:Adding this lock will cause serious performance problems, which I have actually tested.
4:If you remove this lock, performance will improve a lot.
5:It is safe to remove this lock because buf_LRU_flush_or_remove_pages does not access the adaptive hash data.
6:If you have any questions about how to verify this performance issue, I can tell you.
[11 Mar 2021 1:22] linfeng chen
8.0 has removed this part of the code
[11 Mar 2021 13:12] MySQL Verification Team
Hi Mr. chen,

It is highly doubtful that changes that are made specifically for the version 8.0 will be ported back to 5.7.

However, we shall check ....
[12 Mar 2021 12:58] MySQL Verification Team
Hi Mr. chen,

There were deliberations on whether to port a fix from 8.0 to 5.7 and it was decided that users should be urged to upgrade to 8.0, for all of its benefits and that version 5.7 will not be improved further.

Hence, this locks stays in 5.7.

Thank you for your report, though .....
[18 Mar 2021 13:08] MySQL Verification Team
Hi Mr. chen,

This is just to inform you that this bug had a different destiny and it was fixed. A fix is coming with 5.7.35 and, right now, nobody knows when it will be out, not even approximately.

In any case, good news !!!
[26 Mar 2021 7:54] linfeng chen
Thank you very much for informing me of this news!!!
Hope to get a new version soon.
[6 Sep 2021 20:52] Rodrigo Alonso
Hello there,

I noticed that codes changes here have not been applied to mysql 5.7.35. Are they going to be merged? Is there a patch file that could be downloaded? Has it been tested/approved somewhere? 

Thanks for the info,

Rodrigo.
[7 Sep 2021 11:59] MySQL Verification Team
Hi Mr. Alonso,

Actually, the bug was fixed in 5.7.35. If you do not see it , it is because it was fixed for its duplicate, which is  the internal bug, which is why you do not see it by this number , #102838.
[7 Sep 2021 13:50] Rodrigo Alonso
Thanks for the reply. As I am using the percona distribution I wanted to check the actual code that fixes the issue. Do you have a diff/commit so I can check that this bug was fixed in the percona branch as well. Thanks a lot!
Rodrigo.
[8 Sep 2021 10:59] MySQL Verification Team
Hi,

No we do not have the actual diff available.