Bug #90987 | partition table index next same scan ineffective | ||
---|---|---|---|
Submitted: | 23 May 2018 10:03 | Modified: | 1 Jun 2018 2:18 |
Reporter: | Wei Zhao (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Storage Engines | Severity: | S5 (Performance) |
Version: | 5.7.17 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[23 May 2018 10:03]
Wei Zhao
[23 May 2018 10:04]
Wei Zhao
this patch fixes the bug (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: partition_index_same.diff (application/octet-stream, text), 971 bytes.
[23 May 2018 10:30]
Wei Zhao
the hash function and partitioning is not mysql official feature, but the issue still exists if you use official partitioning.
[23 May 2018 10:53]
MySQL Verification Team
C:\tmp\mysql-5.7.17-winx64>bin\mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.17 MySQL Community Server (GPL) Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE a; Query OK, 1 row affected (0.00 sec) mysql> USE a Database changed mysql> create table t4 (a int unsigned auto_increment primary key, b int, c int, key(b)) PARTITION BY LIST (murmurHashCodeAndMod(`a`,4))(PARTITION p0 VALUES IN (0), PARTITION p1 VALUES IN (1),PARTITION p2 VALUES IN (2) ,PARTITION p3 VALUES IN (3) ); ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near ')(PARTITION p0 VALUES IN (0), PARTITION p1 VALUES IN (1),PARTITION p2 VALUES IN ' at line 1 mysql>
[23 May 2018 12:07]
MySQL Verification Team
Hi Mr. Zhao, In order to process this report we have to be able to reproduce the behaviour. Our Miguel Godofredo Solorzano has found a difficulty that I encounter as well in creating the table. Hence, could you, first of all, clear out that small matter???
[23 May 2018 12:10]
MySQL Verification Team
One more small detail. Since we can not run your test case, please send us innodb status while the second transaction waits for the locks. Simply, this could be some gap lock or similar.
[24 May 2018 3:02]
Wei Zhao
Hi Sinisa, This is a test case that you should be able to run on original mysql-5.7: 1. in session 1, do these ops: mysql> create table t4 (a int unsigned auto_increment , b int, c int, key(b), primary key(a,b)) PARTITION BY LIST (b)(partition p0 values in (0,1, 2 ,3, 4), partition p1 values in(5,6,7,8,9,10)); Query OK, 0 rows affected (0.02 sec) mysql> insert into t4(b,c) values(1,11),(2,22),(3,33),(4,44),(5,55),(6,66),(7,77),(8,88),(9,99); Query OK, 9 rows affected (0.01 sec) Records: 9 Duplicates: 0 Warnings: 0 mysql> begin;update t4 set c=c+1 where b=3; Query OK, 0 rows affected (0.00 sec) Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> 2. in session 2, do these ops: mysql> begin; update t4 set c=c+1 where b=2; Query OK, 0 rows affected (0.00 sec) ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction mysql> session 2's stmt is blocked and later the lock wait timeouts.
[24 May 2018 13:49]
MySQL Verification Team
Hi, Would you be so kind to provide us with innodb status after second transaction has started and began waiting. Also, please do not forget that deadlock is probably occurring here since number of rows per partition is way too small. That leads to scanning of all rows and all rows get locked. Hence a lock wait ....... InnoDB status would show up whether it is a consequence of this , well documented behaviour. Scanning small number of rows is MUCH faster then using index. We are waiting on your feedback.
[25 May 2018 2:29]
Wei Zhao
the 'show engine innodb status;' output after session 2 is blocked
Attachment: idb-stat.txt.gz (application/gzip, text), 2.72 KiB.
[25 May 2018 2:35]
Wei Zhao
Hi Sinisa, Thank you for your patient explanation, I've uploaded the 'show engine innodb status;' output, taken after session 2 is blocked waiting for the row lock. There is no dead lock when session 2 blocks, as the uploaded output also shows. And session 2's stmt returns with ER_LOCK_WAIT_TIMEOUT rather than ER_LOCK_DEADLOCK, showing the same fact.
[25 May 2018 12:31]
MySQL Verification Team
Hi Mr. Zhao, Actually, yes, lock wait timeout is expected in these situations and it is what you get. As the status clearly shows, all rows in the partition are scanned, during which scanning all rows are locked, which is why one transaction has to wait for another. Number of row is too small for the indexed search. This is all explained in our Reference Manual, sections on InnoDB and Optimiser. Thank you for your report.
[28 May 2018 10:57]
Wei Zhao
But in the test case example, the transaction(T2) in the 2nd session doesn't have to be blocked by the 1st session's transaction (T1), because in RC isolation level, according to how mysql and innodb works, T1 should lock (4,14,114) and T2 should lock only (2,12,112). The reason T2 is blocked is because when T2 executes the update stmt, handler::index_next() is called instead of handler::index_next_same(), so innodb tries to return the row (4, 14,114), thus T2 tries to acquire row lock on (4, 14,114), and got blocked by T1. See my patch and do some debugging, and you will figure out why this happens.
[30 May 2018 12:44]
MySQL Verification Team
Hi Zhao, I have done some debugging and it seems that you are right !!! Verified as a performance problem.
[1 Jun 2018 2:18]
Wei Zhao
Hi Sinisa, Good to know the bug is verified.but it's a bug of the mysql server code, not a bug of innodb --- mysql server makes use of innodb in a wrong way. The correct way to scan the partitioned table when executing the test case I provided, is to do an 'index next same' scan, because the scan is an equality scan on a non-unique index. The cause of the bug is that in the two functions int Partition_helper::handle_unordered_next(uchar *buf, bool is_next_same) and int Partition_helper::handle_ordered_next(uchar *buf, bool is_next_same) the 'is_next_same' argument is actually ignored or neglected, scans of 'next same' type is actually performed as ordinary range scan(read_range_next_in_part() is called rather than index_next_same_in_part()), causing transaction T2 to try to read&lock the row R1(R1.b=3), and got blocked by T1 because T1 already locked row R1. The correct way for T2 to execute its update stmt is to stop the index scan after innodb returning the row R2(R2.b=2) because innodb should be doing an index next same scan and should find that there is no more index entries of same value(b=2), and thus T2's update stmt should return rather than be blocked by T1. And my patch fixes the issue. My explanation in previous message refers to rows not consistent with the test case I provided at the top of the bug report, which might have made some confusion. The explanation in this message is consistent with the test case, hopefully it makes things clear.
[1 Jun 2018 11:52]
MySQL Verification Team
I am putting this bug in the category that belongs to the executioner, more precisely to the SE API.