Bug #87021 wrong index was chosen when different condition on update
Submitted: 12 Jul 2017 3:50 Modified: 31 Jul 2017 14:20
Reporter: Ye Jinrong Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.18 OS:CentOS
Assigned to: MySQL Verification Team CPU Architecture:Any

[12 Jul 2017 3:50] Ye Jinrong
Description:
when i using different condition on update, mysql may choose wrong index for it.

How to repeat:
the table ddl and data:
CREATE TABLE t1 (
c1 int(10) unsigned NOT NULL DEFAULT '0',
c2 int(10) unsigned NOT NULL DEFAULT '0',
c3 int(10) unsigned NOT NULL DEFAULT '0',
c4 int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (c1),
KEY c2 (c2)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

select * from t1;
+----+----+----+----+
| c1 | c2 | c3 | c4 |
+----+----+----+----+
| 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 |
| 3 | 3 | 3 | 0 |
| 4 | 2 | 2 | 0 |
| 6 | 8 | 5 | 0 |
| 7 | 6 | 6 | 10 |
| 10 | 10 | 4 | 0 |
+----+----+----+----+

different condition on update.
case 1: where c2>=6, primary key was choosen, and lock all rows
session1:
mysql>EXPLAIN update t1 set c4=123 where c2>=6\G
*************************** 1. row ***************************
id: 1
select_type: UPDATE
table: t1
partitions: NULL
type: index
possible_keys: c2
key: PRIMARY
key_len: 4
ref: NULL
rows: 7
filtered: 100.00
Extra: Using where

mysql> begin; update t1 set c4=123 where c2>=6\G
Query OK, 0 rows affected (0.00 sec)

Query OK, 3 rows affected (1.47 sec)
Rows matched: 3  Changed: 3  Warnings: 0

session2: (was blocked)
mysql> begin; select * from t1 where c2 = 3 for update;

check the lock info:
mysql> select * from sys.innodb_lock_waits\G
*************************** 1. row ***************************
                wait_started: 2017-07-12 11:48:23
                    wait_age: 00:00:23
               wait_age_secs: 23
                locked_table: `yejr`.`t1`
                locked_index: PRIMARY
                 locked_type: RECORD
              waiting_trx_id: 266168
         waiting_trx_started: 2017-07-12 11:48:23
             waiting_trx_age: 00:00:23
     waiting_trx_rows_locked: 2
   waiting_trx_rows_modified: 0
                 waiting_pid: 26
               waiting_query: select * from t1 where c2 = 3 for update
             waiting_lock_id: 266168:387:3:4
           waiting_lock_mode: X
             blocking_trx_id: 266167
                blocking_pid: 25
              blocking_query: select * from sys.innodb_lock_waits
            blocking_lock_id: 266167:387:3:4
          blocking_lock_mode: X
        blocking_trx_started: 2017-07-12 11:48:21
            blocking_trx_age: 00:00:25
    blocking_trx_rows_locked: 8
  blocking_trx_rows_modified: 3
     sql_kill_blocking_query: KILL QUERY 25
sql_kill_blocking_connection: KILL 25

case 2: where c2>=8, secondary index `c2` was choosen
session1:
mysql> EXPLAIN update t1 set c4=123 where c2>=8\G
*************************** 1. row ***************************
id: 1
select_type: UPDATE
table: t1
partitions: NULL
type: range
possible_keys: c2
key: c2
key_len: 4
ref: const
rows: 2
filtered: 100.00
Extra: Using where

mysql> begin; update t1 set c4=123 where c2>=8\G
Query OK, 0 rows affected (0.00 sec)

Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

session2: (execute immediatly)
mysql> begin; select * from t1 where c2 = 3 for update;
Query OK, 0 rows affected (0.00 sec)
+----+----+----+----+
| c1 | c2 | c3 | c4 |
+----+----+----+----+
|  3 |  3 |  3 |  0 |
+----+----+----+----+
1 row in set (0.00 sec)

Suggested fix:
shoud not choose primary key and lock all rows when c2>=6.
[27 Jul 2017 5:10] MySQL Verification Team
Hi,
statistic for a table with so few rows cannot give optimizer any data that can be of use to calculate the plan. For that few rows difference between any of the plans, including a full table scan, is so negligible that there is no "wrong" plan.

If you have a sample with larger dataset where you can reproduce a bug please let us know.

kind regards
Bogdan Kecman
[27 Jul 2017 6:27] Ye Jinrong
let's look another table

CREATE TABLE `sid` (
  `id` int(10) unsigned NOT NULL DEFAULT '0',
  `name` varchar(50) NOT NULL DEFAULT '',
  `aid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`aid`),
  KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8

mysql> show index from sid;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| sid   |          0 | PRIMARY  |            1 | aid         | A         |      100063 |     NULL | NULL   |      | BTREE      |         |               |
| sid   |          1 | id       |            1 | id          | A         |       95810 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

session1:
mysql> explain update sid set name = 'afasdfads' where id >= 94773\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: sid
   partitions: NULL
         type: index
possible_keys: id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 100063
     filtered: 100.00
        Extra: Using where

mysql> begin; update sid set name = 'afasdfads' where id >= 94773;

and session2:
mysql> begin; select * from sid where id = 9490 for update;
-- was blocked until session 1 commit or rollback

r+------+-------------------------------------------+-------+
| id   | name                                      | aid   |
+------+-------------------------------------------+-------+
| 9490 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 48197 |
+------+-------------------------------------------+-------+
1 row in set (27.71 sec)  -- waiting some seconds after session1 commit or rollback

look lock wait status
mysql> select * from sys.innodb_lock_waits\G
*************************** 1. row ***************************
                wait_started: 2017-07-27 14:22:01
                    wait_age: 00:00:23
               wait_age_secs: 23
                locked_table: `yejr`.`sid`
                locked_index: PRIMARY
                 locked_type: RECORD
              waiting_trx_id: 266756
         waiting_trx_started: 2017-07-27 14:22:01
             waiting_trx_age: 00:00:23
     waiting_trx_rows_locked: 2
   waiting_trx_rows_modified: 0
                 waiting_pid: 94
               waiting_query: select * from sid where id = 9490 for update
             waiting_lock_id: 266756:255:198:60
           waiting_lock_mode: X
             blocking_trx_id: 266753
                blocking_pid: 93
              blocking_query: select * from sys.innodb_lock_waits
            blocking_lock_id: 266753:255:198:60
          blocking_lock_mode: X
        blocking_trx_started: 2017-07-27 14:19:21
            blocking_trx_age: 00:03:03
    blocking_trx_rows_locked: 100346
  blocking_trx_rows_modified: 90784
     sql_kill_blocking_query: KILL QUERY 93
sql_kill_blocking_connection: KILL 93
[28 Jul 2017 12:25] MySQL Verification Team
Hi,
Can you provide
 - show table status
 - explain of that query
 - does the explain of that query changes after you do analyze table

all best
Bogdan
[31 Jul 2017 2:27] Ye Jinrong
1. table status;
mysql>show table status like 'sid'\G
*************************** 1. row ***************************
           Name: sid
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 100063
 Avg_row_length: 68
    Data_length: 6832128
Max_data_length: 0
   Index_length: 1589248
      Data_free: 4194304
 Auto_increment: 100001
    Create_time: 2016-11-26 22:35:02
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

2. the two explain
sql1:
mysql> explain update sid set name = 'afasdfads' where id >= 94773\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: sid
   partitions: NULL
         type: index
possible_keys: id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 100063
     filtered: 100.00
        Extra: Using where
1 row in set (0.01 sec)

sql2:
mysql> explain select * from sid where id = 9490 for update\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sid
   partitions: NULL
         type: ref
possible_keys: id
          key: id
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

3. new explain after analyze table
mysql> analyze table sid;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| yejr.sid | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.03 sec)

mysql> explain update sid set name = 'afasdfads' where id >= 94773\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: sid
   partitions: NULL
         type: index
possible_keys: id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 99388
     filtered: 100.00
        Extra: Using where
1 row in set (0.00 sec)

mysql>explain select * from sid where id = 9490 for update\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sid
   partitions: NULL
         type: ref
possible_keys: id
          key: id
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)
[31 Jul 2017 13:25] MySQL Verification Team
Hi,
This still does not look like a bug.

What is the output of 

SELECT COUNT(*) FROM sid WHERE  id >= 94773;

If it's more then half of the 100063 (total no of raws) it's expected to do a full table scan (faster) and that will lock the whole table (like every full table scan will do). It is by design.

all best
Bogdan
[31 Jul 2017 13:41] Ye Jinrong
Hi,

mysql>SELECT COUNT(*) FROM sid WHERE  id >= 94773;
+----------+
| COUNT(*) |
+----------+
|    90784 |
+----------+
1 row in set (0.03 sec)

mysql>select count(*) from sid;
+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.04 sec)

I find the following in document:
Each table index is queried, and the best index is used unless the optimizer believes that it is more efficient to use a table scan. At one time, a scan was used based on whether the best index spanned more than 30% of the table, but a fixed percentage no longer determines the choice between using an index or a scan. The optimizer now is more complex and bases its estimate on additional factors such as table size, number of rows, and I/O block size.

from: https://dev.mysql.com/doc/refman/5.7/en/where-optimization.html

It is reasonable to do a full table scan when scan more than 20%-30% of the table, but when update a table, i think mysql can not lock all rows, it is not so reasonable :(

At last, as you says:
If it's more then half of the 100063 (total no of raws) it's expected to do a full table scan (faster) and that will lock the whole table (like every full table scan will do). It is by design.

My question is, what is the percentage, 50% or 20%-30%, which shoud be more than accurate?

thanks much :)
[31 Jul 2017 13:57] MySQL Verification Team
> It is reasonable to do a full table scan when scan more than 20%-30%
> of the table, but when update a table, i think mysql can not lock all 
> rows, it is not so reasonable :(

You cannot do a full table scan without locking full table.

> My question is, what is the percentage, 50% or 20%-30%, 
> which shoud be more than accurate?

There is no "set" percentage. Basic theory say if you need to get more then 30% it's cheaper to scan full table then to jump trough index. This is not the only factor optimizer takes into account so there is no fixed value where the optimizer will decide to do one or the other thing, and the way it works changes trough versions (it's getting better and smarter).

all best
Bogdan
[31 Jul 2017 14:20] Ye Jinrong
Hi Bogdan,

thank you much once more :)