Bug #93360 | mysterious optimizer behavior | ||
---|---|---|---|
Submitted: | 27 Nov 2018 9:41 | Modified: | 12 Dec 2018 14:45 |
Reporter: | dean winchester | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.6.16,5.7.23 | OS: | Linux |
Assigned to: | CPU Architecture: | x86 | |
Tags: | Optimizer |
[27 Nov 2018 9:41]
dean winchester
[27 Nov 2018 10:21]
MySQL Verification Team
Thank you for the bug report. Your server reported version is quite older 5.6.16 current released version is 5.6.42 so please check the latest version (a lot of bugs were fixed between both versions so there is a chance to be fixed) if the issue still exists provide a complete test case including even fake data but enough to be repeatable. Thanks.
[28 Nov 2018 3:58]
dean winchester
OK I simplified the query to this (and run the query on 5.7.23): `select * from user u left outer join user_verify uv on u.id = uv.user_id order by u.id LIMIT 100` the query plan is still `+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | 1 | SIMPLE | u | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | uv | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+` and here's the sample data to reproduce: `DROP TABLE IF EXISTS `user`; CREATE TABLE `user` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `member_id` bigint(20) DEFAULT NULL, `phone` varchar(20) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `is_valid` tinyint(1) NOT NULL DEFAULT '0', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `idx_member_id` (`member_id`), UNIQUE KEY `idx_phone` (`phone`), UNIQUE KEY `idx_email` (`email`) ) ENGINE=InnoDB AUTO_INCREMENT=10153663 DEFAULT CHARSET=utf8mb4; INSERT INTO `user` VALUES (10153653,1203214322,NULL,NULL,1,'2018-11-07 14:39:23','2018-11-28 03:49:31'),(10153654,1182281903,NULL,NULL,1,'2018-11-08 02:12:56','2018-11-28 03:49:31'),(10153655,1487930373,NULL,NULL,1,'2018-11-11 02:02:16','2018-11-28 03:49:31'),(10153656,1361606434,NULL,NULL,1,'2018-11-12 02:07:54','2018-11-28 03:49:31'),(10153657,1360695368,NULL,NULL,1,'2018-11-12 03:47:46','2018-11-28 03:49:31'),(10153658,1898784232,NULL,NULL,1,'2018-11-13 10:27:37','2018-11-28 03:49:31'),(10153659,1595489636,NULL,NULL,1,'2018-11-13 14:22:08','2018-11-28 03:49:31'),(10153660,1684481093,NULL,NULL,1,'2018-11-14 01:52:37','2018-11-28 03:49:31'),(10153661,1499820628,NULL,NULL,1,'2018-11-14 05:56:24','2018-11-28 03:49:31'),(10153662,1313030544,NULL,NULL,1,'2018-11-15 02:47:42','2018-11-28 03:49:31'); DROP TABLE IF EXISTS `user_verify`; CREATE TABLE `user_verify` ( `user_id` bigint(20) NOT NULL, `verify_type` varchar(30) NOT NULL, `verify_title` varchar(100) DEFAULT NULL, `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; INSERT INTO `user_verify` VALUES (10153653,'PARTNER',NULL,'2018-11-28 03:50:21','2018-11-28 03:50:21');`
[28 Nov 2018 3:59]
dean winchester
add version
[3 Dec 2018 14:00]
MySQL Verification Team
Hi, I truly fail to see how is this a bug. First of all, table `user_verify` has too small number of rows that would making index improve things. On the contrary, it would slow down the execution. Second, in the query where index is used, it has been forced upon the optimiser, so no gains again. Last, but not least, in the query where filesort is not used, there is not ORDER BY, so why would it be used.
[4 Dec 2018 2:35]
dean winchester
Hello, see this query: select * from user u left outer join user_verify uv on u.id = uv.user_id order by u.id limit 100; and the query plan: +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ | 1 | SIMPLE | u | NULL | ALL | NULL | NULL | NULL | NULL | 556838 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | uv | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ With about 622K records in `user` table and just 1 record in `user_verify`, the query took 1.6 seconds (run on 5.7.23), which is really absurd to me. And this is the profiling of the query: SELECT event_name AS Stage, TIMER_WAIT FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=167; +-------------------------------------+-------------+ | Stage | TIMER_WAIT | +-------------------------------------+-------------+ | stage/sql/starting | 58000000 | | stage/sql/checking permissions | 2000000 | | stage/sql/checking permissions | 1000000 | | stage/sql/Opening tables | 14000000 | | stage/sql/init | 25000000 | | stage/sql/System lock | 5000000 | | stage/sql/optimizing | 6000000 | | stage/sql/statistics | 16000000 | | stage/sql/preparing | 19000000 | | stage/sql/Creating tmp table | 36000000 | | stage/sql/Sorting result | 4000000 | | stage/sql/executing | NULL | | stage/sql/Sending data | 81551000000 | | stage/sql/converting HEAP to ondisk | 16642000000 | | stage/sql/Creating sort index | 75299000000 | | stage/sql/end | 1000000 | | stage/sql/query end | 7000000 | | stage/sql/removing tmp table | 810000000 | | stage/sql/closing tables | 3000000 | | stage/sql/freeing items | 84000000 | | stage/sql/logging slow query | 68000000 | | stage/sql/cleaning up | NULL | +-------------------------------------+-------------+ I think the problem is that since it's a left join, and the primary key of `user_verify` is ignored, mysql doesn't know how many rows of `user_verify` a single `user` record will be joined to, so it fail to terminate early as soon as scanning the first 100 rows of `user` table. This might be some sort of edge case, but it's really bad when you run into it. PS: in my case(with 622K records in `user` table), the magic number seem to be 6. Soon as there are 6 records in `user_verify` table, the query is executed as expected: +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------+ | 1 | SIMPLE | u | NULL | index | NULL | PRIMARY | 8 | NULL | 100 | 100.00 | NULL | | 1 | SIMPLE | uv | NULL | eq_ref | PRIMARY | PRIMARY | 8 | test.u.id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------+
[4 Dec 2018 13:39]
MySQL Verification Team
Hi, Actually, it is not absurd at all. This is because you are using outer join and the outer join can be only resolved in this manner. Trying INNER JOIN would render a different plan.
[5 Dec 2018 3:45]
dean winchester
Can't agree with you. inner join won't give me what I want coz I need to select the first 100 records of `user` together with 0 or 1 record of `user_verify` for each `user`. So I believe in this case a left join is totally legit. Anyhow, having to use a hint 'force index for join (primary)' to 'speed up' a simple query from over 1 second to less than 1ms seems a problem (or at least something to be improved) to me.
[5 Dec 2018 13:43]
MySQL Verification Team
Hi, Let me see if I understand you correctly. If you use FORCE INDEX on PRIMARY for the `user` table in the query: select * from user u left outer join user_verify uv on u.id = uv.user_id order by u.id limit 100; then you get the execution time from one second to one millisecond ???? Can you run both queries ten times and get the average, providing nothing else runs on the server ??? If that is the case, then we might be looking at a performance bug.
[6 Dec 2018 2:00]
dean winchester
OK let me summarize it. Here're two tables involved: user (with 620k records): CREATE TABLE `user` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `member_id` bigint(20) DEFAULT NULL COMMENT '用户号以及推荐码,对外都应该只暴露本字段而不是id', `phone` varchar(20) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `is_valid` tinyint(1) NOT NULL DEFAULT '0', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `idx_member_id` (`member_id`), UNIQUE KEY `idx_phone` (`phone`), UNIQUE KEY `idx_email` (`email`) ) ENGINE=InnoDB AUTO_INCREMENT=641121 DEFAULT CHARSET=utf8mb4; user_verify (with 1-6 records, in my test case below, only 1 record): create table user_verify ( user_id bigint(20) not null, verify_type varchar(30) not null, verify_title varchar(100), create_time timestamp not null default current_timestamp, update_time timestamp not null default current_timestamp on update current_timestamp, primary key (user_id) ); And here're two queries: query 1: select * from user u left outer join user_verify uv on u.id = uv.user_id order by u.id limit 100; explain result: +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ | 1 | SIMPLE | u | NULL | ALL | NULL | NULL | NULL | NULL | 619080 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | uv | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ and query 1 took about 1.3 seconds except for the first time, which took over 2 seconds, might be buffer pool, or OS page cache etc, let's not worry about that in this case. And query 2 (the only difference is the force index part): select * from user u left outer join user_verify uv force index for join (primary) on u.id = uv.user_id order by u.id limit 100; explain: +----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------+ | 1 | SIMPLE | u | NULL | index | NULL | PRIMARY | 8 | NULL | 100 | 100.00 | NULL | | 1 | SIMPLE | uv | NULL | eq_ref | PRIMARY | PRIMARY | 8 | user_db.u.id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------+ And running query 2, mysql client always reports 0.00 sec. Looking into performance_schema.events_statements_history, the TIMER_WAIT reports numbers like 850000000(that's 0.85ms right?)
[6 Dec 2018 14:57]
MySQL Verification Team
Hi, I do not get your results. I get these ... First, both explains: explain select * from user u left outer join user_verify uv on u.id = uv.user_id order by u.id limit 100; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | 1 | SIMPLE | u | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | uv | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ explain select * from user u left outer join user_verify uv force index for join (primary) on u.id = uv.user_id order by u.id limit 100; +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------+ | 1 | SIMPLE | u | NULL | index | NULL | PRIMARY | 8 | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | uv | NULL | eq_ref | PRIMARY | PRIMARY | 8 | test.u.id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------+ Now, the times ... For the first query: real 0m0.176s user 0m0.013s sys 0m0.011s For the second query: real 0m0.172s user 0m0.012s sys 0m0.011s
[6 Dec 2018 16:27]
MySQL Verification Team
A small note. I have tested your test case on both 5.7.24 and 8.0.14 and got identical results.
[7 Dec 2018 1:37]
dean winchester
Hello, Can you try both queries with more records in `user` table (in my case, there're more that 600K)? With only 10 records in `user` table, I doubt there'd be much difference. Second, I don't understand why the second query took so much time. Even with 600K records in `user` table, I still got this with force index: real 0m0.019s user 0m0.012s sys 0m0.004s but without force index: real 0m1.407s user 0m0.012s sys 0m0.004s
[7 Dec 2018 13:40]
MySQL Verification Team
Hi, Can you upload those 60K rows to our SFTP site. Please, use "Files" tab for that operation. Times are different due to the hardware, OS (and its settings) and MySQL tuning ... Plus, I run debug version so that I can catch any reports. Thank you in advance.
[7 Dec 2018 15:52]
dean winchester
Yeah I can do that, but you'll have to wait until Monday coz the data was on my work laptop. However you can easily generate the data yourself. CREATE TABLE `user` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `member_id` bigint(20) DEFAULT NULL, `phone` varchar(20) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `is_valid` tinyint(1) NOT NULL DEFAULT '0', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `idx_member_id` (`member_id`), UNIQUE KEY `idx_phone` (`phone`), UNIQUE KEY `idx_email` (`email`) ) ENGINE=InnoDB AUTO_INCREMENT=641121 DEFAULT CHARSET=utf8mb4; Just generate id together with member_id from 1 to 600K, columns like phone and email don't matter, just leave them to be NULL. A simple script can do that. Anyway, if you still want it, I can upload it on Monday.
[7 Dec 2018 16:36]
MySQL Verification Team
Hi, Since distribution and width of rows can be important we shall wait on your data during next week.
[11 Dec 2018 3:33]
dean winchester
Hello, I've uploaded data for `user` table, filename is mysql-bug-data-93360.gz.
[11 Dec 2018 14:16]
MySQL Verification Team
Hi, Can you please let us know to which SFTP site have you uploaded that file. To which directory EXACTLY have you uploaded it and how big is the file ??? Thanks in advance.
[12 Dec 2018 2:49]
dean winchester
Hello, I uploaded to sftp.oracle.com/support/incoming/, filename is mysql-bug-data-93360.gz, file size is about 7MB
[12 Dec 2018 14:11]
MySQL Verification Team
Just now trying to reproduce it.
[12 Dec 2018 14:45]
MySQL Verification Team
Hi, This is the output from the first select: time "select * from user u left outer join user_verify uv on u.id = uv.user_id order by u.id limit 100" real 0m7.194s user 0m0.012s sys 0m0.006s This is from the second select: time "select * from user u left outer join user_verify uv force index for join (primary) on u.id = uv.user_id order by u.id limit 100" real 0m0.225s user 0m0.012s sys 0m0.009s Hence , I am verifying this bug as a Performance bug in the "Optimizer" category. Thank you for your contribution.
[12 Dec 2018 14:47]
MySQL Verification Team
Additional explanation. I have got the same results on both 5.7.24 and 8.0.14.