Bug #98770 | Non-covering full index scan is always preferred over table scan for GROUP BY | ||
---|---|---|---|
Submitted: | 28 Feb 2020 4:02 | Modified: | 3 Mar 2020 13:14 |
Reporter: | Øystein Grøvlen | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.0.19 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[28 Feb 2020 4:02]
Øystein Grøvlen
[28 Feb 2020 13:27]
MySQL Verification Team
Hi Øystein, Thank you for yet another useful report. There are only couple of requests from me. We do not use DBT3, so, could you upload for us a test case with a smaller size then that huge database ??? Testing a bug on 50 Gb is a bit too much. May be you could upload a dump only of that table ??? We use mostly sysbench and our internal "World" database, without any additional requests. Next, EXPLAINs do not show "Using index", which means that data pages are read. That column is actually empty, so this looks like second bug report. There is no "Using where" nor "Using index", so may be there should be some third category. Do you concur ??? Next, you have shown that MEMORY engine is faster then TempTable engine. That would be a third bug report. Do you concur with this as well ??? Hence , we have three bugs in a single report. Would you agree that this should lead to three bug reports, all of which quite useful ??? Many thanks in advance.
[28 Feb 2020 18:41]
Øystein Grøvlen
Hi Sinisa! 1. Here is the results from running a similar query against the world database: mysql> SELECT sql_text, timer_wait/1000000000.0 "Time (ms)" FROM performance_schema.events_statements_history order by timer_start; +--------------------------------------------------------------------+-----------+ | sql_text | Time (ms) | +--------------------------------------------------------------------+-----------+ | truncate performance_schema.events_statements_history | 3.1604 | | select avg(population) from city group by countrycode | 7.2727 | | select avg(population) from city group by countrycode | 7.0528 | | select avg(population) from city group by countrycode | 6.9788 | | select avg(population) from city group by countrycode | 7.1160 | | select avg(population) from city group by countrycode | 6.9783 | | select avg(population) from city group by countrycode | 7.0807 | | select avg(population) from city group by countrycode | 6.9727 | | select avg(population) from city group by countrycode | 6.9246 | | select avg(population) from city group by countrycode | 6.9634 | | select avg(population) from city group by countrycode | 6.9464 | | select avg(population) from city use index () group by countrycode | 3.1152 | | select avg(population) from city use index () group by countrycode | 2.9697 | | select avg(population) from city use index () group by countrycode | 3.0242 | | select avg(population) from city use index () group by countrycode | 3.1219 | | select avg(population) from city use index () group by countrycode | 3.1007 | | select avg(population) from city use index () group by countrycode | 3.0109 | | select avg(population) from city use index () group by countrycode | 2.9983 | | select avg(population) from city use index () group by countrycode | 2.9582 | | select avg(population) from city use index () group by countrycode | 3.3237 | | select avg(population) from city use index () group by countrycode | 3.1143 | +--------------------------------------------------------------------+-----------+ 21 rows in set (0.01 sec) As you can see, the default plan takes around 7 ms, while a table scan uses around 3 ms. 2. I see no problem with the EXPLAIN output. There is no "Using index" because the index scan is not covering, and there is no "Using where" because there is no WHERE clause. I do think it is a good idea to add information about everything that is not used. That will be a long list! 3. Yes, it looks like the query is 10% faster when using the MEMORY engine for temporary tables. I can file a separate bug for that.
[28 Feb 2020 19:10]
Øystein Grøvlen
If the query is disk-bound, the use of a non-covering index scan is a total disaster. This is the result of running the query on a DBT3 SF10 database with a 1 GB buffer pool (O_DIRECT is used to make this really disk-bound): -------------- select max(total) from ( select l_partkey, l_suppkey, sum(l_quantity) total from lineitem group by l_partkey, l_suppkey) dt -------------- +------------+ | max(total) | +------------+ | 780.00 | +------------+ 1 row in set (1 hour 42 min 50.52 sec) -------------- select max(total) from ( select l_partkey, l_suppkey, sum(l_quantity) total from lineitem use index () group by l_partkey, l_suppkey) dt -------------- +------------+ | max(total) | +------------+ | 780.00 | +------------+ 1 row in set (1 min 46.85 sec) -------------- set internal_tmp_mem_storage_engine=MEMORY -------------- Query OK, 0 rows affected (0.00 sec) -------------- select max(total) from ( select l_partkey, l_suppkey, sum(l_quantity) total from lineitem group by l_partkey, l_suppkey) dt -------------- +------------+ | max(total) | +------------+ | 780.00 | +------------+ 1 row in set (1 hour 42 min 14.12 sec) -------------- select max(total) from ( select l_partkey, l_suppkey, sum(l_quantity) total from lineitem use index () group by l_partkey, l_suppkey) dt -------------- +------------+ | max(total) | +------------+ | 780.00 | +------------+ 1 row in set (7 min 37.11 sec) Using index scans is 60 times slower with index scan, and in this case, table scan is much faster even when the temporary tables overflows to InnoDB.
[3 Mar 2020 13:14]
MySQL Verification Team
Hi Øystein Grøvlen, Thank you for the results. I have repeated your test case results. I think that this report actually covers three bugs, but will wait for other reports from Øystein. Verified as reported.