Bug #87998 | MySQL Group By slow even when indexed | ||
---|---|---|---|
Submitted: | 5 Oct 2017 16:42 | Modified: | 10 Oct 2017 15:07 |
Reporter: | Bram Kivenko | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.7 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | query performance |
[5 Oct 2017 16:42]
Bram Kivenko
[6 Oct 2017 12:02]
MySQL Verification Team
Hi! Thank you for your bug report. What we see is actually the expected behaviour. It is , of course, much faster to aggregate over the table that is 70 bytes wide, in comparison with the table that is 400 bytes wide. This is a simple law of physics. We would need a full and always repeatable test case to see if there is anything at odds, but before you do that, please try the following two / three steps: * Set latin1 on all CHAR or VARCHAR columns * If that is not possible, use some other fixed-length character set * Apply fixed length character set, both for table and for all other charset settings ... * Use FORCE INDEX instead of the USE INDEX * See the length of your VARCHAR columns and see whether you can convert it to CHAR. For example, if all your colB tuples contain between 30 and 40 chars, set it to CHAR(40) Please, let us know how it worked and if it does not we will have to try and repeat your test case. I must point out, one more time, that aggregating over 70 bytes is MUCH faster then aggregating over 400 bytes. Regarding index, it might not be wise to use if filtering conditions are done by index search, which would result in much smaller table ... I hope that I was clear enough.
[6 Oct 2017 12:30]
Bram Kivenko
I will try to create a repeatable example, however, while I do this, I want to make sure the problem is described correctly, because there seems to be a misunderstanding: There are two constructions offered, one seems to be 10x faster than the other. The slow construction does this: Queries a table for information of interest, which is indexed and the index is definitely being used, the results of that query are aggregated into a small result set of < 2000 records. The fast construction does this: Queries a table for information of interest, inserts that information into a new temporary table, the new table is then table scanned to produce the same result set above. To be clear, I am counting time from the start of querying the original table in both instances, therefore the workload cannot be less for the second construction. I will attempt to prove that: this doesn't have anything to do with row length, and that even if I additionally group by a column that is not indexed (and therefore requires the original records be read in full) that the second construction will still be faster.
[6 Oct 2017 13:00]
MySQL Verification Team
We are waiting on your feedback. Please, follow all other instructions as well.
[6 Oct 2017 20:57]
Bram Kivenko
Upon further investigation... It turns out, it is caused by the underlying table actually having colB defined as type text, not a varchar. The performance difference is caused exclusively because of this. However, this leaves me with 2 issues: (a) The index can be used without referencing the table rows at all, but this optimization does not seem to apply with group-by. (b) temporary tables seem to take a huge hit for text columns even though the text content itself is small. The DYNAMIC row format, for example is supposed to store small text data inline, but this does not translate to temporary tables, or at least the intended performance does not translate. So, this is not a bug, but there is some obvious performance enhancement available here that is not being used.
[9 Oct 2017 11:50]
MySQL Verification Team
Hi! Thank you for your investigations. I agree that this is not a bug, but I do not see a way how this situation could be optimised any further. You see, CLOB columns, which include both TEXT and BLOB, are stored in separate pages. That is how InnoDB SE is designed and has many advantages for fetching parts of the tuples, when CLOB is not required. The only possibility is to use VARCHAR, if possible.
[10 Oct 2017 14:49]
Bram Kivenko
Before closing the bug report, I am curious to know what happens with the following statement: SELECT colA, colB, count(*) FROM someTable use index (idx_someTable_Date_colA_colB) WHERE Date >= ? and Date < ? GROUP BY colA, colB || ''; (1) The above executes with the same performance as a varchar even if colB is a text column. Does it truncate long content for colB? Does it have other potential drawbacks? (2) Can't short text content be embedded inline in the internal temp table? It appears this optimization is available with DYNAMIC row formats when selecting from the original table, but the temp table seemingly does not make use of this optimization.
[10 Oct 2017 15:07]
MySQL Verification Team
Adding that operation at the end changes last column type. Speed will depend most of all of the lengths of the contents in colB in most of the tuples. So, your expression might slow down everything or accelerate , depending on the contents. When the optimiser chooses the algorithms, he uses only global info plus indices, does not scan a table. DYNAMIC row formats is an old format, not used any more.