Bug #119421 Regression in 8.4 vs 8.0 where an index with a second column being a prefix value is not used for a range scan
Submitted: 18 Nov 12:51 Modified: 3 Dec 11:36
Reporter: Glyn Astill Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.4 OS:Linux
Assigned to: CPU Architecture:Any
Tags: 8.4 regression, index prefix, range scan

[18 Nov 12:51] Glyn Astill
Description:
We have a table with a composite secondary index containing char(n) types where the second indexed field is a length prefix value less than n. Example:

  CREATE TABLE t (
      a int PRIMARY KEY AUTO_INCREMENT, 
      b char(20),
      c char(20),
      d int
  );
  CREATE INDEX t_i1 ON t (b, c(16), d);

A query against the table on MySQL 8.4 with an equality condition on the leftmost field and multiple values for the second field using "IN" or "=" results in a plan where the second field is not included in the index condition to do a range scan.

* In MySQL Community Server 8.0.39 We get an Index range scan on column a and b and then a filter.
* In MySQL Community Server 8.4.7 We get an Index lookup on column a and then a filter.

How to repeat:
The reproduction below uses a primary key for simplicity but the same behaviour remains:

CREATE DATABASE test;
USE test;
CREATE TABLE test.repro (
  a char(20),
  b char(20),
  PRIMARY KEY (a, b(16))
);

-- The view "dba.generator_64k" simply returns a series of integers from 1 to 64,000.
INSERT INTO test.repro (a, b) 
SELECT concat(left(md5(cast(n/1000 as signed)),19), '='), left(md5(n),20) from dba.generator_64k n;

-- I will attach some traces after the bug submission.
-- Unexpected behaviour "b" not in index lookup
EXPLAIN ANALYZE SELECT count(*) FROM test.repro
WHERE repro.a = 'eccbc87e4b5ce2fe283=' AND repro.b IN ('7a4bf9ba2bd774068ad5', '6fab6e3aa34248ec1e34')\G 

-- Expected "a" and "b" in index lookup and we get a range scan
EXPLAIN ANALYZE SELECT count(*) FROM test.repro
WHERE repro.a = 'eccbc87e4b5ce2fe283=' AND (repro.b LIKE '7a4bf9ba2bd774068ad5' OR repro.b LIKE '6fab6e3aa34248ec1e34')\G 

-- If the where clause is just reduced to a single value for "b" we get a more reasonable/expected result,
-- If we were to try "(repro.b = '7a4bf9ba2bd774068ad5' OR repro.b = '6fab6e3aa34248ec1e34')" then the undesired behaviour exists just the same as using IN.
-- If we change "b" from char(n) to text the initial query is again reasonable/expected result.
-- If we change the index from "b(16)" to "b(20)" or just "b" the initial query is again reasonable/expected result.

Suggested fix:
8.4 should behave the same as 8.4.
[18 Nov 12:52] Glyn Astill
Trace showing the 8.0 behaviour (desired)

Attachment: repo_trace_8.0.txt (text/plain), 10.46 KiB.

[18 Nov 12:53] Glyn Astill
Trace showing the 8.4 behaviour (undesired)

Attachment: repro_trace_8.4.txt (text/plain), 20.38 KiB.

[18 Nov 13:17] Glyn Astill
In suggested fix, I intended to write "8.0 should behave the same as 8.4."
[18 Nov 13:32] Glyn Astill
lets try again ... "8.4 should behave the same as 8.0."
[18 Nov 16:54] Jean-François Gagné
Probably a duplicate of Bug#118009, see also Bug#119286.
[18 Nov 23:09] Glyn Astill
Bug#118009 is interesting, but if related I fail to see how the explanation there relates to it only affecting fields with a length. If we change column "b" from char(20) to text, keeping the prefix index "b(16)" behaviour is as expected.
[18 Nov 23:40] Jean-François Gagné
There might be a better explanation in Bug#119286, with respect to the fix of that duplicate bug (in 8.4) introducing a performance regression.

In Bug#119286, I link the commit fixing this in 8.4, you might also find more explanation in that commit which I also link below.

https://github.com/mysql/mysql-server/commit/6ae1d0216c416c3053a301c7b4df2ce4c0bf212d

I might also be wrong about duplicate.  I am commenting here trying to be helpful, sorry if I am wrong.
[3 Dec 11:36] Knut Anders Hatlen
Thanks for the bug report. Jean-François is correct about which commit introduced this change, and I agree that it seems to describe the same problem as in bug#118009, so I'm closing it as a duplicate.

You can see why the change was done if you take your original repro test case and modify it so that it populates the table with this row:

INSERT INTO test.repro VALUES ('eccbc87e4b5ce2fe283=', '6fab6e3aa㉞248ec1e34');

In 8.0, your query

  SELECT count(*) FROM test.repro
  WHERE repro.a = 'eccbc87e4b5ce2fe283=' AND repro.b IN ('7a4bf9ba2bd774068ad5', '6fab6e3aa34248ec1e34')

returns zero rows when you have the prefix index. If you remove the prefix index, it returns one row.

In 8.4, it returns one row regardless of whether there is a prefix index.

You're right that changing the type of b from char(20) to text seem to preserve the old behaviour. Unfortunately, that means it still returns wrong results for the text data type with a prefix index, so it was probably an oversight in the original bug fix.