Bug #104128 Poor performance for queries with tuple comparison
Submitted: 28 Jun 2021 7:00 Modified: 28 Jun 2021 15:26
Reporter: Hamid Alaei Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.25 OS:Ubuntu (18.04)
Assigned to: CPU Architecture:x86 (Core i7)

[28 Jun 2021 7:00] Hamid Alaei
Description:
According to (https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html), using tuple comparison, if not mixed with single column comparison, should be equivalent of rewriting the query using single column comparison and "and" "or" operations in terms of how efficient the DB can benefit from an index. However, I find a scenario in which using tuple comparison is even worse than scanning the whole table. Not only the DB can't use the index to skip rows not in the range, but also the query run-time is several times slower than running an equivalent query on a table without index.
I find the issue while testing a Laravel PR. I have my test case there too:
https://github.com/laravel/framework/pull/37762#issuecomment-868484453

How to repeat:
# Table:
CREATE TABLE `foos` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `foo` int NOT NULL,
  `bar` int NOT NULL,
  `baz` int NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `foos_foo_bar_baz_index` (`foo`,`bar`,`baz`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

# Data:
The table is seeded with 2 million records, with 0<=foo<10, 0<=bar<10, 0<=baz<20000. This distribution noticeably slows down queries that can't use the entire 12-byte length of the multi-column index.

for ($foo = 0; $foo < 10; $foo++) {
    for ($bar = 0; $bar < 10; $bar++) {
        for ($i = 0; $i < 20000; $i += 100) {
            $values = [];
            $date = now();
            for ($baz = $i; $baz < $i + 100; $baz++) {
                $values[] = [
                    'foo' => $foo,
                    'bar' => $bar,
                    'baz' => $baz,
                    'created_at' => $date,
                    'updated_at' => $date,
                ];
            }
            \App\Models\Foo::query()->insert($values);
        }
    }
}

# Query with tuple comparison:

select * from `foos` where (`foo`, `bar`, `baz`) > (9, 9, 19000) order by `foo` asc, `bar` asc, `baz` asc limit 51

Runtime: 16 seconds.
key_len reported by explain select statement: 12.
This result must be a bug from MySQL. It is multiple times slower than scanning the whole table, e.g. by select count(*).

# Equivalent single comparison query:
select * from `foos` where (`foo` > 9 or (`foo` = 9 and (`bar` > 9 or (`bar` = 9 and (`baz` > 19000))))) order by `foo` asc, `bar` asc, `baz` asc limit 51;

Runtime: less than 2 milliseconds.
key_len reported by explain select statement: 12.

# Note
It seems to me that with tuple comparison, the index is used to filter the rows after the given range, but MySQL always start the search from the beginning of the index, even though the desired rows are at the end of the index. So if I change the range query from > (9, 9, 19000) to > (0, 0, 0), the query will run much faster.
[28 Jun 2021 8:45] MySQL Verification Team
Bug 32585422 - "ROW COMPARISONS" ARE EXECUTED INEFFICIENTLY
[28 Jun 2021 13:14] MySQL Verification Team
Hi Mr. Alael,

Thank you very much for your bug report.

However, this bug is a duplicate of an internally-filed bug report, which is not yet scheduled for fixing, but it is already verified.

Since it is an internal bug, public can not have access to it.

A duplicate.
[28 Jun 2021 15:26] Hamid Alaei
I hope you consider increase the priority of this issue. Please note that removing the multi-column index in my test-case speeds up the performance by a factor of 20, which is wired. I never expected an indexing slows down queries by orders of magnitude!
Also if it isn't possible to fix the issue any time soon, it will be great if you can update the documentations at least.

Thanks.
[7 Oct 2021 19:32] Jean-François Gagné
I am not sure I am happy with this bug being flagged as a duplicate of an internal bug.  Because the internal bug is not publicly accessible, there is no way to know if this is fixed or not, and in which version it is fixed.  IMHO, this should be flagged as Verified so the public can follow progress on the fix.
[8 Oct 2021 5:42] MySQL Verification Team
The internal bug:
 Bug 32585422 - "ROW COMPARISONS" ARE EXECUTED INEFFICIENTLY 

was eventually closed by development as "Not a bug"
with the following comment:

----------
We have documented the following related to row comparisons :
For row comparisons, (a, b) >= (x, y) is equivalent to:
 (a > x) OR ((a = x) AND (b >= y))
For row comparisons, (a, b) < (x, y) is equivalent to:
 (a < x) OR ((a = x) AND (b < y))

This means we compare the row constructor according the rule mentioned above.
However, this does not mean we transform the given row constructor according
to this rule while optimizing the query, because it gets complicated with
increase in number of columns in the row constructor (for eg: (a,b,c) >
(x,y,z), etc).
According to the documentation :

We only do this optimizations for equality and IN() predicate.
https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#row-constructor-range-opti...

For queries, involving operators other than IN() and equality, it is
recommended to re-write those queries as non-constructor expressions.
https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html 

-------------

Having said this,  an internal WL was opened to improve these types of queries:
WL#14731 - Range optimizations for row comparisons
[2 Aug 2023 19:54] Jean-François Gagné
Somehow related: Bug#111952.

> It is multiple times slower than scanning the whole table, e.g. by select count(*).

Care about this statement: select count(*) only needs scanning the secondary index, while the query being slow in this report, if being run using the index, needs to do primary key lookup to fetch the columns not in the index (select *).

Hamid: it would be interesting to show the EXPLAIN and SHOW STATUS of running the query with and without the row constructor being expanded.  You can look in Bug#111952 for inspiration on how to do this.
[3 Aug 2023 12:33] MySQL Verification Team
Merci, Mr. Gagne ....