MySQL Bugs: #104128: Poor performance for queries with tuple comparison

Bug #104128	Poor performance for queries with tuple comparison
Submitted:	28 Jun 2021 7:00	Modified:	28 Jun 2021 15:26
Reporter:	Hamid Alaei	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	8.0.25	OS:	Ubuntu (18.04)
Assigned to:		CPU Architecture:	x86 (Core i7)

Description:
According to (https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html), using tuple comparison, if not mixed with single column comparison, should be equivalent of rewriting the query using single column comparison and "and" "or" operations in terms of how efficient the DB can benefit from an index. However, I find a scenario in which using tuple comparison is even worse than scanning the whole table. Not only the DB can't use the index to skip rows not in the range, but also the query run-time is several times slower than running an equivalent query on a table without index.
I find the issue while testing a Laravel PR. I have my test case there too:
https://github.com/laravel/framework/pull/37762#issuecomment-868484453

How to repeat:
# Table:
CREATE TABLE `foos` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `foo` int NOT NULL,
  `bar` int NOT NULL,
  `baz` int NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `foos_foo_bar_baz_index` (`foo`,`bar`,`baz`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

# Data:
The table is seeded with 2 million records, with 0<=foo<10, 0<=bar<10, 0<=baz<20000. This distribution noticeably slows down queries that can't use the entire 12-byte length of the multi-column index.

for ($foo = 0; $foo < 10; $foo++) {
    for ($bar = 0; $bar < 10; $bar++) {
        for ($i = 0; $i < 20000; $i += 100) {
            $values = [];
            $date = now();
            for ($baz = $i; $baz < $i + 100; $baz++) {
                $values[] = [
                    'foo' => $foo,
                    'bar' => $bar,
                    'baz' => $baz,
                    'created_at' => $date,
                    'updated_at' => $date,
                ];
            }
            \App\Models\Foo::query()->insert($values);
        }
    }
}

# Query with tuple comparison:

select * from `foos` where (`foo`, `bar`, `baz`) > (9, 9, 19000) order by `foo` asc, `bar` asc, `baz` asc limit 51

Runtime: 16 seconds.
key_len reported by explain select statement: 12.
This result must be a bug from MySQL. It is multiple times slower than scanning the whole table, e.g. by select count(*).

# Equivalent single comparison query:
select * from `foos` where (`foo` > 9 or (`foo` = 9 and (`bar` > 9 or (`bar` = 9 and (`baz` > 19000))))) order by `foo` asc, `bar` asc, `baz` asc limit 51;

Runtime: less than 2 milliseconds.
key_len reported by explain select statement: 12.

# Note
It seems to me that with tuple comparison, the index is used to filter the rows after the given range, but MySQL always start the search from the beginning of the index, even though the desired rows are at the end of the index. So if I change the range query from > (9, 9, 19000) to > (0, 0, 0), the query will run much faster.

Bug 32585422 - "ROW COMPARISONS" ARE EXECUTED INEFFICIENTLY

Hi Mr. Alael,

Thank you very much for your bug report.

However, this bug is a duplicate of an internally-filed bug report, which is not yet scheduled for fixing, but it is already verified.

Since it is an internal bug, public can not have access to it.

A duplicate.

I hope you consider increase the priority of this issue. Please note that removing the multi-column index in my test-case speeds up the performance by a factor of 20, which is wired. I never expected an indexing slows down queries by orders of magnitude!
Also if it isn't possible to fix the issue any time soon, it will be great if you can update the documentations at least.

Thanks.

I am not sure I am happy with this bug being flagged as a duplicate of an internal bug.  Because the internal bug is not publicly accessible, there is no way to know if this is fixed or not, and in which version it is fixed.  IMHO, this should be flagged as Verified so the public can follow progress on the fix.

The internal bug:
 Bug 32585422 - "ROW COMPARISONS" ARE EXECUTED INEFFICIENTLY 

was eventually closed by development as "Not a bug"
with the following comment:

----------
We have documented the following related to row comparisons :
For row comparisons, (a, b) >= (x, y) is equivalent to:
 (a > x) OR ((a = x) AND (b >= y))
For row comparisons, (a, b) < (x, y) is equivalent to:
 (a < x) OR ((a = x) AND (b < y))

This means we compare the row constructor according the rule mentioned above.
However, this does not mean we transform the given row constructor according
to this rule while optimizing the query, because it gets complicated with
increase in number of columns in the row constructor (for eg: (a,b,c) >
(x,y,z), etc).
According to the documentation :

We only do this optimizations for equality and IN() predicate.
https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#row-constructor-range-opti...

For queries, involving operators other than IN() and equality, it is
recommended to re-write those queries as non-constructor expressions.
https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html 

-------------

Having said this,  an internal WL was opened to improve these types of queries:
WL#14731 - Range optimizations for row comparisons

Somehow related: Bug#111952.

> It is multiple times slower than scanning the whole table, e.g. by select count(*).

Care about this statement: select count(*) only needs scanning the secondary index, while the query being slow in this report, if being run using the index, needs to do primary key lookup to fetch the columns not in the index (select *).

Hamid: it would be interesting to show the EXPLAIN and SHOW STATUS of running the query with and without the row constructor being expanded.  You can look in Bug#111952 for inspiration on how to do this.

Merci, Mr. Gagne ....