Bug #111952 Row constructor inequality comparison on Primary Key results in full table scan.
Submitted: 2 Aug 2023 19:50 Modified: 3 Aug 2023 7:43
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.34, 5.7.43, 8.1.0 OS:Any
Assigned to: CPU Architecture:Any

[2 Aug 2023 19:50] Jean-François Gagné
Description:
Hi,

this bug is a follow-up on Bug#104128.  That bug was closed as a Duplicate of an internal Verified bug (Bug#32585422), which ended-up being closed as Not a Bug, with an internal Work log (WL#14731) being open to "improve these types of queries".  This looks like a lot of shenanigans to hide a sub-optimal behavior.  This new bug sheds additional light on all this.

As explained in Bug#104128 and shown in How to repeat, a query with a row constructor inequality comparison takes more time to execute than the expanded row constructor condition while we expect query performance to be similar.  The reproduction in this report is a little different than in Bug#104128: that other report row constructor comparison is on a secodary index, and this report is on the primary key.  More details are given in this report: an EXPLAIN shows the difference in plan, and a SHOW STATUS shows the amount of data scanned.

I set this as S2 (Serious) because I consider row constructor an essential feature to simplify writing queries.  Where I work, we often do "paginated" queries scanning a multi-column indexes (both secondary and primary), and using row constructor result in much simpler SQL.

If there is an argument about this being a bug, I would settle on a feature request (the existence of a WL shows this should somehow be verified).  Also, if this is not a bug (but still a feature request), documenting the sub-optimal behavior in [1] is probably important.

[1]: https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html

How to repeat details the behavior for MySQL 8.0.34, but the behavior of this is the same for 5.7.43 and 8.1.0.

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
# Create a dbdeployer sandbox.
dbdeployer deploy single mysql_8.0.34

# Create a test schema.
./use <<< "create database test_jfg"

# Create a test table
./use test_jfg <<< "
  create table u(
    p int not null,
    i bigint not null,
    v int not null default 0,
    PRIMARY KEY (p,i))"

# Fill the table with data.
for i in $(seq 1 30000); do
  for j in $(seq 1 100); do echo "($i,$j)"; done
done |
  paste -s -d "$(printf ',%.0s' {1..1000})\n" |
  sed -e 's/.*/INSERT INTO u(p,i) VALUES &;/' |
  ./use test_jfg

-- Explain for the expanded row constructor condition shows a range scan,
--   with the query being quick, and only calling the storage engine / examining rows as needed.
mysql [localhost:8034] {msandbox} (test_jfg) > explain select p, i from u where (p = 25001 and i > 50) or p > 25001 order by p, i limit 1 offset 99\G flush status; select p, i from u where (p = 25001 and i > 50) or p > 25001 order by p, i limit 1 offset 99; show status like 'Handler_read_%';
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 12
          ref: NULL
         rows: 955168
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+-------+----+
| p     | i  |
+-------+----+
| 25002 | 50 |
+-------+----+
1 row in set (0.00 sec)

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 2     |
| Handler_read_last     | 0     |
| Handler_read_next     | 99    |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

-- Explain for the row constructor condition shows a full table scan (type: index),
--   with the query being quick slower than above,
--   and calling the storage engine / examining rows much more than expected.
mysql [localhost:8034] {msandbox} (test_jfg) > explain select p, i from u where (p,i) > (25001,50) order by p, i limit 1 offset 99\G flush status; select p, i from u where (p,i) > (25001,50) order by p, i limit 1 offset 99; show status like 'Handler_read_%';
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
   partitions: NULL
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 12
          ref: NULL
         rows: 100
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

+-------+----+
| p     | i  |
+-------+----+
| 25002 | 50 |
+-------+----+
1 row in set (0.34 sec)

+-----------------------+---------+
| Variable_name         | Value   |
+-----------------------+---------+
| Handler_read_first    | 1       |
| Handler_read_key      | 1       |
| Handler_read_last     | 0       |
| Handler_read_next     | 2500149 |
| Handler_read_prev     | 0       |
| Handler_read_rnd      | 0       |
| Handler_read_rnd_next | 0       |
+-----------------------+---------+
7 rows in set (0.00 sec)
[3 Aug 2023 7:43] MySQL Verification Team
Hello Jean-François,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh
[30 Oct 2024 6:12] tianfeng li
It seems to be the duplicate issue as BUG#108116, and a contribution that improves the row constructor expression optimization was submitted.