Bug #114512 MySQL 8: Order by with hash join incredibly slow
Submitted: 29 Mar 2024 9:35 Modified: 17 Jun 2024 9:51
Reporter: Jakub Szafrański Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.3.0, 8.0.36 OS:Any
Assigned to: CPU Architecture:Any

[29 Mar 2024 9:35] Jakub Szafrański
Description:
I encountered an issue today and I'm not sure if this is a bug in MySQL 8 or if I'm doing something wrong.

Preface

At the bottom of the post I attached a query to reproduce the issue locally.

I have three tables: clients and two more. All tables have ~50 columns. I'm making a very simple query that joins all tables together to the clients table by a foreign key and then just orders them.

The clients table has 1M rows. The two other tables are empty.

The problematic query looks as follows:

select
    test_client.*,
    test_join1.*,
    test_join2.*
from
    test_client
    left join test_join1 on test_client.id = test_join1.client_id
    left join test_join2 on test_client.id = test_join2.client_id
order by test_client.id
limit 10;

This query takes 4-5 seconds to complete.

When I remove ORDER BY it executed in 9 ms. If I disable hash joins by using the /*+ NO_BNL() */ planner hint it also completes in 9 ms. If I query only for one column from each table instead of all of them, the query also completes in 9ms.

The issue became visible after upgrading from 5.7 to 8.3. The query was not using hash joins in 5.7 and thus no problem was observed.

How to repeat:
I reproduced the issue by using the offical mysql 8.3.0 docker image. The issue seems to happen regardless of the operating system or architecture (I tested in on ARM @ macbook m3 and on intel x86_64 on linux).

The query that is slow is:

select
    test_client.*,
    test_join1.*,
    test_join2.*
from
    test_client
    left join test_join1 on test_client.id = test_join1.client_id
    left join test_join2 on test_client.id = test_join2.client_id
order by test_client.id
limit 10;

The following query can be used to generate a test data set for the query above to cause a visible problem:

drop table if exists test_join1;
drop table if exists test_join2;
drop table if exists test_client cascade;

create table if not exists test_client (
    id int auto_increment primary key,
    col1 varchar(255),
    col2 varchar(255),
    col3 varchar(255),
    col4 varchar(255),
    col5 varchar(255),
    col6 varchar(255),
    col7 varchar(255),
    col8 varchar(255),
    col9 varchar(255),
    col10 varchar(255),
    col11 varchar(255),
    col12 varchar(255),
    col13 varchar(255),
    col14 varchar(255),
    col15 varchar(255),
    col16 varchar(255),
    col17 varchar(255),
    col18 varchar(255),
    col19 varchar(255),
    col20 varchar(255),
    col21 varchar(255),
    col22 varchar(255),
    col23 varchar(255),
    col24 varchar(255),
    col25 varchar(255),
    col26 varchar(255),
    col27 varchar(255),
    col28 varchar(255),
    col29 varchar(255),
    col30 varchar(255),
    col31 varchar(255),
    col32 varchar(255),
    col33 varchar(255),
    col34 varchar(255),
    col35 varchar(255),
    col36 varchar(255),
    col37 varchar(255),
    col38 varchar(255),
    col39 varchar(255),
    col40 varchar(255),
    col41 varchar(255),
    col42 varchar(255),
    col43 varchar(255),
    col44 varchar(255),
    col45 varchar(255),
    col46 varchar(255),
    col47 varchar(255),
    col48 varchar(255),
    col49 varchar(255)
);
create table if not exists test_join1 (
    id int auto_increment primary key,
    client_id int,
    col1 varchar(255),
    col2 varchar(255),
    col3 varchar(255),
    col4 varchar(255),
    col5 varchar(255),
    col6 varchar(255),
    col7 varchar(255),
    col8 varchar(255),
    col9 varchar(255),
    col10 varchar(255),
    col11 varchar(255),
    col12 varchar(255),
    col13 varchar(255),
    col14 varchar(255),
    col15 varchar(255),
    col16 varchar(255),
    col17 varchar(255),
    col18 varchar(255),
    col19 varchar(255),
    col20 varchar(255),
    col21 varchar(255),
    col22 varchar(255),
    col23 varchar(255),
    col24 varchar(255),
    col25 varchar(255),
    col26 varchar(255),
    col27 varchar(255),
    col28 varchar(255),
    col29 varchar(255),
    col30 varchar(255),
    col31 varchar(255),
    col32 varchar(255),
    col33 varchar(255),
    col34 varchar(255),
    col35 varchar(255),
    col36 varchar(255),
    col37 varchar(255),
    col38 varchar(255),
    col39 varchar(255),
    col40 varchar(255),
    col41 varchar(255),
    col42 varchar(255),
    col43 varchar(255),
    col44 varchar(255),
    col45 varchar(255),
    col46 varchar(255),
    col47 varchar(255),
    col48 varchar(255),
    col49 varchar(255),
    foreign key (client_id) references test_client(id)
);
create table if not exists test_join2 (
    id int auto_increment primary key,
    client_id int,
    col1 varchar(255),
    col2 varchar(255),
    col3 varchar(255),
    col4 varchar(255),
    col5 varchar(255),
    col6 varchar(255),
    col7 varchar(255),
    col8 varchar(255),
    col9 varchar(255),
    col10 varchar(255),
    col11 varchar(255),
    col12 varchar(255),
    col13 varchar(255),
    col14 varchar(255),
    col15 varchar(255),
    col16 varchar(255),
    col17 varchar(255),
    col18 varchar(255),
    col19 varchar(255),
    col20 varchar(255),
    col21 varchar(255),
    col22 varchar(255),
    col23 varchar(255),
    col24 varchar(255),
    col25 varchar(255),
    col26 varchar(255),
    col27 varchar(255),
    col28 varchar(255),
    col29 varchar(255),
    col30 varchar(255),
    col31 varchar(255),
    col32 varchar(255),
    col33 varchar(255),
    col34 varchar(255),
    col35 varchar(255),
    col36 varchar(255),
    col37 varchar(255),
    col38 varchar(255),
    col39 varchar(255),
    col40 varchar(255),
    col41 varchar(255),
    col42 varchar(255),
    col43 varchar(255),
    col44 varchar(255),
    col45 varchar(255),
    col46 varchar(255),
    col47 varchar(255),
    col48 varchar(255),
    col49 varchar(255),
    foreign key (client_id) references test_client(id)
);

SET @@cte_max_recursion_depth  = 1000001;

insert into test_client
WITH RECURSIVE nrows(id) AS (
SELECT 1 as id UNION ALL
SELECT id + 1 from nrows WHERE  id<=1000000
)
SELECT id, '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '' FROM nrows;

create index test_client_id on test_client (id);
[29 Mar 2024 9:41] Jakub Szafrański
Plans (explain & analyze) for the problematic query

Attachment: 1_plans-regular-query.txt (text/plain), 1.89 KiB.

[29 Mar 2024 9:41] Jakub Szafrański
Plans (explain & analyze) for the query without ORDER BY

Attachment: 2_plans-without-order-by.txt (text/plain), 1.67 KiB.

[29 Mar 2024 9:41] Jakub Szafrański
Plans (explain & analyze) for the query with disabled hash join (via a plan hinter)

Attachment: 3_plans-without-hash-join.txt (text/plain), 1.58 KiB.

[2 Apr 2024 9:26] MySQL Verification Team
Hello Jakub Szafrański,

Thank you for the report and feedback.
Could you please provide configuration files from both the instances(5.7, 8.3.0) if not on defaults? Thank you. 

regards,
Umesh
[2 Apr 2024 9:28] Jakub Szafrański
Hi Umesh,

Both are on default. I just ran the instances using:

docker run --rm --name mysql-8-demo -e MYSQL_ROOT_PASSWORD=root -v /Users/js/mysql-8-datadir:/var/lib/mysql -d -p 3308:3306 mysql:8

/Users/js/mysql-8-datadir was empty when running. Same for 5.7. It uses the default docker image with all the default configuration.
[5 Apr 2024 8:11] MySQL Verification Team
Hello Jakub Szafrański,

Thank you for the details.
With the provided test case I confirmed at our end that there is no regression in 8.0/8.3 compared to 5.7 but it is clear that without hash joins we get better performance(pls correct me if I'm missing anything). Verifying for now but if you think there is regression in 8.x then please do share exact test case to confirm. 

Joining test details shortly.

Sincerely,
Umesh
[5 Apr 2024 8:12] MySQL Verification Team
Test results - 5.7.44, 8.0.36 and 8.3.0

Attachment: 114512_5.7.44_8.0.36_8.3.0.results (application/octet-stream, text), 39.21 KiB.

[5 Apr 2024 8:30] Jakub Szafrański
Hi Umesh,

I understand why it's confusing if this should be considered a regression or not. From client POV this is a regression, because the same query on 5.7 went from 9 milliseconds to 5 seconds.

Even in your results you can see that the default query takes ~8 seconds to complete.

But I did check your results and indeed you get the same bad performance on 5.7 as well.

The issue appeared for us after upgrading to 8.x from 5.7 on exactly the same query, and the documentation states that hash joins are now preferred since 8.0.18: https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html

This means that most likely my steps to reproduce are flawed. In a real product the query became very slow without any changes, I must have changed something in the minimal reproduction that makes it slow on 5.7 as well. I will try to revamp my reproduction steps to better show the issue.
[6 Jun 2024 5:53] MySQL Verification Team
Hello Jakub,

Development has requested few details to proceed further on this. Please provide the exact version of 5.7 being used by you. Additionally, please share the following details for both versions (5.7 and 8.0) you are using:

• Status variables details
• Configuration variables
• The query used
• Explain result

regards,
Umesh
[17 Jun 2024 5:22] MySQL Verification Team
Hello Jakub,

This is just a follow up message.
Could you please provide earlier requested details? Development is awaiting for the response on this. Thank you.

regards,
Umesh
[17 Jun 2024 8:03] Jakub Szafrański
Hello Umesh,

Apologies, I missed the notification.

As for the queries and the plan - they're attached in the first comments to this bug report.

I just checked latest mysql version (8.4.0) and the issue still persist, so I will attach the result of `explain analyze` for the same query running at 8.4.0 in the next message.

As for the variables - like I said before, to reproduce the issue I'm running a fresh docker image with all the settings set to the defaults. 

I'll attach both variables anyway, but I just wanted to clarify that I'm specifically reproducing it on a fresh docker image each time to make sure I'm running everything on the default configuration.
[17 Jun 2024 8:03] Jakub Szafrański
Plans (explain & analyze) for the problematic query on 8.4.0

Attachment: hash-join-840.txt (text/plain), 955 bytes.

[17 Jun 2024 8:04] Jakub Szafrański
8.4.0 status variables

Attachment: status_variables.txt (text/plain), 255.49 KiB.

[17 Jun 2024 8:04] Jakub Szafrański
8.4.0 configuration variables

Attachment: variables.txt (text/plain), 405.62 KiB.

[17 Jun 2024 8:06] Jakub Szafrański
And just to recap,

The problematic query is:

```
explain analyze
select
    test_client.*,
    test_join1.*,
    test_join2.*
from
    test_client
    left join test_join1 on test_client.id = test_join1.client_id
    left join test_join2 on test_client.id = test_join2.client_id
order by test_client.id
limit 10;
```

which works in a few milliseconds on 5.7 but takes a few seconds on 8.x (we're observing the issue on aurora and we're able to reproduce it locally as well).

If I add `/*+ NO_BNL() */` to the query in 8.x the query takes a few ms to complete again.
[17 Jun 2024 8:57] MySQL Verification Team
Hello Jakub,

Thank you for providing the requested details, could you please confirm/provide the exact version of 5.7.x being used by you? Development wanted to ensure there is no regression from 5.7 to 8.x. Thank you. 

Sincerely,
Umesh
[17 Jun 2024 9:51] Jakub Szafrański
Hi Umesh,

I can't find what version oif 5.7 we were using when we didn't observe the issue. I just tried reproducing the issue on 5.7.42 and I couldn't, so this indicates that my reproduction steps aren't good enough.
[17 Jun 2024 9:54] MySQL Verification Team
Hello Jakub,

Thank you for your response.

regards,
Umesh