Bug #112035 Materializing performance_schema.data_locks can lead to excessive mem usage/OOM
Submitted: 10 Aug 2023 19:33 Modified: 19 Sep 9:05
Reporter: Marc Reilly Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S2 (Serious)
Version:8.0.37, 8.4.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: data_locks, OOM, performance schema, server crash

[10 Aug 2023 19:33] Marc Reilly
Description:
Issue 1: When materializing performance_schema.data_locks when there are a lot of locks held on the server can lead to a server exit due to OOM. This is quite simple to reproduce using ro queries on a server. 

Issue 2: The memory allocated here during materialization does not seem to be tracked in the performance schema memory instrumentation, making it difficult to troubleshoot where memory is being allocated on a live server. 

Here I'm using a 10000000 row table to demonstrate on a r5.large ec2 instance (2 vcpu, 16GiB memory ), to repro you can adjust the number of rows/locks taken. 
Steps:
# generate sysbench data and enable P_S instruments
# Based on my above specs I'm using 10 tables with 10000000 rows. 
sysbench /usr/local/share/sysbench/oltp_write_only.lua --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-socket=/tmp/mysql_sandbox8034.sock --mysql-port=8034 --threads=10 --rand-type=uniform --db-ps-mode=disable --tables=10 --forced-shutdown --table-size=10000000 --create_secondary=0 prepare
update performance_schema.setup_consumers set enabled=1;
update performance_schema.setup_instruments set enabled=1;

Open 3 sessions:
# Execute 1st and leave open
Session 1:
PAGER tail
begin;
select * from sbtest1 for update;
<do the same for sbtest2,3 etc for more rows>

# Run after session 1
Session 2: 
SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'test' LIMIT 100;

# Observe system memory usage while session 2 is running
Session 3:
while sleep 5; do sandboxes/msb_8_0_34/use -e "select * from sys.memory_global_by_current_bytes limit 10";sudo dmesg | grep "Out of memory"; free -m;sleep 3;done

How to repeat:
Session 1: Run a query which generates a lot of row locks. Here I'm demonstrating using sysbench with select for update
PAGER tail
begin;
select * from sbtest1 for update;
<do the same for sbtest2,3 etc for more rows>

Session 2: Once session 1 is finished gathering the locks, run a query against performance_schema.data_locks in session 2 to get info on locks against the test schema. This will fail and cause server to OOM. 
mysql [localhost:8034] {msandbox} (test) > SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'test' LIMIT 100;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql_sandbox8034.sock' (111)
ERROR:
Can't connect to the server

Session 3: Here, once session 2 starts executing, I'm checking OS level stats and performance schema. Note how free -m is reporting the drop in mem until we OOM, performance schema is not changing at all. 
 while sleep 5; do sandboxes/msb_8_0_34/use -e "select * from sys.memory_global_by_current_bytes limit 10";sudo dmesg | grep "Out of memory"; free -m;sleep 3;done
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                                                                  | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/innodb/buf_buf_pool                                                  |            80 | 10.22 GiB     | 130.88 MiB        |         80 | 10.22 GiB  | 130.88 MiB     |
| memory/performance_schema/events_statements_summary_by_digest               |             1 | 40.28 MiB     | 40.28 MiB         |          1 | 40.28 MiB  | 40.28 MiB      |
| memory/innodb/ut0link_buf                                                   |             2 | 24.00 MiB     | 12.00 MiB         |          2 | 24.00 MiB  | 12.00 MiB      |
| memory/innodb/log_buffer_memory                                             |             1 | 16.00 MiB     | 16.00 MiB         |          1 | 16.00 MiB  | 16.00 MiB      |
| memory/performance_schema/events_statements_history_long                    |             1 | 14.34 MiB     | 14.34 MiB         |          1 | 14.34 MiB  | 14.34 MiB      |
| memory/performance_schema/events_errors_summary_by_thread_by_error          |           257 | 12.81 MiB     | 51.04 KiB         |        257 | 12.81 MiB  | 51.04 KiB      |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |             1 | 11.55 MiB     | 11.55 MiB         |          1 | 11.55 MiB  | 11.55 MiB      |
| memory/performance_schema/events_statements_summary_by_digest.digest_text   |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.digest_text        |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.sql_text           |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
               total        used        free      shared  buff/cache   available
Mem:           15779        5445        8019           0        2314       10074
Swap:              0           0           0
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                                                                  | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/innodb/buf_buf_pool                                                  |            80 | 10.22 GiB     | 130.88 MiB        |         80 | 10.22 GiB  | 130.88 MiB     |
| memory/performance_schema/events_statements_summary_by_digest               |             1 | 40.28 MiB     | 40.28 MiB         |          1 | 40.28 MiB  | 40.28 MiB      |
| memory/innodb/ut0link_buf                                                   |             2 | 24.00 MiB     | 12.00 MiB         |          2 | 24.00 MiB  | 12.00 MiB      |
| memory/innodb/log_buffer_memory                                             |             1 | 16.00 MiB     | 16.00 MiB         |          1 | 16.00 MiB  | 16.00 MiB      |
| memory/performance_schema/events_statements_history_long                    |             1 | 14.34 MiB     | 14.34 MiB         |          1 | 14.34 MiB  | 14.34 MiB      |
| memory/performance_schema/events_errors_summary_by_thread_by_error          |           257 | 12.81 MiB     | 51.04 KiB         |        257 | 12.81 MiB  | 51.04 KiB      |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |             1 | 11.55 MiB     | 11.55 MiB         |          1 | 11.55 MiB  | 11.55 MiB      |
| memory/performance_schema/events_statements_summary_by_digest.digest_text   |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.digest_text        |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.sql_text           |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
               total        used        free      shared  buff/cache   available
Mem:           15779        7267        6198           0        2314        8253
Swap:              0           0           0
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                                                                  | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/innodb/buf_buf_pool                                                  |            80 | 10.22 GiB     | 130.88 MiB        |         80 | 10.22 GiB  | 130.88 MiB     |
| memory/performance_schema/events_statements_summary_by_digest               |             1 | 40.28 MiB     | 40.28 MiB         |          1 | 40.28 MiB  | 40.28 MiB      |
| memory/innodb/ut0link_buf                                                   |             2 | 24.00 MiB     | 12.00 MiB         |          2 | 24.00 MiB  | 12.00 MiB      |
| memory/innodb/log_buffer_memory                                             |             1 | 16.00 MiB     | 16.00 MiB         |          1 | 16.00 MiB  | 16.00 MiB      |
| memory/performance_schema/events_statements_history_long                    |             1 | 14.34 MiB     | 14.34 MiB         |          1 | 14.34 MiB  | 14.34 MiB      |
| memory/performance_schema/events_errors_summary_by_thread_by_error          |           257 | 12.81 MiB     | 51.04 KiB         |        257 | 12.81 MiB  | 51.04 KiB      |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |             1 | 11.55 MiB     | 11.55 MiB         |          1 | 11.55 MiB  | 11.55 MiB      |
| memory/performance_schema/events_statements_summary_by_digest.digest_text   |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.digest_text        |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.sql_text           |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
               total        used        free      shared  buff/cache   available
Mem:           15779       10261        3203           0        2314        5258
Swap:              0           0           0

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql_sandbox8034.sock' (111)
[  954.783966] Out of memory: Killed process 4993 (mysqld) total-vm:34598780kB, anon-rss:15830440kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:31688kB oom_score_adj:0
               total        used        free      shared  buff/cache   available
Mem:           15779         192       15521           0          66       15390
Swap:              0           0           0

Suggested fix:
restrict the amount of memory the data_locks table can use during materialization. The query should fail or memory be reused/spill when it passes a certain threshold - it should not use unlimited memory by default which can lead to crashing the server. 

https://github.com/mysql/mysql-server/blob/8.0/storage/perfschema/pfs_data_lock.cc#L261
[16 Aug 2023 9:00] MySQL Verification Team
Hello Marc Reilly,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[16 Aug 2023 9:06] MySQL Verification Team
-- verified on

mysql> system hostname -f
support-cluster03.regionaliad02.mysql2iad.oraclevcn.com
mysql>
mysql> system free -t -g
              total        used        free      shared  buff/cache   available
Mem:            117           5          53           5          58         105
Swap:             7           2           5
Total:          125           8      

--
rm -rf 112035/
bin/mysqld --no-defaults --initialize-insecure --basedir=$PWD --datadir=$PWD/112035 --log-error-verbosity=3
bin/mysqld_safe --no-defaults --mysqld-version='' --basedir=$PWD --datadir=$PWD/112035 --core-file --socket=/tmp/mysql.sock  --port=3306 --log-error=$PWD/112035/log.err --mysqlx-port=33330 --mysqlx-socket=/tmp/mysql_x_ushastry.sock --log-error-verbosity=3  --secure-file-priv="" --local-infile=1  2>&1 &

[umshastr@support-cluster03:/export/home/tmp/ushastry/mysql-8.0.34]$ bin/mysql -uroot -S /tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.34 MySQL Community Server - GPL

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql>

-- sysbench

bin/sysbench ./share/sysbench/oltp_write_only.lua --db-driver=mysql --mysql-user=root --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-port=8034 --threads=10 --rand-type=uniform --db-ps-mode=disable --tables=10 --forced-shutdown --table-size=10000000 --create_secondary=0 prepare

 bin/sysbench ./share/sysbench/oltp_write_only.lua --db-driver=mysql --mysql-user=root --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-port=8034 --threads=10 --rand-type=uniform --db-ps-mode=disable --tables=10 --forced-shutdown --table-size=10000000 --create_secondary=0 prepare
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Initializing worker threads...

Creating table 'sbtest5'...
Creating table 'sbtest1'...
Creating table 'sbtest3'...Creating table 'sbtest2'...Creating table 'sbtest6'...

Creating table 'sbtest8'...
Creating table 'sbtest9'...
Creating table 'sbtest7'...
Creating table 'sbtest10'...
Creating table 'sbtest4'...
Inserting 10000000 records into 'sbtest7'
Inserting 10000000 records into 'sbtest6'
Inserting 10000000 records into 'sbtest1'
Inserting 10000000 records into 'sbtest2'
Inserting 10000000 records into 'sbtest9'
Inserting 10000000 records into 'sbtest8'
Inserting 10000000 records into 'sbtest5'
Inserting 10000000 records into 'sbtest4'
Inserting 10000000 records into 'sbtest3'
Inserting 10000000 records into 'sbtest10'

-- follow steps

-- memory usage before and after

MEM     TIME+ COMMAND
23107 umshastr  20   0 3417196 694620  37260 S 105.1  0.6  27:36.77 mysqld

.

8604 total,  5535272 free,  2853332 used. 98281752 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23107 umshastr  20   0   16.5g  12.2g  37260 S 100.0 10.4  27:57.40 mysqld
.
23107 umshastr  20   0   82.8g  48.9g  37260 S 100.7 41.7  29:03.56 mysqld
[21 Nov 2023 10:20] Edward Gilmore
Added the following note to the MySQL Server 8.3.0 release notes:
	
Executing SELECT * from performance_schema.data_locks on a server under heavy
load could cause MySQL to consume too much memory and close unexpectedly.
As of this release, memory used executing such a query is now instrumented with
memory/performance_schema/data_container, enabling you to observe memory consumption.
[21 Nov 2023 16:06] Marc Reilly
Hi,
This bug also affects the 8.0 stream, will it be backported there too?
Thanks,
Marc
[6 Dec 2023 9:41] Edward Gilmore
Added to the 8.0.37 release notes.
[17 Jan 2:16] Marc Reilly
FYI that I'm still seeing the crashes in 8.3:
```
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2024-01-17T02:08:32Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=997abe13436676064035a91d553a37e323af7eb4
Thread pointer: 0x7fe9d000dce0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe9fc3f8bc0 thread_stack 0x100000
 #0 0x7fea14454dcf <unknown>
 #1 0x7fea144a153c <unknown>
 #2 0x7fea14454d25 <unknown>
 #3 0x7fea144287f2 <unknown>
 #4 0x7fea148a19e5 <unknown>
 #5 0x7fea148ad04b <unknown>
 #6 0x7fea148ad0b6 <unknown>
 #7 0x7fea148ad318 <unknown>
 #8 0x931f8d <unknown>
 #9 0x16aa177 <unknown>
 #10 0x16aa495 <unknown>
 #11 0x1c70436 <unknown>
 #12 0x1c106d3 <unknown>
 #13 0xd05c8b <unknown>
 #14 0xf9ac18 <unknown>
 #15 0xf90cdc <unknown>
 #16 0xb77597 <unknown>
 #17 0xb7785f <unknown>
 #18 0xafdbda <unknown>
 #19 0xafd002 <unknown>
 #20 0xadd100 <unknown>
 #21 0xae0aa0 <unknown>
 #22 0xab0ec6 <unknown>
 #23 0xab18ca <unknown>
 #24 0xbffd26 <unknown>
 #25 0x1c2e5a1 <unknown>
 #26 0x7fea1449f7f1 <unknown>
 #27 0x7fea1443f44f <unknown>
 #28 0xffffffffffffffff <unknown>

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe9d001c700): SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'test' LIMIT 100
Connection ID (thread ID): 13
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
```

Interestingly, 8.0.36 now throws an error, but this seems to introduce an assertion failure on subsequent runs. I have opened a bug report here for this assertion failure: https://bugs.mysql.com/bug.php?id=113656
[23 Jan 8:59] Jakub Lopuszanski
Hi Marc,
can you please clarify what repro steps are you using for 8.3?
[24 Jan 7:00] Marc Reilly
Hi Jakub,
here are the steps, same as above:

resource: r5.large (2vcpu 16GiB)
OS: Amazon linux 2023
Linux 6.1.72-96.166.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jan 17 00:42:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

sudo yum install https://dev.mysql.com/get/mysql80-community-release-el9-3.noarch.rpm -y
sudo yum update -y
sudo yum-config-manager --disable mysql80-community
sudo yum-config-manager --enable mysql-innovation-community
sudo yum install mysql mysql-server -y
sudo service mysqld start

# prepare sysbench
create database sysbench;
update performance_schema.setup_consumers set enabled=1;
update performance_schema.setup_instruments set enabled=1;

# load sysbench data
sysbench /usr/local/share/sysbench/oltp_write_only.lua --db-driver=mysql --mysql-user=root --mysql-password="Passw0rd(" --mysql-db=sysbench --mysql-socket=/var/lib/mysql/mysql.sock --mysql-port=3306 --threads=10 --rand-type=uniform --db-ps-mode=disable --tables=10 --forced-shutdown --table-size=10000000 --create_secondary=0 prepare

# repro crash using 3 separate terminals:
Session 1:
use sysbench;
PAGER tail
begin;
select * from sbtest1 for update;
select * from sbtest2 for update;
select * from sbtest3 for update;
select * from sbtest4 for update;

# Run after session 1, will lead to bad alloc crash
Session 2: 
SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'sysbench' LIMIT 100;

# Observe system memory usage while session 2 is running
Session 3:
while sleep 5; do mysql -uroot -p"Passw0rd(" -e "select * from sys.memory_global_by_current_bytes limit 10";sudo dmesg | grep "Out of memory"; free -m;done

PFS before crash, stack will be the same as mentioned above(pasted again below[1]:
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                                                                  | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/performance_schema/data_container                                    |        161870 | 9.26 GiB      | 59.98 KiB         |     161870 | 13.88 GiB  | 89.93 KiB      |
| memory/innodb/buf_buf_pool                                                  |             1 | 130.88 MiB    | 130.88 MiB        |          1 | 130.88 MiB | 130.88 MiB     |
| memory/performance_schema/events_statements_summary_by_digest               |             1 | 40.28 MiB     | 40.28 MiB         |          1 | 40.28 MiB  | 40.28 MiB      |
| memory/innodb/ut0link_buf                                                   |             2 | 24.00 MiB     | 12.00 MiB         |          2 | 24.00 MiB  | 12.00 MiB      |
| memory/innodb/log_buffer_memory                                             |             1 | 16.00 MiB     | 16.00 MiB         |          1 | 16.00 MiB  | 16.00 MiB      |
| memory/performance_schema/events_statements_history_long                    |             1 | 14.42 MiB     | 14.42 MiB         |          1 | 14.42 MiB  | 14.42 MiB      |
| memory/performance_schema/events_errors_summary_by_thread_by_error          |           257 | 13.61 MiB     | 54.21 KiB         |        257 | 13.61 MiB  | 54.21 KiB      |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |             1 | 11.60 MiB     | 11.60 MiB         |          1 | 11.60 MiB  | 11.60 MiB      |
| memory/performance_schema/events_statements_summary_by_digest.digest_text   |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
| memory/performance_schema/events_statements_history_long.sql_text           |             1 | 9.77 MiB      | 9.77 MiB          |          1 | 9.77 MiB   | 9.77 MiB       |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
               total        used        free      shared  buff/cache   available
Mem:           15779       10969        1948           0        2862        4510
Swap:              0           0           0
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104

[1]
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2024-01-24T06:54:33Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=997abe13436676064035a91d553a37e323af7eb4
Thread pointer: 0x7fa810001580
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fa8383f8bc0 thread_stack 0x100000
 #0 0x7fa850e54dcf <unknown>
 #1 0x7fa850ea153c <unknown>
 #2 0x7fa850e54d25 <unknown>
 #3 0x7fa850e287f2 <unknown>
 #4 0x7fa8512a19e5 <unknown>
 #5 0x7fa8512ad04b <unknown>
 #6 0x7fa8512ad0b6 <unknown>
 #7 0x7fa8512ad318 <unknown>
 #8 0x931f8d <unknown>
 #9 0x16aa177 <unknown>
 #10 0x16aa495 <unknown>
 #11 0x1c70436 <unknown>
 #12 0x1c106d3 <unknown>
 #13 0xd05c8b <unknown>
 #14 0xf9ac18 <unknown>
 #15 0xf90cdc <unknown>
 #16 0xb77597 <unknown>
 #17 0xb7785f <unknown>
 #18 0xafdbda <unknown>
 #19 0xafd002 <unknown>
 #20 0xadd100 <unknown>
 #21 0xae0aa0 <unknown>
 #22 0xab0ec6 <unknown>
 #23 0xab18ca <unknown>
 #24 0xbffd26 <unknown>
 #25 0x1c2e5a1 <unknown>
 #26 0x7fa850e9f7f1 <unknown>
 #27 0x7fa850e3f44f <unknown>
 #28 0xffffffffffffffff <unknown>

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fa81012bb10): SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'sysbench' LIMIT 100
Connection ID (thread ID): 8
Status: NOT_KILLED
[24 Jan 9:59] Jakub Lopuszanski
Thank you very much for detailed repo steps!

So, IIUC, Session 1 is acquiring locks on around 4*10M rows - obviously materializing info about all of them would be problematic, but Session 2 tries to list just 100 of them, yet p_s logic is unnecessarily materializing all rows, even though just 100 were needed.

So, one tempting way to fix it is to somehow "push down" the information about LIMIT 100 to the p_s module. But that would be just a band-aid, only helpful to kind people like you, who remember to add LIMITs to their SELECTs. But what about users who don't? They'd still get OOM.

So, another solution would be to somehow stream all of that in chunks.
AFAIU p_s already does "stream it in chunks", the problem is that "a chunk" is variable-length as it is defined in terms of transactions, as opposed to rows - so, a single chunk contains all locks of a few transactions. If any of the transactions takes makes a lot of lock requests, as is the case here, then all of them must end up in a single chunk.

So, another idea would be to somehow redefine "the chunk", so that we can upper bound its size.
One such idea, (which rots in my drawer) was to iterate over lock_sys's hash buckets, so "a chunk" = "all lock requests which hash into the same bucket".
This of course has it's own failure mode: if a given bucket is very long, then the chunk will be long, too.
The hash function is based on b-tree page id, so all lock requests for the same leaf page (and any other pages which have same hash) would end up in the same bucket. That might be fine - there is only so many rows you can have in a single page, and only so many concurrent transactions, and so many different types of locks, and collisions among page ids of hot pages should be rare in real world setting. These are all bounded numbers, as opposed to the number of rows, which can grow unbounded.
There were other problems with this solution, though, among them, that the number of buckets is so huge, that iterating over them takes a lot of time, because even though most of them are empty, you still have to somehow determine that in a thread-safe way. One way would be to use atomic counters (so we don't need mutexes nor rw-locks), but I've tried various ways to keep these stats in cache-friendly way, but still the mere need to fetch all the cache lines with the stats seemed to work too slow. Perhaps I should revisit this idea with some hierarchical data-structure with partial sums, or non-zero-indicators...

Also, I know Marc Alff is also working on reimplementing p_s to be more "streamlined", so perhaps his ideas could solve this bug, too.
[24 Jan 9:59] Marc ALFF
Per testing done by Marc Reilly (Thanks),
we now see this in 8.3:

+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                                                                  | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/performance_schema/data_container                                    |        161870 | 9.26 GiB      | 59.98 KiB         |     161870 | 13.88 GiB  | 89.93 KiB      |

This proves two things:
- first, this is indeed the 8.3 binary with the previous fix, because "memory/performance_schema/data_container" is visible.
- second, the data container allocates way too much memory during a scan, and/or is not purged between incremental scans.

Rework on this issue is needed, reopening bug.
[21 Feb 1:00] Charles Thompson
Is this bug actually fixed in 8.0.37 or is it broken as well since this bug was re-opened for 8.3?
[2 May 1:42] Marc Reilly
@Charles Thompson : I confirmed that 8.0.37 shows same behavior. The wait event is added in 8.0.37 which allows you to monitor the memory usage, but std::bad_alloc/OOM will still occur.

8.0.37 
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2024-05-02T00:48:40Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=760904bb5c412602b3f2c1b7241648b1a2970a7b
Thread pointer: 0x7f1b4800ee80
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f1b50408b30 thread_stack 0x100000
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x21211bd]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(print_fatal_signal(int)+0x37f) [0xfe111f]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(handle_fatal_signal+0xa5) [0xfe11d5]
/lib64/libc.so.6(+0x54dd0) [0x7f1b7d254dd0]
/lib64/libc.so.6(+0xa153c) [0x7f1b7d2a153c]
/lib64/libc.so.6(raise+0x16) [0x7f1b7d254d26]
/lib64/libc.so.6(abort+0xd3) [0x7f1b7d2287f3]
/lib64/libstdc++.so.6(+0xa19e6) [0x7f1b7d6a19e6]
/lib64/libstdc++.so.6(+0xad04c) [0x7f1b7d6ad04c]
/lib64/libstdc++.so.6(+0xad0b7) [0x7f1b7d6ad0b7]
/lib64/libstdc++.so.6(+0xad319) [0x7f1b7d6ad319]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(PFS_data_lock_container::add_lock_row(char const*, unsigned long, char const*, unsigned long, unsigned long long, unsigned long long, unsigned long long, char const*, unsigned long, char const*, unsigned long, char const*, unsigned long, char const*, unsigned long, char const*, unsigned long, void const*, char const*, char const*, char const*, char const*)+0) [0x2859f20]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(PFS_data_lock_container::add_lock_row(char const*, unsigned long, char const*, unsigned long, unsigned long long, unsigned long long, unsigned long long, char const*, unsigned long, char const*, unsigned long, char const*, unsigned long, char const*, unsigned long, char const*, unsigned long, void const*, char const*, char const*, char const*, char const*)+0x4d0) [0x285a3f0]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(Innodb_data_lock_iterator::scan_trx(PSI_server_data_lock_container*, bool, trx_t const*, bool, unsigned long, unsigned long)+0x474) [0x21c98c4]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(Innodb_data_lock_iterator::scan(PSI_server_data_lock_container*, bool)+0x2bd) [0x21ca53d]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(table_data_locks::rnd_next()+0x8a) [0x27fe4ea]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(ha_perfschema::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x96) [0x27bef16]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x2af) [0x10f4e6f]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(RefIterator<false>::Read()+0x134) [0x13beed4]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(LimitOffsetIterator::Read()+0x7d) [0x13b0a2d]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x293) [0xf4c303]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(Query_expression::execute(THD*)+0x2c) [0xf4c69c]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x1e4) [0xecce64]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(mysql_execute_command(THD*, bool)+0xb3f) [0xe7310f]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x520) [0xe76da0]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0xdee) [0xe7813e]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld(do_command(THD*)+0x21e) [0xe7a51e]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld() [0xfd13e8]
/home/ec2-user/opt/mysql/8.0.37/bin/mysqld() [0x27c86e5]
/lib64/libc.so.6(+0x9f7f2) [0x7f1b7d29f7f2]
/lib64/libc.so.6(+0x3f450) [0x7f1b7d23f450]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f1b480dbbb0): SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'sysbench' LIMIT 100
Connection ID (thread ID): 11
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

I also tried setting connection_memory_limit but that does not take effect here:
mysql [localhost:8037] {msandbox2} ((none)) > select @@global_connection_memory_tracking,@@connection_memory_limit;
+-------------------------------------+---------------------------+
| @@global_connection_memory_tracking | @@connection_memory_limit |
+-------------------------------------+---------------------------+
|                                   1 |                  20971520 |
+-------------------------------------+---------------------------+
1 row in set (0.00 sec)

mysql [localhost:8037] {msandbox2} ((none)) > SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'sysbench' LIMIT 100;
<crash>
+------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                               | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/performance_schema/data_container |        111150 | 13.88 GiB     | 130.93 KiB        |     464267 | 13.88 GiB  | 31.35 KiB      |
+------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
[2 May 1:42] Marc Reilly
8.4 is the same:
mysql [localhost:8400] {msandbox2} ((none)) > select @@global_connection_memory_tracking,@@connection_memory_limit;
+-------------------------------------+---------------------------+
| @@global_connection_memory_tracking | @@connection_memory_limit |
+-------------------------------------+---------------------------+
|                                   1 |                  20971520 |
+-------------------------------------+---------------------------+
1 row in set (0.00 sec)

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2024-05-02T01:30:41Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=f1df040df33f237c18376119eef189c9b25f0c90
Thread pointer: 0x7fb443a2ac50
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fb43c3f8b30 thread_stack 0x100000
 #0 0x103ff76 print_fatal_signal at mysql-8.4.0/sql/signal_handler.cc:319
 #1 0x10402ec _Z19handle_fatal_signaliP9siginfo_tPv at mysql-8.4.0/sql/signal_handler.cc:399
 #2 0x7fb454654dcf <unknown> at sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
 #3 0x7fb4546a153c __pthread_kill_implementation at /usr/src/debug/glibc-2.34-52.amzn2023.0.9.x86_64/nptl/pthread_kill.c:44
 #4 0x7fb454654d25 __GI_raise at sysdeps/posix/raise.c:26
 #5 0x7fb4546287f2 __GI_abort at /usr/src/debug/glibc-2.34-52.amzn2023.0.9.x86_64/stdlib/abort.c:79
 #6 0x7fb454aa19e5 <unknown> at libstdc++-v3/libsupc++/vterminate.cc:95
 #7 0x7fb454aad04b <unknown> at libstdc++-v3/libsupc++/eh_terminate.cc:48
 #8 0x7fb454aad0b6 <unknown> at libstdc++-v3/libsupc++/eh_terminate.cc:58
 #9 0x7fb454aad318 <unknown> at libstdc++-v3/libsupc++/eh_throw.cc:95
 #10 0x293c04f _ZN17PFS_std_allocatorI13row_data_lockE8allocateEm at mysql-8.4.0/storage/perfschema/pfs_std_allocator.h:55
 #11 0x293c04f _ZNSt16allocator_traitsI28PFS_data_container_allocatorI13row_data_lockEE8allocateERS2_m at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/alloc_traits.h:318
 #12 0x293c04f _ZNSt12_Vector_baseI13row_data_lock28PFS_data_container_allocatorIS0_EE11_M_allocateEm at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/stl_vector.h:346
 #13 0x293c04f _ZNSt6vectorI13row_data_lock28PFS_data_container_allocatorIS0_EE17_M_realloc_insertIJRKS0_EEEvN9__gnu_cxx17__normal_iteratorIPS0_S3_EEDpOT_ at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/vector.tcc:440
 #14 0x293c51f _ZNSt6vectorI13row_data_lock28PFS_data_container_allocatorIS0_EE9push_backERKS0_ at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/stl_vector.h:1198
 #15 0x293c51f _ZN23PFS_data_lock_container12add_lock_rowEPKcmS1_myyyS1_mS1_mS1_mS1_mS1_mPKvS1_S1_S1_S1_ at mysql-8.4.0/storage/perfschema/pfs_data_lock.cc:333
 #16 0x2262683 _ZN25Innodb_data_lock_iterator8scan_trxEP30PSI_server_data_lock_containerbPK5trx_tbmm at mysql-8.4.0/storage/innobase/handler/p_s.cc:897
 #17 0x22629e3 _ZN25Innodb_data_lock_iterator13scan_trx_listI12ut_list_baseI5trx_t28ut_list_base_explicit_getterIS2_XadL_ZNS2_8trx_listEEEEEEEmP30PSI_server_data_lock_containerbPT_ at mysql-8.4.0/storage/innobase/handler/p_s.cc:765
 #18 0x22629e3 _ZN25Innodb_data_lock_iterator4scanEP30PSI_server_data_lock_containerb at mysql-8.4.0/storage/innobase/handler/p_s.cc:681
 #19 0x28de349 _ZN16table_data_locks8rnd_nextEv at mysql-8.4.0/storage/perfschema/table_data_locks.cc:189
 #20 0x289b3c5 _ZN13ha_perfschema10index_readEPhPKhj16ha_rkey_function at mysql-8.4.0/storage/perfschema/ha_perfschema.cc:1991
 #21 0x289b3c5 _ZN13ha_perfschema10index_readEPhPKhj16ha_rkey_function at mysql-8.4.0/storage/perfschema/ha_perfschema.cc:1966
 #22 0x116aa6e _ZN7handler14index_read_mapEPhPKhm16ha_rkey_function at mysql-8.4.0/sql/handler.h:5623
 #23 0x116aa6e _ZN7handler17ha_index_read_mapEPhPKhm16ha_rkey_function at mysql-8.4.0/sql/handler.cc:3289
 #24 0x1462023 _ZN11RefIteratorILb0EE4ReadEv at mysql-8.4.0/sql/iterators/ref_row_iterators.cc:381
 #25 0x144b6bc _ZN19LimitOffsetIterator4ReadEv at mysql-8.4.0/sql/iterators/composite_iterators.cc:175
 #26 0xf9ecca _ZN16Query_expression20ExecuteIteratorQueryEP3THD at mysql-8.4.0/sql/sql_union.cc:1778
 #27 0xf9f08b _ZN16Query_expression7executeEP3THD at mysql-8.4.0/sql/sql_union.cc:1834
 #28 0xf1de6a _ZN11Sql_cmd_dml13execute_innerEP3THD at mysql-8.4.0/sql/sql_select.cc:1059
 #29 0xf29370 _ZN11Sql_cmd_dml7executeEP3THD at mysql-8.4.0/sql/sql_select.cc:782
 #30 0xec2f30 _Z21mysql_execute_commandP3THDb at mysql-8.4.0/sql/sql_parse.cc:4737
 #31 0xec6fff _Z20dispatch_sql_commandP3THDP12Parser_state at mysql-8.4.0/sql/sql_parse.cc:5392
 #32 0xec9a70 _Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command at mysql-8.4.0/sql/sql_parse.cc:2136
 #33 0xeca685 _Z10do_commandP3THD at mysql-8.4.0/sql/sql_parse.cc:1465
 #34 0x102fbdf handle_connection at mysql-8.4.0/sql/conn_handler/connection_handler_per_thread.cc:304
 #35 0x28a5084 pfs_spawn_thread at mysql-8.4.0/storage/perfschema/pfs.cc:3051
 #36 0x7fb45469f7f1 start_thread at /usr/src/debug/glibc-2.34-52.amzn2023.0.9.x86_64/nptl/pthread_create.c:443
 #37 0x7fb45463f44f clone3 at sysdeps/unix/sysv/linux/x86_64/clone3.S:81
 #38 0xffffffffffffffff <unknown>

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fb443b91560): SELECT * FROM performance_schema.data_locks WHERE OBJECT_SCHEMA = 'sysbench' LIMIT 100
Connection ID (thread ID): 12
Status: NOT_KILLED
[19 Sep 9:05] Edward Gilmore
Posted by developer:
 
Added the following note to the MySQL Server 8.0.40 release notes:
	
Redesigned the performance schema data_locks and data_lock_waits tables so
that querying them does not require an exclusive global mutex on the
transaction or lock system. It now iterates over buckets of hash tables
that hold the locks to only latch the actively processed shard, when
previously it iterated over the transactions. This also improves the
iteration logic complexity in terms of speed and memory to decrease the
impact of these queries on the rest of the system.

Note that the query result might show an incomplete list of transaction locks
if it committed, started, or otherwise changed the set of owned locks
in-between visiting two buckets. This differs from previous behavior which
always showed a consistent snapshot of locks held by individual
transactions, although two different transactions could have been
presented at different moments. In other words, the new approach gives a
consistent view of a single wait queue to show conflicting locks with a
waiting lock because they are always in the same bucket, while the old
approach could miss some of them because they belonged to other
transactions. The old approach would always show all the other locks held
by a reported transaction but could miss locks of other transactions even
if they were conflicting.