Bug #120499 Recursive stored procedure with max_sp_recursion_depth >= 250 crashes mysqld
Submitted: 20 May 5:02
Reporter: miao runpei Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Stored Routines Severity:S2 (Serious)
Version:9.6.0 OS:Linux
Assigned to: CPU Architecture:x86 (AMD EPYC 9K65)
Tags: asan, check_stack_overrun, crash, max_sp_recursion_depth, recursion, regression, sp_head, Stack Overflow, stored procedure

[20 May 5:02] miao runpei
Description:
MySQL 9.6.0 mysqld crashes (AddressSanitizer: stack-overflow) when a
stored procedure calls itself recursively and the session variable
max_sp_recursion_depth has been raised to a value approximately >= 250.

The standard recursion guard ER_SP_RECURSION_LIMIT (1456) correctly
rejects the call for depths up to 128, but for depths in the [250..255]
range the server starts executing and exhausts the 1 MiB default
thread_stack after roughly 240-255 recursive frames, terminating the
worker thread via SIGSEGV.  mysqld_safe restarts the server, but every
in-flight connection observes "Lost connection to MySQL server during
query (ER 2013)".

Documented maximum for max_sp_recursion_depth is 255 (see the manual at
[1]), so the value used in the PoC is fully within the documented
range — the server should either honour it safely or reject it up
front, not crash.

This is a regression from 8.4.9, which correctly returns
ER_STACK_OVERRUN_NEED_MORE (1436) for the same PoC on the same default
thread_stack and remains up.

Discovered by automated SP fuzzing (custom mutator that injects
"SET max_sp_recursion_depth = 255" + recursive CALL into seed
routines).  First hit on a 1-hour ASAN fuzz session; minimised to a
14-line deterministic reproducer.

[1] https://dev.mysql.com/doc/refman/9.0/en/server-system-variables.html#sysvar_max_sp_recursi...

Build:
  mysqld  Ver 9.6.0-asan for Linux on x86_64 (Source distribution)
  cmake flags: -DWITH_ASAN=ON -DWITH_ASAN_SCOPE=ON
               -DOPTIMIZE_SANITIZER_BUILDS=ON
               -DCMAKE_BUILD_TYPE=RelWithDebInfo
               (clang/clang++)
  thread_stack: 1048576 (default)
  OS: TencentOS Server 4.4 / Linux 6.6.98 x86_64
  CPU: AMD EPYC 9K65

ASAN summary:
  SUMMARY: AddressSanitizer: stack-overflow sql/sql_parse.cc:3031
           in mysql_execute_command(THD*, bool)

Crash signature — a ~9-frame block repeats ~27 times for 246 total
frames:
  mysql_execute_command            sql/sql_parse.cc:3031
  sp_instr_stmt::exec_core         sql/sp_instr.cc:1102
  sp_lex_instr::reset_lex_and_exec_core         sql/sp_instr.cc:467
  sp_lex_instr::validate_lex_and_execute_core   sql/sp_instr.cc:788
  sp_instr_stmt::execute           sql/sp_instr.cc:1028
  sp_head::execute                 sql/sp_head.cc:2252
  sp_head::execute_procedure       sql/sp_head.cc:3129
  Sql_cmd_call::execute_inner      sql/sql_call.cc:233
  Sql_cmd_dml::execute             sql/sql_select.cc:798
  (loop)

Reproduced 3/3 on a freshly restarted server.  Full ASAN report
attached separately.

SUMMARY: AddressSanitizer: stack-overflow /root/mysql-server/sql/sql_parse.cc:3031 in mysql_execute_command(THD*, bool)
Thread T55 created by T0 here:
    #0 0x559627350cc6 in pthread_create (/root/mysql-asan/install/bin/mysqld+0x1a0cc6) (BuildId: 47f91daa8e0e24a8921dc11531a45ff1d3c1caea)
    #1 0x55962ad40341 in pfs_spawn_thread_vc(unsigned int, unsigned int, my_thread_handle*, pthread_attr_t const*, void* (*)(void*), void*) /root/mysql-server/storage/perfschema/pfs.cc:3116:22
    #2 0x559627a5d793 in Per_thread_connection_handler::add_connection(Channel_info*) /root/mysql-server/sql/conn_handler/connection_handler_per_thread.cc:421:7
    #3 0x559627c5b8a9 in Connection_handler_manager::process_new_connection(Channel_info*) /root/mysql-server/sql/conn_handler/connection_handler_manager.cc:265:29
    #4 0x5596273bd49a in Connection_acceptor<Mysqld_socket_listener>::connection_event_loop() /root/mysql-server/sql/conn_handler/connection_acceptor.h:66:41
    #5 0x5596273b0eec in mysqld_main(int, char**) /root/mysql-server/sql/mysqld.cc:10254:27
    #6 0x7f5c77c0e5df in __libc_start_call_main (/lib64/libc.so.6+0x25df) (BuildId: c8073ac2381f4976e9fe7bad4a1add4e7930e51c)

==mysqld==448391==ABORTING

How to repeat:
Server-side setup (any default install; the bug needs only the default
1 MiB thread_stack):

    mysqld --thread_stack=1048576    # default; no special config

Client-side, on a fresh connection:

----- begin SQL -----
DROP PROCEDURE IF EXISTS test.p1;
SET @@session.max_sp_recursion_depth = 255;

DELIMITER //
CREATE PROCEDURE test.p1(a INT) BEGIN
  CALL test.p1(a+1);
END //
DELIMITER ;

CALL test.p1(0);
----- end SQL -----

Run:
    mysql -uroot test < poc.sql

Observed (every run, ~0.5 s wall clock):
    ERROR 2013 (HY000) at line 12: Lost connection to MySQL server
    during query

Server log shows:
    AddressSanitizer:DEADLYSIGNAL
    mysqld_safe Number of processes running now: 0
    mysqld_safe mysqld restarted

Parameter sweep (same PoC, only max_sp_recursion_depth changed):
    depth=0   -> ER_SP_RECURSION_LIMIT (1456)   -- handled, server OK
    depth=32  -> ER_SP_RECURSION_LIMIT (1456)   -- handled, server OK
    depth=64  -> ER_SP_RECURSION_LIMIT (1456)   -- handled, server OK
    depth=128 -> ER_SP_RECURSION_LIMIT (1456)   -- handled, server OK
    depth=250 -> ER 2013 LOST CONNECTION         -- CRASH
    depth=255 -> ER 2013 LOST CONNECTION         -- CRASH (documented max)

Cross-version check (same PoC, same default thread_stack):
    MySQL 8.4.9 (official docker image):
        ERROR 1436 (HY000): Thread stack overrun:
            895088 bytes used of a 1048576 byte stack, and 160000 bytes needed.
        -> handled gracefully, server stays up.
    MySQL 9.6.0 ASAN build:
        crashes as described above.

So the safety net that 8.4 had (check_stack_overrun firing as part of
sp_head::execute) is no longer effective in 9.6 in this code path.