Bug #116171 std::bad_optional_access due to non-thread-safe access to vio->thread_id
Submitted: 20 Sep 2024 4:28 Modified: 20 Sep 2024 11:15
Reporter: Jinyou Ma Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S3 (Non-critical)
Version: OS:Linux
Assigned to: CPU Architecture:Any

[20 Sep 2024 4:28] Jinyou Ma
Description:
When MySQL closes all connections in vio_shutdown, It accesses the vio->thread_id

```
    assert(vio->thread_id.has_value());
    if (vio->thread_id.value() != 0 && vio->poll_shutdown_flag.test_and_set()) {
```
However, the vio->thread_id is non-thread-safe and not protected by any mutexes.
When the thread_id is reset by the `create_and_init_vio`, the MySQL will be crashed due to std::bad_optional_access
```
#ifdef USE_PPOLL_IN_VIO
    if (vio != nullptr) {
      // Unset thread_id, to ensure that all shutdowns explicitly set the
      // current real_id from the THD.
      vio->thread_id.reset();
```

The error log is
```
2024-09-17T21:35:19.701451Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 5477696  user: ''.
terminate called after throwing an instance of 'std::bad_optional_access'
  what():  bad optional access
2024-09-17T21:35:19Z UTC - mysqld got signal 6 ;
```

The backtrace is below

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=<optimized out>)
    at pthread_kill.c:44
#1  0x0000000000ddb2a1 in my_write_core (sig=6)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/mysys/stacktrace.cc:322
#2  handle_fatal_signal (sig=6)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/signal_handler.cc:252
#3  handle_fatal_signal (sig=6)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/signal_handler.cc:224
#4  <signal handler called>
#5  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#6  0x00007f6e4628b9b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#7  0x00007f6e4623e646 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#8  0x00007f6e462287f3 in __GI_abort () at abort.c:79
#9  0x00007f6e466a1b21 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#10 0x00007f6e466ad52c in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#11 0x00007f6e466ad597 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#12 0x00007f6e466ad7f9 in __cxxabiv1::__cxa_throw (obj=<optimized out>,
    tinfo=0x3606dc0 <typeinfo for std::bad_optional_access>,
    dest=0xa81280 <std::bad_optional_access::~bad_optional_access()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:95
#13 0x0000000000897654 in std::__throw_bad_optional_access () at /usr/include/c++/11/optional:102
#14 0x000000000093e272 in std::optional<unsigned long>::value() & (this=<optimized out>, this=<optimized out>)
    at /usr/include/c++/11/optional:960
#15 vio_shutdown (vio=<optimized out>, how=<optimized out>)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/vio/viosocket.cc:537
#16 0x0000000000c0f8dc in THD::disconnect (this=0x4532440, server_shutdown=<optimized out>)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/sql_class.cc:1713
#17 0x0000000000c1d32b in close_connection (thd=0x4532440, sql_errno=0, server_shutdown=<optimized out>,
    generate_event=<optimized out>)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/sql_connect.cc:1207
#18 0x0000000000b609d5 in Do_THD::operator() (thd=<optimized out>, this=<synthetic pointer>)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/mysqld_thd_manager.cc:78
#19 std::for_each<THD**, Do_THD> (__f=..., __last=0x9038b90, __first=0x9037fc0)
    at /usr/include/c++/11/bits/stl_algo.h:3820
#20 Global_THD_manager::do_for_all_thd (this=0x41f0da0, func=0x7f6e3dbf0b40)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/mysqld_thd_manager.cc:313
#21 0x0000000000b46628 in close_connections ()
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/mysqld.cc:2383
#22 signal_hand (arg=arg@entry=0x0)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/sql/mysqld.cc:3903
#23 0x00000000019b67b6 in pfs_spawn_thread (arg=0x436d7a0)
    at /usr/src/debug/percona-server-8.0.32-24.1.el9.x86_64/percona-server-8.0.32-24/storage/perfschema/pfs.cc:2987
#24 0x00007f6e46289c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#25 0x00007f6e4630ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

How to repeat:
I have not got a way to reproduce it.

Suggested fix:
1. Adding a new mutex
2. Using atomic instead of optional
[20 Sep 2024 11:15] MySQL Verification Team
Hi Mr. Ma,

Thank you for your bug report.

However, we noticed that you did not use our official server binaries.

Please, let us know if you have managed to crash MySQL server.

Also, we do need a fully repeatable test case.

We do accept reports based on the fully detailed source code analysis, which would explain a scenario in which a crash can happen.

However, we do not see a detailed description of the scenario, based on the fully disclosed analysis of our latest source code.

Can't repeat.