Bug #102189 SSL memory leak in the VIO struct on cleanup
Submitted: 7 Jan 21:42 Modified: 15 Apr 14:57
Reporter: Fadi Hanna Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[7 Jan 21:42] Fadi Hanna
Description:
This bug affects the async client case, in the event of a connection timeout.

In some cases, we could be initiating a VIO shutdown and cleanup on a
VIO struct that has a non-null ssl_arg using the fallback vio_delete()
function instead of the vio_ssl_delete() function. This causes a memory
leak on the SSL struct because nothing is freeing it.

The ssl_arg is deleted in the call to vio_ssl_delete(), which gets called if the type == VIO_TYPE_SSL. However in some cases, the type we get is VIO_TYPE_TCPIP (probably some transient state, interrupted from becoming VIO_TYPE_SSL maybe by the timeout), and because of that, we call vio_delete() instead of vio_ssl_delete(), which doesn't free up the SSL data.

How to repeat:
Create an infinite loop that establishes a connection to a mysql server using async client APIs, and terminate connection attempt at random points to simulate timeouts. Notice how memory starts increasing.

Suggested fix:
Call SSL_free() inside of internal_vio_delete(Vio* vio), and remove it from the vio_ssl_delete(Vio* vio) function (the latter ends up calling the former)
[11 Jan 19:41] Fadi Hanna
Fix: https://github.com/facebook/mysql-5.6/commit/2c46939e959a374afc8afde64500738005163678
[12 Jan 12:53] MySQL Verification Team
Hi Mr. Hanna,

Thank you for your bug report.

We have  made a detailed check of your code analysis and concluded that you are correct.

Thank you also, for your patch.

Verified as reported.
[26 Jan 19:19] Fadi Hanna
Quick update: we discovered that this fix would cause a double freeing with the sync path, and cause a crash.

We have a better fix in this patch: https://github.com/facebook/mysql-5.6/commit/3af4f875216c6fac196c7d4bfd650f8da56f1112
[27 Jan 13:17] MySQL Verification Team
Thanks again .....
[4 Feb 11:49] Georgi Kodinov
Posted by developer:
 
This is the beginning of a great problem analysis, but I do not find it complete and hence I do not agree with the fix proposed. That's because your fix is breaking the code layering. 
VIOs come in multiple types. And each has its own methods that handle various transitions in its state according to the type. The socket VIO (that uses vio_delete) and the SSL vio (that uses vio_ssl_delete()) are two completely separate VIOs. Hence doing stuff that needs to be done for one in the methods of the other is not only unnecessary but also resource consuming because if a socket VIO is just a socket vio it will never have an SSL set into it. 
Live VIOs can be transformed from one kind to another via the vio_reset() call. 

Also note that the leak doesn't happen with synchronous mysql_real_connect() which indicates that the issue is not with the VIO code itself.

I've taken a look at the async code and this is my educated guess:

The async code first creates an SSL context, stores it into the connect async context of the MYSQL handle, then goes about doing SSL_connect() on that. But, I assume, this doesn't complete because of lack of bytes in the buffer. So instead of transforming the vio into a SSL vio (and setting the SSL context into it) it will just keep the SSL into the async connect context and return "want more data" to the controlling application. 
Now the controlling application most probably goes about doing mysql_close() on that MYSQL structure. 
And, while doing that, it completely forgets to free the SSL handle from the async connect context, if it's present. 

And voila: a leak.

Thus I assume that the (beginning) of the correct fix for this would be something like:
C:\Users\GKODINOV\dev\mysql-8.0>git diff
diff --git a/sql-common/client.cc b/sql-common/client.cc
index 1316d54a764..2ac0901da9c 100644
--- a/sql-common/client.cc
+++ b/sql-common/client.cc
@@ -3253,6 +3253,10 @@ void mysql_extension_free(MYSQL_EXTENSION *ext) {
         my_free(ext->mysql_async_context->connect_context->scramble_buffer);
         ext->mysql_async_context->connect_context->scramble_buffer = nullptr;
       }
+      if (ext->mysql_async_context->connect_context->ssl) {
+        SSL_free(ext->mysql_async_context->connect_context->ssl);
+        ext->mysql_async_context->connect_context->ssl = nullptr;
+      }
       my_free(ext->mysql_async_context->connect_context);
       ext->mysql_async_context->connect_context = nullptr;
     }

But I need a stable reproduction case (as mandated by the bug reporting guidelines) so I can prove my theory above and verify the proposed fix. 

Thus setting the bug to "need more info".
[4 Feb 11:55] Georgi Kodinov
Posted by developer:
 
After writing the above I've discovered your "improved" fix and am glad that I was ballpark correct :) 
But I still need a test case before accepting the contribution. Hint: use DBUG_EXECUTE_IF to trigger the condition I've described above and then call mysql_close() to check if it frees everything properly.
[5 Mar 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[8 Mar 18:49] Fadi Hanna
Hello, unfortunately it would be hard to provide a repro for this issue, as the clients used to hit the issue are using some wrapper libraries that are not open source.
[9 Mar 13:26] MySQL Verification Team
Hi Mr. Hanna,

Thank you for your feedback. In the case that you find a way for us to repeat this bug, please, just update immediately this report and we shall do our best to attend to it.
[29 Mar 16:18] Aaron Balsara
We have a stable repro that fails with valgrind without the patch, and passes with it.

diff --git a/mysql-test/t/async_client_ssl_leak.test b/mysql-test/t/async_client_ssl_leak.test
new file mode 100644
index 00000000000..26edfa4ac1e
--- /dev/null
+++ b/mysql-test/t/async_client_ssl_leak.test
@@ -0,0 +1 @@
+--exec $MYSQL_CLIENT_TEST test_async_client_ssl_leak
diff --git a/testclients/mysql_client_test.cc b/testclients/mysql_client_test.cc
index 7047c578a8d..edd8b07a9e3 100644
--- a/testclients/mysql_client_test.cc
+++ b/testclients/mysql_client_test.cc
@@ -5123,6 +5123,30 @@ static void test_prepare_multi_statements() {
   mysql_close(mysql_local);
 }
 
+static void test_async_client_ssl_leak() {
+  MYSQL *mysql_local;
+  myheader("async_client_ssl_leak");
+  DBUG_SET("+d,async_client_ssl_leak");
+
+  if (!(mysql_local = mysql_client_init(NULL))) {
+    fprintf(stderr, "\n mysql_client_init() failed");
+    exit(1);
+  }
+
+  net_async_status status;
+  bool exit_loop = false;
+  do {
+    status = mysql_real_connect_nonblocking(
+        mysql_local, opt_host, opt_user, opt_password, current_db, opt_port,
+        opt_unix_socket, CLIENT_MULTI_STATEMENTS);
+    DBUG_EXECUTE_IF("async_client_ssl_leak_ssl_started", { exit_loop = true; });
+  } while (status == NET_ASYNC_NOT_READY && !exit_loop);
+
+  mysql_close(mysql_local);
+  DBUG_SET("-d,async_client_ssl_leak");
+  DBUG_SET("-d,async_client_ssl_leak_ssl_started");
+}
+
 /* Test simple bind store result */
 
 static void test_store_result() {
@@ -20800,6 +20824,7 @@ static struct my_tests_st my_tests[] = {
     {"test_multi_stmt", test_multi_stmt},
     {"test_multi_statements", test_multi_statements},
     {"test_prepare_multi_statements", test_prepare_multi_statements},
+    {"test_async_client_ssl_leak", test_async_client_ssl_leak},
     {"test_store_result", test_store_result},
     {"test_store_result1", test_store_result1},
     {"test_store_result2", test_store_result2},
diff --git a/vio/viossl.cc b/vio/viossl.cc
index 34e7946ebc1..bee70852eba 100644
--- a/vio/viossl.cc
+++ b/vio/viossl.cc
@@ -461,6 +461,11 @@ static ssize_t ssl_handshake_loop(Vio *vio, SSL *ssl, ssl_handshake_func_t func,
 
     /* Process the SSL I/O error. */
     if (!ssl_should_retry(vio, handshake_ret, &event, ssl_errno_holder)) break;
+    DBUG_EXECUTE_IF("async_client_ssl_leak", {
+      DBUG_SET("+d,async_client_ssl_leak_ssl_started");
+      DBUG_RETURN(VIO_SOCKET_WANT_READ);
+    });
+
 
     if (!vio->is_blocking_flag) {
       switch (event) {
[31 Mar 12:09] MySQL Verification Team
Hi Mr. Balsara,

We are grateful for your patch.

Would you be so kind to share your test case with us ?????
[2 Apr 18:09] Manuel Ung
Isn't the test case right there in Aaron's comment?
[5 Apr 13:04] MySQL Verification Team
Hi Mr. Ung,

It seems that you are correct.
[14 Apr 14:34] Georgi Kodinov
Posted by developer:
 
I can confirm the supplied test case works as advertised.
[14 Apr 15:10] MySQL Verification Team
Thank you, Joro.
[15 Apr 14:57] Paul DuBois
Posted by developer:
 
Fixed in 8.0.25.

Client programs using the asynchronous C API functions could leak
memory if a connection timeout occurred.
[16 Apr 12:31] MySQL Verification Team
Thank you, Paul.
[24 Apr 14:39] Paul DuBois
Posted by developer:
 
Fixed in 8.0.26, not 8.0.25.
[24 Apr 14:48] Paul DuBois
Posted by developer:
 
Client programs using the asynchronous C API functions could leak
memory if a connection timeout occurred. Thanks for Facebook for a
contribution used in fixing this issue.