Bug #115703 Asynchronous interface uses unsafe static local variables to (re-)store state
Submitted: 27 Jul 11:55 Modified: 12 Sep 2:53
Reporter: Rick Eisermann Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[27 Jul 11:55] Rick Eisermann
Description:
Using the asynchronous interface causes intermittent crashes under high concurrency when fetching large results. After quite some debugging, I think that the cause is WL#13510 (https://github.com/mysql/mysql-server/commit/bc94adeb128a1ad464ca0c2c771855eee497d740), which adds support of compression for the asynchronous interface.

This worklog introduces net_read_uncompressed_nonblocking (https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/sql-co...) and net_read_compressed_nonblocking (https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/sql-co...).
These functions use static local variables to store (https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/sql-co... resp. https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/sql-co...) and restore (https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/sql-co... resp. https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/sql-co...) state between invocations.

Static local variables are not thread-safe. But even turning them into static thread_local variables is not enough.
Usage of the asynchronous interface, especially within an event system, can lead to interleaved invocations of these functions, if net_read_packet_nonblocking results in NET_ASYNC_NOT_READY, which would be typical for large results which are split into multiple packets.
This will lead to mingled states of save_pos, total_length, net->where_b and net->read_pos.

How to repeat:
I've managed to create a simple test program. It creates two connections and two querys and tries to fetch their results.

Query one gives a result which has to be split into multiple packets due to its size. The result of the second query is fetched when the result of the first query returns a packet of maximum length but before it is fetched completely.

It is crucial to hit this state, that is having received the first but not the second packet, which requires quite some timing. The test program counts invocations of mysql_store_result_nonblocking to watch for the right moment, so you'll have to experiment to get the correct value.

I've also modified net_read_uncompressed_nonblocking to print the values of save_pos, total_length, net->where_b and net->read_pos, but this is not included here.

-------

#include <iostream>
#include <string>
#include <cstring>
#include <mysql/mysql.h>

namespace {
	void test() {
		// Initialize two connections.
		// To keep things simple, both connections have the same setup.
		MYSQL* connection_1 = mysql_init(nullptr);

		if (connection_1 == nullptr) {
			std::cerr << "mysql_init failed";

			return;
		}

		MYSQL* connection_2 = mysql_init(nullptr);

		if (connection_2 == nullptr) {
			std::cerr << "mysql_init failed";

			mysql_close(connection_1);

			return;
		}

		const char* host = "127.0.0.1";
		const char* user = "test";
		const char* passwd = "test";
		const char* db = nullptr;
		unsigned int port = 0;
		const char* unix_socket = nullptr;
		unsigned long clientflag = 0;
		net_async_status status;

		do {
			status = mysql_real_connect_nonblocking(connection_1, host, user, passwd, db, port, unix_socket, clientflag);
		} while (status == NET_ASYNC_NOT_READY);

		if (status == NET_ASYNC_ERROR) {
			std::cerr << "mysql_real_connect_nonblocking failed" << std::endl;

			mysql_close(connection_1);

			return;
		}

		do {
			status = mysql_real_connect_nonblocking(connection_2, host, user, passwd, db, port, unix_socket, clientflag);
		} while (status == NET_ASYNC_NOT_READY);

		if (status == NET_ASYNC_ERROR) {
			std::cerr << "mysql_real_connect_nonblocking failed" << std::endl;

			mysql_close(connection_1);
			mysql_close(connection_2);

			return;
		}

		// The query on the first connection must cause a packet split.
		const char* query = "SELECT REPEAT(1, 256 * 256 * 256 + 1)";
		std::size_t query_length = std::strlen(query);

		do {
			status = mysql_real_query_nonblocking(connection_1, query, query_length);
		} while (status == NET_ASYNC_NOT_READY);

		if (status == NET_ASYNC_ERROR) {
			std::cerr << "mysql_real_query_nonblocking failed" << std::endl;

			mysql_close(connection_1);
			mysql_close(connection_2);

			return;
		}

		// The query on the second connection can be anything.
		query = "SELECT 0";
		query_length = std::strlen(query);

		do {
			status = mysql_real_query_nonblocking(connection_2, query, query_length);
		} while (status == NET_ASYNC_NOT_READY);

		if (status == NET_ASYNC_ERROR) {
			std::cerr << "mysql_real_query_nonblocking failed" << std::endl;

			mysql_close(connection_1);
			mysql_close(connection_2);

			return;
		}

		// We are going to interleave result fetching of query one and two.
		// Fetching the result of the second query must start after the first packet of the first query has been received.
		MYSQL_RES* result_1 = nullptr;
		int count = 0;

		do {
			++count;
			status = mysql_store_result_nonblocking(connection_1, &result_1);

			// At this point, the first maximum length packet of the first query's result has been received.
			// Now go and fetch the result of the second query.
			// The count of 1025 may be very individual. If you don't get a crash, check the output of the count variable.
			if (status == NET_ASYNC_NOT_READY && count == 1025) {
				MYSQL_RES* result_2 = nullptr;
				net_async_status status_2;

				do {
					status_2 = mysql_store_result_nonblocking(connection_2, &result_2);
				} while (status_2 == NET_ASYNC_NOT_READY);

				if (status_2 == NET_ASYNC_ERROR) {
					std::cerr << "mysql_store_result_nonblocking failed" << std::endl;

					if (result_1 != nullptr) {
						mysql_free_result(result_1);
					}

					if (result_2 != nullptr) {
						mysql_free_result(result_2);
					}

					mysql_close(connection_1);
					mysql_close(connection_2);

					return;
				}

				MYSQL_ROW row = mysql_fetch_row(result_2);

				if (row != nullptr) {
					//std::cout << row[0] << std::endl;
				}

				mysql_free_result(result_2);
			}
		} while (status == NET_ASYNC_NOT_READY);

		if (status == NET_ASYNC_ERROR) {
			std::cerr << "mysql_store_result_nonblocking failed" << std::endl;

			if (result_1 != nullptr) {
				mysql_free_result(result_1);
			}

			mysql_close(connection_1);
			mysql_close(connection_2);

			return;
		}

		std::cout << "mysql_store_result_nonblocking count: " << count << std::endl;

		// This will cause a crash
		MYSQL_ROW row = mysql_fetch_row(result_1);

		if (row != nullptr) {
			//std::cout << row[0] << std::endl;
		}

		mysql_free_result(result_1);
		mysql_close(connection_1);
		mysql_close(connection_2);
	}
}

int main(int argc, char* argv[]) {
	mysql_library_init(0, nullptr, nullptr);
	test();
	mysql_library_end();

	return 0;
}

Suggested fix:
Static local variables should be turned into variables attached to NET_ASYNC so that they dependent on the connection.
[27 Jul 13:53] Rick Eisermann
Output of valgrind:

==1882== Memcheck, a memory error detector
==1882== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1882== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1882== Command: mysql/example
==1882== 
==1882== Invalid read of size 8
==1882==    at 0x4887DB4: mysql_fetch_row (client.cc:8414)
==1882==    by 0x109759: (anonymous namespace)::test() (in /home/ab/mysql/example)
==1882==    by 0x1097C3: main (in /home/ab/mysql/example)
==1882==  Address 0x30 is not stack'd, malloc'd or (recently) free'd
==1882== 
==1882== 
==1882== Process terminating with default action of signal 11 (SIGSEGV)
==1882==  Access not within mapped region at address 0x30
==1882==    at 0x4887DB4: mysql_fetch_row (client.cc:8414)
==1882==    by 0x109759: (anonymous namespace)::test() (in /home/ab/mysql/example)
==1882==    by 0x1097C3: main (in /home/ab/mysql/example)
==1882==  If you believe this happened as a result of a stack
==1882==  overflow in your program's main thread (unlikely but
==1882==  possible), you can try to increase the size of the
==1882==  main thread stack using the --main-stacksize= flag.
==1882==  The main thread stack size used in this run was 8388608.
==1882== 
==1882== HEAP SUMMARY:
==1882==     in use at exit: 17,237,537 bytes in 5,140 blocks
==1882==   total heap usage: 9,677 allocs, 4,537 frees, 51,475,697 bytes allocated
==1882== 
==1882== LEAK SUMMARY:
==1882==    definitely lost: 0 bytes in 0 blocks
==1882==    indirectly lost: 0 bytes in 0 blocks
==1882==      possibly lost: 0 bytes in 0 blocks
==1882==    still reachable: 17,237,537 bytes in 5,140 blocks
==1882==         suppressed: 0 bytes in 0 blocks
==1882== Rerun with --leak-check=full to see details of leaked memory
==1882== 
==1882== For lists of detected and suppressed errors, rerun with: -s
==1882== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
[27 Jul 13:55] Rick Eisermann
Relevant output of gdb:

Program received signal SIGSEGV, Segmentation fault.
mysql_fetch_row () at /home/ab/mysql/mysql-server-trunk/sql-common/client.cc:8414
8414	  if (res->methods) {
[29 Jul 9:26] MySQL Verification Team
Hi Mr. Esermann,

Thank you for your bug report.

We repeated your problem with 8.0.39.

It affects 8.0 and higher versions.

Verified as reported.
[31 Jul 6:47] Rick Eisermann
It seems that Facebook, the "culprit" of the original implementation, discovered their mistake recently (https://github.com/facebook/mysql-5.6/commit/5457fe7f63eb96bce3a980b2a3391a8b5cb6cd07). It still makes me wonder, how such a serious bug has passed review.
[31 Jul 10:37] MySQL Verification Team
Mr. Eisermann,

We truly do not have the answer to your last question.

It would require men-weeks to answer a question like that ......
[12 Sep 2:53] Jon Stephens
Documented fix as follows in the MySQL 8.0.40, 8.4.3, and 9.1.0 changelogs:

    The asynchronous interface used unsafe static local variables to
    store state information.

Closed.
[16 Sep 11:20] MySQL Verification Team
thank you, Jon.