Bug #101789 | CPU 100% MySQL 8 on Windows 2019 | ||
---|---|---|---|
Submitted: | 28 Nov 2020 21:07 | Modified: | 18 Jan 2021 17:27 |
Reporter: | Alex K. | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 8.0.21, 8.0.22 | OS: | Windows (2019) |
Assigned to: | CPU Architecture: | x86 (32 cores) |
[28 Nov 2020 21:07]
Alex K.
[29 Nov 2020 16:28]
MySQL Verification Team
Can you check if my.ini parameter will solve this? [mysqld] innodb_parallel_read_threads=1
[29 Nov 2020 16:33]
Alex K.
Hi, the default value is 4. Should i try to lower it to 1 ?
[30 Nov 2020 15:02]
MySQL Verification Team
Hi Mr. K. Thank you for your bug report. Please, do set that system variable to 1 and try again. Let us know if you can repeat this behaviour. If you manage to do it, then, please send us the full output from the innodb status command. This command is described in our Reference Manual. We are waiting on your feedback.
[30 Nov 2020 18:32]
Alex K.
Hi there, I've added innodb_parallel_read_threads=1 to my.ini Since this change, there hasn't been a bottleneck. It's almost 30 hours now (usually we had a bottleneck every 8 hours). So is this related to a bug that remains to be fixed? or something related to the parallel threads? Does innodb_parallel_read_threads=1 means our SELECTS will be slower? Thanks Alex
[1 Dec 2020 17:36]
Sunny Bains
Hi, Can you please post an equivalent table definition (ideally with anonymized data) on which you are see this problem. Regards.
[2 Dec 2020 13:15]
MySQL Verification Team
Hi Mr. K, Can you also send us contents of your my.ini file ????
[2 Dec 2020 14:13]
Alex K.
Hi, I have uploaded my.ini file. Please help me with "equivalent table definition" as I don't know exactly what you mean. Thanks.
[2 Dec 2020 14:14]
Alex K.
By the way, still no bottleneck! it's been several days now.
[2 Dec 2020 14:28]
Alex K.
Unbelievable, I had to say it. We're having another bottleneck right now. CPU is completely bottlenecked at 100%. Simple queries like this one: SELECT count(id) FROM g3hy2_usergroups Taking forever to load, a lot of them in the SHOW PROCESSLIST. At the same time when the bottleneck happens, when I manually run this query, it returns results just fine.
[2 Dec 2020 14:31]
MySQL Verification Team
Hi, Please, do note that we also asked for equivalent table definition with as many rows as you can provide. Waiting on your feedback.
[2 Dec 2020 19:29]
Alex K.
Hi, I have uploaded the table that we mostly see on the process list during bottleneck, including the small data that it contains. We have replaced some sensitive data with ******. Let me know if that helps. Thanks.
[2 Dec 2020 20:59]
MySQL Verification Team
On windows 8.0.22, I did a test with the sample table having 37 rows. Ran the query: select count(id) from table; in 1 thread: ~2000 qps in 2 threads: ~3000 qps in 3 threads: ~3500 qps in 4 threads: 78 qps at 4+ threads there's a major performance hit. This was running with --innodb-parallel-read-threads=0 --innodb-thread-concurrency=0 I think something needs investigating here.
[2 Dec 2020 21:41]
MySQL Verification Team
thread stacks taken at a point in time with slow QPS
Attachment: bug101789_8.0.22_thread_stacks.txt (text/plain), 124.01 KiB.
[2 Dec 2020 21:43]
MySQL Verification Team
I've seen enough to verify this on 8.0.22 on Windows. mysql> show full processlist; +-----+-----------------+----------------+------+---------+------+------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+-----------------+----------------+------+---------+------+------------------------+-------------------------------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 846 | Waiting on empty queue | NULL | | 17 | root | localhost:5735 | test | Query | 0 | init | show full processlist | | 178 | root | localhost:9392 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 179 | root | localhost:9394 | test | Query | 3 | executing | select count(id) from g3hy2_usergroups | | 180 | root | localhost:9395 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 181 | root | localhost:9396 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | | 182 | root | localhost:9397 | test | Query | 0 | executing | select count(id) from g3hy2_usergroups | | 183 | root | localhost:9398 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | | 184 | root | localhost:9399 | test | Query | 3 | executing | select count(id) from g3hy2_usergroups | | 185 | root | localhost:9400 | test | Query | 3 | executing | select count(id) from g3hy2_usergroups | | 186 | root | localhost:9402 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | | 187 | root | localhost:9403 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 188 | root | localhost:9404 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 189 | root | localhost:9405 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | +-----+-----------------+----------------+------+---------+------+------------------------+-------------------------------------------+ 14 rows in set (0.00 sec) It becomes VERY slow at 10 threads; # Runtime QPS Qrun RowRcvd RowAfctd #------------------------------------------------------- #00:11:13 8 4802 1207 0 #00:11:16 11 4836 1241 0 #00:11:19 8 4862 1267 0 #00:11:23 8 4887 1292 0 #00:11:26 8 4911 1316 0 #00:11:29 8 4935 1340 0 #00:11:32 7 4958 1363 0 #00:11:35 8 4983 1388 0 #00:11:38 8 5009 1414 0 #00:11:41 8 5033 1438 0 #00:11:45 8 5059 1464 0 #00:11:48 8 5084 1489 0 #00:11:51 8 5110 1515 0 #00:11:54 9 5137 1542 0 #00:11:57 9 5165 1570 0 #00:12:01 9 5192 1597 0 #00:12:04 9 5219 1624 0 #00:12:07 8 5245 1650 0 #00:12:10 8 5270 1675 0 #00:12:13 8 5296 1701 0 #-------------------------------------------------------
[2 Dec 2020 21:47]
Alex K.
Hi, I'm glad this is verified. On our server, those selects take forever to return results (they never stop). Alex
[2 Dec 2020 21:47]
MySQL Verification Team
same behavior on 8.0.21.
[2 Dec 2020 21:59]
MySQL Verification Team
5.7.21 with 4 threads also ate CPU, but that's because it had a huge throughput, so that's fine. #------------------------------------------------ # Runtime QPS Qrun RowRcvd RowAfc #------------------------------------------------ #00:00:00 831 2493 2493 #00:00:05 32159 98971 98971 #00:00:10 35906 206689 206689 #00:00:16 40470 328101 328101 #00:00:23 50768 480407 480407 #00:00:33 71166 693907 693907 #00:00:36 20790 756277 756277 #00:00:39 20781 818621 818621 #00:00:42 20579 880360 880360 #00:00:45 20628 942246 942246
[2 Dec 2020 22:30]
MySQL Verification Team
Similar problem on Linux 8.0.22. The moment the number of threads goes above the number of CPU cores, the overall performance gets ~10x slower than a single thread. It falls off a cliff fast. This seems unexpected to me. 5.7.31 didn't hit such a hard ceiling. At least devs can profile this with "perf" on linux easier now. I tested with --innodb-parallel-read-threads=1.
[2 Dec 2020 22:39]
Sunny Bains
Looks like the issue is in this code: void IB_thread::start() { ut_a(state() == State::NOT_STARTED); m_state->store(State::ALLOWED_TO_START); while (state() == State::ALLOWED_TO_START) { UT_RELAX_CPU(); } const auto state_after_start = state(); ut_a(state_after_start == State::STARTED || state_after_start == State::STOPPED); } I suspect that a memory barrier will be required in the check for state. Why this problem only manifests on Windows (assuming it's a memory barrier issue), I can't say.
[3 Dec 2020 12:51]
MySQL Verification Team
Thank you very much, Shane and Sunny. I took a look myself and it seems that problem started when we stopped using configuration #define's HAVE_PAUSE_INSTRUCTION and HAVE_FAKE_PAUSE_INSTRUCTION.
[3 Dec 2020 12:53]
MySQL Verification Team
Increasing severity.
[11 Dec 2020 21:51]
Alex K.
Hi there, Is there any update on this? Thanks Alex
[11 Dec 2020 23:23]
Sunny Bains
We have figured out the underlying issue.The fix will be pushed soon. Thank you for helping us debug this issue. Regards, -sunny
[14 Dec 2020 13:49]
MySQL Verification Team
Thank you, Sunny and Alex.
[14 Dec 2020 13:55]
Alex K.
Hi, is this going to be included in the next MySQL community upgrade? for example 8.0.23? Is this the way that I can get the fix? Thanks Alex
[6 Jan 2021 12:15]
Alex K.
Hi and happy new year. is there any update on this? Thanks Alex
[9 Jan 2021 5:56]
Darshan M N
Hi, the fix will be pushed in a few days as internal testing is still going on. It would be available in the 8.0.24 release.
[11 Jan 2021 15:02]
MySQL Verification Team
Thank you, Darshan ...........
[18 Jan 2021 17:18]
Daniel Price
Posted by developer: commit 3b36c7dad05c288c03208c8b10863f1c53e0e153 Date: Mon Jan 11 19:09:01 2021 +0530 Bug#32224707 CPU 100% MYSQL 8 ON WINDOWS 2019 Issue ===== 1. On windows, running SELECT COUNT(*) queries parallely from various client threads, with the number of client threads exceeding the number of cores of the machine, can cause stalls for a very long time. The stall is seen in IB_thread::start(), while spawning threads for parallel read, where we spin-wait in a loop waiting for the state to be set. It is seen that we cannot just rely on UT_RELAX_CPU() when the number of threads to be spawned exceeds the number of cores of a machine. This is because UT_RELAX_CPU uses the YieldProcesser() system call, which only makes the processor available to other logical processors. But if all the logical processors are busy in spin wait then no real work is done causing the hang-like, situation. 2. The parallel reader infrastructure, allows the user to set the parameter innodb_parallel_reader_threads (default being 4), to configure the number of threads to be spawned to carry out the parallel read operation. This parameter is supposed to be an upper limit, with the infrastructure using only as many threads required as possible. But we currently spawn as many threads as the parameter value even though it's not necessary. Fix === 1. If we are still stuck in spin-wait, after calling UT_RELAX_CPU multiple times, allow the thread to release its time slice.
[18 Jan 2021 17:27]
Daniel Price
Posted by developer: Fixed as of the upcoming 8.0.24 release, and here's the proposed changelog entry from the documentation team: On Windows, stalls were caused by concurrent SELECT COUNT(*) queries from multiple client threads where the number of client threads exceeded the number of machine cores.
[19 Jan 2021 14:05]
MySQL Verification Team
Thank you, Daniel .....
[19 Mar 2021 7:01]
MySQL Verification Team
On linux this manifested too, here's the perf output of 8.0.23. - 100.00% 100.00% mysqld - 93.64% start_thread - 93.55% pfs_spawn_thread - 93.55% handle_connection - 93.54% do_command - 93.40% dispatch_command - 93.20% dispatch_sql_command - 92.81% mysql_execute_command - 92.56% Sql_cmd_dml::execute - 91.82% SELECT_LEX_UNIT::execute - 91.80% SELECT_LEX_UNIT::ExecuteIteratorQuery - 91.64% UnqualifiedCountIterator::Read - 91.63% get_exact_record_count - 91.62% ha_innobase::records - 91.58% row_scan_index_for_mysql - 91.50% row_mysql_parallel_select_count_star - 91.05% Parallel_reader::run - 91.04% Parallel_reader::parallel_read 77.14% IB_thread::start 13.61% IB_thread::wait Confirmed it's fixed in current trunk for linux too.
[13 Apr 2021 14:11]
Daniel Price
Posted by developer: Revised entry: Fixed as of the upcoming 8.0.24 release: On Windows, stalls were caused by concurrent SELECT COUNT(*) queries where the number of parallel read threads exceeded the number of machine cores. Patch applied to other platforms in 8.0.25: Bug #32678019 BUSY WAIT IN IB_THREAD::START CAN CAUSE PERFORMANCE ISSUES
[13 Apr 2021 14:11]
Daniel Price
Posted by developer: Revised entry: Fixed as of the upcoming 8.0.24 release: On Windows, stalls were caused by concurrent SELECT COUNT(*) queries where the number of parallel read threads exceeded the number of machine cores. Patch applied to other platforms in 8.0.25: Bug #32678019 BUSY WAIT IN IB_THREAD::START CAN CAUSE PERFORMANCE ISSUES
[14 Apr 2021 12:58]
MySQL Verification Team
Thank you, Daniel.