| Bug #101789 | CPU 100% MySQL 8 on Windows 2019 | ||
|---|---|---|---|
| Submitted: | 28 Nov 2020 21:07 | Modified: | 18 Jan 2021 17:27 | 
| Reporter: | Alex K. | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) | 
| Version: | 8.0.21, 8.0.22 | OS: | Windows (2019) | 
| Assigned to: | CPU Architecture: | x86 (32 cores) | |
   [28 Nov 2020 21:07]
   Alex K.        
  
 
   [29 Nov 2020 16:28]
   MySQL Verification Team        
  Can you check if my.ini parameter will solve this? [mysqld] innodb_parallel_read_threads=1
   [29 Nov 2020 16:33]
   Alex K.        
  Hi, the default value is 4. Should i try to lower it to 1 ?
   [30 Nov 2020 15:02]
   MySQL Verification Team        
  Hi Mr. K. Thank you for your bug report. Please, do set that system variable to 1 and try again. Let us know if you can repeat this behaviour. If you manage to do it, then, please send us the full output from the innodb status command. This command is described in our Reference Manual. We are waiting on your feedback.
   [30 Nov 2020 18:32]
   Alex K.        
  Hi there, I've added innodb_parallel_read_threads=1 to my.ini Since this change, there hasn't been a bottleneck. It's almost 30 hours now (usually we had a bottleneck every 8 hours). So is this related to a bug that remains to be fixed? or something related to the parallel threads? Does innodb_parallel_read_threads=1 means our SELECTS will be slower? Thanks Alex
   [1 Dec 2020 13:21]
   MySQL Verification Team        
  Here's the previous bugs: BUG 31016076 - SELECT COUNT(*) ON LARGE TABLE HANG FOREVER EVEN AFTER CTRL+C BUG 31015525 - INTERRUPTING PARALLEL INNODB READS LEAVES PARALLEL READ THREADS IN AN INFINITE LOOP supposed to have been fixed in 8.0.22. seems it's not completely solved?
   [1 Dec 2020 17:36]
   Sunny Bains        
  Hi, Can you please post an equivalent table definition (ideally with anonymized data) on which you are see this problem. Regards.
   [2 Dec 2020 13:15]
   MySQL Verification Team        
  Hi Mr. K, Can you also send us contents of your my.ini file ????
   [2 Dec 2020 14:13]
   Alex K.        
  Hi, I have uploaded my.ini file. Please help me with "equivalent table definition" as I don't know exactly what you mean. Thanks.
   [2 Dec 2020 14:14]
   Alex K.        
  By the way, still no bottleneck! it's been several days now.
   [2 Dec 2020 14:28]
   Alex K.        
  Unbelievable, I had to say it. We're having another bottleneck right now. CPU is completely bottlenecked at 100%. Simple queries like this one: SELECT count(id) FROM g3hy2_usergroups Taking forever to load, a lot of them in the SHOW PROCESSLIST. At the same time when the bottleneck happens, when I manually run this query, it returns results just fine.
   [2 Dec 2020 14:31]
   MySQL Verification Team        
  Hi, Please, do note that we also asked for equivalent table definition with as many rows as you can provide. Waiting on your feedback.
   [2 Dec 2020 19:29]
   Alex K.        
  Hi, I have uploaded the table that we mostly see on the process list during bottleneck, including the small data that it contains. We have replaced some sensitive data with ******. Let me know if that helps. Thanks.
   [2 Dec 2020 20:59]
   MySQL Verification Team        
  On windows 8.0.22, I did a test with the sample table having 37 rows. Ran the query: select count(id) from table; in 1 thread: ~2000 qps in 2 threads: ~3000 qps in 3 threads: ~3500 qps in 4 threads: 78 qps at 4+ threads there's a major performance hit. This was running with --innodb-parallel-read-threads=0 --innodb-thread-concurrency=0 I think something needs investigating here.
   [2 Dec 2020 21:41]
   MySQL Verification Team        
  thread stacks taken at a point in time with slow QPS
Attachment: bug101789_8.0.22_thread_stacks.txt (text/plain), 124.01 KiB.
   [2 Dec 2020 21:43]
   MySQL Verification Team        
  I've seen enough to verify this on 8.0.22 on Windows. mysql> show full processlist; +-----+-----------------+----------------+------+---------+------+------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+-----------------+----------------+------+---------+------+------------------------+-------------------------------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 846 | Waiting on empty queue | NULL | | 17 | root | localhost:5735 | test | Query | 0 | init | show full processlist | | 178 | root | localhost:9392 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 179 | root | localhost:9394 | test | Query | 3 | executing | select count(id) from g3hy2_usergroups | | 180 | root | localhost:9395 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 181 | root | localhost:9396 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | | 182 | root | localhost:9397 | test | Query | 0 | executing | select count(id) from g3hy2_usergroups | | 183 | root | localhost:9398 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | | 184 | root | localhost:9399 | test | Query | 3 | executing | select count(id) from g3hy2_usergroups | | 185 | root | localhost:9400 | test | Query | 3 | executing | select count(id) from g3hy2_usergroups | | 186 | root | localhost:9402 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | | 187 | root | localhost:9403 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 188 | root | localhost:9404 | test | Query | 1 | executing | select count(id) from g3hy2_usergroups | | 189 | root | localhost:9405 | test | Query | 2 | executing | select count(id) from g3hy2_usergroups | +-----+-----------------+----------------+------+---------+------+------------------------+-------------------------------------------+ 14 rows in set (0.00 sec) It becomes VERY slow at 10 threads; # Runtime QPS Qrun RowRcvd RowAfctd #------------------------------------------------------- #00:11:13 8 4802 1207 0 #00:11:16 11 4836 1241 0 #00:11:19 8 4862 1267 0 #00:11:23 8 4887 1292 0 #00:11:26 8 4911 1316 0 #00:11:29 8 4935 1340 0 #00:11:32 7 4958 1363 0 #00:11:35 8 4983 1388 0 #00:11:38 8 5009 1414 0 #00:11:41 8 5033 1438 0 #00:11:45 8 5059 1464 0 #00:11:48 8 5084 1489 0 #00:11:51 8 5110 1515 0 #00:11:54 9 5137 1542 0 #00:11:57 9 5165 1570 0 #00:12:01 9 5192 1597 0 #00:12:04 9 5219 1624 0 #00:12:07 8 5245 1650 0 #00:12:10 8 5270 1675 0 #00:12:13 8 5296 1701 0 #-------------------------------------------------------
   [2 Dec 2020 21:47]
   Alex K.        
  Hi, I'm glad this is verified. On our server, those selects take forever to return results (they never stop). Alex
   [2 Dec 2020 21:47]
   MySQL Verification Team        
  same behavior on 8.0.21.
   [2 Dec 2020 21:59]
   MySQL Verification Team        
  5.7.21 with 4 threads also ate CPU, but that's because it had a huge throughput, so that's fine. #------------------------------------------------ # Runtime QPS Qrun RowRcvd RowAfc #------------------------------------------------ #00:00:00 831 2493 2493 #00:00:05 32159 98971 98971 #00:00:10 35906 206689 206689 #00:00:16 40470 328101 328101 #00:00:23 50768 480407 480407 #00:00:33 71166 693907 693907 #00:00:36 20790 756277 756277 #00:00:39 20781 818621 818621 #00:00:42 20579 880360 880360 #00:00:45 20628 942246 942246
   [2 Dec 2020 22:30]
   MySQL Verification Team        
  Similar problem on Linux 8.0.22. The moment the number of threads goes above the number of CPU cores, the overall performance gets ~10x slower than a single thread. It falls off a cliff fast. This seems unexpected to me. 5.7.31 didn't hit such a hard ceiling. At least devs can profile this with "perf" on linux easier now. I tested with --innodb-parallel-read-threads=1.
   [2 Dec 2020 22:39]
   Sunny Bains        
  Looks like the issue is in this code:
void IB_thread::start() {
  ut_a(state() == State::NOT_STARTED);
  m_state->store(State::ALLOWED_TO_START);
  while (state() == State::ALLOWED_TO_START) {
    UT_RELAX_CPU(); 
  }
  const auto state_after_start = state();
  
  ut_a(state_after_start == State::STARTED ||
       state_after_start == State::STOPPED);
}
I suspect that a memory barrier will be required in the check for state. Why this problem only manifests on Windows (assuming it's a memory barrier issue), I can't say.
 
   [3 Dec 2020 12:51]
   MySQL Verification Team        
  Thank you very much, Shane and Sunny. I took a look myself and it seems that problem started when we stopped using configuration #define's HAVE_PAUSE_INSTRUCTION and HAVE_FAKE_PAUSE_INSTRUCTION.
   [3 Dec 2020 12:53]
   MySQL Verification Team        
  Increasing severity.
   [11 Dec 2020 21:51]
   Alex K.        
  Hi there, Is there any update on this? Thanks Alex
   [11 Dec 2020 23:23]
   Sunny Bains        
  We have figured out the underlying issue.The fix will be pushed soon. Thank you for helping us debug this issue. Regards, -sunny
   [14 Dec 2020 13:49]
   MySQL Verification Team        
  Thank you, Sunny and Alex.
   [14 Dec 2020 13:55]
   Alex K.        
  Hi, is this going to be included in the next MySQL community upgrade? for example 8.0.23? Is this the way that I can get the fix? Thanks Alex
   [6 Jan 2021 12:15]
   Alex K.        
  Hi and happy new year. is there any update on this? Thanks Alex
   [9 Jan 2021 5:56]
   Darshan M N        
  Hi, the fix will be pushed in a few days as internal testing is still going on. It would be available in the 8.0.24 release.
   [11 Jan 2021 15:02]
   MySQL Verification Team        
  Thank you, Darshan ...........
   [18 Jan 2021 17:18]
   Daniel Price        
  Posted by developer:
 
commit 3b36c7dad05c288c03208c8b10863f1c53e0e153
Date:   Mon Jan 11 19:09:01 2021 +0530
    Bug#32224707 CPU 100% MYSQL 8 ON WINDOWS 2019
    
    Issue
    =====
    1. On windows, running SELECT COUNT(*) queries parallely from various client
    threads, with the number of client threads exceeding the number of cores of the
    machine, can cause stalls for a very long time. The stall is seen in
    IB_thread::start(), while spawning threads for parallel read, where we spin-wait
    in a loop waiting for the state to be set.
    
    It is seen that we cannot just rely on UT_RELAX_CPU() when the number of threads
    to be spawned exceeds the number of cores of a machine. This is because
    UT_RELAX_CPU uses the YieldProcesser() system call,  which only makes the
    processor available to other logical processors. But if all the logical
    processors are busy in spin wait then no real work is done causing the
    hang-like, situation.
    
    2. The parallel reader infrastructure, allows the user to set the parameter
    innodb_parallel_reader_threads (default being 4), to configure the number of
    threads to be spawned to carry out the parallel read operation. This parameter
    is supposed to be an upper limit, with the infrastructure using only as many
    threads required as possible. But we currently spawn as many threads as the
    parameter value even though it's not necessary.
    
    Fix
    ===
    1. If we are still stuck in spin-wait, after calling UT_RELAX_CPU multiple
    times, allow the thread to release its time slice.
 
   [18 Jan 2021 17:27]
   Daniel Price        
  Posted by developer: Fixed as of the upcoming 8.0.24 release, and here's the proposed changelog entry from the documentation team: On Windows, stalls were caused by concurrent SELECT COUNT(*) queries from multiple client threads where the number of client threads exceeded the number of machine cores.
   [19 Jan 2021 14:05]
   MySQL Verification Team        
  Thank you, Daniel .....
   [19 Mar 2021 7:01]
   MySQL Verification Team        
  On linux this manifested too, here's the perf output of 8.0.23.
-  100.00%   100.00%  mysqld
   - 93.64% start_thread
      - 93.55% pfs_spawn_thread
         - 93.55% handle_connection
            - 93.54% do_command
               - 93.40% dispatch_command
                  - 93.20% dispatch_sql_command
                     - 92.81% mysql_execute_command
                        - 92.56% Sql_cmd_dml::execute
                           - 91.82% SELECT_LEX_UNIT::execute
                              - 91.80% SELECT_LEX_UNIT::ExecuteIteratorQuery
                                 - 91.64% UnqualifiedCountIterator::Read
                                    - 91.63% get_exact_record_count
                                       - 91.62% ha_innobase::records
                                          - 91.58% row_scan_index_for_mysql
                                             - 91.50% row_mysql_parallel_select_count_star
                                                - 91.05% Parallel_reader::run
                                                   - 91.04% Parallel_reader::parallel_read
                                                        77.14% IB_thread::start
                                                        13.61% IB_thread::wait
Confirmed it's fixed in current trunk for linux too.
 
   [13 Apr 2021 14:11]
   Daniel Price        
  Posted by developer: Revised entry: Fixed as of the upcoming 8.0.24 release: On Windows, stalls were caused by concurrent SELECT COUNT(*) queries where the number of parallel read threads exceeded the number of machine cores. Patch applied to other platforms in 8.0.25: Bug #32678019 BUSY WAIT IN IB_THREAD::START CAN CAUSE PERFORMANCE ISSUES
   [13 Apr 2021 14:11]
   Daniel Price        
  Posted by developer: Revised entry: Fixed as of the upcoming 8.0.24 release: On Windows, stalls were caused by concurrent SELECT COUNT(*) queries where the number of parallel read threads exceeded the number of machine cores. Patch applied to other platforms in 8.0.25: Bug #32678019 BUSY WAIT IN IB_THREAD::START CAN CAUSE PERFORMANCE ISSUES

