Bug #87637 | High CPU occupancy rate | ||
---|---|---|---|
Submitted: | 1 Sep 2017 8:44 | Modified: | 18 Dec 2017 8:46 |
Reporter: | Xiaoyu Wang (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S5 (Performance) |
Version: | 5.7.17 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | MySQL Community |
[1 Sep 2017 8:44]
Xiaoyu Wang
[1 Sep 2017 8:49]
Xiaoyu Wang
patch
Attachment: high_CPU_occupancy_rate.patch (application/octet-stream, text), 6.68 KiB.
[1 Sep 2017 15:06]
MySQL Verification Team
Hi! First of all, thank you for your bug report and for your patch. Unfortunately, we can not accept your patch until you become our OCA contributor. If you wish to become a contributor, please, visit this page: http://www.oracle.com/technetwork/community/oca-486395.html Regarding the bug itself, based on the patch that you have sent us, it seems to be a tuning problem. Please, try to follow our manual and try to make a combination of the following startup variables which would make a problem go away. Before doing that, please make sure that you are using our binary. This is the list of the most important variables that you should tune: innodb_log_file_size innodb_log_files_in_group innodb_open_files innodb_buffer_pool_instances innodb_log_buffer_size innodb_checksums innodb_doublewrite innodb_support_xa innodb_thread_concurrency innodb_flush_log_at_trx_commit innodb_flush_method innodb_max_dirty_pages_pct innodb_adaptive_flushing innodb_read_io_threads innodb_write_io_threads innodb_io_capacity innodb_io_capacity_max innodb_purge_threads innodb_use_native_aio innodb_stats_persistent innodb_checksum_algorithm innodb_stats_on_metadata innodb_lock_wait_timeout innodb_flush_neighbors innodb_spin_wait_delay We have had many reports of high CPU usage and in almost all of these cases tuning of the above variables solved the problem. You should find a combination of the values and if that does not work we shall have to look further. In that case, we need all the info of the sysbench script that you have run, including how did you run sysbench, how many threads and all other options .....
[4 Sep 2017 6:18]
Xiaoyu Wang
Hi! Thanks for your replying and sorry for my ambiguous expression. We believe tuning can better CPU usage, but what we are concentrated on may be another direction.Please allow me to make it clear. The complete info of sysbench script will be at the end part. case 1: Corresponding source code is at “buf0flu.cc: 3092”(version 5.7.17). In situation that There is no read or write operation, mysql is not busy. We attached two counters, one for a while-loop(buf0flu.cc: 3181), and one for an else-branch(buf0flu.cc: 3357). Result showed that else_count was almost as the same as loop_count. Thus, we can avoid idling by make mysql sleep when it is not busy. The sleep time should be no larger than 1 second, and at present, default sleep time is 0.01 seconds. We want mysql sleep for a little while, and be ready for read/write operation at most time. case 2: According to what “perf” shows, we look into “buf_get_n_pending_read_ios()”. And it contains a “srv_n_instances” times loop. “buf_get_n_pnding_read_ios()” costs because, on one hand, it’s called whenever mysql is scheduled, on the other hand, every time it’s called, loop has to be finished. Instead of loop, MY_BITMAP costs less. Compared to “buf_get_n_pending_read_ios()”, “lock_rec_has_to_wait_in_queue” occupies much more. And this can be next problem to solve. sysbench info: how we run script: ./sysbench --num-threads=100 --test=./tests/db/update_index.lua.varchar --oltp-table-size=1 --oltp-tables-count=1 --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-index-updates=1 --oltp-non-index-updates=0 --report-interval=1 --mysql-host=my_host --mysql-port=my_port --mysql-user=test --mysql-password=test --mysql-db=test_db --max-time=360000 --max-requests=2000000000 run script update_index.lua.varchar: pathtest = string.match(test, "(.*/)") or "" dofile(pathtest .. "common.lua") function thread_init(thread_id) set_vars() end function event(thread_id) local table_name local strlen table_name = "sbtest1_varchar" strlen = sb_rand(1, 299) rs = db_query("UPDATE ".. table_name .." SET c= rand_string("..strlen..") WHERE id=1") end
[5 Sep 2017 1:52]
Xiaoyu Wang
After applying the patch, CPU usage decreased observably . We monitored it for about 1 minute, and here statistic is. case 1: It toke up 40%-100% before applying this patch, and 0-4% after. case 2: It toke up 100%-180% variably before, and 100% steadily. Additionally, I submitted application to become a contributor.
[11 Sep 2017 22:04]
MySQL Verification Team
Hi, looking at your example and your fix, you are trying to hack a solution for what thread_pool plugin solves gracefully. Please look at: https://dev.mysql.com/doc/refman/5.7/en/thread-pool.html all best Bogdan
[12 Sep 2017 2:20]
Xiaoyu Wang
Hi, Thanks for your reply. We also tested with Percona, which supports thread_pool, and CPU usage is still so high. And based on statistic of counters, we think idling is the key. We do not see the relation between thread_pool and this bug, we do not know why and how thread_pool solves, so, could you please explain in detail? The statistic of counters follows. Meanings of variables: start_time, end_time: in million second, used to determine an interval interval: in million second, increase counters during it loop_count: times of stepping into the while-loop which is located in buf0flu.cc : 3181 else_count: times of steeping into the else-branch which is located in buf0flu.cc : 3357 Before applying patch: Almost all statistic is like 2017-08-31T09:45:49.176001+08:00 0 [Note] InnoDB: start_time= 1504143949175 end_time= 1504143949176 interval= 1 loop_count= 7082 else_count= 7082 Only 1/20 or less statistic is like 2017-08-31T09:45:50.191131+08:00 0 [Note] InnoDB: start_time= 1504143949191 end_time= 1504143950191 interval= 1000 loop_count= 1 else_count= 0 After applying patch: All statistic is like 2017-08-30T20:00:21.053916+08:00 0 [Note] InnoDB: start_time= 1504094419053 end_time= 1504094421053 interval= 2000 loop_count= 58 else_count= 56 In addition, MySQL Enterprise solves with thread_pool, MySQL Community may solve with this patch. Yours Sincerely, Xiaoyu Wang
[12 Sep 2017 18:24]
MySQL Verification Team
Hi! Thread Pool Plugin does MUCH more then what you are suggesting here ...... Whether community version would get some improvements or not is a decision to be made on much higher level. Meanwhile, you can improve things significantly by tuning in the following variables: innodb_thread_concurrency innodb_flush_log_at_trx_commit innodb_flush_method innodb_max_dirty_pages_pct innodb_adaptive_flushing innodb_read_io_threads innodb_write_io_threads innodb_io_capacity innodb_io_capacity_max innodb_purge_threads innodb_spin_wait_delay Tuning in those variables, so that they are optimised for your hardware, OS, schemata and your SQL statements can bring you much more benefit then the patch that you have submitted.
[13 Sep 2017 4:02]
Xiaoyu Wang
Hi, Maybe tuning solves, but sidestepping this problem is not right, as the idling matters, after all. There may be relation between idling and variables mentioned before, could you please make it clear? If tuning solves, show us your configuration and comparassion of results, please. Applying this patch without tuning is effective, just as statistic shows, and if we can combine tuning and applying patch, it could perform better. About using this patch, I've signed oca and been waiting for reply. Thanks
[13 Sep 2017 13:24]
MySQL Verification Team
Hi, When you finalize the OCA process your code will go to trough the contribution process with the dev team. I agree with Sinisa that these can be solved by retuning the config but the dev team might find your contribution of use, after you finalize the OCA application. Thanks Bogdan
[13 Sep 2017 15:16]
MySQL Verification Team
Hi! Exchanging data about tuning, would not help at all. Tuning is separate for each installation and has to take into account : * Operating System and its version * Versions of system software * Tuning of OS * MySQL version and make * Plugins in use, like GR * Schemata * Replication configuration * SQL statements and applications in general That is why each installations requires different configuration and optimisation. I am working on the machine with 8 cores, using SSD. Its configuration would slow down your installation considerably.
[14 Sep 2017 4:01]
Xiaoyu Wang
Hi, Although the same configuration performs differently on different machine, it would help us understand the meaning of tuning. So, paste it, please. More importantly, it is idling which causes high CPU occupancy, as statistic shows. I agree that tuning solves most of problems in a general way, but if we find the essense of problem, why do not suit the remedy to the case? If tuning solves, tell us whether it influences idling, please.If not, tuning covers up instead of solving. About case2, ignored for long, a bitmap performs better than add operation, without doubt. CPU usage decreases from 130-180% to 100%, observably. Please take it into consideration.
[14 Sep 2017 15:05]
MySQL Verification Team
Here is one example for you ... This is an excerpt from my.cnf. I can not provide you with the entire configuration file as it contains some internal info, for example, sensitive directories and security protocols: read_buffer_size = 4M sort_buffer_size = 4M table_definition_cache = 3000 table_open_cache = 30000 tmp_table_size = 64M wait_timeout = 300 innodb_adaptive_hash_index = 0 innodb_purge_batch_size = 300 innodb_adaptive_flushing = 1 innodb_stats_on_metadata = 0 innodb_file_per_table innodb_flush_log_at_trx_commit = 2 innodb_support_xa = 1 innodb_io_capacity = 2000 innodb_lock_wait_timeout = 50 innodb_log_buffer_size = 32M innodb_buffer_pool_size = 161G innodb_buffer_pool_instances = 64 innodb_log_files_in_group = 2 innodb_log_file_size = 512M innodb_max_dirty_pages_pct = 80 innodb_open_files = 3000 innodb_thread_concurrency = 32 innodb_read_io_threads=16 innodb_write_io_threads=8 innodb_thread_sleep_delay=100000 innodb_use_native_aio=off innodb_purge_threads=2 innodb-spin-wait-delay=512 But, it is just a configuration on one specific installation. You should not use it even for guidance, as it may harm your performance and even make your server unusable or even crash it. In short, you have to find the best configuration for your system, not copy something that works well on a totally different installation.
[19 Sep 2017 9:44]
Xiaoyu Wang
Please keep it open, before we reach a consensus. Thanks.
[19 Sep 2017 10:39]
Xiaoyu Wang
Hi, I tried tuning some variables. Result is as follows. case 1: loop_count and else_count decreased by about 0.5%, and the number is still too large. Process still idled nearly all the time. case 2: CPU usage reached 200%, even. I think, tuning costs a lot of time, and may cause potential problems. Next time we encounter another problem, we tune again? How can we guarantee a new solution will not revive a solved problem? Experts may solve it perfectly, but MySQL is generally used, if we can solve within source code, we should not leave it to users. We know why CPU usage is so high, based on statistic. We know how to fix it.Take it into your consideration, please. In addition, it is the 4th time to re-open this bug. Before consensus is reached, do not close this bug, please. Thank you
[19 Sep 2017 11:01]
MySQL Verification Team
Hi, Not sure what consensus you are talking about? Bug is not "closed", I set the "not a bug" status, that's all. As for the config optimization, bugs system is not a place to discuss / train / test optimization methods. You can discuss optimization methods on MySQL forum ( https://forums.mysql.com/ ) or you can contact our MySQL Support team and discuss proper configuration optimization with them, or find any number of 3rd party consulting agencies to help you deal with it; in any way the bugs db is not the place to discuss "what's the best way to optimize your mysql config". As for the patch, I tried it in 5 different scenarios and it doesn't make "my properly configured, for my needs, system" better in any way (faster, less cpu usage, less memory usage etc etc.. none of those were achieved) so I agree with Sinisa's assessment. With regards to "Experts may solve it perfectly, but MySQL is generally used, if we can solve within source code, we should not leave it to users"; I do not agree. Config is there so you can tweak the system to your needs. If you don't know how there are DBA's for hire or consultants, support engineers etc. Making forceful changes into source code forcing stuff is wrong path that I doubt we will ever make. kind regards Bogdan
[20 Sep 2017 3:35]
Xiaoyu Wang
Hi, Sorry for my misunderstanding about 'closed' and 'not a bug'. The consensus I'm talking about is the reason why tuning works, and the relation between tuning and idling. Because idling causes high CPU and I do not find that tuning reduces it. About the patch. buf_get_n_pending_read_ios() is always called as a 'bool' condition in if-clause, so replacing the while-loop with bitmap is theoretically better and our test result shows that it practically makes difference. As for "Making forceful changes...", we find the cause and fix it, so we can not call it making forceful changes.Idling is the major cause, and this patch reduce idling ,as statistic shows. That's why we say this patch works. Maybe you can count how many times it idles before and after applying this patch. By the way, it is the 5th time to open this bug. Please do not set it "not-a-bug", because this performace problem is not solved by tuning, and relation between uning and idling is not clear. Thanks a lot
[20 Sep 2017 4:50]
Sunny Bains
From what I understand the patch helps in short circuiting the loop, skipping buffer pool instances that don't have any pending IO. This makes sense. Scanning a large number of instances where only a few have pending IO is a waste of CPU. Xiaoyu, is my understanding correct?
[20 Sep 2017 5:09]
Xiaoyu Wang
Thanks, that's exactly what I mean. And that is about case2. As for case1, we find idling causes high CPU usage. Best wishes
[20 Sep 2017 7:27]
Xiaoyu Wang
Hi, This bug is result from two aspects. First, when there is nothing to do(no activity, no pending read ios, no page to flush,as shown in buf0flu.cc:3213), it steps into an else-branch(buf0flu.cc:3375), does nothing and continues looping. We attach two counters to record how many times it steps into the while-loop(buf0flu.cc:3208) and else-branch(buf0flu.cc:3375), and the result shows that the process steps into the else-branch, almost all the time. So, we want it sleep for a little while, but we are not absolutely sure about how long it should sleep. If the process sleeps too long, it won't be as responsive as it is now. As we understand, work cycle is 1 second, so we make it sleep for 0.01s, so that 99% of the time is preserved for responsing. And the time to sleep is adjustable, as a variable 'idle_sleep_us' is attached. Second, as Sunny understands, with bitmap, we reduce the loop times from '<=64' to '<= 2'. Considering that buf_get_n_pending_read_ios is called frequently, bitmap makes difference. We have reached a consensus about 2nd aspect, please take 1st part into your consideration. Thanks a lot. Sincerely Xiaoyu
[20 Sep 2017 23:57]
MySQL Verification Team
I'm verifying this and after OCA is signed you can push this as a contribution so dev team can then decide how to integrate it. all best Bogdan
[26 Sep 2017 2:05]
Xiaoyu Wang
Hi Bogdan, My OCA gets approved, and I am honored to be able to contribute.Could you please let me know how to push the patch as a contribution? Thank you, sincerely Xiaoyu
[28 Sep 2017 4:14]
MySQL Verification Team
> Could you please let me know how to push the patch as a contribution? Hi, It's already done by us so it should show up on the dev contribution report soon. Thank you Bogdan
[17 Nov 2017 4:07]
Xiaoyu Wang
patch for solving high CPU usage (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: high_CPU_occupancy_rate.patch (application/octet-stream, text), 6.68 KiB.
[18 Dec 2017 8:00]
Allen Lai
Hi Xiaoyu, I'm trying to merge your patch to InnoDB code. And trying to verify this fix. But I have question, that is why you need to modify function buf_get_n_pending_read_ios? Seems your modification will cause the return value to be different. BTW, would you please give us your contact information? Just for efficient communication. Thanks, Allen.
[18 Dec 2017 8:46]
Xiaoyu Wang
Hi Allen, Thanks for your replying. We find that function buf_get_n_pending_read_ios is only called in if-clauses, and we care about whether the return value is 0 or not. So, whether the return value is 1, 2 or 3 does not matter. That's why we do not need to calculate the total number of n_pend_reads, as well as the reason why using a bitmap is acceptable. Please contact me through the email 934109129@qq.com