Bug #98974 | InnoDB temp table could hurt InnoDB perf badly | ||
---|---|---|---|
Submitted: | 17 Mar 2020 14:30 | Modified: | 23 Apr 2020 17:43 |
Reporter: | Fungo Wang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 8.0.19, 5.7.29 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | checkpoint, flush_list_mutex, page cleaner, temp |
[17 Mar 2020 14:30]
Fungo Wang
[18 Mar 2020 13:51]
MySQL Verification Team
Hi Mr. Wang, Thank you for your bug report. However, this is clearly not a bug. Because, what if there are none temporary tables ??? Then there is no additional time taken to hold that mutex. At best, this is a feature request. But, we would need a well defined feature request. Your idea of having pages of temporary tables held separately is not good enough. Simply, buffer pool pages are allocated based on the LRU algorithm. The only solution to the problem is the introduction of a new variable, which would define number of instances dedicated to temporary tables, with a minimum being one instance. If you have any better idea for this feature request, let us know.
[18 Mar 2020 14:26]
Fungo Wang
Hi Sinisa, > However, this is clearly not a bug. Because, what if there are none temporary tables ??? Then there is no additional time taken to hold that mutex. This bug is dedicated to InnoDB temp table. You can not stop customers from using temp table, as long as large temp table is used, the bug could be triggered. So I don't agree with you * this is clearly not a bug*, it is. :( > At best, this is a feature request. But, we would need a well defined feature request. Your idea of having pages of temporary tables held separately is not good enough. Simply, buffer pool pages are allocated based on the LRU algorithm. I have reported several bugs about InnoDB tmp table, and I also find some other bugs related to temptable recently (I'll report later). Filing feature request to improve temptable is in my plan, not yet in this bug report. > The only solution to the problem is the introduction of a new variable, which would define number of instances dedicated to temporary tables, with a minimum being one instance. The idea is to manage temptable pages separately, how to implement this deserve a detail design. I don't think this is *the only solution*, to create a new temp table buffer pool... We could, such as, maintain a dedicated LRU_tmp and FLUSH_list_tmp in the original buf pool instance.
[18 Mar 2020 14:57]
MySQL Verification Team
Hi, This is definitely a feature request. However, in order to verify it, we need this feature request to be well defined. We do not need design, but a precise definition of the new feature. We do not think that you have yet provided one. However, we would be happy if you would provide us with one. Thanks in advance.
[18 Mar 2020 15:13]
Fungo Wang
Hi Sinisa, I still think this report is perf bug about InnoDB temp table, and sincerely suggest you reconsider it. About the feature request to handle temp tablespace pages separately, I'll file a new one later. Thank you.
[18 Mar 2020 15:18]
MySQL Verification Team
Hi, I do not think that this report can be verified as a performance bug, simply because the behaviour conforms to the current design. Hence, we could consider only a feature request, for which you are filing a separate report.
[19 Mar 2020 6:03]
Jimmy Yang
Sinisa, this actually represents a set of problems we were experiencing online involving temp table. Another serious one is https://bugs.mysql.com/bug.php?id=98869. In both cases, the impact on buffer pool lru or flush_list from temp table pages has caused serious and hard to understand jitter problem on the system. We are asserting these are bugs because they have cause real pain to MySQL customers as well as DBAs to figure out the cause of these performance jitter. And in the 98869 case, we have to fall back to MyISAM temp table (we were still on 8.0.13), otherwise the system fluctuate so much, users won’t be able to tolerate . So these are real usable problems, not feature. And possible solution is to separate lru or flush list handling for temp tablespace pages, so it won’t affect user tables, or at least check the loop while holding flush list mutex in uf_pool_get_oldest_modification_approx()
[19 Mar 2020 13:30]
MySQL Verification Team
Hi Jimmy, A bug on the performance of the new temporary tables already exists and is verified some time ago. And it was reported by the company in which you are employed. Regarding this bug, reporter has promised a properly defined feature request. When that one is verified, it will cover all necessary remaining issues.
[19 Mar 2020 14:26]
Fungo Wang
Hi, The bug #98869, if it's the bug you mentioned already exists, affects only 8.0, it's a regression perf bug for 8.0. > "I believe this regression was introuced by WL#11613, which aimed to reclaim disk space used by temporary table." As I already pointed out in that bug report. But this one also affects 5.7 if you check the func `buf_pool_get_oldest_modification()` in 5.7 ``` 410lsn_t 411buf_pool_get_oldest_modification(void) 412/*==================================*/ 413{ 414 lsn_t lsn = 0; 415 lsn_t oldest_lsn = 0; 416 417 /* When we traverse all the flush lists we don't want another 418 thread to add a dirty page to any flush list. */ 419 log_flush_order_mutex_enter(); 420 421 for (ulint i = 0; i < srv_buf_pool_instances; i++) { 422 buf_pool_t* buf_pool; 423 424 buf_pool = buf_pool_from_array(i); 425 426 buf_flush_list_mutex_enter(buf_pool); 427 428 buf_page_t* bpage; 429 430 /* We don't let log-checkpoint halt because pages from system 431 temporary are not yet flushed to the disk. Anyway, object 432 residing in system temporary doesn't generate REDO logging. */ 433 for (bpage = UT_LIST_GET_LAST(buf_pool->flush_list); 434 bpage != NULL 435 && fsp_is_system_temporary(bpage->id.space()); 436 bpage = UT_LIST_GET_PREV(list, bpage)) { 437 /* Do nothing. */ 438 } ``` Also they are different bugs from the customer perspective, as they triggered by different usage/scenario. From the developer perspective, they are also different code paths. This bug shows another view of the defect of InnoDB temp table. Which make the improvement/enhancement more urgent... So I don't know why this bug did not get verified as a perf bug.
[19 Mar 2020 14:38]
MySQL Verification Team
HI Mr. Wang, The measurements that you have provided are for 8.0, not 5.7. If you would come with a benchmark for 5.7 with a heavy performance impact, which we could repeat, then this could be a performance bug that can be verified.
[20 Mar 2020 7:16]
Fungo Wang
Hi, 1. I think the code logic of buf_pool_get_oldest_modification() in 5.7, which I have posted, shows the issue apparently, the loop logic about temp table space (which is the culprit) is same with 8.0, you can do analysis. 2. The repro steps I provided also applies to 5.7, you can try to repro this in 5.7, and 8.0. 3. It is a perf bug in 8.0 anyway, no matter whether it applies to 5.7 or not. And I still don't know why it did not get verified at the first place :(
[20 Mar 2020 7:33]
Fungo Wang
I have just ran the repro steps, in case you won't bother to do it :) Using the same repro steps, and this is the result for 5.7.29. 1. innodb_buffer_pool_size = 256G 2. Create and fill about 50G temp table with configuration | innodb_io_capacity | 100 | | innodb_io_capacity_max | 100 | 3. Then run a sysbench oltp_rw workload [ 50s ] thds: 126 tps: 13217.53 qps: 263457.72 (r/w/o: 184624.48/52399.18/26434.06) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 [ 51s ] thds: 126 tps: 11551.90 qps: 231128.97 (r/w/o: 161688.58/46338.59/23101.80) lat (ms,95%): 16.41 err/s: 0.00 reconn/s: 0.00 [ 52s ] thds: 126 tps: 8823.84 qps: 176627.81 (r/w/o: 123727.76/35250.36/17649.68) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00 [ 53s ] thds: 126 tps: 12007.16 qps: 240238.28 (r/w/o: 168118.30/48106.66/24013.33) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00 [ 54s ] thds: 126 tps: 11842.77 qps: 237035.38 (r/w/o: 165890.77/47457.08/23687.54) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 [ 55s ] thds: 126 tps: 10420.60 qps: 208067.03 (r/w/o: 145663.42/41565.41/20838.20) lat (ms,95%): 17.63 err/s: 0.00 reconn/s: 0.00 [ 56s ] thds: 126 tps: 13342.07 qps: 266797.34 (r/w/o: 186852.94/53257.27/26687.13) lat (ms,95%): 15.55 err/s: 0.00 reconn/s: 0.00 [ 57s ] thds: 126 tps: 13366.68 qps: 267270.60 (r/w/o: 187004.52/53533.72/26732.36) lat (ms,95%): 15.55 err/s: 0.00 reconn/s: 0.00 [ 58s ] thds: 126 tps: 10809.26 qps: 215986.19 (r/w/o: 151211.63/43159.04/21615.52) lat (ms,95%): 16.41 err/s: 0.00 reconn/s: 0.00 [ 59s ] thds: 126 tps: 8389.98 qps: 168364.67 (r/w/o: 117820.77/33760.93/16782.97) lat (ms,95%): 17.32 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 126 tps: 10673.95 qps: 213192.02 (r/w/o: 149322.31/42521.80/21347.90) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 [ 61s ] thds: 126 tps: 11027.06 qps: 220392.16 (r/w/o: 154221.81/44116.23/22054.12) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00 You can see the fluctuation of QPS numbers. The reason it's much better than 8.0 is buf_pool_get_oldest_modification() is called at a much low frequency than 8.0 (which means another regression point in 8.0). the perf result during run above sysbench: - 0.66% mysqld mysqld [.] buf_pool_get_oldest_modification - buf_pool_get_oldest_modification - 88.86% buf_flush_page_cleaner_coordinator start_thread - 11.14% log_checkpoint srv_master_thread start_thread
[20 Mar 2020 7:45]
Satya Bodapati
This is definitely known issue. Whatever you call it 'bug' or 'feature request'. "not a bug" is not acceptable. Dedicated temporary buffer pool instance -> not good idea. All temp tables will use same instance and there will be contention Dedicated temporary buffer pools -> If user has large spare memory, good idea. Easy to implement. Dedicated temporary buffer pools with auto-resize. Start with lower size and resize based on needs. Downsizing algorithm could be little complex.. May be Separate lists for temp pages in each buffer pool -> Best interms of memory usage but complex solution.
[20 Mar 2020 7:47]
Fungo Wang
To simulate the frequent calling of buf_pool_get_oldest_modification() behaviour, which is triggered by log_checkpoint background thread in 8.0. I create a sysbench lua script as below, to call "SHOW ENGINE INNODB STATUS" repeatedly, which will call buf_pool_get_oldest_modification(). ``` cat show_engine.lua #!/usr/bin/env sysbench require("oltp_common") function prepare_statements() end function event() con:query("SHOW ENGINE INNODB STATUS") end ``` As the same time run oltp_rw workload, also run this show_engine script with 1 thread. [ 25s ] thds: 126 tps: 167.00 qps: 2908.02 (r/w/o: 2330.02/244.00/334.00) lat (ms,95%): 2680.11 err/s: 0.00 reconn/s: 0.00 [ 26s ] thds: 126 tps: 24.00 qps: 704.00 (r/w/o: 344.00/312.00/48.00) lat (ms,95%): 1618.78 err/s: 0.00 reconn/s: 0.00 [ 27s ] thds: 126 tps: 37.00 qps: 818.00 (r/w/o: 518.00/226.00/74.00) lat (ms,95%): 2449.36 err/s: 0.00 reconn/s: 0.00 [ 28s ] thds: 126 tps: 65.00 qps: 1125.00 (r/w/o: 862.00/133.00/130.00) lat (ms,95%): 3639.94 err/s: 0.00 reconn/s: 0.00 [ 29s ] thds: 126 tps: 86.00 qps: 1780.01 (r/w/o: 1243.01/365.00/172.00) lat (ms,95%): 4128.91 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 126 tps: 10.00 qps: 397.00 (r/w/o: 149.00/228.00/20.00) lat (ms,95%): 2045.74 err/s: 0.00 reconn/s: 0.00 [ 31s ] thds: 126 tps: 149.00 qps: 2554.01 (r/w/o: 2049.01/207.00/298.00) lat (ms,95%): 2632.28 err/s: 0.00 reconn/s: 0.00 [ 32s ] thds: 126 tps: 5.00 qps: 399.00 (r/w/o: 106.00/283.00/10.00) lat (ms,95%): 787.74 err/s: 0.00 reconn/s: 0.00 [ 33s ] thds: 126 tps: 0.00 qps: 150.00 (r/w/o: 1.00/149.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 34s ] thds: 126 tps: 145.00 qps: 2187.00 (r/w/o: 1793.00/104.00/290.00) lat (ms,95%): 3448.53 err/s: 0.00 reconn/s: 0.00 [ 35s ] thds: 126 tps: 28.00 qps: 1114.99 (r/w/o: 628.99/430.00/56.00) lat (ms,95%): 204.11 err/s: 0.00 reconn/s: 0.00 [ 36s ] thds: 126 tps: 121.00 qps: 2493.01 (r/w/o: 1685.00/566.00/242.00) lat (ms,95%): 1235.62 err/s: 0.00 reconn/s: 0.00 Now the numbers are pretty close to 8.0. The perf result: - 71.61% mysqld mysqld [.] buf_pool_get_oldest_modification - buf_pool_get_oldest_modification - 80.37% log_print srv_printf_innodb_monitor innobase_show_status ha_show_status mysql_execute_command mysql_parse dispatch_command do_command handle_connection pfs_spawn_thread start_thread - 17.74% buf_flush_page_cleaner_coordinator start_thread - 1.88% log_checkpoint srv_master_thread start_thread
[20 Mar 2020 7:48]
Satya Bodapati
Another idea if user don't care so much about rollback, May be expose Temptable engine to users.(It is used for internal optimizer tables already) CREATE TEMPORARY TABLE t1(a INT) ENGINE=TEMPTABLE?
[20 Mar 2020 8:34]
Sunny Bains
This is a known issue. It will be fixed in due time.
[20 Mar 2020 13:06]
MySQL Verification Team
Hi Mr. Wang, I have run your LUA script on latest 5.7 and got similar QPS variation. Verified as a perforance issue.
[22 Mar 2020 9:24]
Fungo Wang
I filed bug #99020 for the feature request.
[23 Apr 2020 17:43]
Daniel Price
Posted by developer: Fixed as of the upcoming 8.0.22 release, and here's the proposed changelog entry from the documentation team: Numerous system temporary table pages at the tail of the buffer pool flush list caused a performance degradation. The flush_list_mutex was held while the flush list scan traversed over system temporary table pages. The flush list scan now excludes system temporary table pages.
[24 Apr 2020 12:32]
MySQL Verification Team
Thank you, Daniel.
[24 Apr 2020 12:49]
Christos Chatzaras
So this issue doesn't affect 5.7?
[9 Apr 2021 15:44]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.7.35, 8.0.22 releases, and here's the proposed changelog entry from the documentation team: Numerous system temporary table pages at the tail of the buffer pool flush list caused a performance degradation. The flush_list_mutex was held while the flush list scan traversed over system temporary table pages. The flush list scan now excludes system temporary table pages.
[12 Apr 2021 12:43]
MySQL Verification Team
Thank you, Daniel ......