Bug #98869 | Temp ibt tablespace truncation at disconnection stuck InnoDB under large BP | ||
---|---|---|---|
Submitted: | 8 Mar 2020 15:46 | Modified: | 25 Nov 2020 18:25 |
Reporter: | Fungo Wang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 8.0.19 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | ibt, intrinsic, LRU_list_mutex, truncate |
[8 Mar 2020 15:46]
Fungo Wang
[8 Mar 2020 15:47]
Fungo Wang
This bug can also be illustrated in another perspective. Under idle work load, the cpu usage will bump to 100% at disconnection. 11:14:19 PM UID PID %usr %system %guest %CPU CPU Command 11:14:20 PM 60343 60409 1.00 0.00 0.00 1.00 18 mysqld 11:14:21 PM 60343 60409 0.00 0.00 0.00 0.00 2 mysqld 11:14:22 PM 60343 60409 1.00 0.00 0.00 1.00 18 mysqld 11:14:23 PM 60343 60409 0.00 0.00 0.00 0.00 2 mysqld 11:14:24 PM 60343 60409 1.00 0.00 0.00 1.00 18 mysqld 11:14:25 PM 60343 60409 1.00 0.00 0.00 1.00 2 mysqld 11:14:26 PM 60343 60409 52.00 1.00 0.00 53.00 20 mysqld // disconection 11:14:27 PM 60343 60409 100.00 0.00 0.00 100.00 4 mysqld 11:14:28 PM 60343 60409 101.00 0.00 0.00 101.00 4 mysqld 11:14:29 PM 60343 60409 99.00 0.00 0.00 99.00 4 mysqld 11:14:30 PM 60343 60409 101.00 0.00 0.00 101.00 4 mysqld 11:14:31 PM 60343 60409 57.00 0.00 0.00 57.00 4 mysqld 11:14:32 PM 60343 60409 0.00 0.00 0.00 0.00 2 mysqld 11:14:33 PM 60343 60409 1.00 0.00 0.00 1.00 18 mysqld 11:14:34 PM 60343 60409 0.00 1.00 0.00 1.00 18 mysqld And the on-cpu perf result shows this stack is the hottest. - 56.55% mysqld mysqld [.] buf_LRU_flush_or_remove_pages buf_LRU_flush_or_remove_pages Fil_shard::space_truncate ibt::Tablespace::truncate ibt::Tablespace_pool::free_ts innobase_close_connection closecon_handlerton plugin_foreach_with_mask plugin_foreach_with_mask THD::release_resources handle_connection pfs_spawn_thread start_thread
[9 Mar 2020 14:02]
MySQL Verification Team
Hi Mr. Wang, Thank you for your bug report. I do have only one question and it is related to your second method of observing this behaviour. What exactly do we have to do to observe the behaviour, except for having an idle server and we disconnect a client from the server. Also, is that output from the program `top` or some other ??? Waiting on your feedback.
[9 Mar 2020 14:50]
Fungo Wang
Hi Sinisa, It's my negligence, I should have write it more clearly. You need execute either 1. select count(*) from information_schema.processlist; // Intrinsic Temporary Table or 2. create a temporary innodb table explicitly. // External Temporary Table After that disconnection will repro. The cpu usage is the result of `pidstat -u -p <mysql pid> 1`. The stack is from `perf record -ag -p <pid>` and `perf report`.
[9 Mar 2020 15:27]
MySQL Verification Team
Hi Mr. Wang, After following your last instructions, I managed to repeat the behaviour, Verified as reported.
[19 Mar 2020 6:11]
Jimmy Yang
Also see https://bugs.mysql.com/bug.php?id=98974
[19 Mar 2020 13:32]
MySQL Verification Team
Thank you, Jimmy.
[20 Mar 2020 6:40]
Satya Bodapati
If AHI helps for user temporary tables (CREATE TEMPORARY TABLE) has to be evaluated. And the truncation of ibt tablespaces should follow the new undo style truncation. New undo style truncation is drop + create and uses new space_ids. AHI in general is an issue with dropping indexes, tables. See https://jira.percona.com/browse/PS-5769
[20 Mar 2020 6:43]
Satya Bodapati
fil_replace_tablespace() instead of fil_truncate_tablespace()
[20 Mar 2020 8:02]
Fungo Wang
Hi Satya, > And the truncation of ibt tablespaces should follow the new undo style truncation. New undo style truncation is drop + create and uses new space_ids. We also talked about this solution internally with my colleagues, but this may not work very well if there are many short connections (say 64K, as mysql can not support more than 64K connections, 100K is a fake number, check bug #98624 and #98911), the space id for ibt tables will be exhausted quickly. Check this in WL#11613, > A higher range of 400K space_ids are reserved for session temporary tablespaces. 400K is chosen because max_connections is 100K and we will reserve 4 space_ids per session. With this work, only 2 will be needed. We reserve more for some future purpose. So the total number of space_ids needed is 100K * 4 = 400K. Only 400K space id is reserved for ibt tablespaces.
[20 Mar 2020 9:26]
Satya Bodapati
@Fungo Wang I see that MySQL is limiting at 65k connections at MDL levels. When I implemented the feature, I choose 100K because it was the theoretical connection limit. But how IBT space_ids ran out quickly at 65K connections or less? This is bit surprising. Coming to the space_ids. Undo space_id reservation logic is bit different. They are just managed with 7bit space_id values (2^7 space_ids). There is space_id bank and groups. The IBT space_ids are just in-memory and the limit can be increased if required. BTW, why not thread-pool or proxysql or other techniques to reduce number of total connections to mysql?
[20 Mar 2020 13:09]
MySQL Verification Team
Thank you, Satya, for your comments. And I agree with. you completely. Especially on the last comment.
[20 Mar 2020 14:06]
Fungo Wang
Hi Satya, > I see that MySQL is limiting at 65k connections at MDL levels. When I implemented the feature, I choose 100K because it was the theoretical connection limit. What's "the theoretical connection limit" mean, this 100K number is based on ? > But how IBT space_ids ran out quickly at 65K connections or less? This is bit surprising. If we choose to drop + create, and use new space ids on creation. Each connection will use 2 space ids at max, and after disconnection the old space ids can not bee used, cause the potentail AHI issues, we don't know when it's safe to reuse. > Coming to the space_ids. Undo space_id reservation logic is bit different. They are just managed with 7bit space_id values (2^7 space_ids). > There is space_id bank and groups. Thanks for the info, I have not noticed this space_id logic before:) While checking the undo truncate logic, looks like it's same with ibt truncation, buf_remove_t is BUF_REMOVE_ALL_NO_WRITE, and LRU will be scanned. This is not needed, smells like a new bug :) 4343bool fil_replace_tablespace(space_id_t old_space_id, space_id_t new_space_id, 4344 page_no_t size_in_pages) { 4345 fil_space_t *space = fil_space_get(old_space_id); 4346 std::string space_name(space->name); 4347 std::string file_name(space->files.front().name); 4348 4349 /* Delete the old file and space object. */ 4350 dberr_t err = fil_delete_tablespace(old_space_id, BUF_REMOVE_ALL_NO_WRITE); > The IBT space_ids are just in-memory and the limit can be increased if required. This may not be an ideal option, as said above, short connections could quickly exhaust the range. > BTW, why not thread-pool or proxysql or other techniques to reduce number of total connections to mysql? Yeah, 3rd party products could help. I guess thread-pool is aimed to limit the the number of threads, not connections. Proxysql may help to solve connections issue, I'm not familiar with it...
[21 Mar 2020 3:12]
Satya Bodapati
"While checking the undo truncate logic, looks like it's same with ibt truncation, buf_remove_t is BUF_REMOVE_ALL_NO_WRITE, and LRU will be scanned. This is not needed, smells like a new bug :)" Yes Indeed, REMOVE_ALL_NO_WRITE is not necessary 4 row0mysql.cc row_drop_tablespace 4281 err = fil_delete_tablespace(space_id, BUF_REMOVE_FLUSH_NO_WRITE); 1 fil0fil.cc fil_replace_tablespace 4624 dberr_t err = fil_delete_tablespace(old_space_id, BUF_REMOVE_ALL_NO_WRITE); clearly no AHI exists for UNDO, so FLUH_NO_WRITE is sufficient.
[21 Mar 2020 3:17]
Satya Bodapati
RENAME TABLE also has same problem (ALL_NO_WRITE used)
[22 Mar 2020 9:28]
Fungo Wang
Hi Satya, I have just filed bug #99021 for the new bug we discussed here about BUF_REMOVE_ALL_NO_WRITE, if you are interested and may want to follow. Thanks!
[23 Mar 2020 13:56]
MySQL Verification Team
Thank you both for your contribution. I will attend to both of the new bugs.
[25 Nov 2020 18:25]
Daniel Price
Posted by developer: Fixed in 8.0.23 by WL14100. Bug numbers added to the 8.0.23 changelog entry for WL14100.
[30 Nov 2020 14:42]
MySQL Verification Team
Thank you, Daniel.
[20 Jan 2021 10:11]
WANG GUANGYOU
I check the implementaion. https://dev.mysql.com/worklog/task/?id=14100 Does it worth to check every page when read?
[20 Jan 2021 10:12]
WANG GUANGYOU
I check the implementaion. https://dev.mysql.com/worklog/task/?id=14100 Does it worth to check every page when read?
[20 Jan 2021 13:28]
MySQL Verification Team
HI Mr. Wang, It depends on the user. In your case we would say that it would be worth it.
[17 Dec 2021 10:54]
MySQL Verification Team
Bug #105926 marked as duplicate of this one