Bug #110652 | InnoDB:trx hangs due to wrong trx->in_innodb value | ||
---|---|---|---|
Submitted: | 11 Apr 2023 7:58 | Modified: | 23 Feb 18:18 |
Reporter: | Shaohua Wang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 8.0.28 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[11 Apr 2023 7:58]
Shaohua Wang
[11 Apr 2023 8:02]
Shaohua Wang
Contributed patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: trx_hang.patch (application/octet-stream, text), 722 bytes.
[11 Apr 2023 8:32]
MySQL Verification Team
Hello Shaohua, Thank you for the report and contribution. regards, Umesh
[12 Apr 2023 8:08]
Jakub Lopuszanski
Hello Shaohua Wang, > In TrxFactory, we doesn't initialize in_innodb & in_depth to 0. That's true, but have you seen the Pool's constructor? ``` Pool(size_t size) : m_end(), m_start(), m_size(size), m_last() { ut_a(size >= sizeof(Element)); m_lock_strategy.create(); ut_a(m_start == nullptr); m_start = reinterpret_cast<Element *>( ut::zalloc_withkey(UT_NEW_THIS_FILE_PSI_KEY, m_size)); m_last = m_start; m_end = &m_start[m_size / sizeof(*m_start)]; /* Note: Initialise only a small subset, even though we have allocated all the memory. This is required only because PFS (MTR) results change if we instantiate too many mutexes up front. */ init(std::min(size_t(16), size_t(m_end - m_start))); ut_ad(m_pqueue.size() <= size_t(m_last - m_start)); } ``` it seems to use zalloc, so at least initially all the objects should be zeroed. So, if they later become non-zero, then there must be some bug somewhere else, I think. I think adding the assertions to `trx_free` as you propose may help find such places. Perhaps we should add it to `trx_validate_state_before_free`. Also, I see `trx_free_prepared_or_active_recovered` doesn't call `trx_validate_state_before_free` before calling `trx_free` - perhaps it should? Also there's [Bug#29871880 TRX_T::IN_INNODB SHOULD BE PROTECTED FROM CONCURRENT ACCESS](https://mybug.mysql.oraclecorp.com/orabugs/site/bug.php?id=29871880) which is perhaps related to all this?
[12 Apr 2023 9:17]
Shaohua Wang
Just found one bug by the added asserts. static xa_status_code innobase_rollback_by_xid( handlerton *hton, /*!< in: InnoDB handlerton */ XID *xid) /*!< in: X/Open XA transaction identification */ { assert(hton == innodb_hton_ptr); trx_t *trx = trx_get_trx_by_xid(xid); if (trx != nullptr) { TrxInInnoDB trx_in_innodb(trx); int ret = innobase_rollback_trx(trx); trx_deregister_from_2pc(trx); ut_ad(!trx->will_lock); # trx is returned to trx pools before calling TrxInInnoDB::exit(), # it causes concurrent access to trx->in_depth which is not allowed. trx_free_for_background(trx); return (ret != 0 ? XAER_RMERR : XA_OK); } else { return (XAER_NOTA); } }
[12 Apr 2023 9:19]
Shaohua Wang
Contributed patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: trx_hang_v2.patch (application/octet-stream, text), 1.81 KiB.
[23 Oct 2023 7:51]
Shaohua Wang
Contributed patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: trx_hang_v3.patch (application/octet-stream, text), 3.75 KiB.
[23 Oct 2023 11:59]
Jakub Lopuszanski
Hello Shaohua Wang, Thank you very much for analysis and a patch! I am puzzled by the cases like this one: ``` $3 = 1254150 $4 = 4294967295 $5 = 12869799 $6 = 4294967295 ``` where the `in_innodb` doesn't seem like anything that could result (in simple manner) from ++/--/&/| operations which we can find in code for `in_innodb` even assuming data races (unless we consider some very creative optimizations by the compiler such as trying to spill a register to in_innodb field under assumption that no other thread should be using it anyway). Therefore I have a question to you: how exactly were these values obtained? (The `$` seem to suggest you've used `gdb`, and the comment in one of the patches contains a sentence "There are many invalid values in the same core file." which further supports the hypothesis that you've used `gdb` on an already crashed system, but how exactly did you find the `trx_t` objects to iterate over?)
[3 Nov 2023 6:57]
Shaohua Wang
Hi Jacub, I defined a gdb function to print these infos. call it in a frame where there is a valid trx object. print_trx_in_innodb trx https://sourceware.org/gdb/onlinedocs/gdb/Define.html define print_trx_in_innodb set $trx=$arg0 while ($trx) if ($trx->in_innodb > 1) print $trx->in_innodb print $trx->in_depth end set $trx = $trx->mysql_trx_list->next end set $trx=$arg0 while ($trx) if ($trx->in_innodb > 1) print $trx->in_innodb print $trx->in_depth end set $trx = $trx->mysql_trx_list->prev end end
[6 Nov 2023 15:32]
Jakub Lopuszanski
Thank you for sharing the script! Also I've read your email, thanks, let me summarise it here, so that all relevant info is in this bug tracker. One of your earlier comments contained: $3 = 1254150 $4 = 4294967295 $5 = 12869799 $6 = 4294967295 where I was puzzled not so much by 4294967295 (which I interpreted to be -1 mod 2^32, which is easy to explain as a result of a single race condition), but rather by the 12869799, which seemed to require the race condition to occur at least 12 million times. Based on your comment "print trx->in_innodb/in_depth, too many invalid values!!!" and the script you've shared, I assume the first number in each pair is `in_innodb`. In your email you've explained how such a large value of `in_innodb` can occur: All it takes is that a race condition happens just once, so that in_depth becomes -1. Because from this point onwards each enter() will do: ++trx->in_depth; // becomes 0 if (trx->in_depth > 1) { // 0 is not larger than 1 return; } ++trx->in_innodb; // gets larger and larger and exit() will do just: --trx->in_depth; // becomes -1 again if (trx->in_depth > 0) { // -1 == 4294967295 is larger than 0 return; } Thus, the in_innodb will be incremented with each enter()+exit() pair, even if no more data races occur. Thank you for this analysis!
[7 Nov 2023 10:37]
Shaohua Wang
Hi Jakub, agree your analysis for different in_innodb values. That's great! One more debugging information, for the hanging trx: trx->version = 665826
[9 Nov 2023 9:10]
MySQL Verification Team
see also: https://bugs.mysql.com/bug.php?id=91078
[23 Feb 18:18]
Philip Olson
Posted by developer: Thank you, 8.0.37 release note added.
[23 Feb 18:19]
Philip Olson
Posted by developer: Fixed as of the upcoming MySQL Server 8.0.37 and 8.4.0 releases, and here's the proposed changelog entry from the documentation team: A trx would unexpectedly halt after encountering an incorrect trx->in_innodb value. Our thanks to Shaohua Wang for the contribution. Thank you for the bug report.