Bug #110652 InnoDB:trx hangs due to wrong trx->in_innodb value
Submitted: 11 Apr 2023 7:58 Modified: 23 Feb 18:18
Reporter: Shaohua Wang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[11 Apr 2023 7:58] Shaohua Wang
Description:
Fail to kill a session:
| 1428539 | ***            | *.*.*.*:3306 | ***           | Killed           | 1863872 | updating                                                      | DELETE FROM sc

Stack trace:
#0  0x00007fe181a42e8d in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000023651f5 in sleep_for<long, std::ratio<1, 1000000> > (__rtime=...)
    at /opt/rh/devtoolset-7/root/usr/include/c++/7/thread:373
#2  wait (trx=<optimized out>)
    at storage/innobase/include/trx0trx.h:1517
#3  TrxInInnoDB::enter (trx=0x7fe17a00ce18, disable=<optimized out>)
    at storage/innobase/include/trx0trx.h:1447
#4  0x000000000236549b in TrxInInnoDB (disable=false, trx=<optimized out>,
    this=0x7fd4250af6f8)
    at storage/innobase/include/trx0trx.h:1358
#5  ha_innobase::unlock_row (this=0x7fd3f5430828)
    at storage/innobase/handler/ha_innodb.cc:9751
#6  0x000000000102651f in Sql_cmd_delete::delete_from_single_table (
    this=this@entry=0x7fd3f9cc31f8, thd=thd@entry=0x7fd3d5b95800)
    at sql/sql_delete.cc:536
#7  0x0000000001026a9b in Sql_cmd_delete::execute_inner (this=0x7fd3f9cc31f8,
    thd=0x7fd3d5b95800)
    at sql/sql_delete.cc:881
#8  0x00000000010cb610 in Sql_cmd_dml::execute (this=0x7fd3f9cc31f8, thd=0x7fd3d5b95800)
    at sql/sql_select.cc:725
#9  0x00000000010702f5 in mysql_execute_command (thd=thd@entry=0x7fd3d5b95800,
    first_level=first_level@entry=true)
    at sql/sql_parse.cc:4471
#10 0x0000000001073f5f in mysql_parse (thd=thd@entry=0x7fd3d5b95800,
    parser_state=parser_state@entry=0x7fd4250b17e0)
    at sql/sql_parse.cc:5122
#11 0x000000000107593b in dispatch_command (thd=thd@entry=0x7fd3d5b95800,
    com_data=com_data@entry=0x7fe17cc9b0b0, command=command@entry=COM_QUERY)
    at sql/sql_parse.cc:1844

(gdb) p trx->in_innodb
$2 = 4294967295
(gdb) p trx->lock.was_chosen_as_deadlock_victim
$1 = false

print trx->in_innodb/in_depth, too many invalid values!!!
$1 = 4294967295 $2 = 1
$3 = 1254150 $4 = 4294967295
$5 = 12869799 $6 = 4294967295
$7 = 4294967295 $8 = 1  --duplicated
$9 = 3635408 $10 = 4294967295
$11 = 6968179 $12 = 4294967295
$13 = 3761588 $14 = 0
$15 = 226010 $16 = 4294967295
$17 = 1257935 $18 = 4294967295
$19 = 3409511 $20 = 4294967295
$21 = 7230873 $22 = 0
$23 = 2222821 $24 = 4294967295
$25 = 6734274 $26 = 0
$27 = 2435286 $28 = 4294967295
$29 = 58104809 $30 = 4294967293
$31 = 16879690 $32 = 4294967294
$33 = 16715155 $34 = 4294967294
$35 = 207950 $36 = 0
$37 = 1066851 $38 = 4294967295
$39 = 758 $40 = 0
$41 = 51600504 $42 = 4294967294
$43 = 40931056 $44 = 4294967294
$45 = 5591301 $46 = 4294967294

How to repeat:
not yet

Suggested fix:
Analysis:
In TrxFactory, we doesn't initialize in_innodb & in_depth to 0. In release build, they can be any random value. If in_innodb has a invalid value with TRX_FORCE_ROLLBACK flag, the trx will hang(TrxInInnoDB::wait() for ever).

another possible cause is that TrxInInnoDB::enter() and TrxInInnoDB::exit() has not called in pair in some edge cases. Maybe call TrxInInnoDB::exit() without calling TrxInInnoDB::enter(). No clues are found when checking the code carefully.

Solution:
1. initialize initialize in_innodb & in_depth to 0 when put trx into trx pool.
2. assert in_inodb & in_depth when return trx into trx pool.
[11 Apr 2023 8:02] Shaohua Wang
Contributed patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: trx_hang.patch (application/octet-stream, text), 722 bytes.

[11 Apr 2023 8:32] MySQL Verification Team
Hello Shaohua,

Thank you for the report and contribution.

regards,
Umesh
[12 Apr 2023 8:08] Jakub Lopuszanski
Hello Shaohua Wang,
> In TrxFactory, we doesn't initialize in_innodb & in_depth to 0.

That's true, but have you seen the Pool's constructor?
```
  Pool(size_t size) : m_end(), m_start(), m_size(size), m_last() {
    ut_a(size >= sizeof(Element));

    m_lock_strategy.create();

    ut_a(m_start == nullptr);

    m_start = reinterpret_cast<Element *>(
        ut::zalloc_withkey(UT_NEW_THIS_FILE_PSI_KEY, m_size));

    m_last = m_start;

    m_end = &m_start[m_size / sizeof(*m_start)];

    /* Note: Initialise only a small subset, even though we have
    allocated all the memory. This is required only because PFS
    (MTR) results change if we instantiate too many mutexes up
    front. */

    init(std::min(size_t(16), size_t(m_end - m_start)));

    ut_ad(m_pqueue.size() <= size_t(m_last - m_start));
  }
```
it seems to use zalloc, so at least initially all the objects should be zeroed.

So, if they later become non-zero, then there must be some bug somewhere else, I think.
I think adding the assertions to `trx_free` as you propose may help find such places.
Perhaps we should add it to `trx_validate_state_before_free`.
Also, I see `trx_free_prepared_or_active_recovered` doesn't call `trx_validate_state_before_free` before calling `trx_free` - perhaps it should?
Also there's 
[Bug#29871880 TRX_T::IN_INNODB SHOULD BE PROTECTED FROM CONCURRENT ACCESS](https://mybug.mysql.oraclecorp.com/orabugs/site/bug.php?id=29871880)
which is perhaps related to all this?
[12 Apr 2023 9:17] Shaohua Wang
Just found one bug by the added asserts.

static xa_status_code innobase_rollback_by_xid(
    handlerton *hton, /*!< in: InnoDB handlerton */
    XID *xid)         /*!< in: X/Open XA transaction
                      identification */
{
  assert(hton == innodb_hton_ptr);

  trx_t *trx = trx_get_trx_by_xid(xid);

  if (trx != nullptr) {
    TrxInInnoDB trx_in_innodb(trx);

    int ret = innobase_rollback_trx(trx);

    trx_deregister_from_2pc(trx);
    ut_ad(!trx->will_lock);
    # trx is returned to trx pools before calling TrxInInnoDB::exit(),
    # it causes concurrent access to trx->in_depth which is not allowed.
    trx_free_for_background(trx);   

    return (ret != 0 ? XAER_RMERR : XA_OK);
  } else {
    return (XAER_NOTA);
  }
}
[12 Apr 2023 9:19] Shaohua Wang
Contributed patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: trx_hang_v2.patch (application/octet-stream, text), 1.81 KiB.

[23 Oct 2023 7:51] Shaohua Wang
Contributed patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: trx_hang_v3.patch (application/octet-stream, text), 3.75 KiB.

[23 Oct 2023 11:59] Jakub Lopuszanski
Hello Shaohua Wang,
Thank you very much for analysis and a patch!

I am puzzled by the cases like this one:
```
$3 = 1254150 $4 = 4294967295
$5 = 12869799 $6 = 4294967295
```
where the `in_innodb` doesn't seem like anything that could result (in simple manner) from ++/--/&/| operations which we can find in code for `in_innodb` even assuming data races (unless we consider some very creative optimizations by the compiler such as trying to spill a register to in_innodb field under assumption that no other thread should be using it anyway).

Therefore I have a question to you: how exactly were these values obtained?
(The `$` seem to suggest you've used `gdb`, and the comment in one of the patches contains a sentence "There are many invalid values in the same core file." which further supports the hypothesis that you've used `gdb` on an already crashed system, but how exactly did you find the `trx_t` objects to iterate over?)
[3 Nov 2023 6:57] Shaohua Wang
Hi Jacub, I defined a gdb function to print these infos.

call it in a frame where there is a valid trx object.
print_trx_in_innodb trx
https://sourceware.org/gdb/onlinedocs/gdb/Define.html

define print_trx_in_innodb
  set $trx=$arg0
  while ($trx)
    if ($trx->in_innodb > 1)
      print $trx->in_innodb
      print $trx->in_depth
    end
    set $trx = $trx->mysql_trx_list->next
  end

  set $trx=$arg0
  while ($trx)
    if ($trx->in_innodb > 1)
      print $trx->in_innodb
      print $trx->in_depth
    end
    set $trx = $trx->mysql_trx_list->prev
  end
end
[6 Nov 2023 15:32] Jakub Lopuszanski
Thank you for sharing the script!
Also I've read your email, thanks, let me summarise it here, so that all relevant info is in this bug tracker.

One of your earlier comments contained:

$3 = 1254150 $4 = 4294967295
$5 = 12869799 $6 = 4294967295

where I was puzzled not so much by 4294967295 (which I interpreted to be -1 mod 2^32, which is easy to explain as a result of a single race condition), but rather by the 12869799, 
which seemed to require the race condition to occur at least 12 million times.
Based on your comment "print trx->in_innodb/in_depth, too many invalid values!!!" and the script you've shared, I assume the first number in each pair is `in_innodb`.

In your email you've explained how such a large value of `in_innodb` can occur:
All it takes is that a race condition happens just once, so that in_depth becomes -1.
Because from this point onwards each enter() will do:

    ++trx->in_depth; // becomes 0

    if (trx->in_depth > 1) { // 0 is not larger than 1
      return;
    }

    ++trx->in_innodb; // gets larger and larger

and exit() will do just:

    --trx->in_depth; // becomes -1 again

    if (trx->in_depth > 0) { // -1 == 4294967295 is larger than 0
      return;
    }
Thus, the in_innodb will be incremented with each enter()+exit() pair, even if no more data races occur.

Thank you for this analysis!
[7 Nov 2023 10:37] Shaohua Wang
Hi Jakub, agree your analysis for different in_innodb values. That's great!

One more debugging information, for the hanging trx:
trx->version = 665826
[9 Nov 2023 9:10] MySQL Verification Team
see also:
 https://bugs.mysql.com/bug.php?id=91078
[23 Feb 18:18] Philip Olson
Posted by developer:
 
Thank you, 8.0.37 release note added.
[23 Feb 18:19] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Server 8.0.37 and 8.4.0 releases, and here's the proposed changelog entry from the documentation team:

A trx would unexpectedly halt after encountering an incorrect
trx->in_innodb value.

Our thanks to Shaohua Wang for the contribution.

Thank you for the bug report.