Bug #110797 log0files_capacity.cc * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity
Submitted: 25 Apr 2023 7:04 Modified: 20 Dec 2023 17:48
Reporter: Olivier Doucet Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0 OS:Linux
Assigned to: CPU Architecture:x86

[25 Apr 2023 7:04] Olivier Doucet
Description:
Starting a few days ago, a MySQL server started rebooting several times (5 to 50 times in just a few hours) with the following error : 

```
[ERROR] [MY-013183] [InnoDB] Assertion failure: log0files_capacity.cc:478:2 * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity thread 139797725161216
/usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x33a) [0x24a92ba]
/usr/sbin/mysqld() [0x233c2b8]
/usr/sbin/mysqld(Log_files_capacity::update(Log_files_dict const&, unsigned long, unsigned long)+0x4c) [0x233d76c]
/usr/sbin/mysqld() [0x2343a9d]
/usr/sbin/mysqld() [0x2344a40]
/usr/sbin/mysqld(log_files_governor(log_t*)+0xb8) [0x2344fd8]
/usr/sbin/mysqld(std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)(log_t*), log_t*> > >::_M_run()+0xce) [0x234e6ce]
```
Link to source code involved: https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/log/log0files_capacity.cc#...

I've never been out of disk space (100 GiB remaining on this volume), and don't know why this error popped only a few days ago (this server has been up for months).

The crash always happens at night, I guess when specific workload is performed (I do not have control on this workload). Then, MySQL is rebooting on a loop several times  then it stabilizes. Downtime is small (~ 3 seconds) has there is no log replay needed, but it is still a nuisance. 

The bug happened first on version 8.0.30, that's why I made an upgrade to latest version yesterday (8.0.32) but crashes still occur with the exact same error.
The same bug happens also on Percona Server 8.0.30 and 8.0.32, that's why I think this bug is upstream. 

Relevant config directives : 
```
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 8M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
max_binlog_cache_size = 32M
skip_log_bin=1
back_log = 8192
log_error_verbosity=3
max_binlog_size = 1G 
# innodb_redo_log_capacity 104857600 (default)
# innodb_redo_log_encrypt off (default)
```

MySQL is launched in a docker container with :
```
--net=host --ulimit nofile=32768:32768 --cap-add=sys_nice 
```

If I follow source code : 
```
soft_logical_capacity = soft_logical_capacity_for_hard(hard_logical_capacity);

# https://github.com/mysql/mysql-server/blob/e57893fef5b2739c59b8f354d199251514571f61/storag...
/** Computes soft logical capacity, that corresponds to the provided
  hard logical capacity of the redo log (@see hard_logical_capacity()).
  @param[in]  hard_logical_capacity  logical capacity for the log writer
  @return soft logical capacity */
  static lsn_t soft_logical_capacity_for_hard(lsn_t hard_logical_capacity);

# https://github.com/mysql/mysql-server/blob/e57893fef5b2739c59b8f354d199251514571f61/storag...
/** Hard limit for logical capacity of the redo log. This limitation includes
  "extra_writer_margin" that belongs to the log writer thread. The log writer
  does not to exceed this limit. If space isn't reclaimed after 1 sec wait,
  it writes only as much as possible or crashes the InnoDB.
  @note This value changes only during calls to @see update or @see initialize.
  @return limitation for free space in the redo log for the log writer thread */
  lsn_t hard_logical_capacity() const;

# https://github.com/mysql/mysql-server/blob/e57893fef5b2739c59b8f354d199251514571f61/storag...
constexpr uint32_t OS_FILE_LOG_BLOCK_SIZE = 512;

# So to me, error means soft_logical_capacity < 1024
```

How to repeat:
Unfortunately, I'm currently unable to repeat it as I do not know what triggers this bug. Some hints on this assertion may help.
[25 Apr 2023 11:40] MySQL Verification Team
Hi Mr. Doucet,

Thank you for your bug report.

However, what you reported looks like a duplicate of one of our internal bug reports. The following bug:

https://bugs.mysql.com/bug.php?id=110271

You can not access that report, since it is internal, but we have left a notice there that you are informed when the original bug is fixed. Since that a problem occurs during the attempted resize of the log files.

Duplicate.
[11 May 2023 11:33] Olivier Doucet
Is there a known workaround ? Like setting some specific parameter on log file size, ... ?
This bug triggers dozens of time every day.
[11 May 2023 12:36] MySQL Verification Team
Hi,

So far, it seems that the only workaround is to set all of the InnoDB-specific log file sizes to some large value, so that resizing never have to occur. This should be done when no other SQL statement is running.
[12 May 2023 13:59] Jakub Lopuszanski
Hello, Olivier Doucet!

Do you know if https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_redo_log_capa... was modified in this period?
This is a dynamic sysvar which one can modify using SET GLOBAL.
One way to check if somebody did that is to search the error log for an [INFO] message:
"User has set innodb_redo_log_capacity to"... 
(Unfortunately, [INFO] is a rather low severity, so would need https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_log_error_verb... set to 3)

I ask because if nobody was modifying the redo log capacity, then `Log_files_capacity::get_suggested_hard_logical_capacity(..)` should pick the `Log_resize_mode::NONE` case in which it should just return `hard_logical_capacity_for_physical(..)` which despite admittedly complicated logic asserts that returned value is at least LOG_WRITER_HARD_CAPACITY_MIN, which is ~5M:
```
  ut_a(LOG_WRITER_HARD_CAPACITY_MIN <= ret);

  return ret;
```
Then Log_files_capacity::update should pass this `hard_logical_capacity` to `Log_files_capacity::update_exposed`, which as you know then passes it to `soft_logical_capacity_for_hard` which basically returns 95% of that value (so still ~5M).

Hm, actually if there was an ongoing resize, we'd end up in Log_resize_mode::RESIZING_DOWN case for which results also is at least the one returned from `hard_logical_capacity_for_physical`, so again I don't see how anything smaller than 5M could be the result.

It looks like either there's some subtle type error which I don't see (something which should be 64-bit is 32-bit resulting in a wraparound?) or stack got corrupted ?

Are there any other messages in the error log?
[12 May 2023 14:17] Olivier Doucet
Hello,

innodb_redo_log_capacity is never updated dynamically on these servers. 

As this was an updated mysql server from 5.6 (through 5.7), we did not specify innodb_redo_log_capacity but used innodb_log_file_size instead.
That's why, on each boot, we have this warning: 
Deprecated configuration parameters innodb_log_file_size and/or innodb_log_files_in_group have been used to compute innodb_redo_log_capacity=1073741824. Please use innodb_redo_log_capacity instead.

The last 392 crashes (in 12 days !) on this specific server are: 
303 [MY-013183] [InnoDB] Assertion failure: log0files_capacity.cc:478:2 * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity
 60 [MY-013183] [InnoDB] Assertion failure: log0files_capacity.cc:320:LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD 
 29 [MY-013183] [InnoDB] Assertion failure: log0files_capacity.cc:329:LOG_WRITER_HARD_CAPACITY_MIN <= ret

We also have this ERROR but with no impact : 
Plugin mysqlx reported: 'Setup of socket: '/var/lib/mysql/mysqlx.sock' failed, can't create lock file

innodb_redo_log_capacity has changed yesterday as we increased innodb_log_file_size from 8M to 32M then 64M but it is still crashing. We updated it through "set global" + my.cnf update
This bug happens on three different mysql servers that are absolutely not related to each other (different customers), but all three have a very low workload (< 10 connections / minute).

All servers are x86_64 (docker container running on virtual machines)
[13 May 2023 15:46] Olivier Doucet
if it helps, here are the redo_log status less than one second before a crash : 

Innodb_redo_log_read_only       OFF
Innodb_redo_log_uuid    1075899837
Innodb_redo_log_checkpoint_lsn  764784025595
Innodb_redo_log_current_lsn     764784025595
Innodb_redo_log_flushed_to_disk_lsn     764784025595
Innodb_redo_log_logical_size    512
Innodb_redo_log_physical_size   33554432
Innodb_redo_log_capacity_resized        1073741824
Innodb_redo_log_resize_status   OK
Innodb_redo_log_enabled ON
Innodb_log_waits        0
Innodb_log_write_requests       2301
Innodb_log_writes       423
Innodb_os_log_fsyncs    422
Innodb_os_log_pending_fsyncs    0
Innodb_os_log_pending_writes    0
Innodb_os_log_written   259072
[15 May 2023 12:09] MySQL Verification Team
Hi,

The cause of your problems may be all due to the usage of the deprecated setting.

It would be wise to stop using that setting and instead change my.cnf so that innodb_redo_log_capacity is set explicitly. Set it an 1 Gb or 2 Gb or 4 Gb. You are the one who knows the sizes of transactions and the concurrency on your server. You should also check free space on your disk or partition or quota that you have available.

It is possible that usage of deprecated variable causes some of the asserts.  It is not advisable to ignore any warning that MySQL server issues. They are there for good reasons.
[15 May 2023 13:55] Jakub Lopuszanski
1. Do you have a core dump file?
--core-file --innodb_buffer_pool_in_core_file=OFF

If so, might be interesting to use `gdb` to extract as much info as possible from the global `log_sys` object, in particular from `log_sys->m_capacity`.

2. Could you please share stacktraces for the other two kinds of assertion failures?
[17 May 2023 7:08] Olivier Doucet
Here are the informations from a core dump : 

(see btfull.txt file)

(gdb) p log_sys->m_capacity
$5 = {m_exposed = {m_soft_logical_capacity = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 952928256}, static is_always_lock_free = true}, m_hard_logical_capacity = {<std::__atomic_base<unsigned long>> = {
        static _S_alignment = 8, _M_i = 1003082752}, static is_always_lock_free = true}, m_adaptive_flush_min_age = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 833811968}, static is_always_lock_free = true},
    m_adaptive_flush_max_age = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 893369856}, static is_always_lock_free = true}, m_agressive_checkpoint_min_age = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8,
        _M_i = 923148800}, static is_always_lock_free = true}}, m_current_physical_capacity = 1073741824, m_target_physical_capacity = 1073741824, m_resize_mode = Log_resize_mode::NONE}

(gdb) p OS_FILE_LOG_BLOCK_SIZE
$6 = 512

As a reminder, the error is : ut_a(2 * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity);
Here, if I understand correctly, we have: 
soft_logical_capacity =  952928256
hard_logical_capacity = 1003082752

I dont understand why ut_a() is triggered here.

Full log_sys structure:
(gdb) p log_sys
$12 = (log_t *) 0x7f8c3313ee80
(gdb) p *((log_t*) (0x7f8c3313ee80))
(see log_sys.txt file))

Tell me if you need any more informations.
[17 May 2023 7:08] Olivier Doucet
bt full from gdb

Attachment: btfull.txt (text/plain), 18.66 KiB.

[17 May 2023 7:08] Olivier Doucet
log_sys structure from gdb

Attachment: log_sys.txt (text/plain), 13.71 KiB.

[17 May 2023 12:03] MySQL Verification Team
Hi Mr. Doucet,

Is it still possible for you to upload the compressed core file ???

Also, you should let us know which binary did you exactly use ......
[17 May 2023 13:00] Olivier Doucet
Forcing innodb_redo_log_capacity=1024M does not solve the issue, but gives a different error: 

Assertion failure: log0files_capacity.cc:320:LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD 

	 /usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x33a) [0x24a92ba]
	 /usr/sbin/mysqld(Log_files_capacity::hard_logical_capacity_for_physical(unsigned long)+0x128) [0x233c608]
	 /usr/sbin/mysqld(Log_files_capacity::update(Log_files_dict const&, unsigned long, unsigned long)+0x41) [0x233d761]
[18 May 2023 6:22] Olivier Doucet
well for the stacktrace it depends, there was another ~ 30 crashes in a night and we still get the "old" message
Assertion failure: log0files_capacity.cc:478:2 * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity
[18 May 2023 12:40] MySQL Verification Team
Hi,

We do have couple of additional questions.

Are you using our binary for 8.0.33 ???

If don't, please do.

Next, that change lead to some new info, so we have to ask again about the free space on the partition where all log files are ....
[19 May 2023 11:09] Jakub Lopuszanski
Thank you for these details!
> As a reminder, the error is : ut_a(2 * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity);
> Here, if I understand correctly, we have: 
> soft_logical_capacity =  952928256
> hard_logical_capacity = 1003082752

Keep in mind that there are two separate things: 
1. The local variable named `soft_logical_capacity` which is compared to 2*OS_FILE_LOG_BLOCK_SIZE in the assert. It seems to be <optimized out>, so we don't really know what was its value at the moment of assertion failure. Or if it even is meaningful to talk about "its value" if it is optimized out.
2. The log_sys->m_capacity.m_exposed.m_soft_logical_capacity atomic variable, for which you see the value 952928256. But this is the "old" value assigned before this unlucky call - we know this because in this call we have not yet reached to the point of assigning to it because assertion has failed:
```
void Log_files_capacity::update_exposed(lsn_t hard_logical_capacity) {
  const lsn_t soft_logical_capacity =
      soft_logical_capacity_for_hard(hard_logical_capacity);
// we fail here:
  ut_a(2 * OS_FILE_LOG_BLOCK_SIZE <= soft_logical_capacity);

  m_exposed.m_hard_logical_capacity.store(hard_logical_capacity);
//thus this is not reached yet:
  m_exposed.m_soft_logical_capacity.store(soft_logical_capacity);
```

Similarly there's a local argument named `hard_logical_capacity` and a global atomic variable log_sys->m_capacity.m_exposed.m_hard_logical_capacity.
We see from gdb that the atomic's value was 1003082752, and from the stacktraces you've attached we see the same value being passed to the function:
```
#8  0x000000000233c2b8 in Log_files_capacity::update_exposed (this=this@entry=0x7f8c3313f5e0, hard_logical_capacity=1003082752) at /compile/storage/innobase/log/log0files_capacity.cc:478
```
So at least these two match, which suggest that this call doesn't differ much from one of the previous calls. This is intriguing then, why it fails only now.

Now, what would be interesting to see is assembly code used for `Log_files_capacity::soft_logical_capacity_for_hard(hard_logical_capacity)`.
From what I can see in https://godbolt.org/z/MP5deYo9q, modern compilers in general seem to optimize it pretty hard to use some clever xmm tricks.
Perhaps they make some mistake when doing to many tricks.
Perhaps we (developers) made a mistake with all this type juggling (lsn_t, double, ulint, uint64_t,..) and forgot about some edge case?

I don't know what compiler was used for your binary, nor how exactly it would optimize this call in this particular context.
Therefore, it would be great if you could share the output `gdb` gives for 
1. `disassemble Log_files_capacity::soft_logical_capacity_for_hard` and
2. `disassemble Log_files_capacity::update_exposed`
Hopefully this will provide some hints for what could be the culprit here.
I ask for these two separately, because this line:
```
        soft_logical_capacity = <optimized out>
```
suggests that the whole call to `soft_logical_capacity_for_hard` was inlined.

3. Could you please share how exactly did you obtain the `mysqld` binary?
[22 May 2023 9:45] Olivier Doucet
Here are the informations you asked for: 

(gdb) disassemble Log_files_capacity::soft_logical_capacity_for_hard
Dump of assembler code for function Log_files_capacity::soft_logical_capacity_for_hard(unsigned long):
   0x000000000233bff0 <+0>:     endbr64
   0x000000000233bff4 <+4>:     test   %rdi,%rdi
   0x000000000233bff7 <+7>:     js     0x233c078 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+136>
   0x000000000233bff9 <+9>:     pxor   %xmm0,%xmm0
   0x000000000233bffd <+13>:    cvtsi2sd %rdi,%xmm0
   0x000000000233c002 <+18>:    mulsd  0x1185f9e(%rip),%xmm0        # 0x34c1fa8
   0x000000000233c00a <+26>:    movsd  0x8d408e(%rip),%xmm1        # 0x2c100a0
   0x000000000233c012 <+34>:    movsd  0x8d403e(%rip),%xmm3        # 0x2c10058
   0x000000000233c01a <+42>:    movapd %xmm0,%xmm2
   0x000000000233c01e <+46>:    andpd  %xmm1,%xmm2
   0x000000000233c022 <+50>:    ucomisd %xmm2,%xmm3
   0x000000000233c026 <+54>:    jbe    0x233c042 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+82>
   0x000000000233c028 <+56>:    cvttsd2si %xmm0,%rax
   0x000000000233c02d <+61>:    pxor   %xmm2,%xmm2
   0x000000000233c031 <+65>:    andnpd %xmm0,%xmm1
   0x000000000233c035 <+69>:    cvtsi2sd %rax,%xmm2
   0x000000000233c03a <+74>:    orpd   %xmm1,%xmm2
   0x000000000233c03e <+78>:    movapd %xmm2,%xmm0
   0x000000000233c042 <+82>:    movsd  0x8d3fee(%rip),%xmm1        # 0x2c10038
   0x000000000233c04a <+90>:    comisd %xmm1,%xmm0
   0x000000000233c04e <+94>:    jae    0x233c060 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+112>
   0x000000000233c050 <+96>:    cvttsd2si %xmm0,%rax
   0x000000000233c055 <+101>:   and    $0xfffffffffffffe00,%rax
   0x000000000233c05b <+107>:   retq
   0x000000000233c05c <+108>:   nopl   0x0(%rax)
   0x000000000233c060 <+112>:   subsd  %xmm1,%xmm0
   0x000000000233c064 <+116>:   cvttsd2si %xmm0,%rax
   0x000000000233c069 <+121>:   btc    $0x3f,%rax
   0x000000000233c06e <+126>:   and    $0xfffffffffffffe00,%rax
   0x000000000233c074 <+132>:   retq
   0x000000000233c075 <+133>:   nopl   (%rax)
   0x000000000233c078 <+136>:   mov    %rdi,%rax
   0x000000000233c07b <+139>:   and    $0x1,%edi
   0x000000000233c07e <+142>:   pxor   %xmm0,%xmm0
   0x000000000233c082 <+146>:   shr    %rax
   0x000000000233c085 <+149>:   or     %rdi,%rax
   0x000000000233c088 <+152>:   cvtsi2sd %rax,%xmm0
   0x000000000233c08d <+157>:   addsd  %xmm0,%xmm0
   0x000000000233c091 <+161>:   jmpq   0x233c002 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+18>
End of assembler dump.

(gdb) disassemble Log_files_capacity::update_exposed
Dump of assembler code for function Log_files_capacity::update_exposed(unsigned long):
   0x000000000233c220 <+0>:     endbr64
   0x000000000233c224 <+4>:     push   %rbp
   0x000000000233c225 <+5>:     mov    %rsp,%rbp
   0x000000000233c228 <+8>:     push   %r13
   0x000000000233c22a <+10>:    mov    %rsi,%r13
   0x000000000233c22d <+13>:    push   %r12
   0x000000000233c22f <+15>:    mov    %rdi,%r12
   0x000000000233c232 <+18>:    mov    %rsi,%rdi
   0x000000000233c235 <+21>:    push   %rbx
   0x000000000233c236 <+22>:    sub    $0x8,%rsp
   0x000000000233c23a <+26>:    callq  0x233bff0 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)>
   0x000000000233c23f <+31>:    cmp    $0x3ff,%rax
   0x000000000233c245 <+37>:    jbe    0x233c2a0 <Log_files_capacity::update_exposed(unsigned long)+128>
   0x000000000233c247 <+39>:    mov    %rax,%rbx
   0x000000000233c24a <+42>:    xchg   %r13,0x8(%r12)
   0x000000000233c24f <+47>:    xchg   %rax,(%r12)
   0x000000000233c253 <+51>:    mov    %rbx,%rdi
   0x000000000233c256 <+54>:    callq  0x233c170 <Log_files_capacity::sync_flush_logical_capacity_for_soft(unsigned long)>
   0x000000000233c25b <+59>:    xchg   %rax,0x18(%r12)
   0x000000000233c260 <+64>:    mov    %rbx,%rdx
   0x000000000233c263 <+67>:    mov    %rbx,%rax
   0x000000000233c266 <+70>:    shr    $0x3,%rdx
   0x000000000233c26a <+74>:    sub    %rdx,%rax
   0x000000000233c26d <+77>:    and    $0xfffffffffffffe00,%rax
   0x000000000233c273 <+83>:    xchg   %rax,0x10(%r12)
   0x000000000233c278 <+88>:    mov    %rbx,%rax
   0x000000000233c27b <+91>:    shr    $0x5,%rax
   0x000000000233c27f <+95>:    sub    %rax,%rbx
   0x000000000233c282 <+98>:    and    $0xfffffffffffffe00,%rbx
   0x000000000233c289 <+105>:   xchg   %rbx,0x20(%r12)
   0x000000000233c28e <+110>:   add    $0x8,%rsp
   0x000000000233c292 <+114>:   pop    %rbx
   0x000000000233c293 <+115>:   pop    %r12
   0x000000000233c295 <+117>:   pop    %r13
   0x000000000233c297 <+119>:   pop    %rbp
   0x000000000233c298 <+120>:   retq
   0x000000000233c299 <+121>:   nopl   0x0(%rax)
   0x000000000233c2a0 <+128>:   mov    $0x1de,%edx
   0x000000000233c2a5 <+133>:   lea    0x122fa7c(%rip),%rsi        # 0x356bd28
   0x000000000233c2ac <+140>:   lea    0x122fb15(%rip),%rdi        # 0x356bdc8
   0x000000000233c2b3 <+147>:   callq  0x24a8f80 <ut_dbg_assertion_failed(char const*, char const*, unsigned long)>
End of assembler dump.
[22 May 2023 9:50] Olivier Doucet
And here is another error I sometimes get : 
LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD

#0  0x00007fa5c007ca35 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000001214c8d in handle_fatal_signal (sig=6) at /compile/sql/signal_handler.cc:252
#2  <signal handler called>
#3  0x00007fa5be42eaff in raise () from /lib64/libc.so.6
#4  0x00007fa5be401ea5 in abort () from /lib64/libc.so.6
#5  0x0000000000c9f07b in my_server_abort () at /compile/sql/signal_handler.cc:311
#6  0x000000000219b94e in my_abort () at /compile/mysys/my_init.cc:264
#7  0x00000000024a92ba in ut_dbg_assertion_failed (expr=expr@entry=0x356be88 "LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD",
    file=file@entry=0x356bd28 "/mnt/jenkins/compile/storage/innobase/log/log0files_capacity.cc", line=line@entry=320)
    at /compile/storage/innobase/ut/ut0dbg.cc:99
#8  0x000000000233c608 in Log_files_capacity::hard_logical_capacity_for_physical (physical_capacity=<optimized out>) at /compile/storage/innobase/log/log0files_capacity.cc:320
#9  0x000000000233c76e in Log_files_capacity::get_suggested_hard_logical_capacity (this=this@entry=0x7f9abe13f5e0, current_checkpoint_age=current_checkpoint_age@entry=0)
    at /compile/storage/innobase/log/log0files_capacity.cc:459
#10 0x000000000233d761 in Log_files_capacity::update (this=0x7f9abe13f5e0, files=..., current_logical_size=512, current_checkpoint_age=0) at /compile/storage/innobase/log/log0files_capacity.cc:201
#11 0x0000000002343a9d in log_files_update_capacity_limits (log=...) at /compile/storage/innobase/log/log0files_governor.cc:1117
#12 0x0000000002344a40 in log_files_governor_iteration_low (log=..., has_writer_mutex=<optimized out>) at /compile/storage/innobase/log/log0files_governor.cc:1233
#13 0x0000000002344fd8 in log_files_governor_iteration (log=...) at /compile/storage/innobase/log/log0files_governor.cc:1330
#14 log_files_governor (log_ptr=0x7f9abe13ee80) at /compile/storage/innobase/log/log0files_governor.cc:1371
#15 0x000000000234e6ce in std::__invoke_impl<void, void (*&)(log_t*), log_t*&> (__f=<synthetic pointer>: <optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:90
#16 std::__invoke<void (*&)(log_t*), log_t*&> (__fn=<synthetic pointer>: <optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96
#17 std::_Bind<void (*(log_t*))(log_t*)>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (__args=..., this=<synthetic pointer>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:420
#18 std::_Bind<void (*(log_t*))(log_t*)>::operator()<, void>() (this=<synthetic pointer>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/functional:503
#19 Detached_thread::operator()<void (*)(log_t*), log_t*> (f=<optimized out>, this=<optimized out>) at /compile/storage/innobase/include/os0thread-create.h:194
#20 std::__invoke_impl<void, Detached_thread, void (*)(log_t*), log_t*> (__f=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61
#21 std::__invoke<Detached_thread, void (*)(log_t*), log_t*> (__fn=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:96
#22 std::thread::_Invoker<std::tuple<Detached_thread, void (*)(log_t*), log_t*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_thread.h:253
#23 std::thread::_Invoker<std::tuple<Detached_thread, void (*)(log_t*), log_t*> >::operator() (this=<optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_thread.h:260
#24 std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)(log_t*), log_t*> > >::_M_run (this=0x7fa5ab071f40) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_thread.h:211
#25 0x00007fa5bee02ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#26 0x00007fa5c00751cf in start_thread () from /lib64/libpthread.so.0
#27 0x00007fa5be419e73 in clone () from /lib64/libc.so.6
[22 May 2023 10:18] Olivier Doucet
OS is centos8.
I guess the code was compiled with GCC 8.5.0 (rhel 8.7) and is using glibc 2.28
Also, note that mysqld binary is launched with jemalloc 3.6.0
[22 May 2023 12:50] MySQL Verification Team
First function is inlined and second function is NOT inlined.

Code seems to be OK.
[25 May 2023 7:54] Jakub Lopuszanski
Hello Olivier!

Thank you for all the valuable details!
I am yet to analyse the assembly (I am not an expert in it, so it will take some time).
Meanwhile I want to clarify:
1. Do I understand correctly that you are (or somebody on your behalf, but not Oracle) compiling mysqld from source? Of course you can do that and it should be a supported scenario. It just opens up a whole lot of new questions, like:
2. Is it an unaltered MySQL Server source? If not, what patches are you applying on top of our source? 
3. Which exact commit and from which repo was used?
[25 May 2023 8:19] Olivier Doucet
coredump was obtained with Percona Server 8.0.32 and this rpm exactly : 
http://197.155.77.15/percona/ps-80/yum/release/8.2/RPMS/x86_64/percona-server-server-8.0.3...

I was able to reproduce the crash several times with a vanilla MySQL Server, but all our internal tooling is based on PS that's why all further investigations were done with this version.

Tell me if you really need to obtain a new coredump with an official mysql binary, but it will need some time on our side.
[25 May 2023 12:03] MySQL Verification Team
Hi,

Would you be so kind and do all your investigations with our 8.0.33 binary for your OS, which is some Linux variant.

We simply do not want to hunt for bugs in some other sources nor in some other builds.

We support only our products.
[26 May 2023 6:21] Olivier Doucet
gdb session with mysql official binary

Attachment: btfull.txt (text/plain), 18.85 KiB.

[26 May 2023 6:37] Olivier Doucet
I totally understand your statement. I've switch to official MySQL docker image, that use https://repo.mysql.com/yum/mysql-8.0-community/docker/el/8/x86_64/mysql-community-server-m...

I cannot use 8.0.33 version as I need to switch back to Percona Server afterward, and 8.0.33 is not released yet.

gdb session available as attached file.

I've uploaded *private*mysql-bug-data-110797.zip coredump (3.1 GiB compressed to 80MiB) to sftp.oracle.com as mysql-bug-data-110797.zip
[26 May 2023 12:39] Jakub Lopuszanski
I wonder what is the content of MXCSR register and if it perhaps changes somehow over time?
[26 May 2023 15:13] Jakub Lopuszanski
This other assertion failure which you reported:
```
LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD
```

also looks very interesting given that the code is:
```
  static_assert(
      LOG_WRITER_HARD_CAPACITY_MIN + OVERHEAD + OS_FILE_LOG_BLOCK_SIZE +
              NEXT_FILE_EARLIER_MARGIN_FOR_LOG_CAPACITY_MIN <=
          LOG_CAPACITY_MIN * FREE_FILE_RATIO,
      "The minimum redo capacity should be considered good enough for "
      "innodb_thread_concurrency = 0 and 64k pages.");

  ut_a(LOG_CAPACITY_MIN <= physical_capacity);

  /* Combining these two assertions we get: */
  ut_a(LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <=
       physical_capacity * FREE_FILE_RATIO - OVERHEAD);
```
and indeed one can verify that from mathematical standpoint, it is "impossible" to pass through ut_a(LOG_CAPACITY_MIN <= physical_capacity) assert, and then fail on the next assert, if one takes into account the earlier `static_assert`:

LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <=
/* because NEXT_FILE_EARLIER_MARGIN_FOR_LOG_CAPACITY_MIN is non-negative */
LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE + NEXT_FILE_EARLIER_MARGIN_FOR_LOG_CAPACITY_MIN <=
/* because static_assert holds */
LOG_CAPACITY_MIN * FREE_FILE_RATIO - OVERHEAD <=
/* because the first ut_a(..) holds */
physical_capacity *FREE_FILE_RATIO - OVERHEAD

But taken together this is just the body of the second ut_a(..) which you see failing.

Looks like something very strange is going on with floating point arithmetic on your machine, or some miscompilation occurred, or I am missing something.

What CPU are you running it on?
What docker version?
What exact version of compiler was used?
> I guess the code was compiled with GCC 8.5.0 
then why do callstacks contain mentions of toolset 11?
> at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:90
[26 May 2023 15:20] Olivier Doucet
bt full from mysql-community-minimal-8.0.32-1.el8.x86_64

Attachment: btfull.txt (text/plain), 19.64 KiB.

[26 May 2023 15:31] Olivier Doucet
bt full : LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD

Attachment: btfull.txt (text/plain), 20.27 KiB.

[26 May 2023 15:34] Olivier Doucet
Let's ignore  first coredump and informations, obtained with Percona Server RPM, and focus on the new one obtained with MySQL official RPM mysql-community-minimal-8.0.32-1.el8.x86_64

I've uploaded coredump and bt full, so that you can reproduce and investigate easier.

From what I can see, it was compiled with gcc 11.

It is running on docker 20.10.6, on a virtual machine, using ovirt (libvirt 9.0.0-7).
Physical server under the hood is CentOS stream 9, kernel 5.14.0, with CPU Xeon(R) Gold 6130.

(gdb) disassemble Log_files_capacity::soft_logical_capacity_for_hard
Dump of assembler code for function Log_files_capacity::soft_logical_capacity_for_hard(unsigned long):
   0x00000000022d42b0 <+0>:     endbr64
   0x00000000022d42b4 <+4>:     test   %rdi,%rdi
   0x00000000022d42b7 <+7>:     js     0x22d4338 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+136>
   0x00000000022d42b9 <+9>:     pxor   %xmm0,%xmm0
   0x00000000022d42bd <+13>:    cvtsi2sd %rdi,%xmm0
   0x00000000022d42c2 <+18>:    mulsd  0x1168c3e(%rip),%xmm0        # 0x343cf08
   0x00000000022d42ca <+26>:    movsd  0x1022b6e(%rip),%xmm1        # 0x32f6e40
   0x00000000022d42d2 <+34>:    movsd  0x1022b86(%rip),%xmm3        # 0x32f6e60
   0x00000000022d42da <+42>:    movapd %xmm0,%xmm2
   0x00000000022d42de <+46>:    andpd  %xmm1,%xmm2
   0x00000000022d42e2 <+50>:    ucomisd %xmm2,%xmm3
   0x00000000022d42e6 <+54>:    jbe    0x22d4302 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+82>
   0x00000000022d42e8 <+56>:    cvttsd2si %xmm0,%rax
   0x00000000022d42ed <+61>:    pxor   %xmm2,%xmm2
   0x00000000022d42f1 <+65>:    andnpd %xmm0,%xmm1
   0x00000000022d42f5 <+69>:    cvtsi2sd %rax,%xmm2
   0x00000000022d42fa <+74>:    orpd   %xmm1,%xmm2
   0x00000000022d42fe <+78>:    movapd %xmm2,%xmm0
   0x00000000022d4302 <+82>:    movsd  0x8c6e66(%rip),%xmm1        # 0x2b9b170
   0x00000000022d430a <+90>:    comisd %xmm1,%xmm0
   0x00000000022d430e <+94>:    jae    0x22d4320 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+112>
   0x00000000022d4310 <+96>:    cvttsd2si %xmm0,%rax
   0x00000000022d4315 <+101>:   and    $0xfffffffffffffe00,%rax
   0x00000000022d431b <+107>:   retq
   0x00000000022d431c <+108>:   nopl   0x0(%rax)
   0x00000000022d4320 <+112>:   subsd  %xmm1,%xmm0
   0x00000000022d4324 <+116>:   cvttsd2si %xmm0,%rax
   0x00000000022d4329 <+121>:   btc    $0x3f,%rax
   0x00000000022d432e <+126>:   and    $0xfffffffffffffe00,%rax
   0x00000000022d4334 <+132>:   retq
   0x00000000022d4335 <+133>:   nopl   (%rax)
   0x00000000022d4338 <+136>:   mov    %rdi,%rax
   0x00000000022d433b <+139>:   and    $0x1,%edi
   0x00000000022d433e <+142>:   pxor   %xmm0,%xmm0
   0x00000000022d4342 <+146>:   shr    %rax
   0x00000000022d4345 <+149>:   or     %rdi,%rax
   0x00000000022d4348 <+152>:   cvtsi2sd %rax,%xmm0
   0x00000000022d434d <+157>:   addsd  %xmm0,%xmm0
   0x00000000022d4351 <+161>:   jmpq   0x22d42c2 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)+18>
End of assembler dump.

(gdb) disassemble Log_files_capacity::update_exposed
Dump of assembler code for function Log_files_capacity::update_exposed(unsigned long):
   0x00000000022d44e0 <+0>:     endbr64
   0x00000000022d44e4 <+4>:     push   %rbp
   0x00000000022d44e5 <+5>:     mov    %rsp,%rbp
   0x00000000022d44e8 <+8>:     push   %r13
   0x00000000022d44ea <+10>:    mov    %rsi,%r13
   0x00000000022d44ed <+13>:    push   %r12
   0x00000000022d44ef <+15>:    mov    %rdi,%r12
   0x00000000022d44f2 <+18>:    mov    %rsi,%rdi
   0x00000000022d44f5 <+21>:    push   %rbx
   0x00000000022d44f6 <+22>:    sub    $0x8,%rsp
   0x00000000022d44fa <+26>:    callq  0x22d42b0 <Log_files_capacity::soft_logical_capacity_for_hard(unsigned long)>
   0x00000000022d44ff <+31>:    cmp    $0x3ff,%rax
   0x00000000022d4505 <+37>:    jbe    0x22d4560 <Log_files_capacity::update_exposed(unsigned long)+128>
   0x00000000022d4507 <+39>:    mov    %rax,%rbx
   0x00000000022d450a <+42>:    xchg   %r13,0x8(%r12)
   0x00000000022d450f <+47>:    xchg   %rax,(%r12)
   0x00000000022d4513 <+51>:    mov    %rbx,%rdi
   0x00000000022d4516 <+54>:    callq  0x22d4430 <Log_files_capacity::sync_flush_logical_capacity_for_soft(unsigned long)>
   0x00000000022d451b <+59>:    xchg   %rax,0x18(%r12)
   0x00000000022d4520 <+64>:    mov    %rbx,%rdx
   0x00000000022d4523 <+67>:    mov    %rbx,%rax
   0x00000000022d4526 <+70>:    shr    $0x3,%rdx
   0x00000000022d452a <+74>:    sub    %rdx,%rax
   0x00000000022d452d <+77>:    and    $0xfffffffffffffe00,%rax
   0x00000000022d4533 <+83>:    xchg   %rax,0x10(%r12)
   0x00000000022d4538 <+88>:    mov    %rbx,%rax
   0x00000000022d453b <+91>:    shr    $0x5,%rax
   0x00000000022d453f <+95>:    sub    %rax,%rbx
   0x00000000022d4542 <+98>:    and    $0xfffffffffffffe00,%rbx
   0x00000000022d4549 <+105>:   xchg   %rbx,0x20(%r12)
   0x00000000022d454e <+110>:   add    $0x8,%rsp
   0x00000000022d4552 <+114>:   pop    %rbx
   0x00000000022d4553 <+115>:   pop    %r12
   0x00000000022d4555 <+117>:   pop    %r13
   0x00000000022d4557 <+119>:   pop    %rbp
   0x00000000022d4558 <+120>:   retq
   0x00000000022d4559 <+121>:   nopl   0x0(%rax)
   0x00000000022d4560 <+128>:   mov    $0x1de,%edx
   0x00000000022d4565 <+133>:   lea    0x120e2a4(%rip),%rsi        # 0x34e2810
   0x00000000022d456c <+140>:   lea    0x120e315(%rip),%rdi        # 0x34e2888
   0x00000000022d4573 <+147>:   callq  0x243af50 <ut_dbg_assertion_failed(char const*, char const*, unsigned long)>
End of assembler dump.
(gdb)

And another coredump with error "LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD"
=> see file btfull.txt with comment "bt full : LOG_WRITER_HARD_CAPACITY_MIN + OS_FILE_LOG_BLOCK_SIZE <= physical_capacity * FREE_FILE_RATIO - OVERHEAD"
[5 Jun 2023 7:38] Olivier Doucet
Hello,
Is there anything else I can do to help ? It affects several mysql servers here, and we have globally around 40-50 crashes a day.
[5 Jun 2023 9:39] Jakub Lopuszanski
Hi Olivier,
I don't know yet how to access the core files you've uploaded.
I have to figure it out, as I don't see any download link in the bugs.mysql.com UI nor any instruction on how to obtain it.

Sinisa, can you help me with that?
[6 Jun 2023 12:05] MySQL Verification Team
Hi,

Try increasing log file sizes, but please follow the procedures from our Reference Manual.
[7 Jun 2023 9:05] Olivier Doucet
The MySQL server is writing a few kilobytes every now and then.
Currently I have innodb_redo_log_capacity=1073741824 (1 GiB). It seems more than enough to cope with load.
All redo operations seems to fit in 32MiB : 

ls -lah \#innodb_redo/
total 1.1G
-rw-r----- 1 mysql mysql  32M Jun  7 11:05 #ib_redo2079
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2080_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2081_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2082_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2083_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2084_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2085_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2086_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2087_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2088_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2089_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2090_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2091_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2092_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2093_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2094_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2095_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2096_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2097_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2098_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2099_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2100_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2101_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2102_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2103_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2104_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2105_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2106_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2107_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2108_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2109_tmp
-rw-r----- 1 mysql mysql  32M Jun  7 05:57 #ib_redo2110_tmp
[7 Jun 2023 9:43] Jakub Lopuszanski
Thank you Olivier!
[7 Jun 2023 10:28] Jakub Lopuszanski
OK Olivier, I really want to help you, but I am not the one who sets the rules here (trust me, I'd love to) and the rules seem to be that we can't use Docker.

Yes, I know.

We have to get creative. To interpret the core file, I need the exact same binary, as in the docker file, but *I* can't access it.
So, I need you to help me to extract information about the binary and libraries involved.
Here are some commands, that if run from within the docker, will hopefully list important info:

file /usr/sbin/mysqld
rpm -qa
dpkg -l
cat /etc/os-release

The first one should output something like
"/usr/sbin/mysqld: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fd1f537767fe1f4564705ca5348a4183518a57e1, for GNU/Linux 3.2.0, stripped"
and that BuildID should help us identify the exact binary.
The second might fail, then try the third.
The last one might also be useful.

Sorry for inconvenience. 
The only other way I can think of is if you are able to reproduce it using our official distribution channels - in particular without Docker.
[7 Jun 2023 12:00] Olivier Doucet
Hello,

# file /usr/sbin/mysqld
/usr/sbin/mysqld: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6b049f17400f850658b2eb3ff165ec9a085d9655, for GNU/Linux 3.2.0, stripped

# rpm -qa |grep mysql
mysql-community-server-minimal-8.0.32-1.el8.x86_64
mysql-shell-8.0.32-1.el8.x86_64
mysql-community-minimal-debugsource-8.0.32-1.el8.x86_64
mysql-community-server-minimal-debuginfo-8.0.32-1.el8.x86_64

# cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.7"
[...]

Hope it helps ^^
[20 Jun 2023 8:49] Olivier Doucet
Any update ? Tell me how I can help :)
[20 Jun 2023 14:03] Jakub Lopuszanski
I'm sorry for delays - there's only one person working on this, me, and I am allocated 25% to InnoDB, and have several InnoDB bugs assigned, and my daughter was sick, and ... this task is still in my todo list :(

One thing worth checking meanwhile is if this is somehow dependent on docker or "machine" it is run on. Perhaps also run some hardware checks, to eliminate faulty memory, CPU etc.
[27 Jun 2023 16:35] Olivier Doucet
Hello,
> One thing worth checking meanwhile is if this is somehow dependent on docker or "machine" it is run on. 
No, as we encountered this bug on 4 different mysql servers from different customers, running on different VM + different physical servers.
One thing to note is that all four servers have a very low usage.
[21 Nov 2023 9:18] Olivier Doucet
Hello,
Any update on this ? 
Same bug happens on (at least) : 
mysql 8.0.30
mysql 8.0.34
[21 Nov 2023 10:54] MySQL Verification Team
Hi,

We can only inform you that a bug that is similar to yours have been fixed in 8.2.0 and it will not be ported back to any previous version.

You can try MySQL 8.2.0 in a standalone environment and let us know if you have problems there as well.
[20 Dec 2023 17:48] Olivier Doucet
Can you please point me to the bug report you are referring to ?
[8 Jan 9:54] MySQL Verification Team
Hi Mr. Doucet,

This is an internal bug, not visible to the public, so it's number would not mean anything to you ......
[8 Jan 19:29] Jakub Lopuszanski
As I've said earlier we aren't allowed to use Docker, nor Docker Hub.
I am restricted to use Podman and container-registry.oracle.com.
There's no "mysql:8.0.32-oracle" in this repo :(
But there's oraclelinux:8-slim in it.
Unfortunately, I was not able to match the version of /usr/lib64/libpthread-2.28.so
Your core file seems to be using:
```
[root@jakub-lopuszanski-el8-1704713503 ~]# eu-unstrip -n --core=./coredump-mysql-community-server-minimal-8.0.32-1.el8.x86_64  | grep pthr
0x7ff689166000+0x220000 2607a035577563ba65ce4f9ba1d10415f944e307@0x7ff6891662b8 - - /usr/lib64/libpthread-2.28.so
```
but in the docker image I have:
```
bash-4.4# file /usr/lib64/libpthread-2.28.so
/usr/lib64/libpthread-2.28.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=d1adef9579ce9780c97db1a37bd08552a91f4564, for GNU/Linux 3.2.0, not stripped
```
So, the sha1 doesn't match even though it's the same "version" "2.28".
I believe this is the reason why gdb has problems to decode some callstacks:
```
(gdb) bt
#0  0x00007ff689175a45 in pthread_sigmask () from /lib64/libpthread.so.0
#1  0x00007ff632f5a890 in ?? ()
#2  0x00007ff632f5a8f8 in ?? ()
#3  0x0000000000000002 in ?? ()
#4  0x00007ff632f5a5d0 in ?? ()
#5  0x00007ff632f5a850 in ?? ()
#6  0x0000000000000006 in ?? ()
#7  0x0000000000000000 in ?? ()
```
even though I can access things like `p *(Log_files_capacity*)0x7ff676afff20` or `p *log_sys`, so MySQL symbols seem to be found fine.
When I do `disassemble Log_files_capacity::soft_logical_capacity_for_hard` I get the same assembly as you.
I've tried to read the memory location which are loaded into xmm registers, to check the hypothesis, that perhaps they got corrupted somehow:
```
(gdb) p/x *(uint64_t*)0x343cf08
$3 = 0x3fee666666666666
(gdb) p/x *(uint64_t*)0x32f6e40
$4 = 0x7fffffffffffffff
(gdb) p/x *(uint64_t*)0x32f6e60
$5 = 0x4330000000000000
(gdb) p/x *(uint64_t*)0x2b9b170
$6 = 0x43e0000000000000
```
which are the expected values, but I am not sure if they are taken from the core dump, or from the binary (given that this is an address to the data section)
I was trying to look into xmm register values, to see if they contain what they should, and they don't:
```
(gdb) p/x $xmm0
$13 = {v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0xffff, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xe5, 0x6e, 0x66, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0,
    0x0, 0x0, 0x0, 0x6ee5, 0x66, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x666ee5, 0x0}, v2_int64 = {0x0, 0x666ee5}, uint128 = 0x666ee50000000000000000}
(gdb) p/x $xmm1
$14 = {v8_bfloat16 = {0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0xa1, 0x40, 0x39, 0x89, 0xf6, 0x7f, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {
    0x40a1, 0x8939, 0x7ff6, 0x0, 0x8, 0x0, 0x0, 0x0}, v4_int32 = {0x893940a1, 0x7ff6, 0x8, 0x0}, v2_int64 = {0x7ff6893940a1, 0x8}, uint128 = 0x800007ff6893940a1}
(gdb) p/x $xmm2
$15 = {v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x58, 0x93, 0x50, 0x89, 0xf6, 0x7f, 0x0, 0x0, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {
    0x9358, 0x8950, 0x7ff6, 0x0, 0x5, 0x0, 0x0, 0x0}, v4_int32 = {0x89509358, 0x7ff6, 0x5, 0x0}, v2_int64 = {0x7ff689509358, 0x5}, uint128 = 0x500007ff689509358}
(gdb) p/x $xmm3
$16 = {v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0,
    0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
```
in particular, I'd expect xmm3 to contain data loaded by:
```
0x00000000022d42d2 <+34>:    movsd  0x1022b86(%rip),%xmm3        # 0x32f6e60
```
so, `0x4330` should appear somewhere in it, but it doesn't.
This might be because ut_dbg_assertion_failed(..) or something below it, like abort(..) has already cleared it.
When I compile this source with godbolt, I get similar assembly:
https://godbolt.org/z/5vnhzneG5
and it seems to run fine as expected for the input hard_logical_capacity=1003082752.
I've searched the web for known bugs around xmm, floating point conversion, Docker, oVirt, but haven't found anything obvious.

As you know we don't support Docker. I don't think we support oVirt. I am certain we don't support Docker inside oVirt :)

If there is any bug here, then I don't think it is in the Redo Log System code itself - the same code in godbolt runs fine.
And as stated earlier: the assertion failure doesn't make sense mathematically anyway (it is impossible given the set of other asserts which somehow held in earlier lines).
This all looks like something is corrupting contents of registers, or the data section?
I am inclined to think the problem is somewhere in the interaction between oVirt+Docker+mysqld.

Another possibility is that the bug is in some other place of MySQL, and that other place somehow overwrites memory or causes stack corruption or something?

You've said earlier that you suspect this happens when a certain workload is being executed.

Would it be possible for you to extract SQL queries from binlog which happen before failure?
Is there anything more in the error log?
Any other clues?