Bug #92324 mysqld crash unnormally after the machine abort because of problem of cpu
Submitted: 6 Sep 2018 16:32 Modified: 19 Sep 2018 15:14
Reporter: chunyang xu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.7.18 OS:Linux (suse 12 sp 2)
Assigned to: MySQL Verification Team CPU Architecture:x86
Tags: assert, crash

[6 Sep 2018 16:32] chunyang xu
Description:
according the core file generated by  mysqld when it crashed .  the  trace tree as bellow :

(gdb) bt
#0  0x00007fe9f35a5101 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000007e9455 in handle_fatal_signal (sig=6) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/sql/signal_handler.cc:220
#2  <signal handler called>
#3  0x00007fe9f1f3a8d7 in raise () from /lib64/libc.so.6
#4  0x00007fe9f1f3bcaa in abort () from /lib64/libc.so.6
#5  0x00000000007ba31d in ut_dbg_assertion_failed (expr=expr@entry=0x160f090 "addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA", 
    file=file@entry=0x160f000 "/mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/include/fut0lst.ic", line=line@entry=85)
    at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/ut/ut0dbg.cc:67
#6  0x00000000007bbd4e in flst_read_addr (mtr=<optimized out>, faddr=0x7fd7fd72142b "") at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/include/fut0lst.ic:85
#7  0x00000000011e6545 in flst_read_addr (mtr=<optimized out>, faddr=0x7fd7fd72142b "") at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/fut/fut0lst.cc:385
#8  flst_get_prev_addr (mtr=0x7fc5a1d30360, node=0x7fd7fd72142b "") at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/include/fut0lst.ic:168
#9  flst_remove (base=base@entry=0x7fd7fd71402e "", node2=0x7fd7fd72142b "", mtr=mtr@entry=0x7fc5a1d30360) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/fut/fut0lst.cc:338
#10 0x00000000010ca810 in trx_purge_remove_log_hdr (mtr=0x7fc5a1d30360, log_hdr=<optimized out>, rseg_hdr=0x7fd7fd714026 "\377\377\377\376")
    at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/trx/trx0purge.cc:410
#11 trx_purge_truncate_rseg_history (rseg=0xedf99708, limit=limit@entry=0xedfb8ac8) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/trx/trx0purge.cc:604
#12 0x00000000010cffde in trx_purge_truncate_history (limit=0xedfb8ac8, view=<optimized out>) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/trx/trx0purge.cc:1225
#13 0x00000000010d0b53 in trx_purge_truncate () at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/trx/trx0purge.cc:1802
#14 trx_purge (n_purge_threads=<optimized out>, batch_size=<optimized out>, truncate=<optimized out>) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/trx/trx0purge.cc:1900
#15 0x00000000010ab3e5 in srv_do_purge (n_total_purged=<synthetic pointer>, n_threads=4) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/srv/srv0srv.cc:2621
#16 srv_purge_coordinator_thread (arg=<optimized out>) at /mysqldata/mysql_install/install_soft/mysql-5.7.18/storage/innobase/srv/srv0srv.cc:2793
#17 0x00007fe9f359f744 in start_thread () from /lib64/libpthread.so.0
#18 0x00007fe9f1fefaad in clone () from /lib64/libc.so.6

so   the  crash  cause by  the  follow  function.

Reads a file address.
@return file address */
UNIV_INLINE
fil_addr_t
flst_read_addr(
/*===========*/
        const fil_faddr_t*      faddr,  /*!< in: pointer to file faddress */
        mtr_t*                  mtr)    /*!< in: mini-transaction handle */
{
        fil_addr_t      addr;

        ut_ad(faddr && mtr);

        addr.page = mtr_read_ulint(faddr + FIL_ADDR_PAGE, MLOG_4BYTES, mtr);
        addr.boffset = mtr_read_ulint(faddr + FIL_ADDR_BYTE, MLOG_2BYTES,
                                      mtr);
        ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
        ut_a(ut_align_offset(faddr, UNIV_PAGE_SIZE) >= FIL_PAGE_DATA);
        return(addr);
}

  ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA); the  assert can't pass;

How to repeat:
it hard to repeat . but l think   this can repeate by using gdb .

it think  the followed  function had not finshed cause the problem.
/********************************************************************//**
Writes a file address. */
UNIV_INLINE
void
flst_write_addr(
/*============*/
        fil_faddr_t*    faddr,  /*!< in: pointer to file faddress */
        fil_addr_t      addr,   /*!< in: file address */
        mtr_t*          mtr)    /*!< in: mini-transaction handle */
{
        ut_ad(faddr && mtr);
        ut_ad(mtr_memo_contains_page_flagged(mtr, faddr,
                                             MTR_MEMO_PAGE_X_FIX
                                             | MTR_MEMO_PAGE_SX_FIX));
        ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
        ut_a(ut_align_offset(faddr, UNIV_PAGE_SIZE) >= FIL_PAGE_DATA);

        mlog_write_ulint(faddr + FIL_ADDR_PAGE, addr.page, MLOG_4BYTES, mtr);
        mlog_write_ulint(faddr + FIL_ADDR_BYTE, addr.boffset,
                         MLOG_2BYTES, mtr);
}

when       mlog_write_ulint(faddr + FIL_ADDR_PAGE, addr.page, MLOG_4BYTES, mtr);    excuted, but the next line not executed .  and  then  machine crash , cause  the  problem. 

Suggested fix:
fix the  problem, make mysqld robust beter.
[17 Sep 2018 15:39] MySQL Verification Team
Hi,

I'm not sure I understand the bug report so please clarify if you can.

1. You stated that problem came 'cause of CPU issues. There is no way we can solve the CPU issues in code :(. The best that we can do is if a problem with CPU is discovered is to crash preventing wrong data entering database.

2. "crashing mysql using GDB", not sure about this, we are talking about highly concurrent system, how do you think we prevent gdb crash when you can execute any arbitrary stuff anywhere in code? executing flst_write_addr only partially, I don't see how that can happen in normal operation...

All best
Bogdan
[18 Sep 2018 0:26] chunyang xu
sorry.l  make a mistake. can not repeate  using gdb.  may  be  this  issue cased by the hardware.