Description:
Running Percona Server 5.6.38-rel83.0 on a CentOS 6 x86_64 server with 96 GB RAM and about 200K InnoDB tables (that's not a typo) plus a handful of MyISAM ones.
Every week or so, we get a situation where the MySQL server crashes, leaving this backtrace:
------------------------------------------------------------------------
2018-02-12 21:19:27 7fbdb74a2700 InnoDB: Assertion failure in thread 140452800636672 in file ha_innodb.cc line 12153
InnoDB: Failing assertion: index->table->stat_initialized
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:19:28 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/
key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=1210
max_threads=1502
thread_count=164
connection_count=164
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6205868 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fbd7c8c7000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fbdb74a1d30 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8db9bc]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x65b821]
/lib64/libpthread.so.0(+0xf7e0)[0x7fd1572917e0]
/lib64/libc.so.6(gsignal+0x35)[0x7fd1554ca495]
/lib64/libc.so.6(abort+0x175)[0x7fd1554cbc75]
/usr/sbin/mysqld[0x994015]
/usr/sbin/mysqld[0x9984f5]
/usr/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcii+0x33)[0x59e663]
/usr/sbin/mysqld(_Z21open_table_from_shareP3THDP11TABLE_SHAREPKcjjjP5TABLEb+0x694)[0x763d74]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP18Open_table_context+0x1116)[0x690d56]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x6c5)[0x698b95]
/usr/sbin/mysqld(_Z30open_normal_and_derived_tablesP3THDP10TABLE_LISTj+0x51)[0x699471]
/usr/sbin/mysqld[0x55e324]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1b9e)[0x6df10e]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5b8)[0x6e4848]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x100e)[0x6e5fee]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1a2)[0x6b2682]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x6b2720]
/lib64/libpthread.so.0(+0x7aa1)[0x7fd157289aa1]
/lib64/libc.so.6(clone+0x6d)[0x7fd155580bcd]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fbd8c880010): is an invalid pointer
Connection ID (thread ID): 32018292
Status: NOT_KILLED
------------------------------------------------------------------------
When this happens fortunately the server restarts, but it creates a ≈20 minute downtime which is not that neat. Attached is the my.cnf running on that server.
Not sure what to do to make this easier to debug for you guys. I first posted the issue on Percona's bugtracker on https://jira.percona.com/browse/PS-3826 but I did not get an answer; I don't suppose the bug is Percona-specific.
Thanks! My my.cnf follows, for reference:
------------------------------------------------------------------------
[mysqld]
audit_log_policy = NONE
auto-increment-increment = 2
auto-increment-offset = 2
bind_address = 0.0.0.0
binlog-format = ROW
datadir = /var/lib/mysql/
expand_fast_index_creation = 1
expire_logs_days = 7
explicit_defaults_for_timestamp = TRUE
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_size = 4030554K
innodb_change_buffering = inserts
innodb_data_file_path = ibdata1:256M;ibdata2:16M:autoextend
innodb_data_home_dir = /var/lib/mysql
innodb_file_format = Barracuda
innodb_file_format_max = Barracuda
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_log_buffer_size = 128M
innodb_log_file_size = 100M
innodb_log_group_home_dir = /var/lib/mysql
innodb_print_all_deadlocks = on
join_buffer_size = 64M
key_buffer_size = 32M
log-bin = /var/lib/mysql/mysql-bin
log-error = /var/lib/mysql/mysql.err
log-queries-not-using-indexes = 1
log-slave-updates
log-slow-admin-statements = 1
long_query_time = 10
master-info-file = /var/lib/mysql/mysql-master.info
max_allowed_packet = 128M
max_connections = 1500
max_heap_table_size = 256M
max_heap_table_size = 4294967295
min_examined_row_limit = 1
myisam_sort_buffer_size = 32M
net_read_timeout = 600
net_write_timeout = 600
open_files_limit = 65536
performance_schema = off
port = 3306
query_cache_size = 0
query_response_time_range_base = 10
query_response_time_stats = 1
read_buffer_size = 2M
read_rnd_buffer_size = 1024M
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
relay-log-info-file = /var/lib/mysql/mysql-relay-log.info
relay_log_info_repository = TABLE
relay_log_recovery = ON
relay-log = /var/lib/mysql/mysql-relay-bin
report-host = nxapi-db2
secure_file_priv = '/var/mysql_outfile/'
server-id = 837
skip-external-locking = 1
slave-parallel-workers = 4
slave_pending_jobs_size_max = 33554432
slow_query_log = 0
slow_query_log_file = /var/lib/mysql/mysql.slow
socket = /var/run/mysqld/mysqld.sock
sort_buffer_size = 2M
sql-mode = "NO_ENGINE_SUBSTITUTION"
sync-binlog = 1
table_definition_cache = 800
table_open_cache = 4000
thread_cache_size = 8
tmpdir = /home/mysql/tmp
tmp_table_size = 256M
transaction-isolation = READ-COMMITTED
wait_timeout = 600
------------------------------------------------------------------------
How to repeat:
I have no clue how to reproduce the problem; we run other databases with the same configuration, and they do not show the problem, but they host a lesser number of tables and are less busy.