MySQL Bugs: #87477: Semaphore wait crash due to long held lock during online index creation

Bug #87477	Semaphore wait crash due to long held lock during online index creation
Submitted:	18 Aug 2017 14:27	Modified:	3 Dec 2018 20:50
Reporter:	Tate McDaniel	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.7.12	OS:	Debian (8.5)
Assigned to:		CPU Architecture:	Any

Description:
Server crashed due to: 
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-08-15T14:16:33.070609Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-08-15 16:16:33 0x7f78ae3fe700  InnoDB: Assertion failure in thread 140156296226560 in file ut0ut.cc line 921
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
14:16:33 UTC - mysqld got signal 6 ;

At the time there was an ALTER TABLE ADD INDEX operation going on.  The semaphore happened due to an update on the column that the index was being built on. 

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1325322461
--Thread 140155153766144 has waited at row0upd.cc line 2137 for 952.00 seconds the semaphore:
S-lock on RW-latch at 0x7f7504755ba8 created in file dict0dict.cc line 2673
a writer (thread id 140155058185984) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdffffffe
Last time read locked in file row0upd.cc line 2137
Last time write locked in file /mnt/workspace/percona-server-5.7-debian-binary/label_exp/debian-jessie-64bit/percona-server-5.7-5.7.12-5/storage/innobase/btr/b
tr0bulk.cc line 53
OS WAIT ARRAY INFO: signal count 555241020
RW-shared spins 0, rounds 610092192, OS waits 387308161
RW-excl spins 0, rounds 26362799724, OS waits 589927746
RW-sx spins 37826656, rounds 1064451859, OS waits 32365388
Spin rounds per wait: 610092192.00 RW-shared, 26362799724.00 RW-excl, 28.14 RW-sx

---TRANSACTION 31221190861, ACTIVE 952 sec updating or deleting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 175398105, OS thread handle 140155153766144, query id 11778771527 REDACTED test updating
UPDATE s3_delete_deletedImages SET `404`=`404`+1 WHERE result_image = 'REDACTED'
---TRANSACTION 31221109974, ACTIVE 15287 sec
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 175173359, OS thread handle 140155058185984, query id 11771604710 localhost root altering table
ALTER TABLE `test`.`s3_delete_deletedImages` ADD INDEX (`404`)
Trx read view will not see trx with id >= 31221109976, sees < 31221109975

This was a latch and not a row level lock that caused this and that should not happen in an online alter.  I've looked at the code and it appears the thread hung up here: 

if (!index->is_committed()) {
/* The index->online_status may change if the index is
or was being created online, but not committed yet. It
is protected by index->lock. */
mtr_s_lock(dict_index_get_lock(index), &mtr);

where it is trying to get the index lock.  

cat /etc/mysql/my.cnf

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = -10
malloc-lib      = /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
numa_interleave = 1     
log-error       = /var/lib/mysql/mysql-error.log

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql

ignore-db-dir   = mysqltmp
tmpdir          = /var/lib/mysql/mysqltmp

lc-messages-dir = /usr/share/mysql
ignore-db-dir   = lost+found

skip-external-locking
log_error_verbosity     = 2
bind-address            = 0.0.0.0

table_definition_cache  = 20000
table_open_cache        = 20000
thread_cache_size       = 4000
wait_timeout            = 61

server-id = 111111
auto-increment-increment = 1
auto-increment-offset = 1

# MyISAM #
key-buffer-size                = 16M

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000
sysdate-is-now                 = 1
innodb-strict-mode             = 0
max_execution_time=300000

# BINARY LOGGING #
binlog_format                  = ROW
log-bin                        = /var/lib/mysql/mysql-bin
log-slave-updates              = ON
expire-logs-days               = 5
max_binlog_size                = 750M
sync-binlog                    = 0 

# CACHES AND LIMITS #
tmp-table-size                 = 384M
max-heap-table-size            = 384M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 12000
thread-cache-size              = 4000
open-files-limit               = 65535
table-definition-cache         = 20000
table-open-cache               = 20000

thread_stack            = 256K

# INNODB #
innodb_stats_on_metadata        = OFF
innodb-flush-method             = O_DIRECT
innodb-log-files-in-group       = 2
innodb-log-file-size            = 4G
innodb-file-per-table           = 1
innodb_buffer_pool_instances    = 16
innodb_read_io_threads          = 12
innodb_write_io_threads         = 12
innodb_lock_wait_timeout        = 10
innodb_io_capacity              = 4000 
innodb_flush_log_at_trx_commit  = 2 
innodb_old_blocks_time          = 1000
innodb_file_format_max          = Barracuda
innodb_file_format              = Barracuda
innodb_autoinc_lock_mode        = 2
innodb_read_ahead_threshold     = 0

innodb_adaptive_hash_index_parts   = 16
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_pct        = 50
innodb_flush_neighbors             = 0
innodb-buffer-pool-size            = 300G

# LOGGING #
log-error                      = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes  = 0
long_query_time                = 1
slow-query-log                 = 0
slow-query-log-file            = /var/lib/mysql/mysql-slow.log

# RELAY #
#skip-slave-start
#read-only
relay-log                 = mysql-test01-relay-bin
report-host               = mysql-test.test
relay_log_info_repository = TABLE
relay_log_recovery        = ON

[mysqldump]
quick
quote-names
max_allowed_packet      = 32M

!includedir /etc/mysql/conf.d/

How to repeat:
Unable to repeat.

Providing as much detail as possible.

DML stopped here:

Thread 134 (Thread 0x7f9a340ad700 (LWP 160092)):
#0  in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  in wait at ./storage/innobase/os/os0event.cc:165
#2  in os_event::wait_low at ./storage/innobase/os/os0event.cc:335
#3  in sync_array_wait_event at ./storage/innobase/sync/sync0arr.cc:475
#4  in rw_lock_s_lock_spin at ./storage/innobase/sync/sync0rw.cc:425
#5  in rw_lock_s_lock_func at ./storage/innobase/include/sync0rw.ic:433
#6  in pfs_rw_lock_s_lock_func at ./storage/innobase/include/sync0rw.ic:796
#7  in s_lock at ./storage/innobase/include/mtr0mtr.ic:244
#8  in row_ins_sec_index_entry_low at ./storage/innobase/row/row0ins.cc:2914
#9  in row_ins_sec_index_entry at ./storage/innobase/row/row0ins.cc:3385
#10 in row_ins_index_entry at ./storage/innobase/row/row0ins.cc:3431
#11 in row_ins_index_entry_step at ./storage/innobase/row/row0ins.cc:3579
#12 in row_ins at ./storage/innobase/row/row0ins.cc:3721
#13 in row_ins_step at ./storage/innobase/row/row0ins.cc:3906
#14 in row_insert_for_mysql_using_ins_graph at ./storage/innobase/row/row0mysql.cc:1733
#15 in ha_innobase::write_row at ./storage/innobase/handler/ha_innodb.cc:7494
#16 in handler::ha_write_row at ./sql/handler.cc:7923
#17 in write_record at ./sql/sql_insert.cc:1871
#18 in Sql_cmd_insert::mysql_insert at ./sql/sql_insert.cc:769
#19 in Sql_cmd_insert::execute at ./sql/sql_insert.cc:3103
#20 in mysql_execute_command at ./sql/sql_parse.cc:3606
#21 in mysql_parse at ./sql/sql_parse.cc:5611
#22 in Query_log_event::do_apply_event at ./sql/log_event.cc:4595
#23 in Log_event::apply_event at ./sql/log_event.cc:3242
#24 in apply_event_and_update_pos at ./sql/rpl_slave.cc:4702
#25 in exec_relay_log_event at ./sql/rpl_slave.cc:5212
#26 in handle_slave_sql at ./sql/rpl_slave.cc:7320

----
/* Ensure that we acquire index->lock when inserting into an
index with index->online_status == ONLINE_INDEX_COMPLETE, but
could still be subject to rollback_inplace_alter_table().
This prevents a concurrent change of index->online_status.
The memory object cannot be freed as long as we have an open
reference to the table, or index->table->n_ref_count > 0. */
const bool check = !index->is_committed();
if (check) {
DEBUG_SYNC_C("row_ins_sec_index_enter");
if (mode == BTR_MODIFY_LEAF) {
search_mode |= BTR_ALREADY_S_LATCHED;
mtr_s_lock(dict_index_get_lock(index), &mtr);                      <------------------------------
} else {
mtr_sx_lock(dict_index_get_lock(index), &mtr);
}

if (row_log_online_op_try(
   index, entry, thr_get_trx(thr)->id)) {
goto func_exit;
}
}

See: https://bugs.mysql.com/bug.php?id=82940

I' setting this one as duplicate of #82940