Bug #87476 Semaphore wait crash due to long held lock during online index creation
Submitted: 18 Aug 2017 14:27 Modified: 23 Aug 2017 10:32
Reporter: Tate McDaniel Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.7.12 OS:Debian (8.5)
Assigned to: CPU Architecture:Any

[18 Aug 2017 14:27] Tate McDaniel
Description:
Server crashed due to: 
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-08-15T14:16:33.070609Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-08-15 16:16:33 0x7f78ae3fe700  InnoDB: Assertion failure in thread 140156296226560 in file ut0ut.cc line 921
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
14:16:33 UTC - mysqld got signal 6 ;

At the time there was an ALTER TABLE ADD INDEX operation going on.  The semaphore happened due to an update on the column that the index was being built on. 

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1325322461
--Thread 140155153766144 has waited at row0upd.cc line 2137 for 952.00 seconds the semaphore:
S-lock on RW-latch at 0x7f7504755ba8 created in file dict0dict.cc line 2673
a writer (thread id 140155058185984) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdffffffe
Last time read locked in file row0upd.cc line 2137
Last time write locked in file /mnt/workspace/percona-server-5.7-debian-binary/label_exp/debian-jessie-64bit/percona-server-5.7-5.7.12-5/storage/innobase/btr/b
tr0bulk.cc line 53
OS WAIT ARRAY INFO: signal count 555241020
RW-shared spins 0, rounds 610092192, OS waits 387308161
RW-excl spins 0, rounds 26362799724, OS waits 589927746
RW-sx spins 37826656, rounds 1064451859, OS waits 32365388
Spin rounds per wait: 610092192.00 RW-shared, 26362799724.00 RW-excl, 28.14 RW-sx

---TRANSACTION 31221190861, ACTIVE 952 sec updating or deleting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 175398105, OS thread handle 140155153766144, query id 11778771527 REDACTED test updating
UPDATE s3_delete_deletedImages SET `404`=`404`+1 WHERE result_image = 'REDACTED'
---TRANSACTION 31221109974, ACTIVE 15287 sec
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 175173359, OS thread handle 140155058185984, query id 11771604710 localhost root altering table
ALTER TABLE `test`.`s3_delete_deletedImages` ADD INDEX (`404`)
Trx read view will not see trx with id >= 31221109976, sees < 31221109975

This was a latch and not a row level lock that caused this and that should not happen in an online alter.  I've looked at the code and it appears the thread hung up here: 

if (!index->is_committed()) {
/* The index->online_status may change if the index is
or was being created online, but not committed yet. It
is protected by index->lock. */
mtr_s_lock(dict_index_get_lock(index), &mtr);

where it is trying to get the index lock.  

cat /etc/mysql/my.cnf

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = -10
malloc-lib      = /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
numa_interleave = 1     
log-error       = /var/lib/mysql/mysql-error.log

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql

ignore-db-dir   = mysqltmp
tmpdir          = /var/lib/mysql/mysqltmp

lc-messages-dir = /usr/share/mysql
ignore-db-dir   = lost+found

skip-external-locking
log_error_verbosity     = 2
bind-address            = 0.0.0.0

table_definition_cache  = 20000
table_open_cache        = 20000
thread_cache_size       = 4000
wait_timeout            = 61

server-id = 111111
auto-increment-increment = 1
auto-increment-offset = 1

# MyISAM #
key-buffer-size                = 16M

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000
sysdate-is-now                 = 1
innodb-strict-mode             = 0
max_execution_time=300000

# BINARY LOGGING #
binlog_format                  = ROW
log-bin                        = /var/lib/mysql/mysql-bin
log-slave-updates              = ON
expire-logs-days               = 5
max_binlog_size                = 750M
sync-binlog                    = 0 

# CACHES AND LIMITS #
tmp-table-size                 = 384M
max-heap-table-size            = 384M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 12000
thread-cache-size              = 4000
open-files-limit               = 65535
table-definition-cache         = 20000
table-open-cache               = 20000

thread_stack            = 256K

# INNODB #
innodb_stats_on_metadata        = OFF
innodb-flush-method             = O_DIRECT
innodb-log-files-in-group       = 2
innodb-log-file-size            = 4G
innodb-file-per-table           = 1
innodb_buffer_pool_instances    = 16
innodb_read_io_threads          = 12
innodb_write_io_threads         = 12
innodb_lock_wait_timeout        = 10
innodb_io_capacity              = 4000 
innodb_flush_log_at_trx_commit  = 2 
innodb_old_blocks_time          = 1000
innodb_file_format_max          = Barracuda
innodb_file_format              = Barracuda
innodb_autoinc_lock_mode        = 2
innodb_read_ahead_threshold     = 0

innodb_adaptive_hash_index_parts   = 16
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_pct        = 50
innodb_flush_neighbors             = 0
innodb-buffer-pool-size            = 300G

# LOGGING #
log-error                      = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes  = 0
long_query_time                = 1
slow-query-log                 = 0
slow-query-log-file            = /var/lib/mysql/mysql-slow.log

# RELAY #
#skip-slave-start
#read-only
relay-log                 = mysql-test01-relay-bin
report-host               = mysql-test.test
relay_log_info_repository = TABLE
relay_log_recovery        = ON

[mysqldump]
quick
quote-names
max_allowed_packet      = 32M

!includedir /etc/mysql/conf.d/

How to repeat:
Unable to repeat.

Providing as much detail as possible.
[18 Aug 2017 14:29] Tate McDaniel
This submission was duplicated by accident.  Please close this one and see identical bug id=87477
[23 Aug 2017 10:32] MySQL Verification Team
Hello Tate McDaniel,

Thank you for taking the time to report a problem.  Unfortunately you are not using a current version of the product you reported a problem with -- MySQL 5.7.12 is old and many bugs fixed since then and the problem might already be fixed. Please download MySQL 5.7.19 from http://www.mysql.com/downloads/ 

If you are able to reproduce the bug with one of the latest versions, please provide reproducible test case and change the version on this bug report to the version you tested and change the status back to "Open".  Again, thank you for your continued support of MySQL.

Thanks,
Umesh