MySQL Bugs: #88523: InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the serv

Bug #88523	InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the serv
Submitted:	16 Nov 2017 14:13	Modified:	26 Sep 2018 13:14
Reporter:	Kevin F	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7.16/5.7.22	OS:	Windows
Assigned to:		CPU Architecture:	Any

Description:
MySQL server is now restarting itself every 18minutes or so with the following in the error log:
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-11-16T13:18:39.650036Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-11-16 13:18:39 0x1bcc  InnoDB: Assertion failure in thread 7116 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
13:18:39 UTC - mysqld got exception 0x80000003 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=34
max_threads=151
thread_count=6
connection_count=6
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68011 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
7ff7eeb7e262    mysqld.exe!my_sigabrt_handler()[my_thr_init.c:449]
7ff7eef28489    mysqld.exe!raise()[winsig.c:587]
7ff7eef27380    mysqld.exe!abort()[abort.c:82]
7ff7eec8f178    mysqld.exe!ut_dbg_assertion_failed()[ut0dbg.cc:67]
7ff7eec8f391    mysqld.exe!ib::fatal::~fatal()[ut0ut.cc:916]
7ff7eebfd43d    mysqld.exe!srv_error_monitor_thread()[srv0srv.cc:1724]
7ffc55d613d2    KERNEL32.DLL!BaseThreadInitThunk()
7ffc57d854e4    ntdll.dll!RtlUserThreadStart()
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

There appears to be a transaction which is being rolled back when the server restarts. The following can be found in the information_schema.innodb_trx\G

                    trx_id: 309725166
                 trx_state: ROLLING BACK
               trx_started: 2017-11-16 13:30:18
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 17856930
       trx_mysql_thread_id: 0
                 trx_query: NULL
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 1
          trx_lock_structs: 1
     trx_lock_memory_bytes: 1136
           trx_rows_locked: 0
         trx_rows_modified: 17856929
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

Is it possible to stop this transaction being rolled back?

How to repeat:
Cannot provide instruction on how this can be recreated elsewhere.

Hi!

Thank you for your bug report. However, most likely, this is not a bug. We had so many situations like this one and in vast majority of cases it turned out that some transaction was waiting on the end user's input, or on the long running statement, some network (or any other) resource or similar. This is because read / write transactions are practically holding many locks , until it is committed or rolled back.

If you want to see what is a culprit for the locks being held, when you notice a wait longer then 5 minutes run:

SHOW ENGINE INNODB STATUS;

and you will see which connection is holding the locks. Then you should determine what is that transaction actually waiting on. A connection can be in the "Sleep" state, in which case it is a problem of the application, which made transaction duration dependant on the end user's input.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

https://bugs.mysql.com/bug.php?id=92193 marked as duplicate of this one.

At first issue was observed once in 2 or 3 days. Last day, Issue observed 4 times in 5 hours. Why the sudden increase ? Any thoughts ? Was any solution provided for this issue so it will not recur ?

Should I reinstall MySQL?

No need to reinstall anything. You just need to use InnoDB status output and see which transaction and which connection is holding locks. Then, you should examine your application and see what is it waiting on.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

https://bugs.mysql.com/bug.php?id=94930 marked as duplicate of this one.

The following bug is also a duplicate of this one:

https://bugs.mysql.com/bug.php?id=97911

Hello,

We run in this exact bug on a regular basis (MySQL 5.7.27), even on really low pressure. For the context, we have an application server (GlassFish/Payara) that handles a pool of connections.

What we have seen is that *sometimes* (we are unable to find a pattern on how it produces) some semaphores appears and then the database is almost locked. Even a restart of the database (with the application server shut down, so no connection is actually open to the database) does not solve the problem: the semaphore is still back and doing *seemingly* nothing else that holding some locks.

The only way we have to resolve this problem is:
- shut down mysql (kill it, because otherwise the semaphore doesn't let the service to be stopped)
- start the database with innodb_force_recovery=3 => no recovery, and no semaphore created
- dump the crashed tables => maybe a hint here: it is always the same table which holds some millions records with a LINESTRING column having a SPATIAL INDEX on it.
- drop the table
- restart mysql normally => semaphore disapears
- reimport the table

As I see today, our database is again holding some semaphore from two days on and does not release it (there was not load at all during the night), even after the application server is down (hence mysql has no more connection). Even when the application server was up, a SHOW PROCESSSLIST is showing only Sleeping threads.

At this point, even with the SHOW ENGINE INNODB STATUS (see below) command, I'm unable to link a transaction to a thread id... Even so, I killed all the threads shown in the processlist, and still, the semaphore remains here.

=====================================
2020-04-29 10:50:43 0x7f886c3d9700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 10 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 53 srv_idle
srv_master_thread log flush and writes: 54
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1542927
--Thread 140224065615616 has waited at srv0srv.cc line 1981 for 274  seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x555e3e4abfa8 created in file dict0dict.cc line 1183
a writer (thread id 140224065615616) has reserved it in mode  wait exclusive
number of readers 3, waiters flag 0, lock_word: fffffffffffffffd
Last time read locked in file row0purge.cc line 862
Last time write locked in file /export/home/pb2/build/sb_0-34537258-1560178574.29/release/mysql-5.7.27/storage/innobase/srv/srv0srv.cc line 1981
--Thread 140224057222912 has waited at row0purge.cc line 462 for 274  seconds the semaphore:
SX-lock on RW-latch at 0x7f886854f390 created in file dict0dict.cc line 2737
a writer (thread id 140223971325696) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file gis0sea.cc line 135
Last time write locked in file /export/home/pb2/build/sb_0-34537258-1560178574.29/release/mysql-5.7.27/storage/innobase/btr/btr0cur.cc line 974
--Thread 140224048830208 has waited at row0purge.cc line 462 for 274  seconds the semaphore:
SX-lock on RW-latch at 0x7f886854f390 created in file dict0dict.cc line 2737
a writer (thread id 140223971325696) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file gis0sea.cc line 135
Last time write locked in file /export/home/pb2/build/sb_0-34537258-1560178574.29/release/mysql-5.7.27/storage/innobase/btr/btr0cur.cc line 974
OS WAIT ARRAY INFO: signal count 806248
RW-shared spins 0, rounds 627, OS waits 106
RW-excl spins 0, rounds 112, OS waits 54
RW-sx spins 1743336, rounds 50344804, OS waits 1527452
Spin rounds per wait: 627.00 RW-shared, 112.00 RW-excl, 28.88 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 4735765
Purge done for trx's n:o < 4576395 undo n:o < 966042 state: running
History list length 651
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421699342314232, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421699342313312, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
46364 OS file reads, 68 OS file writes, 20 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 3078, seg size 3080, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 9 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 9 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 183794286284
Log flushed up to   183794286284
Pages flushed up to 183794286284
Last checkpoint at  183794286275
0 pending log flushes, 0 pending chkp writes
18 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 274317
Buffer pool size   8191
Free buffers       1
Database pages     8171
Old database pages 3035
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 1471, not young 900778
0.00 youngs/s, 0.00 non-youngs/s
Pages read 40390, created 34, written 41
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 8171, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=12507, Main thread ID=140224065615616, state: enforcing dict cache limit
Number of rows inserted 0, updated 0, deleted 1, read 77284
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

Hi Mr. Tabin,

We do not think that you ran into any bug here, since there was no assert due to the wait longer then innodb_lock_wait_timeout.

What you present is only a problem that, in majority of cases, stems from imperfections in entity design and in the application which either waits for the end-user input or some other processing.

Hi Mr. Tabin,

There are also background threads in InnoDB SE.

Those can wait on latches too ....  This can be improved by better configuration, better hardware and better maintenance, like defragmentation or similar.

Shall we add a paramter to control the auto crash time,instead of use the 600 seconds?

Hi Mr. F,

That value is not settable.

Hi Mr. F,

Jut for your information, there is an internal feature request that is filed in order to make this variable configurable, but it is unknown if and when will it be implemented.