Bug #82940 mysqld crashes itself when creating index
Submitted: 10 Sep 2016 14:44 Modified: 3 Dec 2018 20:53
Reporter: Tim Smith Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DDL Severity:S1 (Critical)
Version:5.7, 8.0 OS:Any (Windows Server 2012 R2)
Assigned to: CPU Architecture:Any

[10 Sep 2016 14:44] Tim Smith
Description:
mysqld crash itself when creating index on a table with insert operations.

How to repeat:
1. Create an index on a table, using algorithm=inplace mode.
2. After the merge sort phase, the mysqld crashes.

2016-09-10T05:20:53.163031Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 3140 has waited at row0ins.cc line 2912 for 241.00 seconds the semaphore:
S-lock on RW-latch at 000000BA400DEDF8 created in file dict0dict.cc line 2687
a writer (thread id 5020) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: dffffffe
Last time read locked in file row0ins.cc line 2912
Last time write locked in file G:\ade\build\sb_0-19699473-1468326568.64\mysql-5.7.14\storage\innobase\btr\btr0bulk.cc line 53
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 1, pwrites 0

=====================================
2016-09-10 05:21:02 0x66c INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 37 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 9889 srv_active, 0 srv_shutdown, 4 srv_idle
srv_master_thread log flush and writes: 9893
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1210991
--Thread 3140 has waited at row0ins.cc line 2912 for 250.00 seconds the semaphore:
S-lock on RW-latch at 000000BA400DEDF8 created in file dict0dict.cc line 2687
a writer (thread id 5020) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: dffffffe
Last time read locked in file row0ins.cc line 2912
Last time write locked in file G:\ade\build\sb_0-19699473-1468326568.64\mysql-5.7.14\storage\innobase\btr\btr0bulk.cc line 53

...

InnoDB: ###### Diagnostic info printed to the standard error stream
2016-09-10T05:32:45.831540Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2016-09-10 13:32:45 0x43c  InnoDB: Assertion failure in thread 1084 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
05:32:45 UTC - mysqld got exception 0x80000003 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
key_buffer_size=8388608
read_buffer_size=65536
max_used_connections=2
max_threads=151
thread_count=3
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 58347 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
7ff7f4f8bad2    mysqld.exe!my_errno()
7ff7f5337979    mysqld.exe!my_wildcmp_mb()
7ff7f5336870    mysqld.exe!my_wildcmp_mb()
7ff7f508dd38    mysqld.exe!?reserve@?$vector@EV?$allocator@E@std@@@std@@QEAAX_K@Z()
7ff7f508df51    mysqld.exe!?reserve@?$vector@EV?$allocator@E@std@@@std@@QEAAX_K@Z()
7ff7f500b0fd    mysqld.exe!?reserve@?$vector@EV?$allocator@E@std@@@std@@QEAAX_K@Z()
7ff9933313d2    KERNEL32.DLL!BaseThreadInitThunk()
7ff9959354e4    ntdll.dll!RtlUserThreadStart()

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

When mysqld is started again, besides the crash recovery process, the following error also is printed:

2016-09-10T05:36:22.278914Z 0 [Warning] InnoDB: Purge reached the head of the history list, but its length is still reported as 2314242! Make a detailed bug report, and submit it to http://bugs.mysql.com

Suggested fix:
Please fix it.
[24 Oct 2016 13:12] Miguel Solorzano
Please try version 5.7.16. Thanks.
[25 Oct 2016 14:29] Shane Bester
>
>  mysqld crash itself when creating index on a table with insert operations.
>

What is table structure,  and what is the index being created?
How many rows are in the table?
[22 Mar 2018 21:59] Bogdan Kecman
Hi,

Few questions
 - do you get to reproduce this problem *every* time?
 - you say add index, you shown the table structure - what index to add?
 - if I understand correctly you reproduce it with
1. create table t1 ...
2. insert 200m records into table t1
3. alter table t1 add index `newIdx ` (`d`, `f`) ALGORITHM = INPLACE;

and after alter is supposed to be done your mysqld crashes?

thanks
Bogdan
[23 Apr 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[1 Dec 2018 7:23] Shane Bester
See: https://bugs.mysql.com/bug.php?id=87477
[1 Dec 2018 7:59] Shane Bester
To repeat this,  I assume one has to have a continuous stream off inserts coming into the table. 

Then add the index  (while it adds, check innodb status for long semaphore waits in innobase\btr\btr0bulk.cc line 53 ).

Drop the index.

Repeat the add/drop index while inserts keep adding to table size.
Larger the table is,  the longer those line 53 semaphore waits should be?
[1 Dec 2018 9:47] Shane Bester
in my sample testcase after a while i see things like :

mysql> show processlist;
+----+------+-----------+------+---------+------+----------------+--------------------------------------------------------------
| Id | User | Host      | db   | Command | Time | State          | Info
+----+------+-----------+------+---------+------+----------------+--------------------------------------------------------------
|  2 | roto | localhost | test | Query   |   14 | update         | insert into t(c,d,e,f,m,b,l) values(uuid(),uuid(),uuid(),uuid
|  5 | root | localhost | test | Query   |  464 | altering table | alter table t add index idx(q), algorithm=inplace, lock=none
|  6 | roto | localhost | test | Query   |    0 | starting       | show processlist
+----+------+-----------+------+---------+------+----------------+--------------------------------------------------------------

+---------+--------------------+-------------------------+
| inserts | avg inserts/second | max time for one insert |
+---------+--------------------+-------------------------+
| 4100000 |          5421.3225 |               11.807000 |
+---------+--------------------+-------------------------+
1 row in set (12 min 36.27 sec)
...
+---------+--------------------+-------------------------+
| inserts | avg inserts/second | max time for one insert |
+---------+--------------------+-------------------------+
| 5800000 |          1363.0876 |               14.974402 |
+---------+--------------------+-------------------------+
1 row in set (1 hour 10 min 55.05 sec)

----------

+-------------------+
| time to add index |
+-------------------+
|         12.368071 |
+-------------------+
1 row in set (59 min 17.17 sec)

+-------------------+
| time to add index |
+-------------------+
|         14.165798 |
+-------------------+
1 row in set (59 min 36.46 sec)

+-------------------+
| time to add index |
+-------------------+
|         13.523217 |
+-------------------+
1 row in set (59 min 55.69 sec)

+-------------------+
| time to add index |
+-------------------+
|         13.621230 |
+-------------------+
1 row in set (1 hour 14.46 sec)

+-------------------+
| time to add index |
+-------------------+
|        471.157830 |
+-------------------+
1 row in set (1 hour 8 min 10.64 sec)

....
[1 Dec 2018 12:44] Shane Bester
datadir on ramdisk, here's the problem in smaller form.

Attachment: mytest_innodb_status.txt (text/plain), 18.76 KiB.

[3 Dec 2018 8:48] Shane Bester
my processlist and innodb status showing the problem

Attachment: proc1.7z (application/octet-stream, text), 1.90 MiB.

[3 Dec 2018 8:49] Shane Bester
My last upload shows the problem.  I think we're good to verify this is a real problem on tables having couple hundred million rows.   Bogdan can check all this.

Id	User	Host	db	Command	Time	State	Info
12	root	localhost	mysql	Sleep	11		NULL
16	root	localhost	test	Query	2451	altering table	alter table t add index idx(q), algorithm=inplace, lock=none
37	root	localhost	test	Query	171	update	insert into t(c,d,e,f,m,b,l) values(uuid(),uuid(),uuid(),uuid(),floor(rand()*100000),floor(rand()*100000),repeat(uuid(),10))
154974	root	localhost	mysql	Sleep	25		NULL
155110	root	localhost	NULL	Query	0	starting	show full processlist

C:\ade\mysql-advanced-5.7.24-winx64\bin>goto d 

C:\ade\mysql-advanced-5.7.24-winx64\bin>mysql -uroot -e"show engine innodb status;show full processlist;do sleep(1);" 
Type	Name	Status
InnoDB		
=====================================
2018-12-03 09:17:33 0x47cc INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 1 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 144771 srv_active, 0 srv_shutdown, 682 srv_idle
srv_master_thread log flush and writes: 145453
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 14754773
--Thread 3036 has waited at row0ins.cc line 2936 for 172.00 seconds the semaphore:
S-lock on RW-latch at 0000019A27BB06C8 created in file dict0dict.cc line 2737
a writer (thread id 2244) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: dffffffe
Last time read locked in file row0ins.cc line 2936
Last time write locked in file G:\\ade\\build\\sb_0-30854123-1538632997.08\\mysqlcom-5.7.24\\storage\\innobase\\btr\\btr0bulk.cc line 53
OS WAIT ARRAY INFO: signal count 14519389
RW-shared spins 0, rounds 19722040, OS waits 9059445
RW-excl spins 0, rounds 5842250, OS waits 187562
RW-sx spins 10282, rounds 191483, OS waits 5441
Spin rounds per wait: 19722040.00 RW-shared, 5842250.00 RW-excl, 18.62 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 109708898
Purge done for trx's n:o < 108598012 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 283232681808232, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283232681809976, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283232681809104, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283232681806488, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 109708897, ACTIVE 172 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 37, OS thread handle 3036, query id 768978048 localhost ::1 root update
insert into t(c,d,e,f,m,b,l) values(uuid(),uuid(),uuid(),uuid(),floor(rand()*100000),floor(rand()*100000),repeat(uuid(),10))
---TRANSACTION 108850281, ACTIVE 2452 sec
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 16, OS thread handle 2244, query id 762945328 localhost ::1 root altering table
alter table t add index idx(q), algorithm=inplace, lock=none
Trx read view will not see trx with id >= 108850283, sees < 108850282
--------
FILE I/O
--------
[3 Dec 2018 20:53] Bogdan Kecman
I was actually going to report this bug myself but waited to collect more details before I found one already exist. The major issue with the behavior is a storage speed, I never had the issue with this on spindles and could not even reproduce on old SSD's but when I switched to RAID0 with few ssd's it popped up.

hence verified

thanks everyone that worked on this one
b.
[4 Dec 2018 6:44] Shane Bester
NEW TESTCASE, easier to see a problem....

Attachment: 2018-12-04-mytest.sql (application/octet-stream, text), 3.53 KiB.

[4 Dec 2018 6:45] Shane Bester
Bogdan,  it's not a storage speed issue really.
Try my NEW TESTCASE I've uploaded on fast storage if you wish.

It gets progressively worse as the table reaches 100m,  then 200m rows ....

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:02:36.242565 |     1527396 |               13.906059 |
+----------------------------+-------------+-------------------------+
1 row in set (1 min 59.62 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:04:52.353065 |     3272874 |               15.146061 |
+----------------------------+-------------+-------------------------+
1 row in set (4 min 15.73 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:07:20.293676 |     5132723 |               16.593053 |
+----------------------------+-------------+-------------------------+
1 row in set (6 min 43.68 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:09:55.702783 |     7056367 |               18.529590 |
+----------------------------+-------------+-------------------------+
1 row in set (9 min 19.08 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:12:37.633985 |     9041517 |               19.617059 |
+----------------------------+-------------+-------------------------+
1 row in set (12 min 1.01 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:15:27.159192 |    11106178 |               21.221147 |
+----------------------------+-------------+-------------------------+
1 row in set (14 min 50.54 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:18:24.615798 |    13294656 |               22.486048 |
+----------------------------+-------------+-------------------------+
1 row in set (17 min 48.00 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:21:27.555321 |    15604019 |               24.370060 |
+----------------------------+-------------+-------------------------+
1 row in set (20 min 50.94 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:24:40.548847 |    18025429 |               26.383068 |
+----------------------------+-------------+-------------------------+
1 row in set (24 min 3.93 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:28:06.725599 |    20555769 |               28.684078 |
+----------------------------+-------------+-------------------------+
1 row in set (27 min 30.11 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:31:40.867069 |    23143998 |               30.539068 |
+----------------------------+-------------+-------------------------+
1 row in set (31 min 4.25 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:35:23.810573 |    25858786 |               32.602071 |
+----------------------------+-------------+-------------------------+
1 row in set (34 min 47.19 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:39:45.982168 |    29081415 |               34.329068 |
+----------------------------+-------------+-------------------------+
1 row in set (39 min 9.36 sec)

+----------------------------+-------------+-------------------------+
| now(6)                     | new inserts | max time for one insert |
+----------------------------+-------------+-------------------------+
| 2018-12-04 08:44:39.914171 |    32614872 |               37.295837 |
+----------------------------+-------------+-------------------------+
1 row in set (44 min 3.30 sec)

......

eventually it takes longer than innodb is willing to wait .....
[4 Dec 2018 6:52] Shane Bester
On my system,  if entire ALTER TABLE .. ADD KEY ..  takes 'x' seconds,  the semaphore wait would be around 'x/6'  seconds long.  So to get 900 seconds timeout for innodb,  entire ALTER should take around 3-4 hours afaict.

It's also possible my testcase will require a much larger than default
innodb_online_alter_log_max_size since we're inserting rows continuously.
[4 Dec 2018 19:37] Bogdan Kecman
Yup, this new test case make it rather easy to reproduce everywhere.
[9 Dec 2018 5:22] monty solomon
InnoDB crashed during a long running ALTER TABLE statement while running Percona Server 5.7.24-26.

2018-12-08T05:21:52.665720Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2018-12-08 05:21:52 0x7f4c65ce4700  InnoDB: Assertion failure in thread 139966102259456 in file ut0ut.cc line 947
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
05:21:52 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=67108864
read_buffer_size=1048576
max_used_connections=12
max_threads=1001
thread_count=9
connection_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1359714 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xedf27c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0xd206f1]
/lib64/libpthread.so.0(+0xf7e0)[0x7f59fb3b87e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f59f95f1625]
/lib64/libc.so.6(abort+0x175)[0x7f59f95f2e05]
/usr/sbin/mysqld[0x777028]
/usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0xee)[0x1136dbe]
/usr/sbin/mysqld(srv_error_monitor_thread+0xaf0)[0x10d18e0]
/lib64/libpthread.so.0(+0x7aa1)[0x7f59fb3b0aa1]
/lib64/libc.so.6(clone+0x6d)[0x7f59f96a793d]
[9 Dec 2018 5:22] monty solomon
Events status:
LLA = Last Locked At  LUA = Last Unlocked At
WOC = Waiting On Condition  DL = Data Locked

Event scheduler status:
State      : INITIALIZED
Thread id  : 0
LLA        : n/a:0
LUA        : n/a:0
WOC        : NO
Workers    : 0
Executed   : 0
Data locked: NO

Event queue status:
Element count   : 0
Data locked     : NO
Attempting lock : NO
LLA             : drop_schema_events:342
LUA             : drop_schema_events:344
WOC             : NO
Next activation : never
2018-12-08T05:10:00.660102Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139965939672832 has waited at row0ins.cc line 2953 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x7f4bbc1a9018 created in file dict0dict.cc line 2751
a writer (thread id 139965938075392) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdffffffd
Last time read locked in file row0ins.cc line 2953
Last time write locked in file /mnt/workspace/percona-server-5.7-redhat-binary-rocks-new/label_exp/min-centos-6-x64/test/rpmbuild/BUILD/percona-server-5.7.24-26/percona-server-5.7.24-26/storage/innobase/btr/btr0bulk.cc line 53
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 1, pwrites 1
[9 Dec 2018 5:22] monty solomon
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 32325
--Thread 139965939672832 has waited at row0ins.cc line 2953 for 256.00 seconds t
he semaphore:
S-lock on RW-latch at 0x7f4bbc1a9018 created in file dict0dict.cc line 2751
a writer (thread id 139965938075392) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdffffffe
Last time read locked in file row0ins.cc line 2953
Last time write locked in file /mnt/workspace/percona-server-5.7-redhat-binary-r
ocks-new/label_exp/min-centos-6-x64/test/rpmbuild/BUILD/percona-server-5.7.24-26
/percona-server-5.7.24-26/storage/innobase/btr/btr0bulk.cc line 53
OS WAIT ARRAY INFO: signal count 63932
RW-shared spins 0, rounds 133592, OS waits 23111
RW-excl spins 0, rounds 191153, OS waits 1431
RW-sx spins 29392, rounds 163477, OS waits 980
Spin rounds per wait: 133592.00 RW-shared, 191153.00 RW-excl, 5.56 RW-sx
[9 Dec 2018 5:23] monty solomon
INNODB MONITOR OUTPUT

Attachment: monitor.txt (text/plain), 11.48 KiB.

[9 Dec 2018 5:30] monty solomon
I can't update the version or OS above.

Our crash was on version 5.7.24 on CentOS 6.10.
[14 Dec 2018 13:37] Sinisa Milivojevic
A described assert is not related to this bug.

This is about long semaphore waits, where the assert is fully intentional.

And it is not a crash, but controlled assert. Those asserts are fully described in our Reference Manual.
[17 Apr 3:01] Satya Bodapati
What the _____ is "controlled" assert?

and you say it is NOT crash? How can you say that? Most crashes are because of corruption and release build assert?

This bug is in verified state and you still you wrote "and it is NOT a crash".

Despite of whatever clever language you use, there is crash in release build and it is loss of service (debug build asserts are fine, that I understand)