Bug #78277 | InnoDB deadlock, thread stuck on kernel calls from transparent page compression | ||
---|---|---|---|
Submitted: | 30 Aug 2015 16:57 | Modified: | 7 Mar 2020 13:22 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.7.8 | OS: | CentOS |
Assigned to: | CPU Architecture: | Any |
[30 Aug 2015 16:57]
Mark Callaghan
[30 Aug 2015 17:45]
Mark Callaghan
I have two servers with CentOS 6.6 and Linux kernel 3.10.53. Both had InnoDB deadlocks from long lock waits. One didn't get stuck so the background InnoDB thread killed the server and I didn't get thread stacks. That was during a DROP DATABASE command and the InnoDB dictionary and the filesystem were out of sync after recovery -- some ibd files were missing.
[31 Aug 2015 1:46]
Mark Callaghan
Reboot #2 has arrived for the host with CentOS 6.6 and kernel 3.10.53. Did kill -9 on mysqld and the process is still there in state Z after 40 minutes. Delete the database directory but space is still in use according to df. I assume that space won't be released until the zombie process goes away. I give up on trying to use transparent compression on this host.
[4 Sep 2015 0:50]
Mark Callaghan
On the host for which the feature seemed to work there are still many problems. http://smalldatum.blogspot.com/2015/09/second-day-with-innodb-transparent-page.html
[8 Sep 2015 18:16]
Mark Callaghan
A test where DROP TABLE is 43X slower with transparent page compression http://smalldatum.blogspot.com/2015/09/third-day-with-innodb-transparent-page.html
[1 Oct 2015 15:13]
Mark Callaghan
For the setup described in http://bugs.mysql.com/bug.php?id=78672 I use recent ubuntu (14.04) with XFS and DROP DATABASE takes 247 seconds after running linkbench for ~14 hours. The database is 3 non-partitioned tables with ~10G of data. Had I run this for a larger database there would have been a long lock wait and InnoDB would have killed mysqld. For ext4 with old-style compression and a 14G database it take 1 second for DROP DATABASE. I am trying to get a similar test for ext4 but bug 78672 gets in the way.
[1 Oct 2015 16:02]
Mark Callaghan
For ext4 I did DROP TABLE immediately after the load step. So this wasn't subject to 14 hours of page writes with holepunch per write. The database was ~11G before the drop and the drop took 175 seconds. This is all because unlink() is slow after holepunch. The problem reproduces on very system I have tested, and the response I get after each discussion is that it works, just on something I have yet to try. Maybe ReiserFS?
[2 Oct 2015 2:53]
Mark Callaghan
When unlink() during DROP DATABASE is slow then the InnoDB dict mutex is locked and other InnoDB DDL is blocked for however many minutes the unlink takes. Here is PMP output showing where the threads are stuck. unlink,os_file_delete_func,pfs_os_file_delete_func,fil_delete_tablespace,row_drop_single_table_tablespace,row_drop_table_for_mysql,ha_innobase::delete_table,ha_delete_table,mysql_rm_table_no_locks,mysql_rm_db,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone pthread_cond_wait@@GLIBC_2.3.2,wait,os_event::wait_low,os_event_wait_low,sync_array_wait_event,wait,spin_and_wait,enter,PolicyMutex<TTASEventMutex<GenericPolicy>,dict_table_open_on_name,create_table_info_t::create_table_update_dict,ha_innobase::create,ha_create_table,rea_create_table,create_table_impl,mysql_create_table_no_lock,mysql_create_table,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
[2 Oct 2015 2:58]
Mark Callaghan
And inserts into existing InnoDB tables are blocked for the duration of the unlink() call. unlink,os_file_delete_func,pfs_os_file_delete_func,fil_delete_tablespace,row_drop_single_table_tablespace,row_drop_table_for_mysql,ha_innobase::delete_table,ha_delete_table,mysql_rm_table_no_locks,mysql_rm_db,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,c pthread_cond_wait@@GLIBC_2.3.2,wait,os_event::wait_low,os_event_wait_low,sync_array_wait_event,wait,spin_and_wait,enter,PolicyMutex<TTASEventMutex<GenericPolicy>,dict_table_open_on_name,ha_innobase::open,handler::ha_open,open_table_from_share,open_table,open_and_process_table,open_tables,open_tables_for_query,Sql_cmd_insert::mysql_insert,Sql_cmd_insert::execute,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
[2 Oct 2015 15:02]
Mark Callaghan
More fun with a database that was ~50GB on disk. Slow DROP TABLE and slow DROP DATABASE are not good. time bin/mysql -uroot -ppw -e 'drop database linkdb' real 14m23.933s ... --- And then I repeated the test and attempted an insert into an innodb table in a database that wasn't being dropped after the slow drop table started ... 2015-10-02T14:58:49.198385Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
[2 Oct 2015 15:25]
Mark Callaghan
Rebooted my host with XFS after the test to reclaim kernel memory from XFS metadata and the filesystem cam back corrupt. Maybe too much holepunching.
[8 Oct 2015 21:42]
Sunny Bains
I did some testing on 5.7.9. and could not reproduce the problem. I also compared the performance of old compression, transparent page compression and no compression on XFS v5.7.8 calls punch hole on every write and that could be one source of this problem: Configuration: Linux supra06 3.8.13-98.2.2.el7uek.x86_64 #2 SMP Tue Sep 15 16:23:09 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux Same config as mentioned here: http://mysqlserverteam.com/innodb-transparent-page-compression/ The test is a short one, will run a longer test, probably 24hrs over the weekend. linkbench -D requesters=64 -Dmaxid1=1000000001 -c config/MyConfig.properties -D requests=500000000 -D maxtime=1800 -r Without compression: INFO 2015-10-08 11:12:43,451 [main]: REQUEST PHASE COMPLETED. 125664086 requests done in 1802 seconds. Requests/second = 69727 mysql> select name, file_size/(1024*1024) as size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+------------+ | name | size_in_mb | +-------------------+------------+ | linkdb/linktable | 88860.0000 | | linkdb/counttable | 7528.0000 | | linkdb/nodetable | 32208.0000 | +-------------------+------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (8.00 sec) With transparent compression: INFO 2015-10-08 12:48:17,163 [main]: REQUEST PHASE COMPLETED. 159619892 requests done in 1802 seconds. Requests/second = 88561 mysql> select name, ((file_size-allocated_size)*100)/(file_size+1) as compressed_pct, allocated_size/(1024*1024) as allocated_size_in_mb, file_size/(1024*1024) as file_size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+----------------+----------------------+-----------------+ | name | compressed_pct | allocated_size_in_mb | file_size_in_mb | +-------------------+----------------+----------------------+-----------------+ | linkdb/linktable | 46.2316 | 48378.6484 | 89976.0000 | | linkdb/counttable | 40.7257 | 4680.3008 | 7896.0000 | | linkdb/nodetable | 21.6917 | 25425.1406 | 32468.0000 | +-------------------+----------------+----------------------+-----------------+ 3 rows in set (0.00 sec) The Req/s is actually much better with transparent compression enabled. This matches EXT4 BTW. I haven't had time to figure out exactly why. However, drop table does take *much* longer on XFS (compressed). mysql> drop database linkdb; Query OK, 3 rows affected (3 min 35.49 sec) As a comparison, the old compression results are: mysql> select name, file_size/(1024*1024) as size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+------------+ | name | size_in_mb | +-------------------+------------+ | linkdb/linktable | 43480.0000 | | linkdb/counttable | 4512.0000 | | linkdb/nodetable | 20684.0000 | +-------------------+------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (5.12 sec) INFO 2015-10-08 08:47:37,565 [main]: REQUEST PHASE COMPLETED. 33765045 requests done in 1802 seconds. Requests/second = 18735 Conclusion: Old compression performance is 4.5x worse than the transparent page compression Old compression file size only a little bit better than transparent page compression. Block on drop table needs to be fixed. Probably rename the file and use a background thread to unlink that file. Which will help the general case too. A 24hr test will hopefully confirm these results.
[9 Oct 2015 22:04]
Mark Callaghan
The primary problem is unlink() taking too long. You reproduced it.
[13 Oct 2015 4:19]
Sunny Bains
With a background thread the database drop time is now down to: mysql> select name, ((file_size-allocated_size)*100)/file_size as compressed_pct, allocated_size/(1024*1024) as allocated_size_in_mb, file_size/(1024*1024) as file_size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+----------------+----------------------+-----------------+ | name | compressed_pct | allocated_size_in_mb | file_size_in_mb | +-------------------+----------------+----------------------+-----------------+ | linkdb/linktable | 45.4606 | 44213.9844 | 81068.0000 | | linkdb/counttable | 44.3847 | 3105.5586 | 5584.0000 | | linkdb/nodetable | 24.4819 | 21320.2734 | 28232.0000 | +-------------------+----------------+----------------------+-----------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (29.00 sec) Next, figure out where the 29 sec overhead is coming from.
[14 Oct 2015 22:30]
Sunny Bains
ysql> select name, ((file_size-allocated_size)*100)/file_size as compressed_pct, allocated_size/(1024*1024) as allocated_size_in_mb, file_size/(1024*1024) as file_size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+----------------+----------------------+-----------------+ | name | compressed_pct | allocated_size_in_mb | file_size_in_mb | +-------------------+----------------+----------------------+-----------------+ | linkdb/linktable | 45.4656 | 44199.0742 | 81048.0000 | | linkdb/counttable | 44.3970 | 3107.0977 | 5588.0000 | | linkdb/nodetable | 24.4180 | 21401.8047 | 28316.0000 | +-------------------+----------------+----------------------+-----------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (3.06 sec) With the background drop table we can side step the issue.
[19 Oct 2015 6:38]
Sunny Bains
mysql> select name,((file_size-allocated_size)*100)/file_size as compressed_pct, allocated_size/(1024*1024) as allocated_size_in_mb,file_size/(1024*1024) as file_size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+----------------+----------------------+-----------------+ | name | compressed_pct | allocated_size_in_mb | file_size_in_mb | +-------------------+----------------+----------------------+-----------------+ | linkdb/linktable | 45.3182 | 44143.5625 | 80728.0000 | | linkdb/counttable | 43.9180 | 3109.1875 | 5544.0000 | | linkdb/nodetable | 23.6898 | 16400.5977 | 21492.0000 | +-------------------+----------------+----------------------+-----------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (22.09 sec) On NVMFS without any background queue. XFS just happens to be really bad at it.
[19 Oct 2015 15:56]
Mark Callaghan
Clearly this is a problem with XFS and not that InnoDB is misusing a feature. Slow on XFS, slow on ext4, btrfs core team tells me it will be slow there. But we can celebrate that it isn't slow on NVMFS - closed source, not GA, can't even find out where to buy it, not aware of anyone running it.
[19 Oct 2015 22:44]
Sunny Bains
I can't comment on how many users NVMFS has :-). But the driver we got we were told is GA and production ready. I have no information beyond that. The drop time goes something like this: With punch hole, EXT4 is 2x worse and XFS is 3x worse than EXT4. This suggests that XFS needs to be at least as fast as EXT4. EXT4 no compression: mysql> select name, file_size/(1024*1024) as size_in_mb from information_schema.INNO +-------------------+------------+ | name | size_in_mb | +-------------------+------------+ | linkdb/linktable | 80716.0000 | | linkdb/counttable | 5544.0000 | | linkdb/nodetable | 27924.0000 | +-------------------+------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (26.13 sec) mysql> select name, ((file_size-allocated_size)*100)/file_size as compressed_pct, allocated_size/(1024*1024) as size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+----------------+------------+ | name | compressed_pct | size_in_mb | +-------------------+----------------+------------+ | linkdb/linktable | 45.3242 | 44149.6367 | | linkdb/counttable | 43.9315 | 3108.4375 | | linkdb/nodetable | 24.7472 | 21004.5703 | +-------------------+----------------+------------+ 3 rows in set (0.00 sec) mysql> drop database linkdb; Query OK, 3 rows affected (50.54 sec)
[20 Oct 2015 18:55]
Mark Callaghan
Why does XFS need to match ext4 on this? A likely response from the XFS team is that InnoDB is misusing this feature. But it probably is better for you to ask them than to rely on my speculation.
[20 Oct 2015 23:31]
Sunny Bains
Because it demonstrates that the performance can be improved. There is no theoretical limit on this.
[21 Oct 2015 20:39]
Sunny Bains
For completeness, this is the latest SanDisk driver on the same HW and host, this driver is not GA. The results are very good and worth sharing. mysql> select name,((file_size-(allocated_size - 1024))*100)/file_size as compressed_pct, allocated_size/(1024*1024) as allocated_size_in_mb,file_size/(1024*1024) as file_size_in_mb from information_schema.INNODB_SYS_TABLESPACES WHERE name like 'linkdb%'; +-------------------+----------------+----------------------+-----------------+ | name | compressed_pct | allocated_size_in_mb | file_size_in_mb | +-------------------+----------------+----------------------+-----------------+ | linkdb/linktable | 57.1966 | 40005.7769 | 93464.0000 | | linkdb/counttable | 58.5372 | 3729.9961 | 8996.0000 | | linkdb/nodetable | 34.7377 | 21732.3477 | 33300.0000 | +-------------------+----------------+----------------------+-----------------+ 3 rows in set (0.00 sec) mysql> drop database linked; Query OK, 3 rows affected (9.78 sec)
[22 Oct 2015 10:53]
Mark Callaghan
Great, but the only people who talk about using NVMFS with InnoDB are the MariaDB and MySQL core teams.
[7 Feb 2020 13:22]
MySQL Verification Team
Hello Mr. Callaghan, Thank you for the correspondence, so far ... Do you still have issues that you describe ??? Do you need to ask additional questions ?? Hence, do you want to leave this report open or can it be closed ??? Have you tested latest 5.7 and 8.0 ???? Thanks in advance.
[8 Mar 2020 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".