Bug #33480 | (Falcon) deadlock on inserts into a partitioned auto_increment table | ||
---|---|---|---|
Submitted: | 22 Dec 2007 13:44 | Modified: | 2 May 2008 23:06 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.4 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[22 Dec 2007 13:44]
Philip Stoev
[22 Dec 2007 13:45]
Philip Stoev
Gypsy script for reproducing bug #33480
Attachment: bug33480.gypsy (text/plain), 232 bytes.
[22 Dec 2007 13:53]
Philip Stoev
Innodb with Partitioning is also affected, MyISAM is not. All table types without partitioning are not affected. Please set the correct category for this bug -- Falcon appears to be affected more since deadlock will be permanent unless a falcon timeout value is configured.
[22 Dec 2007 15:17]
Philip Stoev
Falcon with --falcon-lock-timeout=500 allows queries to proceed with "mysql error Lock wait timeout exceeded; try restarting transaction, (1205)", however after a while the deadlock becomes permanent and no further queries can proceed. Kill inside the mysql client does not help and mysqladmin shutdown hangs. By contrast, Falcon with --innodb_lock_wait_timeout=1 continues to function and shutdown is not a problem.
[23 Dec 2007 12:42]
Philip Stoev
I meant that Innodb with a lock timeout does not exhibit that behavoir.
[23 Dec 2007 15:43]
MySQL Verification Team
philip, use gyspy's --generate-testcase=1 option to have a testcase.c file written in the current directory. this testcase.c will be more or less equivalent testcase, so entire gypsy is not needed for the bug report.
[23 Dec 2007 18:03]
Philip Stoev
Test case for bug 33480 -- compile instructions are in file's header
Attachment: bug33480.c (text/plain), 4.83 KiB.
[23 Dec 2007 18:04]
Philip Stoev
Shane, this is awesome! I have attached a .c file generated from the Gypsy test case. GCC comand line used to compile is in the header of the file (make sure you adjust the include and lib paths). The binary testcase will not produce output, so please use SHOW PROCESSLIST to observe.
[26 Dec 2007 16:46]
Kevin Lewis
Triage values D2/w3/I3 - The potential deadlock comes when the application uses a combination of null autoinc assignments and specified autoinc values. See Bug#33079 for a similar issue with the same test. The first workaround is to use falcon_lock_timeout not equal zero. The second workaround is to not use partitions on an autoinc field, which only randomizes the data into multiple partitions. It is unnecessary. The impact is not widespread.
[26 Dec 2007 18:49]
Philip Stoev
Kevin, please note that all assigments in the test case are to specific values, no NULL-assigments to the auto_inc column are used.
[26 Dec 2007 19:00]
Kevin Lewis
The main insert statement; INSERT INTO t1 (i1) VALUES (FLOOR(rand()*255)) will result in a value of zero one out of 255 times, on average. This causes the autoinc assignment to happen as if the value inserted was null.
[27 Dec 2007 23:44]
Christopher Powers
The deadlock is caused by the mutex in ha_partition::write_row() (see Bug#27405 "Partitioning InnoDB auto_increment") Transaction 73 (thread 18) is waiting on the mutex in ha_partition::write_row(), which is held by transaction 74. Transaction 74 (thread 17) is waiting for transaction 73, hence the deadlock. Thread [18] (Suspended) 15 __lll_mutex_lock_wait() 0x00002adc99b548c8 14 _L_mutex_lock_946() 0x00002adc99b50a06 13 pthread_mutex_lock() 0x00002adc99b5080b 12 ha_partition::write_row() ../libmysqld/ha_partition.cc:2727 0x00000000007a1fc5 11 handler::ha_write_row() ../libmysqld/handler.cc:4600 0x00000000007966b0 10 write_record() ../libmysqld/sql_insert.cc:1548 0x0000000000722dbb 9 mysql_insert() ../libmysqld/sql_insert.cc:803 0x0000000000726849 8 mysql_execute_command() ../libmysqld/sql_parse.cc:2658 0x000000000069c61c 7 mysql_parse() ../libmysqld/sql_parse.cc:5410 0x00000000006a1e56 6 dispatch_command() ../libmysqld/sql_parse.cc:948 0x00000000006a29b6 5 do_command() ../libmysqld/sql_parse.cc:697 0x00000000006a3940 4 handle_one_connection() ../libmysqld/sql_connect.cc:1146 0x000000000069435a 3 start_thread() 0x00002adc99b4e020 2 clone() 0x00002adc9b005f8d 1 <symbol is not available> 0x0000000000000000 Thread [17] (Suspended: Breakpoint hit.) 25 SyncObject::wait() ../storage/falcon/SyncObject.cpp:383 0x000000000085d4c6 24 SyncObject::lock() ../storage/falcon/SyncObject.cpp:259 0x000000000085d8c1 23 Sync::lock() ../storage/falcon/Sync.cpp:66 0x000000000085c17f 22 Transaction::waitForTransaction() ../storage/falcon/Transaction.cpp:974 0x0000000000870500 21 Transaction::getRelativeState() ../storage/falcon/Transaction.cpp:841 0x0000000000870665 20 Transaction::getRelativeState() ../storage/falcon/Transaction.cpp:786 0x000000000087072e 19 Table::checkUniqueRecordVersion() ../storage/falcon/Table.cpp:2463 0x0000000000865331 18 Table::checkUniqueIndex() ../storage/falcon/Table.cpp:2337 0x00000000008656a1 17 Table::checkUniqueIndexes() ../storage/falcon/Table.cpp:2311 0x0000000000865813 16 Table::insert() ../storage/falcon/Table.cpp:2874 0x0000000000867d72 15 StorageDatabase::insert() ../storage/falcon/StorageDatabase.cpp:226 0x0000000000850633 14 StorageTable::insert() ../storage/falcon/StorageTable.cpp:122 0x000000000085691b 13 StorageInterface::write_row() ../storage/falcon/ha_falcon.cpp:977 0x000000000084a099 12 ha_partition::write_row() ../libmysqld/ha_partition.cc:2763 0x00000000007a20b1 11 handler::ha_write_row() ../libmysqld/handler.cc:4600 0x00000000007966b0 10 write_record() ../libmysqld/sql_insert.cc:1548 0x0000000000722dbb 9 mysql_insert() ../libmysqld/sql_insert.cc:803 0x0000000000726849 8 mysql_execute_command() ../libmysqld/sql_parse.cc:2658 0x000000000069c61c 7 mysql_parse() ../libmysqld/sql_parse.cc:5410 0x00000000006a1e56 6 dispatch_command() ../libmysqld/sql_parse.cc:948 0x00000000006a29b6 5 do_command() ../libmysqld/sql_parse.cc:697 0x00000000006a3940 4 handle_one_connection() ../libmysqld/sql_connect.cc:1146 0x000000000069435a 3 start_thread() 0x00002adc99b4e020 2 clone() 0x00002adc9b005f8d 1 <symbol is not available> 0x0000000000000000
[30 Dec 2007 4:23]
Christopher Powers
Same problem with different scenario in Bug#33158, "Server hangs, no new connections are accepted".
[30 Dec 2007 9:02]
Philip Stoev
Actually the duplicate bug is bug #33518.
[3 Jan 2008 15:12]
Kevin Lewis
Mattias, Mikael R suggested that we need to use the mutex in ha_partition::write_row() (see Bug#27405) only if the storage engine is InnoDB. Falcon needs to handle concurrency itself.
[9 Jan 2008 4:26]
Kevin Lewis
Assigning this bug back to the Falcon engine. Mattias provided a very simple change to ha_partition.cc, ~line 2710, in function ha_partition::write_row(); - if (table_share->tmp_table == NO_TMP_TABLE) + if (table_share->tmp_table == NO_TMP_TABLE && m_innodb) This prevents the serialization in the partition engine for autoincrement fields. But then he found that there are still deadlocks/hangs. I ran the gypsy test with 5 threads and found that there is indeed an unhandled deadlock in the Falcon engine. This happens in Table::checkUniqueRecordVersion() where a real transactional deadlock is detected, but an exception is not thrown. Instead, the call to Table::checkUniqueIndexes() is restarted as if it was just a wait, not a deadlock.
[9 Jan 2008 5:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/40733 ChangeSet@1.2773, 2008-01-08 23:22:10-06:00, klewis@klewis-mysql. +2 -0 Bug#33480 - Two kinds of hangs / deadlocks are fixed here. 1) In ha_partition.cc, prevent serialization into Falcon of threads writing to autoincrement fields. Only InnoDB needs this. It can cause deadlocks and timeouts in Falcon which appear as 0% CPU utilization when all threads are hung. When this happens for just a few threads, the serial log can grow indefinitely. 2) If a deadlock happens between threads which are checking for duplicates, the code in Table::checkUniqueRecordVersion() was not handling the detected deadlock by throwing an exception. Now it does. This can cause 100% CPU utilization hangs between threads. One is waiting and the other is looping in Table::checkUnigueIndexes(). This can also cause the serial log to grow indefinitely.
[9 Jan 2008 5:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/40735 ChangeSet@1.2780, 2008-01-08 23:40:07-06:00, klewis@klewis-mysql. +2 -0 Bug#33480 - Two kinds of hangs / deadlocks are fixed here. 1) In ha_partition.cc, prevent serialization into Falcon of threads writing to autoincrement fields. Only InnoDB needs this. It can cause deadlocks and timeouts in Falcon which appear as 0% CPU utilization when all threads are hung. When this happens for just a few threads, the serial log can grow indefinitely. 2) If a deadlock happens between threads which are checking for duplicates, the code in Table::checkUniqueRecordVersion() was not handling the detected deadlock by throwing an exception. Now it does. This can cause 100% CPU utilization hangs between threads. One is waiting and the other is looping in Table::checkUnigueIndexes(). This can also cause the serial log to grow indefinitely.
[11 Jan 2008 17:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/40919 ChangeSet@1.2780, 2008-01-11 11:03:27-06:00, klewis@klewis-mysql. +2 -0 Bug#33480 - Two kinds of hangs / deadlocks are fixed here. 1) In ha_partition.cc, prevent serialization into Falcon of threads writing to autoincrement fields. Only InnoDB needs this. It can cause deadlocks and timeouts in Falcon which appear as 0% CPU utilization when all threads are hung. When this happens for just a few threads, the serial log can grow indefinitely. 2) If a deadlock happens between threads which are checking for duplicates, the code in Table::checkUniqueRecordVersion() was not handling the detected deadlock by throwing an exception. Now it does. This can cause 100% CPU utilization hangs between threads. One is waiting and the other is looping in Table::checkUnigueIndexes(). This can also cause the serial log to grow indefinitely.
[11 Jan 2008 17:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/40924 ChangeSet@1.2781, 2008-01-11 11:22:52-06:00, klewis@klewis-mysql. +2 -0 Bug#33480 - Two kinds of hangs / deadlocks are fixed here. 1) In ha_partition.cc, prevent serialization into Falcon of threads writing to autoincrement fields. Only InnoDB needs this. It can cause deadlocks and timeouts in Falcon which appear as 0% CPU utilization when all threads are hung. When this happens for just a few threads, the serial log can grow indefinitely. 2) If a deadlock happens between threads which are checking for duplicates, the code in Table::checkUniqueRecordVersion() was not handling the detected deadlock by throwing an exception. Now it does. This can cause 100% CPU utilization hangs between threads. One is waiting and the other is looping in Table::checkUnigueIndexes(). This can also cause the serial log to grow indefinitely.
[11 Feb 2008 20:44]
Kevin Lewis
Patch is in mysql-6.0-falcon-team and mysql-6.0-release version 6.0.4
[12 Mar 2008 23:02]
Bugs System
Pushed into 6.0.4-alpha
[2 May 2008 23:06]
Paul DuBois
Noted in 6.0.4 changelog. Multiple simultaneous inserts into a partitioned Falcon table could deadlock.