Bug #30410 | "INSERT DELAYED" queries hang during massive "DELETE" operation | ||
---|---|---|---|
Submitted: | 14 Aug 2007 16:13 | Modified: | 23 Nov 2007 16:14 |
Reporter: | Vesselin Bakalov | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S2 (Serious) |
Version: | 5.0.38 | OS: | Linux |
Assigned to: | Kristofer Pettersson | CPU Architecture: | Any |
Tags: | DELAYED, delete, hang, insert, query |
[14 Aug 2007 16:13]
Vesselin Bakalov
[14 Aug 2007 16:16]
Vesselin Bakalov
The source code for the test application. Build with `gcc main.c -lmysqlclient_r`
Attachment: main.c (text/x-csrc), 2.16 KiB.
[14 Aug 2007 16:21]
Vesselin Bakalov
In our production environment, when the "INSERT DELAYED" queries are hanging, the number of "Not_flushed_delayed_rows" never gets higher than 1400.
[14 Aug 2007 21:48]
MySQL Verification Team
Thank you for the bug report. mysql> show processlist\G *************************** 1. row *************************** Id: 3 User: root Host: localhost db: test Command: Query Time: 225 State: waiting for handler insert Info: INSERT DELAYED INTO shortterm (time,poll_id,value) VALUES ('1187127320','6796','11891') *************************** 2. row *************************** Id: 4 User: DELAYED Host: localhost db: test Command: Delayed insert Time: 227 State: upgrading lock Info: *************************** 3. row *************************** Id: 5 User: root Host: localhost db: test Command: Query Time: 0 State: NULL Info: show processlist *************************** 4. row *************************** Id: 6 User: root Host: localhost db: test Command: Query Time: 227 State: updating Info: DELETE FROM shortterm WHERE time < '1187127018' 4 rows in set (0.00 sec)
[20 Aug 2007 13:05]
Konstantin Osipov
Have you checked that the delayed subsystem has not run out of the queue buffers? Please check that your delayed_queue_size is substantially big.
[20 Aug 2007 15:27]
Vesselin Bakalov
In the original situation, we have failed with a queue size of "8,000,000" (also "4,000,000" and "100,000" and the default).
[30 Aug 2007 10:46]
Konstantin Osipov
Kristofer, this is likely not a bug, the delayed thread simply runs out of buffer space. Please re-verify the attached test case.
[28 Sep 2007 10:21]
Kristofer Pettersson
Running the attached test case with a queue size of 1000 items I could conclude that when the incoming data stream has higher volume than what can be absorbed by the IO-subsystem in the storage engine, the server will quickly reach the limit of the delayed_queue_size (1000). Increasing the size of the queue only postpones the time when the limit is reached, but does not remove the limit altogether. When the maximum number of records in the delayed queue is reached, the INSERT DELAYED will behave as a regular INSERT until the IO-subsystem has caught up by being able to consume more records. This explains the effect when DELETE blocks INSERT DELAYED threads almost immediately. Having a large enough delayed_queue_size allows one to accept new INSERTs even if the performance ratio (inserts_per seconds / inserts_written_to_disk) is greater than one. However, if the time of the surplus production is too long, there obviously won't be a delayed_queue_size which is big enough.
[5 Nov 2007 16:15]
Vesselin Bakalov
Hi guys. Fortunately (or unfortunately, as the case may be), in one of our production environments, we are having the locking issue with INSERT DELAYED queries again. It happens every 10 minutes for about 30 seconds. We basically have 25,000,000+ rows in a memory table, and we're trying to do INSERT DELAYED into it very frequently (trimmed every 10 minutes to retain two hours of data--this is when the table locks; we do a DELETE on it). What information do you need from us ("SHOW VARIABLES LIKE ...", etc.) to better understand the issue? We can put a script together that will query the database for a few minutes before the issue until a few minutes after it if that will help.
[6 Nov 2007 6:43]
Kristofer Pettersson
I would be helpful if you produced some relevant graphs plotting the performance and behavior. Then we could more easily agree on the existence of any bugs and how to deal with them.
[12 Nov 2007 23:35]
Matthew Lord
Here is a test that I think shows the crux of the problem which is that WRITE locks are preventing INSERT DELAYED statements from going into the delayed_queue: Open 2 database connections. Connection 1: set global delayed_queue_size=20; use test; CREATE TABLE `ditest` ( `id` int(10) unsigned NOT NULL auto_increment, `name` char(50) default NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM; insert into ditest (name) values ("matt"), ("faith"), ("sid"), ("jerry"), ("xena"); lock table ditest write; Connection 2: use test; insert delayed into ditest (name) values ("xxx"); show processlist; -------- You'll notice that the state for the insert delayed is "waiting for handler open" as it blocks. That means that we're here in delayed_get_table: /* Wait until table is open */ thd->proc_info="waiting for handler open"; while (!di->thd.killed && !di->table && !thd->killed) { pthread_cond_wait(&di->cond_client, &di->mutex); } Why do we not just throw this statement into the delayed queue as we do with the above test if we instead take a READ lock on ditest instead of a WRITE lock? This was tested with 5.0.46 on x86 linux.
[13 Nov 2007 10:50]
Konstantin Osipov
Matt, I can't repeat your test case. The other connection does not hang, the insert goes through, bot on 5.0-bk and 5.1-bk: This is how it looks on 5.0: Database changed mysql> insert delayed into ditest (name) values ("xxx"); Query OK, 1 row affected (0.00 sec) mysql> mysql> show processlist; +----+--------+-----------+------+---------+------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+--------+-----------+------+---------+------+-------+------------------+ | 1 | kostja | localhost | test | Sleep | 11 | | NULL | | 2 | kostja | localhost | test | Query | 0 | NULL | show processlist | +----+--------+-----------+------+---------+------+-------+------------------+ 2 rows in set (0.00 sec) You could contact me on IRC so that we reproduce it step-by step.
[13 Nov 2007 14:45]
Vesselin Bakalov
Konstantin / Kristofer / Matt I can not get Matt's proof to work either, but I definitely have a server that does it in production under normal (albeit heavy) load. We can do a WebEx and you can observe the logs - it happens every 10 minutes on the dot. Please call me at 302-368-5650 and we can get things rolling.
[13 Nov 2007 17:40]
Matthew Lord
I could no longer repeat the issue with my test case using 5.0.50.
[22 Nov 2007 15:36]
Kristofer Pettersson
If the bug can't be reproduced with the latest version, can we degrade this bug to an open-but-not-verified state?
[23 Nov 2007 16:14]
Vesselin Bakalov
The bug is still exhibited with 5.0.50 Enterprise compiled from source. Although I can not give you a concise test to show it at this time, it is most definitely there. If anyone would like to get on a WebEx, I can show you the bug in action.