Bug #30410 "INSERT DELAYED" queries hang during massive "DELETE" operation
Submitted: 14 Aug 2007 16:13 Modified: 23 Nov 2007 16:14
Reporter: Vesselin Bakalov Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.0.38 OS:Linux
Assigned to: Kristofer Pettersson CPU Architecture:Any
Tags: DELAYED, delete, hang, insert, query

[14 Aug 2007 16:13] Vesselin Bakalov
Description:
We have some pretty massive memory tables that store shortterm data for a couple of hours before the data is organized and flushed to longterm (disk) storage.

In order to prevent our polling threads from getting hung up due to table locking, we do "INSERT DELAYED" to insert our data into the memory tables.

We've noticed that with HUGE numbers ("HUGE", but very, very common), a delete operation (also on huge amounts of data in the table) will cause the "INSERT IGNORE" queries to *wait* until the delete has completed before continuing.

The reason that we use "INSERT DELAYED" is to *prevent* the calling threads from waiting on mysql; they should fire-and-forget their insert queries.

Every 10 minutes, we trim the data from the shortterm tables.

----

In our production environment, we have the following settings as well:
delayed_insert_limit: 3000
delayed_insert_timeout: 300
delayed_queue_size: 8000000

How to repeat:
I have included a sample application.

Steps:
1. ./a.out --create
2. ./a.out --insert
3. (Wait a long time... get about 2 million rows).
4. Concurrently: ./a.out --delete

You will see that the insertions halt for a little bit while the delete occurs.  This is only noticeable with large amounts of items being deleted and inserted.

----

The "--delete" version deletes all data older than 5 minutes.  So, you'll have to have this guy up for a little while to work.

----

#include <stdlib.h>
#include <stdio.h>
#include <mysql/mysql.h>

enum {
        MODE_DELETE,
        MODE_INSERT,
        MODE_CREATE
};

int main( int parameterCount, char** parameters ) {
        int i = 0;

        int mode = MODE_DELETE;

        MYSQL* link = NULL;
        my_bool trueval = 1;
        char query[1024];

        time_t currentTime;
        time( &currentTime );

        for( i = 1; i < parameterCount; i++ ) {
                char* parameter = parameters[ i ];
                if( strcmp(parameter,"--delete") == 0 ) {
                        mode = MODE_DELETE;
                } else if( strcmp(parameter,"--insert") == 0 ) {
                        mode = MODE_INSERT;
                } else if( strcmp(parameter,"--create") == 0 ) {
                        mode = MODE_CREATE;
                } else {
                        printf(
                                "Usage:\n"
                                "   %s [ { --insert | --delete | --create } ]\n"
                                "\n"
                                "",
                                parameters[0]
                        );
                        exit( 1 );
                }
        }

        if( (link = mysql_init(NULL)) == NULL ) {
                printf("!!! Cannot initialize the MySQL database.\n");
                exit( 1 );
        }

        mysql_options( link, MYSQL_OPT_RECONNECT, &trueval );
        if( !mysql_real_connect(link, "127.0.0.1", "root","", "test", 0, NULL, 0) ) {
                printf("!!! Could not connect to database.\n");
                exit( 1 );
        }
        mysql_select_db( link, "test" );

        if( mode == MODE_DELETE ) {
                sprintf( query,
                        "DELETE FROM shortterm WHERE time < '%lu'"
                        "",
                        currentTime - 300
                );
                printf("<<< %s\n", query );
                mysql_real_query( link, query, strlen(query) );
        } else if( mode == MODE_CREATE ) {
                sprintf( query,
                        "DROP TABLE IF EXISTS shortterm"
                        ""
                );
                printf("<<< %s\n", query );
                mysql_query( link, query );

                sprintf( query,
                        "CREATE TABLE shortterm (time INT, poll_id INT, value BIGINT, PRIMARY KEY(time,poll_id), KEY(poll_id)) "
                        ""
                );
                printf("<<< %s\n", query );
                mysql_real_query( link, query, strlen(query) );
        } else if( mode == MODE_INSERT ) {
                for( i = 0; i < 1000000000; i++ ) {
                        time_t endTime;

                        time( &currentTime );

                        sprintf( query,
                                "INSERT DELAYED INTO shortterm (time,poll_id,value) VALUES ('%lu','%lu','%lu') "
                                "",
                                currentTime, i % 100000, ( currentTime + i ) / 100000
                        );

                        time( &currentTime );
                        printf("<<< %s\n", query );
                        mysql_real_query( link, query, strlen(query) );
                        time( &endTime );
                        printf("--- (%ld seconds)\n", endTime-currentTime );
                }
        }

        return( 0 );
}

----

gcc main.c -lmysqlclient_r

----

Suggested fix:
Delayed inserts should not hang when the table is in use; that's the point of doing delayed inserts.
[14 Aug 2007 16:16] Vesselin Bakalov
The source code for the test application.  Build with `gcc main.c -lmysqlclient_r`

Attachment: main.c (text/x-csrc), 2.16 KiB.

[14 Aug 2007 16:21] Vesselin Bakalov
In our production environment, when the "INSERT DELAYED" queries are hanging, the number of "Not_flushed_delayed_rows" never gets higher than 1400.
[14 Aug 2007 21:48] MySQL Verification Team
Thank you for the bug report.

mysql> show processlist\G
*************************** 1. row ***************************
     Id: 3
   User: root
   Host: localhost
     db: test
Command: Query
   Time: 225
  State: waiting for handler insert
   Info: INSERT DELAYED INTO shortterm (time,poll_id,value) VALUES ('1187127320','6796','11891')
*************************** 2. row ***************************
     Id: 4
   User: DELAYED
   Host: localhost
     db: test
Command: Delayed insert
   Time: 227
  State: upgrading lock
   Info: 
*************************** 3. row ***************************
     Id: 5
   User: root
   Host: localhost
     db: test
Command: Query
   Time: 0
  State: NULL
   Info: show processlist
*************************** 4. row ***************************
     Id: 6
   User: root
   Host: localhost
     db: test
Command: Query
   Time: 227
  State: updating
   Info: DELETE FROM shortterm WHERE time < '1187127018'
4 rows in set (0.00 sec)
[20 Aug 2007 13:05] Konstantin Osipov
Have you checked that the delayed subsystem has not run out of the queue buffers?
Please check that your delayed_queue_size is substantially big.
[20 Aug 2007 15:27] Vesselin Bakalov
In the original situation, we have failed with a queue size of "8,000,000" (also "4,000,000" and "100,000" and the default).
[30 Aug 2007 10:46] Konstantin Osipov
Kristofer, this is likely not a bug, the delayed thread simply runs out of buffer space.
Please re-verify the attached test case.
[28 Sep 2007 10:21] Kristofer Pettersson
Running the attached test case with a queue size of 1000 items I
could conclude that when the incoming data stream has higher
volume than what can be absorbed by the IO-subsystem in the
storage engine, the server will quickly reach the limit of the
delayed_queue_size (1000). Increasing the size  of the queue only
postpones the time when the limit is reached, but does not remove
the limit altogether. When the maximum number of records in the
delayed queue is reached, the INSERT DELAYED will behave as a
regular INSERT until the IO-subsystem has caught up by being able
to consume more records. This explains the effect when DELETE
blocks INSERT DELAYED threads almost immediately. 

Having a large enough delayed_queue_size allows one to accept new INSERTs
even if the performance ratio (inserts_per seconds / 
inserts_written_to_disk) is greater than one. However, if the time
of the surplus production is too long, there obviously won't be a
delayed_queue_size which is big enough.
[5 Nov 2007 16:15] Vesselin Bakalov
Hi guys.

Fortunately (or unfortunately, as the case may be), in one of our production environments, we are having the locking issue with INSERT DELAYED queries again.  It happens every 10 minutes for about 30 seconds.

We basically have 25,000,000+ rows in a memory table, and we're trying to do INSERT DELAYED into it very frequently (trimmed every 10 minutes to retain two hours of data--this is when the table locks; we do a DELETE on it).

What information do you need from us ("SHOW VARIABLES LIKE ...", etc.) to better understand the issue?

We can put a script together that will query the database for a few minutes before the issue until a few minutes after it if that will help.
[6 Nov 2007 6:43] Kristofer Pettersson
I would be helpful if you produced some relevant graphs plotting the performance and behavior. Then we could more easily agree on the existence of any bugs and how to deal with them.
[12 Nov 2007 23:35] Matthew Lord
Here is a test that I think shows the crux of the problem which is that
WRITE locks are preventing INSERT DELAYED statements from going into the
delayed_queue:
Open 2 database connections.

Connection 1:
set global delayed_queue_size=20;

use test;
 CREATE TABLE `ditest` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` char(50) default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM;

insert into ditest (name) values ("matt"), ("faith"), ("sid"), ("jerry"), ("xena");

lock table ditest write;

Connection 2:
use test;
insert delayed into ditest (name) values ("xxx");

show processlist;

--------

You'll notice that the state for the insert delayed is "waiting for handler open" as it blocks.  That means that we're here in delayed_get_table:

      /* Wait until table is open */
      thd->proc_info="waiting for handler open";
      while (!di->thd.killed && !di->table && !thd->killed)
      {
        pthread_cond_wait(&di->cond_client, &di->mutex);
      }

Why do we not just throw this statement into the delayed queue as we do with
the above test if we instead take a READ lock on ditest instead of a WRITE 
lock?

This was tested with 5.0.46 on x86 linux.
[13 Nov 2007 10:50] Konstantin Osipov
Matt,
I can't repeat your test case.
The other connection does not hang, the insert goes through, bot on 5.0-bk and 5.1-bk:

This is how it looks on 5.0:

Database changed
mysql> insert delayed into ditest (name) values ("xxx");
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> show processlist;
+----+--------+-----------+------+---------+------+-------+------------------+
| Id | User   | Host      | db   | Command | Time | State | Info             |
+----+--------+-----------+------+---------+------+-------+------------------+
|  1 | kostja | localhost | test | Sleep   |   11 |       | NULL             | 
|  2 | kostja | localhost | test | Query   |    0 | NULL  | show processlist | 
+----+--------+-----------+------+---------+------+-------+------------------+
2 rows in set (0.00 sec)

You could contact me on IRC so that we reproduce it step-by step.
[13 Nov 2007 14:45] Vesselin Bakalov
Konstantin / Kristofer / Matt

I can not get Matt's proof to work either, but I definitely have a server that does it in production under normal (albeit heavy) load.

We can do a WebEx and you can observe the logs - it happens every 10 minutes on the dot.

Please call me at 302-368-5650 and we can get things rolling.
[13 Nov 2007 17:40] Matthew Lord
I could no longer repeat the issue with my test case using 5.0.50.
[22 Nov 2007 15:36] Kristofer Pettersson
If the bug can't be reproduced with the latest version, can we degrade this bug to an open-but-not-verified state?
[23 Nov 2007 16:14] Vesselin Bakalov
The bug is still exhibited with 5.0.50 Enterprise compiled from source.

Although I can not give you a concise test to show it at this time, it is most definitely there.  If anyone would like to get on a WebEx, I can show you the bug in action.