MySQL Bugs: #44841: purge thread + rollback hang forever waiting on dict_operation

Bug #44841	purge thread + rollback hang forever waiting on dict_operation_lock semaphore
Submitted:	13 May 2009 9:39	Modified:	14 Jan 2013 18:23
Reporter:	Nickolay Vinogradov	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.1.34-0, 5.1.43sp1	OS:	Linux (SUSE Linux Enterprise Server 10 SP2 (x86_64))
Assigned to:	Sunny Bains	CPU Architecture:	Any
Tags:	RW-latch dict0dict.c, X-lock

Description:
Every few days MySQL hanged with the same warnings:

InnoDB: Warning: a long semaphore wait:
--Thread 1173395776 has waited at row/row0mysql.c line 1720 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0xc73b20 created in file dict/dict0dict.c line 728
a writer (thread id 1173395776) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row/row0mysql.c line 1685
Last time write locked in file row/row0mysql.c line 1720
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

Every time thread waited at "row/row0mysql.c line 1720" or at "row/row0mysql.c line 1685", and every time "S-lock" or "X-lock" on RW-latch at 0xc73b20 created in file dict/dict0dict.c line 728.

How to repeat:
MySQL running under high load.
Uptime: 5322  Threads: 454  Questions: 11475628  Slow queries: 19839  Opens: 1342  Flush tables: 1  Open tables: 1335  Queries per second avg: 2156.262
All tables are InnoDB.

MySQL installed from MySQL-server-community-5.1.34-0.sles10.x86_64.rpm

InnoDB Monitor Output

Attachment: innodb_monitor_output.log (application/octet-stream, text), 117.05 KiB.

Please, send your my.cnf file content and the results of:

df -k

Here we have purge thread in conflict the thread doing rollback of:

INSERT INTO nagios_hoststatus SET instance_id='1', host_object_id='1', status_update_time=FROM_UNIXTIME(1242201087), ...

Please, send the results of SHOW CREATE TABLE and SHOW TABLE STATUS for this nagios_hoststatus table also.

my.cnf

Attachment: my.cnf (application/octet-stream, text), 20.56 KiB.

nickolay@new_db:~> df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             10490040   6880800   3609240  66% /
udev                   8220496       104   8220392   1% /dev
/tmpfs                 1048576         8   1048568   1% /tmpfs
/dev/sda2            419267584 219350508 199917076  53% /mysql

mysql> SHOW TABLE STATUS WHERE name LIKE 'nagios_hoststatus';
+-------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-------------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------------------------+
| Name              | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free   | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment                         |
+-------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-------------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------------------------+
| nagios_hoststatus | InnoDB |      10 | Compact    |    8 |           2048 |       16384 |               0 |        16384 | 22985834496 |         566410 | 2009-03-02 22:53:47 | NULL        | NULL       | utf8_general_ci |     NULL |                | Current host status information |
+-------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-------------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------------------------+

Warning and InnoDB Monitor Output(Previous Hang)

Attachment: innodb_monitor_output2.log (text/x-log), 140.53 KiB.

Valeriy, please look at last attached file. It's from previous hang.
If I understand correctly, now the problem with thread doing rollback of:

INSERT INTO nagios_programstatus SET instance_id='1', status_update_time=FROM_UNIXTIME(1241899442)...

OK, so this is not related to particular table (do you know what is the reason for rollback, by the way?), but more to transaction rate maybe.

Can you, please, try to set innodb_max_purge_lag = 1000000, for example, and check if it will influence the rate of hangs related to purge vs rollback activity. You can set it dynamically, with SET GLOBAL. Read http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_max_purge_lag for the details.

Side note: I had noted memlock in your my.cnf. Please, send the results of:

free

Linux command.

new_db:/home/nickolay # free
             total       used       free     shared    buffers     cached
Mem:      16440996   16346032      94964          0      99936    1825508
-/+ buffers/cache:   14420588    2020408

Not that much of free memory left. Can you try also to remove memlock option and check if this will influence the situation in any way?

MySQL hanged again. :(
But now without "long semaphore error".
Let me to upload innodb monitor output.

Innodb monitor output

Attachment: innodb_monitor_output3.txt (text/plain), 62.52 KiB.

Problem repeated again and again even after we replace hardware.
We are delete memlock, and add innodb_max_purge_lag = 1000000, but it isn't help.
Any idea?

Last innodb monitor output

Attachment: innodb.out.333 (application/octet-stream, text), 62.52 KiB.

Bug #45813 was marked as a duplicate of this one. It contains newer error log and shows that we get assertion failure eventually as a result of a long wait.

I have increase innodb_max_purge_lag to 10000000, and decrease innodb_thread_concurrency from 32 to 16. And problem has occured today.

Valeriy, do you think is the problem with thread concurrency?

Problem happens again!
Even after I have decreased innodb_thread_concurrency to 8.

Valeriy, How are you going to fix this problem?

Nickolay, it's just a symptom of heavy load. It's one of the ways that extreme heavy disk-bound load shows up. In this case the main symptom is the data dictionary operations that involve waiting for disk I/O.dict/dict0dict.c and row/row0mysql.c are those. InnoDB doesn't expect to wait hundreds of seconds for disk I/O so when it does the watchdog process notices and starts printing diagnostic output. Sometimes the wait gets so long that it concludes that InnoDB must be hung and restarts the server.

The most common way to encounter data dictionary operations is creating and deleting temporary tables but renaming and some other operations can also take the global data dictionary mutex.

One way a thread can be waiting for itself is if it takes a lock and then another thread takes a lock. If it then accesses the row again it's possible that the order of requests in the list of threads waiting for locks can cause it to block on itself. It's rare and not easy to fix so it probably won't be fixed in 5.1. In your case it's probably just normal locking and high load, not this bug.

The purge thread is known to cause performance pain when it's busy. That's for two reasons:

1. It uses a lot of disk I/O.
2. innodb_max_purge_lag is usually not set, so it can grow larger than can be cached in the buffer pool and become even more disk-bound.

Setting innodb_max_purge_lag to a smaller value, perhaps 100,000, might help to give the purge thread more of a chance to keep up.

Look at performance tuning and you should be able to solve the problem. You could also try the 5.4 preview release. That has additional tuning settings and some improved default capabilities that improve throughput. It's not officially released for production use yet but those people who do use it tend to have good results.

James Day, MySQL Senior Support Engineer, Sun Microsystems, England.

Hello James,
The problem resolved after downgrading to MySQL 5.0.x.

Thank you for the feedback.

Closed as "Not a Bug" due to reasons which James wrote in his comment. If you upgrade to version 5.1 again and meet this problem even if follow James's suggestions feel free to reopen the report.

old synopsis: MySQL hanged on long semaphore wait (X-lock on RW-latch)
new synopsis: purge thread + rollback hang forever waiting on dict_operation_lock semaphore

Setting as can't repeat.  Older versions were usually too eager to commit suicide during heavy i/o operations.  If this problem is seen again on modern versions, please post an update here.