MySQL Bugs: #43569: MySQL deadlocks

Bug #43569	MySQL deadlocks
Submitted:	11 Mar 2009 17:21	Modified:	19 Jan 2012 7:30
Reporter:	Peteris Krumins	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S1 (Critical)
Version:	5.1.32	OS:	Linux (x86_64 GNU/Linux)
Assigned to:		CPU Architecture:	Any
Tags:	deadlock, innodb, lockup

Description:
We are having terrible MySQL deadlocks. We are currently using 5.1.32, but the same problem happened also on 5.1.30. The problem happens at random and we do not have a clue what triggers it. The symptoms are that the databases "lock up" up, no one can access them anymore, but mysql still keeps running, using cpu and memory. If trying to 'use database; select * from db limit 1' from the mysql console, for example, it would just wait there never finishing. During this time the connections queue up, and if we do 'show processlist', there are thousands of connections. The only way to solve it is to kill the database the rough way with `kill -9`. It can't be shutdown `mysqladmin shutdown`, that command would never finish.

I managed to collect some info when that happens. We have currently caught one of the servers in this stage, so I could get more info if necessary, just let me know.

The strangest info is in `show innodb status`. It says "LATEST DETECTED DEADLOCK". So we suspect it's some kind of a deadlock related to InnoDB, but we have no idea what causes it from the info found in the output of that command. 

I'm attaching output of `show innodb status` when the problem has happened.

How to repeat:
We have not found a way to repeat it, as it happens spontaneously.

Suggested fix:
We do not know how to fix it.

output of `show innodb status`

Attachment: show-innodb-status.txt (text/plain), 63.44 KiB.

Thank you for the problem report. Please, send your my.cnf file content. How many CPUs/cores do you have on that box? 

That deadlock happened long enough time before you got INNODB STATUS, so I am not sure it is relevant. I just see many active transactions working for 700+ seconds...

Thanks for your quick response.

The machine has 2 CPUs with 4 cores in each CPU. The machine has 32GB of ram. Here is output of /proc/cpuinfo for one of the cores:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz
stepping        : 6
cpu MHz         : 2800.099
cache size      : 6144 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips        : 5600.26
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

I can also add that the same problem happens on both XFS and ReiserFS filesystems. 

We have had this problem happening for more than a month now. We have tried replicating the db from mysqldump (because we thought there might be binary corruption in the database files), but the problem would still happen. It has happened of 4 or 5 machines with similar architecture.

Here is the my.cnf:

#----------------------------------

[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
open_files_limit = 16384

[mysqld]
#
# * Basic Settings
#
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /data/mysql
tmpdir = /data/tmp
language = /usr/share/mysql/english
skip-external-locking
old_passwords   = 1

#bind-address = 192.168.0.252
bind-address = 0.0.0.0

#
# * Fine Tuning
#
key_buffer = 16M
max_allowed_packet = 16M
thread_stack = 128K
thread_cache_size = 1024
max_connections = 4096
table_cache = 2048
thread_concurrency = 16

tmp_table_size = 128M
max_heap_table_size = 128M

read_buffer_size = 1M
read_rnd_buffer_size = 1M
join_buffer_size = 1M

#
# * Query Cache Configuration
#
query_cache_limit = 1M
query_cache_size = 32M
query_cache_type = 1

#
# * Logging and Replication
#

#log = /var/log/mysql/mysql.log

#log_slow_queries = /data/logs/slow.log
long_query_time = 1
log-queries-not-using-indexes
log-output=file

server-id = 11

log_bin = /data/logs/db05-bin.log
relay_log = /data/logs/db05-slave.log
log-slave-updates

expire_logs_days = 6
max_binlog_size = 1000M
binlog_cache_size = 32M

###### Dirty hacks
#auto_increment_increment=2
#auto_increment_offset=2
skip-name-resolve
skip-slave-start
slave_exec_mode='IDEMPOTENT' #at least until upgrade to 5.1.28 (5.1.29 even better)

#
# * InnoDB
#

innodb_buffer_pool_size = 23G
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 0
innodb_open_files = 4096
innodb_log_file_size = 512M
innodb_log_buffer_size = 8M
innodb_support_xa = 1

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]

[isamchk]
key_buffer              = 16M

#----------------------------------------

We currently have one machine in this state:

#-------------------------------
pkrumins@db06:/data$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1468266
Server version: 5.1.32-1-log (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use group002;
Database changed

mysql> select * from foo limit 5;
#-------------------------------

Nothing happens... It just stalls.

I can get all the info you need from this db server. Let me know.

While it all hangs, please, get two set of results:

show full processlist;
show global status;
show innodb status\G

with 60 seconds or so in between. Upload them to the report.

I have attached the requested files. I want to add that the server gets around 3688 queries per sec on average.

Let me know if you need anything else. The mysql is still stalled on the machine, but we can't keep it forever in that state.

We just had the same lockup on MySQL 5.1.30 running on a little loaded server (100 queries/sec). It has ReiserFS, and the same architecture (32 GB ram, 2 cpus x 4 cores). Kernel 2.6.24.

Same thing just happened on mysql-enterprise-advanced-5.1.32.

I am attaching full processlist, global status and innodb status outputs.

show global status on enterprise mysql

Attachment: global-status-enterprise (application/octet-stream, text), 5.82 KiB.

We have also been graphing some mysql statistics: innodb bufpgmem, innodb i/o, innodb insbuf, and mysql threads. I just noticed when the lockup happened, the number of InnoDB database file writes on skyrocketed for some reason (innodb-io.png file), the number of inserts per second dropped to 0, but merges skyrocketed (innodb-insbuffer.png file), number of threads cached dropped, but number of running threads skyrocketed (mysql-threads.png file). 

I have attached them.

innodb io graph

Attachment: innodb-io.png (image/png, text), 58.23 KiB.

innodb insbuffer graph

Attachment: innodb-insbuffer.png (image/png, text), 50.63 KiB.

innodb bufpgmem graph

Attachment: innodb-bufpgmem.png (image/png, text), 68.07 KiB.

mysql threads graph

Attachment: mysql-threads.png (image/png, text), 51.06 KiB.

Thank you for the feedback.

Could you also please check resource usage when hang occur? Using:

df
free
vmstat
iostat

Please also provide your configuration file.

Also, please, try to generate core file when you run kill -9.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

please check if the hangup corresponds with a binlog or relay log rotation.  take a look at the timestamps of these files to check that...

when a hangup happens, we need stack traces of all threads to diagnose a cause.
please use gdb to get those, like this:

(echo "set pagination 0"; 
 echo "thread apply all bt"; 
 echo "quit"; cat /dev/zero ) | gdb -p $(pidof mysqld)

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".