Bug #20857 Deadlocks detected by timeout (S L O W)
Submitted: 5 Jul 2006 1:22 Modified: 15 May 2007 16:42
Reporter: David Sharnoff Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.0.32-BK, 5.0.25-BK, 4.1.15 and 5.0.22 OS:Linux (Linux 2.6.15, Linux 2.6.12)
Assigned to: Heikki Tuuri CPU Architecture:Any
Tags: concurrency, deadlock, innodb, lock, performance

[5 Jul 2006 1:22] David Sharnoff
Description:
One of the regression tests for my perl object store runs VERY slowly under mysql 4.1.5 & 5.0.22.  I belive it used to run at a reasonable clip using mysql 4.0.18.  It's more than 20 times slower than PostgreSQL on this test.

This particular test exercises deadlock detection.

I ran my tests with mysql 4.1.5 on Ubuntu 6.06LTS and mysql 5.0.22 on Debian Sid.

How to repeat:
Grab my OOPS module from CPAN.

   http://search.cpan.org/~muir/OOPS-0.1004/lib/OOPS.pod

Unpack it.  Also install Test::MultiFork, Time::HiRes, Data::Compare, Clone::PP, BSD::Resource.  Make sure that Data::Compare is a recent version (not the old stuff in Debian.

Create a test database and mysql user...

Set some environment variables:

OOPSTEST_DSN=DBI::mysql:database=TheTestDatabase;host=TheTestDatabaseHost
OOPSTEST_USER=TheTestUser
OOPSTEST_PASS=TheTestUserPassword
OOPSTEST_SLOW=okay
OOPSTEST_PREFIX=MyTestPrefix

Then, run t/tran5a.t in the OOPS distribution.

When I do this, it's REALLY slow.

On all the other tests mysql is fast.
[5 Jul 2006 18:23] Valeriy Kravchuk
Thank you for a problem report. Sorry, but I've got:

The requested URL /authors/id/M/MU/MUIR/modules/OOPS-0.1004.tar.gz was not found on this server.

when tried to download http://search.cpan.org/CPAN/authors/id/M/MU/MUIR/modules/OOPS-0.1004.tar.gz

Anyway, please, send you my..cnf content.
[5 Jul 2006 19:40] David Sharnoff
http://search.cpan.org/CPAN/authors/id/M/MU/MUIR/modules/OOPS-0.1004.tar.gz works for me.  If it doesn't, just download my OOPS module from any CPAN repository.  Or just say:

 perl -MCPAN -e shell
 install OOPS
[5 Jul 2006 19:41] David Sharnoff
my.cnf from my Ubuntu system (mysql 4.1.15)

Attachment: my.cnf (application/octet-stream, text), 3.44 KiB.

[5 Jul 2006 19:44] David Sharnoff
my.cnf from my Debian system (mysql 5.0.22)

Attachment: my.cnf (application/octet-stream, text), 3.44 KiB.

[5 Jul 2006 19:49] David Sharnoff
You can also get the OOPS distribution from:

  ftp://ftp.idiom.com/users/muir/CPAN/modules/OOPS-0.1004.tar.gz

That's the master copy.
[23 Jul 2006 12:14] Valeriy Kravchuk
I installed your software, but:

openxs@suse:/tmp/OOPS-0.1004> perl t/tran5a.t
# Debugging on
only mysql, PostgreSQL & SQLite supported at /usr/lib/perl5/site_perl/5.8.6/OOPS
.pm line 420.
Compilation failed in require at t/tran5a.t line 16.
BEGIN failed--compilation aborted at t/tran5a.t line 16.
openxs@suse:/tmp/OOPS-0.1004> vi +16 t/tran5a.t

openxs@suse:/tmp/OOPS-0.1004> set | grep OOPS
OOPSTEST_DSN=DBI::mysql:database=test
OOPSTEST_PREFIX=MyTestPrefix
OOPSTEST_SLOW=okay
OOPSTEST_USER=root
PWD=/tmp/OOPS-0.1004

it fails on use OOPS::TestCommon; (although this module exists, it seems). Looks like I made some mistake, but I see no obvious one. All other requeired modules were installed.

Can you, please, just enable general query log on your MySQL server, run this test, and send/upload content of the query log. I need to know what  your test is trying to perform in MySQL.
[24 Jul 2006 1:27] David Sharnoff
You have an extra colon in OOPSTEST_DSN.  It's "DBI:mysql", not "DBI::mysql".

How do I turn on the general query log?   Is it just "log            = /var/log/mysql.log" in my my.cnf?

-Dave
[28 Aug 2006 13:28] Valeriy Kravchuk
After correcting variable, I've got (with 5.0.25-BK):

openxs@suse:/tmp/OOPS-0.1004> perl t/tran5a.t
# Debugging on
do 'DROP TABLE MyTestPrefixobject':Unknown table 'MyTestPrefixobject' at /usr/li
b/perl5/site_perl/5.8.6/OOPS/Setup.pm line 94.
do 'DROP TABLE MyTestPrefixattribute':Unknown table 'MyTestPrefixattribute' at /
usr/lib/perl5/site_perl/5.8.6/OOPS/Setup.pm line 94.
do 'DROP TABLE MyTestPrefixbig':Unknown table 'MyTestPrefixbig' at /usr/lib/perl
5/site_perl/5.8.6/OOPS/Setup.pm line 94.
do 'DROP TABLE MyTestPrefixcounters':Unknown table 'MyTestPrefixcounters' at /us
r/lib/perl5/site_perl/5.8.6/OOPS/Setup.pm line 94.
# mysql is very slow at this test, reducing iterations from 1000 to 50 [a-1]
# mysql is very slow at this test, reducing iterations from 1000 to 50 [b-1]
ok 1  - line72  #  refcount  [ a-1 #1 ]
ok 2  -   #  Skip $OOPS::debug_tiedvars not set [ a-1 #2 ]
ok 3  - line72  #  database is empty [ a-1 #3 ]
FREE ERROR BIND!FREE ERROR FBIND!FREE ERROR BIND!FREE ERROR FBIND!ok 4 - line87
# refcount  [a-1]
not ok 4  - a-1 # result ordering in a-1
ok 5  -   #  Skip $OOPS::debug_tiedvars not set [ a-1 #5 ]
FREE ERROR BIND!FREE ERROR FBIND!FREE ERROR BIND!FREE ERROR FBIND!ok 6 - line89
# refcount  [a-1]
not ok 6  - a-1 # result ordering in a-1
ok 7  -   #  Skip $OOPS::debug_tiedvars not set [ a-1 #7 ]

Bail out!  Timeout in Test::MultiFork
b-1: waiting for default for t/tran5a.t:96 (event status: active)

It worked for several minutes. Is it the same result you got?
[28 Nov 2006 14:51] Valeriy Kravchuk
Same result with 5.0.32-BK:

openxs@suse:/tmp/OOPS-0.1004> set | grep OOP
OOPSTEST_DSN=DBI:mysql:database=test
OOPSTEST_PREFIX=MyTestPrefix
OOPSTEST_SLOW=okay
OOPSTEST_USER=root
PWD=/tmp/OOPS-0.1004
openxs@suse:/tmp/OOPS-0.1004> export OOPSTEST_DSN="DBI:mysql:database=test;host
=127.0.0.1"
openxs@suse:/tmp/OOPS-0.1004> perl t/tran5a.t
# Debugging on
do 'DROP TABLE MyTestPrefixobject':Unknown table 'MyTestPrefixobject' at /usr/li
b/perl5/site_perl/5.8.6/OOPS/Setup.pm line 94.
do 'DROP TABLE MyTestPrefixattribute':Unknown table 'MyTestPrefixattribute' at /
usr/lib/perl5/site_perl/5.8.6/OOPS/Setup.pm line 94.
do 'DROP TABLE MyTestPrefixbig':Unknown table 'MyTestPrefixbig' at /usr/lib/perl
5/site_perl/5.8.6/OOPS/Setup.pm line 94.
do 'DROP TABLE MyTestPrefixcounters':Unknown table 'MyTestPrefixcounters' at /us
r/lib/perl5/site_perl/5.8.6/OOPS/Setup.pm line 94.
# mysql is very slow at this test, reducing iterations from 1000 to 50 [a-1]
# mysql is very slow at this test, reducing iterations from 1000 to 50 [b-1]
ok 1  - line72  #  refcount  [ a-1 #1 ]
ok 2  -   #  Skip $OOPS::debug_tiedvars not set [ a-1 #2 ]
ok 3  - line72  #  database is empty [ a-1 #3 ]
FREE ERROR BIND!FREE ERROR FBIND!FREE ERROR BIND!FREE ERROR FBIND!ok 4 - line87
# refcount  [a-1]
not ok 4  - a-1 # result ordering in a-1
ok 5  -   #  Skip $OOPS::debug_tiedvars not set [ a-1 #5 ]
FREE ERROR BIND!FREE ERROR FBIND!FREE ERROR BIND!FREE ERROR FBIND!ok 6 - line89
# refcount  [a-1]
not ok 6  - a-1 # result ordering in a-1
ok 7  -   #  Skip $OOPS::debug_tiedvars not set [ a-1 #7 ]

Bail out!  Timeout in Test::MultiFork
b-1: waiting for default for t/tran5a.t:96 (event status: active)

Long wait before two final rows... I think we have enough evidence that problem exists. But URL with VmWare image of your setup may be also nice 9as a private comment).

As for enabling general query log: see http://dev.mysql.com/doc/refman/5.0/en/query-log.html.
[17 Jan 2007 19:00] Konstantin Osipov
David,
is it possible to reproduce the deadlock by a simple SQL test that uses multiple connections?
[19 Jan 2007 8:18] Giuseppe Maxia
output of submitted material

Attachment: bug_20857_output.zip (application/zip, text), 39.33 KiB.

[19 Jan 2007 8:21] Giuseppe Maxia
I managed to repeat the steps described by the reporter, resulting in a script that was running for about 40 minutes, using MySQL 5.0.32

The attached file contains the general log with the queries issued, the slow-query log, the my.cnf file and the output of the script. (http://bugs.mysql.com/file.php?id=5434)

Now, I need to ask David:
- it is slow compared to what?
- what is the expected behavior?

About the "FREE ERROR BIND!FREE ERROR  FBIND!" This was a combination of some buggy DBD::mysql releases with recent DBI. (See bug#20559 for more details)
It was fixed not long ago.
[22 Jan 2007 0:23] David Sharnoff
It's more than 10x slower than PostgreSQL.  Basically as it runs it
should be producing output every second rather than every once in a
while.
[22 Jan 2007 10:05] Giuseppe Maxia
second run with additional information

Attachment: bug_20857_output2.zip (application/zip, text), 43.20 KiB.

[22 Jan 2007 10:13] Giuseppe Maxia
I ran the Perl script a second time, after adding to the my.cnf file this line:
innodb_buffer_pool_size = 200M

The noticeable difference is that there were no reported deadlocks.
The execution time is about the same. 
The INNODB STATUS, however, reports some waiting due to locks.

Changed Category to "Innodb storage engine".

Regards

Giuseppe
[22 Jan 2007 10:57] Domas Mituzas
I had pretty fast run with InnoDB, now the PG was of course very fast, as it was doing nothing:

ok 48  - line110  #  Could Not Execute 'DELETE FROM oopstestprefixattribute\n WHERE id = ? AND pkey = ?\n ' with '112 coin1':ERROR:  could not serialize access due to concurrent update\n  at /usr/lib/perl5/site_perl/5.8.5/OOPS/pg.pm line 171\n OOPS::pg::query('OOPS::pg=HASH(0xa4a6c04)', 'deleteattribute', 'execute', 'ARRAY(0xa4a7be8)') called at /usr/lib/perl5/site_perl/5.8.5/OOPS.pm line 1862\n OOPS::delete_attribute('OOPS::pg=HASH(0xa4a6c04)', 112, 'coin1', 25) called at /usr/lib/perl5/site_perl/5.8.5/OOPS.pm line 1311\n OOPS::write_hash('OOPS::pg=HASH(0xa4a6c04)', 'HASH(0xa4a7bb8)', 112, 'HASH(0xa4a6f40)', 'HASH(0xa471a14)') called at /usr/lib/perl5/site_perl/5.8.5/OOPS.pm line 2665\n OOPS::NormalHash::SAVE_SELF('OOPS::NormalHash=ARRAY(0xa49457c)') called at /usr/lib/perl5/site_perl/5.8.5/OOPS.pm line 1181\n OOPS::write_object('OOPS::pg=HASH(0xa4a6c04)', 112) called at /usr/lib/perl5/site_perl/5.8.5/OOPS.pm line 997\n OOPS::save('OOPS::pg=HASH(0xa4a6c04)') called at /usr/lib/perl5/site_perl/5.8.5/OOPS.pm line 943\n OOPS::commit('OOPS::pg=HASH(0xa4a6c04)') called at ./tran5a.t line 108\n eval {...} called at ./tran5a.t line 104 [ b-1 #15 ]

In order to have proper test, it should:
a) Have same sync or async setup (as in, InnoDB syncs every transaction to disk by default, at least in 4.1 or 5.0) 
b) Have same serialization level
c) Have checks for final results and errors.
d) Have equal number of tests, as in this case, MySQL:

ok 868  - b-1 # plan followed
ok 869  - a-1 # plan followed
ok 870  -  # clean shutdown

after 42seconds.

and PG went to loop until ok10000 (and continuing) (probably some reduced count of iterations?).
[22 Jan 2007 11:03] Valeriy Kravchuk
Can you, please, set innodb_lock_wait_timeout=2 explicitely and check if it will influence runtime (it is 50 by default)?

According to files uploaded by Giuseppe, there are lock waits for 20 seconds, but deadlock is detected not after 50 seconds of wait, but almost immediately.

Please, send the results of SHOW INNODB STATUS after your test run.
[22 Jan 2007 11:20] Giuseppe Maxia
innodb status after third run

Attachment: innodb_status_end2.txt (text/plain), 6.00 KiB.

[22 Jan 2007 11:20] Giuseppe Maxia
repeated with innodb_lock_wait_timeout=2. 
Ran in 4 minutes (10x faster)
see attached file for INNODB STATUS
[23 Feb 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[21 Apr 2007 0:18] David Sharnoff
With no override ov innodb_lock_wait_timeout...
1.128u 0.100s 44:13.52 0.0%     0+0k 0+0io 0pf+0w

With innodb_lock_wait_timeout=2
1.108u 0.120s 3:23.75 0.5%      0+0k 0+0io 0pf+0w

Only 13 times faster with a 2-second timeout...
[21 Apr 2007 0:19] David Sharnoff
innodb status after run with timeout = 2

Attachment: x (application/octet-stream, text), 6.16 KiB.

[21 Apr 2007 0:36] Heikki Tuuri
David,

even if you set innodb_lock_wait_timeout=2, it may happen that some thread sleeps 3 or even more seconds. There is a background thread that checks lock wait timeouts every second.

But I do not fully understand why do you want to perform deadlock detection with lock wait timeouts? That is bound to be slow.

Regards,

Heikki
[21 Apr 2007 23:50] David Sharnoff
I don't *want* to detect deadlocks with timeouts!   That's what
this bug is about.  I have an expectation that mysql had a more
sophisticated deadlock detector than timeouts.  It doesn't work 
on my test case.
[22 Apr 2007 0:01] Heikki Tuuri
David,

InnoDB does have an automatic, instantaneous deadlock detection. Are you using MySQL table locks? It is not recommended to use MySQL table locks with InnoDB, as they are not properly transactional.

Regards,

Heikki
[22 Apr 2007 6:32] David Sharnoff
I'm not making any explicit locks.  The only thing I'm
doing out of the ordinary is using a transaction isolation
level of SERIALIZABLE.
[14 May 2007 17:05] Heikki Tuuri
http://bugs.mysql.com/bug.php?id=28029 may be associated.

David, please post SHOW INNODB STATUS\G when a transaction is waiting for a lock.

--Heikki
[14 May 2007 23:52] David Sharnoff
output from "show innodb status" while deadlocked and waiting.

Attachment: innodb-status.txt (text/plain), 3.84 KiB.

[14 May 2007 23:54] David Sharnoff
The following is six seconds or so into a deadlock that wasn't
detected.

Status

=====================================
070514 16:49:27 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 27 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 5, signal count 5
Mutex spin waits 5, rounds 100, OS waits 0
RW-shared spins 6, OS waits 3; RW-excl spins 3, OS waits 2
------------
TRANSACTIONS
------------
Trx id counter 0 13717817
Purge done for trx's n:o < 0 13717790 undo n:o < 0 0
History list length 14
Total number of lock structs in row lock hash table 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 31897, OS thread id 1141754800
MySQL thread id 29, query id 312 localhost root
show innodb status
---TRANSACTION 0 13717816, ACTIVE 6 sec, process no 31897, OS thread id 1116265392 starting index re
ad
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 320
MySQL thread id 27, query id 311 localhost muir Updating
UPDATE charm2object \t\t\tSET loadgroup = id
------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 45 n bits 80 index `PRIMARY` of table `rectangle/charm2object` trx i
d 0 13717816 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
 0: len 8; hex 8000000000000001; asc         ;; 1: len 6; hex 000000d15134; asc     Q4;; 2: len 7; h
ex 00000000c720d1; asc        ;; 3: len 8; hex 8000000000000001; asc         ;; 4: len 4; hex 484153
48; asc HASH;; 5: len 1; hex 48; asc H;; 6: len 1; hex 56; asc V;; 7: len 1; hex 30; asc 0;; 8: len
1; hex 30; asc 0;; 9: len 4; hex 800003ed; asc     ;; 10: len 4; hex 80000001; asc     ;; 11: len 2;
 hex 8002; asc   ;; 12: len 4; hex 80002710; asc   ' ;;

------------------
---TRANSACTION 0 13717815, ACTIVE 6 sec, process no 31897, OS thread id 1116130224
2 lock struct(s), heap size 320
MySQL thread id 25, query id 306 localhost muir
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
144 OS file reads, 31 OS file writes, 26 OS fsyncs
5.33 reads/s, 31886 avg bytes/read, 1.15 writes/s, 0.96 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf for space 0: size 1, free list len 5, seg size 7, is empty
Ibuf for space 0: size 1, free list len 5, seg size 7,
0 inserts, 0 merged recs, 0 merges
Hash table size 34679, used cells 0, node heap has 1 buffer(s)
0.00 hash searches/s, 27.74 non-hash searches/s
---
LOG
---
Log sequence number 10 2404494816
Log flushed up to   10 2404494816
Last checkpoint at  10 2404474255
0 pending log writes, 0 pending chkp writes
28 log i/o's done, 1.04 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 18347710; in additional pool allocated 1048192
Buffer pool size   512
Free buffers       364
Database pages     147
Modified db pages  17
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 146, created 1, written 22
5.41 reads/s, 0.04 creates/s, 0.81 writes/s
Buffer pool hit rate 995 / 1000
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 31897, id 1124703152, state: sleeping
Number of rows inserted 44, updated 4, deleted 11, read 22808
1.63 inserts/s, 0.15 updates/s, 0.41 deletes/s, 844.71 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
[15 May 2007 12:28] Heikki Tuuri
David,

there is no deadlock. This transaction is probably holding some locks:

---TRANSACTION 0 13717815, ACTIVE 6 sec, process no 31897, OS thread id
1116130224
2 lock struct(s), heap size 320
MySQL thread id 25, query id 306 localhost muir

and the other transaction is waiting for the above transaction to commit.

What is the above transaction doing? 6 seconds is quite a long time for a small transaction to be active. Why your application does not commit it?

Regards,

Heikki
[15 May 2007 16:42] David Sharnoff
With your hint and some new debugging features in my application,
I found where process A is trying to modify the database while still
holding an idle connection.  My bug.

My apologies for the time wasted on this.