Bug #12049 MySQL hangs
Submitted: 20 Jul 2005 8:38 Modified: 11 May 2008 6:35
Reporter: NOT_FOUND NOT_FOUND Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.12, 5.0.9 OS:IBM AIX (AIX 5.2)
Assigned to: CPU Architecture:Any

[20 Jul 2005 8:38] NOT_FOUND NOT_FOUND
Description:
We are running MySQL on IBM RS/6000 S80 18-CPU server. We've tried 2 versions of MySQL:

AIX 5.2 (IBM POWER, IBM VisualAge C++, 64-bit) Standard 4.1.12
AIX 5.2 (IBM POWER, IBM VisualAge C++, 64-bit) Standard 5.0.9-beta

We have 1 db with only 1 innodb type table:

CREATE TABLE table1 (
  id varchar(100) NOT NULL,
  field1 varchar(4000) default NULL,
  field2 varchar(4000) default NULL,
  data datetime default NULL,
  PRIMARY KEY  (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin2

There are about 25 java application servers, that query the database with simple:

select field1, field2 from table1 where id=?

At the application server we use MySQL Connector/J 3.1.10. and we use connection pooling.
About 10% of all queries to the database is DML:

insert into table1(id,field1,field2,data) values (?
,?,?,?) on duplicate key update field1 = ?, field2 = ?, data = ?

Typical load is 200-250 af all queries/second with 150-175 active connections.
When we start test, the table1 is empty, when the test ends after 5 minutes, it has about 5000-7000 records. When MySQL runs fine, there are no slow queries. I run show innodb status and everything looks normal - no locks, no waiting threads. We use pooling, so very few new threads are created during the test.
The problem is that MySQL very suddenly hangs after about 5 minutes of load. Active connections count suddenly rises to about 500, queries start to queue and everything stops. I think 200-250 q/sec is not a heavy load for that hardware, cause IBM machine is almost idle, and I can make the server running ~3000 queries/sec without a problem, but only when the queries are executed on a few connections/threads. When MySQL hungs, we cannot run show status neither show innodb status. When we turn off queries at app servers, the server returns to normal state after 10-20 minutes (we don't kill hunging theads with select and insert queries, and let them execute). When I run show innodb status after server becomes responsive, it shows lots of thread locks & semaphore waits, just like this:

OS WAIT ARRAY INFO: reservation count 120547, signal count 9955
--Thread 98176 has waited at btr0cur.c line 366 for 27.00 seconds the semaphore:
S-lock on RW-latch at 11285e080 created in file dict0dict.c line 3642
a writer (thread id 33668) has reserved it in mode  wait exclusive
number of readers 40, waiters flag 1
Last time read locked in file btr0cur.c line 366
Last time write locked in file btr0cur.c line 359
--Thread 96120 has waited at btr0cur.c line 366 for 21.00 seconds the semaphore:
S-lock on RW-latch at 11285e080 created in file dict0dict.c line 3642
a writer (thread id 33668) has reserved it in mode  wait exclusive
number of readers 40, waiters flag 1
Last time read locked in file btr0cur.c line 424
Last time write locked in file btr0cur.c line 424
--Thread 30584 has waited at btr0cur.c line 366 for 31.00 seconds the semaphore:
S-lock on RW-latch at 11285e080 created in file dict0dict.c line 3642
a writer (thread id 33668) has reserved it in mode  wait exclusive
--Thread 13365 has waited at trx0trx.c line 730 for 1.00 seconds the semaphore:
Mutex at 11284e420 created file trx0rseg.c line 150, lock var 1
waiters flag 1
--Thread 74275 has waited at trx0undo.c line 1682 for 1.00 seconds the semaphore:
Mutex at 11284e420 created file trx0rseg.c line 150, lock var 1
waiters flag 1
--Thread 113339 has waited at trx0roll.c line 916 for 0.00 seconds the semaphore:
Mutex at 11284e420 created file trx0rseg.c line 150, lock var 1
waiters flag 1
--Thread 107171 has waited at trx0roll.c line 916 for 0.00 seconds the semaphore:
Mutex at 11284e420 created file trx0rseg.c line 150, lock var 1
waiters flag 1

We have changed some innodb parameters, which should increase the performance:
innodb_flush_log_at_trx_commit = 0
innodb_thread_concurrency = 36
innodb_doublewrite = OFF
tx_isolation=READ-UNCOMMITTED

but it didn't help at all. We have also tried myISAM type for the table, but with no luck either. The problem looks the same for 4.1.12 and 5.0.9 beta.

You can see all show status dumps and config file at http://akson.sgh.waw.pl/~mpieck/mysql_problem.txt

How to repeat:
Get a S80 server and make some load :)
[25 Jul 2005 17:25] J Jorgenson
Found similar situation:  MYSQL v5.0.3 (Solaris):

CREATE TABLE myTable( 
id INT NOT NULL,
value DECIMAL(9,2),
PRIMARY KEY id
);

INSERT INTO ZmyTable(id, value) VALUES (1, 6.5243)
ON DUPLICATE KEY UPDATE value=VALUES(value);

THIS CAUSES the DB Server to start having record locking, along with other tables appear to start locking.

Problem Researched: 
INSERT INTO myTable(id, value) VALUES (1, 6.5243)
   returns an 'Field Truncated' error message, but still inserts the value.

When you add the ON DUPLICATE KEY, the SQL hangs the DB engine.
-- J Jorgenson --
[29 Aug 2005 21:38] J Jorgenson
A collegue found the cause to my issue... There was a VIEW 'on-top' of the table that was being updated.  Once view had been executed, and appears to take some time (select ?) for it to release the read lock on the table.  Using Algorithm=TEMPTABLE on the 'create view' sql actually caused the view to take longer to produce output.

The update lock issue would 'go-away' when the view was NOT executing/created
-- jason --
[24 Sep 2005 16:13] Valeriy Kravchuk
I think, you should limit the number of concurrently working threads (your application server connections) to 18, the number of processors. Your description, similar behaviour for MyISAN and InnoDB storage engines, and the show innodb status results shows it is a contention issue. It is partly because all you sessions use the same data concurrently. You may try to split the load, using several tables for inserts and MERGE table for selects, in case of MyISAM.

Please, retry with no more that 18 concurrent threads and, possibly, with newer versions of MySQL (4.1.14 and 5.0.12-beta) and inform about the results.
[24 Sep 2005 16:25] Vadim Tkachenko
in 5.0.12 we introduced new option innodb_commit_concurrency, as innodb_thread_concurrency does not protects all structures.
Please try at first innodb_commit_concurrency = 1,2,3,4,5
then 10,20,30... 60.
[24 Oct 2005 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[11 Apr 2008 6:35] Valeriy Kravchuk
If you still have a chance, please, try to repeat with a newer version, 5.0.51a, and inform about the results.
[11 May 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".