Bug #47876 Apparent hang in fsync()
Submitted: 6 Oct 2009 20:14 Modified: 14 Oct 2009 23:11
Reporter: Roel Van de Paar Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.1.39 OS:Linux (Fedora Linux 2.6.27.5-117.fc10.x86_64)
Assigned to: CPU Architecture:Any

[6 Oct 2009 20:14] Roel Van de Paar
Description:
Resolved stacktrace attached which should show the issue.

New bug from previous discussion in bug #43758 and bug #47768 (Ricardo Gomez) about what originally looked like the same hang as the one described in the second bug (it is not).

How to repeat:
Need more information
[6 Oct 2009 20:17] Roel Van de Paar
Stacktrace

Attachment: bug_43758_resolved_stacktrace_Ricardo_Gomez.txt (text/plain), 24.56 KiB.

[6 Oct 2009 23:48] Roel Van de Paar
Hi Ricardo, Could you please clarify for me:

1. Are the physical disks working correctly? Please check your disks using any available OS tools and/or tools/programs that came with your disk hardware. What sort of I/O system are you using on this server (RAID, SAN, number of disks etc.)?
2. Could you clarify how you establish that mysql is hanging instead of just busy?
3. Please clarify what version you are running by executing SELECT VERSION(); at your mysql prompt.
4. Please upload your my.cnf file

Also, you mentioned having this issue on two different versions of MySQL. Would you mind generating an additional stack trace as per my previous instructions in bug #43758 for each of those versions while the hang is happening?
[7 Oct 2009 23:50] Ricardo Gomez
Hi Roel,

>1. Are the physical disks working correctly? Please check your disks using any available 
>OS tools and/or tools/programs that came with your disk hardware. What sort of I/O system
>are you using on this server (RAID, SAN, number of disks etc.)?

These is an iSCSI implementation using DELL PowerVault MD3000, which have 4 disks using RAID-5. 
The filesystem used is OCFS2, which allows to publish and sync data to several servers, 
althought at this time We have only one server PowerEdge 2950 conected via iSCSI to the Power Vault.
We are expecting to solve this problem to share disk information with a second Server PE2950.

>2. Could you clarify how you establish that mysql is hanging instead of just busy?

We have a owned management system application, We are polling several servers. For every polling 
We update the status for each device in the tables. We have almost 160 INNODB tables.

This problem occurs only in INNODB table, We detected the problem in 3 of our more used tables. 
The workaround for us It was to switch 2 of these tables to MyISAM. For these 2 tables the problem dissapears. 
But the problem remains for the table that we could not to migrate to MyISAM.

Problem appears when tables are being updated or inserted (rate is aproximately 10 per second).

We believed at the beginning that maybe the problem was caused because of more than one update 
are trying to executed at the same table. 
However the problem appears only trying to update or insert one INNODB table and the 
following statements to the same table or a different INNODB table will be in "freeing item" status until the 
first one statement will be finished.

This weird behavior it happens with or without cache. 

>3. Please clarify what version you are running by executing SELECT VERSION(); at your mysql prompt.

We had 5.1.30, when the problem appears, then We upgraded to 5.1.39. Currently We have only 5.1.39
mysql> select  version();
+------------+
| version()  |
+------------+
| 5.1.39-log |
+------------+
1 row in set (0.00 sec)

>4. Please upload your my.cnf file
------------------------------------------------------------------------------------------
[admin@emanagement2 ~]$ more /etc/my.cnf
[mysqld]
#innodb_force_recovery = 4
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=admin
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

# tunning for e-management
max_connections=200
open_files_limit=65535
max_allowed_packet = 256M
slow_query_log=1
log-queries-not-using-indexes
#log-bin = /var/run/mysqld/log-bin.log
#binlog_format='MIXED'
#query_cache_size = 512M
#query_cache_limit = 768M
#table_cache=350
####query_cache_type = 0
thread_cache_size = 32
thread_concurrency=8
read_buffer_size=2M
key_buffer_size=128M
#innodb_file_per_table=1
#innodb_flush_log_at_trx_commit = 2
#innodb_flush_method=O_DIRECT
#innodb_buffer_pool_size=600M
#innodb_adaptive_hash_index=0

# To allow mysqld to connect to a MySQL Cluster management daemon, uncomment
# these lines and adjust the connectstring as needed.
#ndbcluster
#ndb-connectstring="nodeid=4;host=localhost:1186"

# To enable the InnoDB Plugin, uncomment the 2 next lines
#ignore-builtin-innodb
#plugin-load=innodb=ha_innodb_plugin.so

# To enable InnoDB-related INFORMATION_SCHEMA tables
# Join the following options to above directive
  ;innodb_trx=ha_innodb_plugin.so
  ;innodb_locks=ha_innodb_plugin.so
  ;innodb_cmp=ha_innodb_plugin.so
  ;innodb_cmp_reset=ha_innodb_plugin.so
  ;innodb_cmpmem=ha_innodb_plugin.so
  ;innodb_cmpmem_reset=ha_innodb_plugin.so

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[ndbd]
# If you are running a MySQL Cluster storage daemon (ndbd) on this machine,
# adjust its connection to the management daemon here.
# Note: ndbd init script requires this to include nodeid!
connect-string="nodeid=2;host=localhost:1186"

[ndb_mgm]
# connection string for MySQL Cluster management tool
connect-string="host=localhost:1186"
------------------------------------------------------------------------------------------

>Also, you mentioned having this issue on two different versions of MySQL. Would you mind
>generating an additional stack trace as per my previous instructions in bug #43758 for
>each of those versions while the hang is happening?

As We upgraded to 5.1.39 We have not the previous scenario with 5.1.30.

By the way, about your AIO appreciation. We were looking some information and We only see and execute some linux commands that maybe will be useful for you. 

Aditionally attachment 4 files:
Resume of processlist when failed (ResumeLokcsMysql.txt)
backtrace in two different moments when ocurred the fails (backtrace.txt, backtraceOld.txt)
and  AIO info of system

Thanks.
[7 Oct 2009 23:53] Ricardo Gomez
resume processlist

Attachment: ResumeLocksMysql.txt (text/plain), 5.41 KiB.

[7 Oct 2009 23:54] Ricardo Gomez
backtrace

Attachment: backtrace.txt (text/plain), 20.18 KiB.

[7 Oct 2009 23:55] Ricardo Gomez
backtraceold

Attachment: backtraceOld.txt (text/plain), 21.05 KiB.

[7 Oct 2009 23:55] Ricardo Gomez
AIO info system

Attachment: AIO_info.txt (text/plain), 7.44 KiB.

[8 Oct 2009 3:04] Roel Van de Paar
Hi Ricardo, Great - thank you for sending in this information. 

Could you also send me the symbol files (mysqld.sym) for both the new and old versions you are running? Thanks..
[8 Oct 2009 13:25] Ricardo Gomez
Hi Roel, attachment the mysqld.sym file. The  file backtraceOld.txt was created before, 3 minutes , which the backtrace.txt file, but I do not generate 2 files mysqld.sym, I only generated a mysqld.sym file which I am enclosing.
[8 Oct 2009 13:26] Ricardo Gomez
mysqld.sym file

Attachment: mysqld.sym.zip (application/zip, text), 169.81 KiB.

[13 Oct 2009 8:09] Roel Van de Paar
backtrace.txt resolved

Attachment: backtrace_resolved.txt (text/plain), 26.68 KiB.

[13 Oct 2009 8:09] Roel Van de Paar
backtraceold.txt resolved

Attachment: backtraceold_resolved.txt (text/plain), 28.63 KiB.

[13 Oct 2009 14:14] Ricardo Gomez
Hi Roel, excuse me, I have one question: What I do with  the resolved files ?
[13 Oct 2009 22:33] Roel Van de Paar
Hi Ricardo, nothing :) They are to make it easier for the developers to have a look at the stack traces.
[14 Oct 2009 7:52] Marko Mäkelä
The system seems to be completely idle in backtrace_resolved.txt, waiting for input from connections.

In backtraceold_resolved.txt, fsync seems to be the culprit. Two other threads are waiting for the mutex in log_write_up_to() while the master thread is attempting to flush the redo log.

The fsync hang could be a bug in libc or the operating system kernel or drivers, or an error in the disk system firmware or hardware (especially connectors), but not likely in InnoDB or MySQL. In this kind of situations, I would examine the logs (kernel log, syslog, dmesg, S.M.A.R.T. event log on the hard disk) and collect some vmstat output.
[14 Oct 2009 23:11] Roel Van de Paar
Ricardo, I am now closing this as 'not a [MySQL/InnoDB] bug', based on the analysis from Marko:

----
The fsync hang could be a bug in libc or the operating system kernel or drivers, or an error in the disk system firmware or hardware (especially connectors), but not likely in InnoDB or MySQL. In this kind of situations, I would examine the logs (kernel log, syslog, dmesg, S.M.A.R.T. event log on the hard disk) and collect some vmstat output.
----

However, if you have any further information which leads you to believe that this is a bug within MySQL or InnoDB, feel free to report it.
[15 Oct 2009 21:32] Ricardo Gomez
Hi Roel, one question: When I changed the INNODB tables for MYISAM tables the problem does not occur, ¿this information not is relevant to possible bug ?
[26 Oct 2009 9:05] Roel Van de Paar
Hard to say.

Are you able to reproduce the bug on another machine and/or on another Linux OS?