Bug #53322 deadlock with FLUSH TABLES WITH READ LOCK and DROP FUNCTION
Submitted: 30 Apr 2010 16:58 Modified: 20 Jan 2011 19:34
Reporter: Sasha Pachev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.1.35, 5.1, 5.6.99 bzr OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any

[30 Apr 2010 16:58] Sasha Pachev
Description:
The following mysqltest test case causes a deadlock:

connect (th1,localhost,root);
connect (th2,localhost,root);
connection th1;
flush tables with read lock;
connection th2;
--send drop function example;
sleep 5;
connection th1;
select example("foo";
connection th2;
reap;

This assumes there is a pre-loaded UDF example that takes one string argument. This can be any UDF.

How to repeat:
See description.

Suggested fix:
I have debugged with with gdb. The trouble code is mysql_drop_function() in sql/sql_udf.cc. The problem is that th2 in DROP FUNCTION write-locks THR_LOCK_udf and then starts waiting for the global read lock to be released so it can update mysql.func table. But th1 holds the global read lock and is attempting a read lock on THR_LOCK_udf.

It seems like a good fix is to change mysql_drop_function() to call call open_ltable() on mysql.func before acquiring THR_LOCK_udf. Would that introduce side effects?

Same problem appears to be in mysql_create_function() as well.
[30 Apr 2010 21:27] Sasha Pachev
In case this is not obvious, the test case is a mysqltest script.
[2 May 2010 14:45] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior with function metaphon from udf_example library and BZR development sources. Please try with current version 5.1.46 and if problem still exists provide code for UDF function you use.
[3 May 2010 18:28] Sasha Pachev
I am able to deadlock 5.1.46 with the following:

connect (th1,localhost,root);
connect (th2,localhost,root);

connection th1;
create function metaphon returns string soname 'udf_example.so';
create function reverse_lookup returns string soname 'udf_example.so';
flush tables with read lock;
connection th2;
--send drop function metaphon;
sleep 5;
connection th1;
select metaphon("foo");
connection th2;
reap;

Removing create function reverse_lookup ... changes the behavior. I get 

query 'select metaphon("foo")' failed: 1305: FUNCTION test.metaphon does not exist

instead.
[4 May 2010 6:26] Sveta Smirnova
Thank you for the feedback.

Verified as described.

Stack trace after timeout error:

The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/users/ssmirnova/blade12/src/mysql-5.1/sql/mysqld --defaults-group-suffix=.1 --'.
Program terminated with signal 6, Aborted.
#0  0x0000003429e0b002 in pthread_kill () from /lib64/libpthread.so.0
#0  0x0000003429e0b002 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000b23fd5 in my_write_core (sig=6) at stacktrace.c:329
#2  0x000000000069625e in handle_segfault (sig=6) at mysqld.cc:2571
#3  <signal handler called>
#4  0x00000034292c6952 in __select_nocancel () from /lib64/libc.so.6
#5  0x000000000069a186 in handle_connections_sockets (arg=0x0) at mysqld.cc:5055
#6  0x00000000006995ad in main (argc=9, argv=0x7fff68d21a78) at mysqld.cc:4539

Thread 4 (process 15336):
#0  0x0000003429e0da78 in do_sigwait () from /lib64/libpthread.so.0
#1  0x0000003429e0db1d in sigwait () from /lib64/libpthread.so.0
#2  0x0000000000696ac1 in signal_hand (arg=0x0) at mysqld.cc:2773
#3  0x0000003429e061b5 in start_thread () from /lib64/libpthread.so.0
#4  0x00000034292cd39d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 3 (process 15380):
#0  0x0000003429e09a8d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1  0x0000000000842caa in find_udf (name=0xd89a108 "metaphon", length=8, mark_used=false) at sql_udf.cc:329
#2  0x00000000006dbb2b in MYSQLparse (yythd=0xd8b1c78) at sql_yacc.yy:7984
#3  0x00000000006b7942 in parse_sql (thd=0xd8b1c78, parser_state=0x40ac2410, creation_ctx=0x0) at sql_parse.cc:7847
#4  0x00000000006b34df in mysql_parse (thd=0xd8b1c78, inBuf=0xd89a068 "select metaphon(\"foo\")", length=22, found_semicolon=0x40ac2ec0) at sql_parse.cc:5934
#5  0x00000000006a5c91 in dispatch_command (command=COM_QUERY, thd=0xd8b1c78, packet=0xd886a59 "select metaphon(\"foo\")", packet_length=22) at sql_parse.cc:1233
#6  0x00000000006a4c48 in do_command (thd=0xd8b1c78) at sql_parse.cc:874
#7  0x00000000006a2f49 in handle_one_connection (arg=0xd8b1c78) at sql_connect.cc:1127
#8  0x0000003429e061b5 in start_thread () from /lib64/libpthread.so.0
#9  0x00000034292cd39d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()

Thread 2 (process 15381):
#0  0x0000003429e0a376 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000b2f89a in safe_cond_wait (cond=0x1183660, mp=0x1182ce0, file=0xc27115 "lock.cc", line=1508) at thr_mutex.c:237
#2  0x000000000068fc19 in wait_if_global_read_lock (thd=0xd89c0a8, abort_on_refresh=true, is_not_commit=true) at lock.cc:1508
#3  0x000000000068cf66 in mysql_lock_tables (thd=0xd89c0a8, tables=0x40b023e0, count=1, flags=0, need_reopen=0x40b02307) at lock.cc:224
#4  0x00000000006fb795 in open_ltable (thd=0xd89c0a8, table_list=0x40b02350, lock_type=TL_WRITE, lock_flags=0) at sql_base.cc:4972
#5  0x00000000008439e9 in mysql_drop_function (thd=0xd89c0a8, udf_name=0xd8a6b50) at sql_udf.cc:576
#6  0x00000000006af5ce in mysql_execute_command (thd=0xd89c0a8) at sql_parse.cc:4546
#7  0x00000000006b365c in mysql_parse (thd=0xd89c0a8, inBuf=0xd8a6a88 "drop function metaphon", length=22, found_semicolon=0x40b03ec0) at sql_parse.cc:5971
#8  0x00000000006a5c91 in dispatch_command (command=COM_QUERY, thd=0xd89c0a8, packet=0xd89e9f9 "drop function metaphon;", packet_length=23) at sql_parse.cc:1233
#9  0x00000000006a4c48 in do_command (thd=0xd89c0a8) at sql_parse.cc:874
#10 0x00000000006a2f49 in handle_one_connection (arg=0xd89c0a8) at sql_connect.cc:1127
#11 0x0000003429e061b5 in start_thread () from /lib64/libpthread.so.0
#12 0x00000034292cd39d in clone () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 1 (process 15334):
#0  0x0000003429e0b002 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000b23fd5 in my_write_core (sig=6) at stacktrace.c:329
#2  0x000000000069625e in handle_segfault (sig=6) at mysqld.cc:2571
#3  <signal handler called>
#4  0x00000034292c6952 in __select_nocancel () from /lib64/libc.so.6
#5  0x000000000069a186 in handle_connections_sockets (arg=0x0) at mysqld.cc:5055
#6  0x00000000006995ad in main (argc=9, argv=0x7fff68d21a78) at mysqld.cc:4539
[20 Dec 2010 12:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127297

3450 Jon Olav Hauglid	2010-12-20
      Bug #53322 deadlock with FLUSH TABLES WITH READ LOCK and DROP FUNCTION
      
      This deadlock could occur between two connections if one connection
      first locked the mysql.func table (using either FLUSH TABLES WITH 
      READ LOCK or LOCK TABLE mysql.func WRITE). If the second connection
      then tried to either CREATE or DROP an UDF function, a deadlock would
      occur when the first connection tried to use an UDF function.
      
      The reason for the deadlock was the way the THR_LOCK_udf rwlock was
      used in the UDF handling code. For CREATE or DROP FUNCTION (UDF),
      THR_LOCK_udf was write locked before mysql.func was locked and opened.
      This meant that another connection first locking mysql.func and later
      using an UDF function (and thus locking THR_LOCK_udf), could cause
      a deadlock.
      
      This patch fixes the problem by changing the CREATE FUNCTION (UDF)
      implementation to open mysql.func before locking THR_LOCK_udf. The
      DROP FUNCTION (UDF) implementation is changed so that THR_LOCK_udf
      is unlocked before opening mysql.func.
      
      Test case added to udf.test.
[10 Jan 2011 15:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128326

3479 Jon Olav Hauglid	2011-01-10
      Bug #53322 deadlock with FLUSH TABLES WITH READ LOCK and DROP FUNCTION
      
      This deadlock could occur between two connections if one connection
      first locked the mysql.func table (using either FLUSH TABLES WITH 
      READ LOCK or LOCK TABLE mysql.func WRITE). If the second connection
      then tried to either CREATE or DROP an UDF function, a deadlock would
      occur when the first connection tried to use an UDF function.
      
      The reason for the deadlock was the way the THR_LOCK_udf rwlock was
      used in the UDF handling code. For CREATE or DROP FUNCTION (UDF),
      THR_LOCK_udf was write locked before mysql.func was locked and opened.
      This meant that another connection first locking mysql.func and later
      using an UDF function (and thus locking THR_LOCK_udf), could cause
      a deadlock.
      
      This patch fixes the problem by changing the CREATE and DROP FUNCTION
      (UDF) implementation to open mysql.func before locking THR_LOCK_udf.
      
      Test case added to udf.test.
[10 Jan 2011 16:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128335

3479 Jon Olav Hauglid	2011-01-10
      Bug #53322 deadlock with FLUSH TABLES WITH READ LOCK and DROP FUNCTION
      
      This deadlock could occur between two connections if one connection
      first locked the mysql.func table (using either FLUSH TABLES WITH 
      READ LOCK or LOCK TABLE mysql.func WRITE). If the second connection
      then tried to either CREATE or DROP an UDF function, a deadlock would
      occur when the first connection tried to use an UDF function.
      
      The reason for the deadlock was the way the THR_LOCK_udf rwlock was
      used in the UDF handling code. For CREATE or DROP FUNCTION (UDF),
      THR_LOCK_udf was write locked before mysql.func was locked and opened.
      This meant that another connection first locking mysql.func and later
      using an UDF function (and thus locking THR_LOCK_udf), could cause
      a deadlock.
      
      This patch fixes the problem by changing the CREATE and DROP FUNCTION
      (UDF) implementation to open mysql.func before locking THR_LOCK_udf.
      
      Test case added to udf.test.
[10 Jan 2011 16:28] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:jon.hauglid@oracle.com-20110110162745-631sf4i0orwehwfr) (version source revid:jon.hauglid@oracle.com-20110110162745-631sf4i0orwehwfr) (merge vers: 5.6.2) (pib:24)
[20 Jan 2011 19:34] Paul DuBois
Noted in 5.6.2 changelog.

If one connection locked the mysql.func table using either FLUSH
TABLES WITH READ LOCK or LOCK TABLE mysql.func WRITE and a second
connection tried to either create or drop a UDF function, a deadlock
occurred when the first connection tried to use a UDF function.