Bug #40552 Race condition around default_directories in load_defaults()
Submitted: 6 Nov 2008 16:10 Modified: 27 Mar 16:26
Reporter: Alexey Kopytov
Status: Closed
Category:C API Severity:S3 (Non-critical)
Version:5.0, 5.1, 6.0 OS:Any
Assigned to: Alexey Kopytov Target Version:5.1+
Tags: crash, widespread, sporadic, test failure, pushbuild
Triage: Triaged: D1 (Critical)

[6 Nov 2008 16:10] Alexey Kopytov
Description:
Found by analyzing sporadic sysbench crashes on Solaris, but turned out to be a general
problem in the client library and load_defaults().

load_defaults() uses a global variable (default_directories) to store an array of options
file directories. This results in a race condition, since the client lib calls
load_defaults() for each mysql_real_connect(), and so there may be multiple concurrent
threads executing it. Memory referenced by default_directories may still be in use by
some thread when it has already been freed in free_defaults() by the thread that
allocated it.

This is likely an ancient bug, I could reproduce it on 5.0-bzr, 5.1-bzr and 6.0-bzr. For
debug builds of the client library, sometimes assertion failure in mf_arr_appstr.c occurs
instead of a segmentation fault.

How to repeat:
Call mysql_real_connect() with high concurrency. Steps to reproduce with sysbench: 

1. Download sysbench: svn co  https://sysbench.svn.sourceforge.net/svnroot/sysbench/

2. cd sysbench/trunk; ./autogen.sh; ./configure; make

3. Save the following test file as reconnect.lua:

--- cut ---
function event(thread_id)
   db_connect()
   db_disconnect()
end
--- cut ---

4. sysbench --test=reconnect.lua --num-threads=4 --max-requests=0 run

On a quadcore machine it crashes in a few seconds with the following stacktrace:

#0  my_search_option_files (conf_file=0x7f856d183218 "my", argc=<value optimized out>,
argv=<value optimized out>, args_used=<value optimized out>, 
    func=0x7f856d043a60 <handle_default_option>, func_ctx=0x41aab0a0) at default.c:237
#1  0x00007f856d044b0f in load_defaults (conf_file=0x7f856d183218 "my",
groups=0x41aab130, argc=0x41aab14c, argv=0x41aab138) at default.c:442
#2  0x00007f856d0684d2 in mysql_read_default_options (options=0x2171d70,
filename=0x7f856d183218 "my", group=<value optimized out>) at client.c:1003
#3  0x00007f856d069ca0 in mysql_real_connect (mysql=0x21719e0, host=0x2136330
"localhost", user=0x21364f0 "sbtest", passwd=0x0, db=0x2136600 "sbtest", port=3306, 
    unix_socket=0x0, client_flag=65536) at client.c:1851
#4  0x0000000000410ad9 in mysql_drv_connect (sb_conn=<value optimized out>) at
drv_mysql.c:308
#5  0x00000000004096a1 in db_connect (drv=0x647020) at db_driver.c:270
#6  0x000000000040f59a in sb_lua_db_connect (L=0x214ea50) at script_lua.c:568
#7  0x0000000000414e78 in luaD_precall (L=0x214ea50, func=<value optimized out>,
nresults=0) at ldo.c:319
#8  0x0000000000423906 in luaV_execute (L=0x214ea50, nexeccalls=1) at lvm.c:587
#9  0x0000000000415975 in luaD_call (L=0x214ea50, func=0x21953f0, nResults=1) at
ldo.c:377
#10 0x0000000000414a87 in luaD_rawrunprotected (L=0x214ea50, f=0x412100 <f_call>,
ud=0x41aabfd0) at ldo.c:116
#11 0x0000000000414b05 in luaD_pcall (L=0x7f856d183218, func=0x41aab140, u=0x41aaafe0,
old_top=5312, ef=-72340172838076673) at ldo.c:461
#12 0x0000000000411e62 in lua_pcall (L=0x214ea50, nargs=1, nresults=1, errfunc=<value
optimized out>) at lapi.c:817
#13 0x000000000040db26 in sb_lua_op_execute_request (sb_req=<value optimized out>,
thread_id=<value optimized out>) at script_lua.c:278
#14 0x0000000000404c5d in runner_thread (arg=<value optimized out>) at sysbench.c:386
#15 0x00007f856c97c3ea in start_thread () from /lib/libpthread.so.0
#16 0x00007f856be3fc6d in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? () 

Suggested fix:
Do not use any global vars pointing to thread local memory in default.c
[6 Nov 2008 18:23] Alexey Stroganov
I would add note that there is kind of workaround exists that may help to decrease
significantly probability of the happening  cases when race condition leads to segfault.

Just add blank my.cnf files to places where libmysqlclient will look for them i.e.
/etc/my.cnf and ~/.my.cnf.
[13 Nov 2008 3:36] Vladislav Vaintroub
Removing mysql_options() from sysbench code will fix the problem. Don't know if this is 
an acceptable workaround.
[13 Nov 2008 3:48] Vladislav Vaintroub
In previous  comment of course sysbench problem is meant, not the race in client code.
This works because options are re-read only in some cases( non-default config file or ,
as in sysbench case, non-default group).

The patch against current sysbench trunk could be like below.

===================================================================
--- drv_mysql.c	(revision 41)
+++ drv_mysql.c	(working copy)
@@ -280,10 +280,8 @@
     hosts_pos = SB_LIST_ITEM_NEXT(hosts_pos);
   host = SB_LIST_ENTRY(hosts_pos, value_t, listitem)->data;
   pthread_mutex_unlock(&hosts_mutex);
-  
-  mysql_options(con, MYSQL_READ_DEFAULT_GROUP, "sysbench");
-  DEBUG("mysql_options(%p, MYSQL_READ_DEFAULT_GROUP, \"sysbench\")", con);
 
+
   if (args.use_ssl)
   {
     ssl_key= "client-key.pem";
[27 Feb 10:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67809

2812 Alexey Kopytov	2009-02-27
      Fix for bug #40552: Race condition around default_directories  
                          in load_defaults() 
      
      load_defaults(), my_search_option_files() and 
      my_print_default_files()  utilized a global variable 
      containing  a pointer to thread local memory. This could lead 
      to race conditions when those functions were called with high 
      concurrency. 
      
      Fixed by changing the interface of the said functions to avoid 
      the necessity for using a global variable.
      
      Since we cannot change load_defaults() prototype for API
      compatibility reasons, it was renamed my_load_defaults().
      Now load_defaults() is a thread-unsafe wrapper around
      a thread-safe version, my_load_defaults().
      modified:
        include/my_sys.h
        mysys/default.c
        server-tools/instance-manager/instance_map.cc
        server-tools/instance-manager/options.cc
        server-tools/instance-manager/options.h
        sql-common/client.c
[16 Mar 11:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69272

2840 Alexey Kopytov	2009-03-16 [merge]
      Manual merge of patch for bug #40552 into the team tree.
      Replaced a call to load_defaults() in sql_plugin.cc with 
      its thread-safe version.
      modified:
        include/my_sys.h
        mysys/default.c
        server-tools/instance-manager/instance_map.cc
        server-tools/instance-manager/options.cc
        server-tools/instance-manager/options.h
        sql-common/client.c
        sql/sql_plugin.cc
[18 Mar 14:16] Bugs System
Pushed into 6.0.11-alpha (revid:joro@sun.com-20090318122208-1b5kvg6zeb4hxwp9) (version
source revid:joro@sun.com-20090317133112-41qn6aly7arljtlq) (merge vers: 6.0.11-alpha)
(pib:6)
[19 Mar 4:17] Paul DuBois
Noted in 6.0.11 changelog.

The load_defaults(), my_search_option_files() and
my_print_default_files() functions in the C client library were
subject to a race condition in multi-threaded operation.

Setting report to NDI pending push into 5.1.x.
[27 Mar 15:56] Bugs System
Pushed into 5.1.34 (revid:joro@sun.com-20090327143448-wuuuycetc562ty6o) (version source
revid:leonard@mysql.com-20090316090622-sr8lylqvsl1jrcnv) (merge vers: 5.1.34) (pib:6)
[27 Mar 16:26] Paul DuBois
Noted in 5.1.34 changelog.
[9 May 18:39] Bugs System
Pushed into 5.1.34-ndb-6.2.18 (revid:jonas@mysql.com-20090508185236-p9b3as7qyauybefl)
(version source revid:jonas@mysql.com-20090508185236-p9b3as7qyauybefl) (merge vers:
5.1.34-ndb-6.2.18) (pib:6)
[9 May 19:36] Bugs System
Pushed into 5.1.34-ndb-6.3.25 (revid:jonas@mysql.com-20090509063138-1u3q3v09wnn2txyt)
(version source revid:jonas@mysql.com-20090509063138-1u3q3v09wnn2txyt) (merge vers:
5.1.34-ndb-6.3.25) (pib:6)
[9 May 20:34] Bugs System
Pushed into 5.1.34-ndb-7.0.6 (revid:jonas@mysql.com-20090509154927-im9a7g846c6u1hzc)
(version source revid:jonas@mysql.com-20090509154927-im9a7g846c6u1hzc) (merge vers:
5.1.34-ndb-7.0.6) (pib:6)