Bug #40552 Race condition around default_directories in load_defaults()
Submitted: 6 Nov 2008 15:10 Modified: 27 Mar 2009 15:26
Reporter: Alexey Kopytov Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: C API (client library) Severity:S3 (Non-critical)
Version:5.0, 5.1, 6.0 OS:Any
Assigned to: Alexey Kopytov
Tags: crash, pushbuild, sporadic, test failure, widespread
Triage: Triaged: D1 (Critical)

[6 Nov 2008 15:10] Alexey Kopytov
Found by analyzing sporadic sysbench crashes on Solaris, but turned out to be a general problem in the client library and load_defaults().

load_defaults() uses a global variable (default_directories) to store an array of options file directories. This results in a race condition, since the client lib calls load_defaults() for each mysql_real_connect(), and so there may be multiple concurrent threads executing it. Memory referenced by default_directories may still be in use by some thread when it has already been freed in free_defaults() by the thread that allocated it.

This is likely an ancient bug, I could reproduce it on 5.0-bzr, 5.1-bzr and 6.0-bzr. For debug builds of the client library, sometimes assertion failure in mf_arr_appstr.c occurs instead of a segmentation fault.

How to repeat:
Call mysql_real_connect() with high concurrency. Steps to reproduce with sysbench: 

1. Download sysbench: svn co  https://sysbench.svn.sourceforge.net/svnroot/sysbench/

2. cd sysbench/trunk; ./autogen.sh; ./configure; make

3. Save the following test file as reconnect.lua:

--- cut ---
function event(thread_id)
--- cut ---

4. sysbench --test=reconnect.lua --num-threads=4 --max-requests=0 run

On a quadcore machine it crashes in a few seconds with the following stacktrace:

#0  my_search_option_files (conf_file=0x7f856d183218 "my", argc=<value optimized out>, argv=<value optimized out>, args_used=<value optimized out>, 
    func=0x7f856d043a60 <handle_default_option>, func_ctx=0x41aab0a0) at default.c:237
#1  0x00007f856d044b0f in load_defaults (conf_file=0x7f856d183218 "my", groups=0x41aab130, argc=0x41aab14c, argv=0x41aab138) at default.c:442
#2  0x00007f856d0684d2 in mysql_read_default_options (options=0x2171d70, filename=0x7f856d183218 "my", group=<value optimized out>) at client.c:1003
#3  0x00007f856d069ca0 in mysql_real_connect (mysql=0x21719e0, host=0x2136330 "localhost", user=0x21364f0 "sbtest", passwd=0x0, db=0x2136600 "sbtest", port=3306, 
    unix_socket=0x0, client_flag=65536) at client.c:1851
#4  0x0000000000410ad9 in mysql_drv_connect (sb_conn=<value optimized out>) at drv_mysql.c:308
#5  0x00000000004096a1 in db_connect (drv=0x647020) at db_driver.c:270
#6  0x000000000040f59a in sb_lua_db_connect (L=0x214ea50) at script_lua.c:568
#7  0x0000000000414e78 in luaD_precall (L=0x214ea50, func=<value optimized out>, nresults=0) at ldo.c:319
#8  0x0000000000423906 in luaV_execute (L=0x214ea50, nexeccalls=1) at lvm.c:587
#9  0x0000000000415975 in luaD_call (L=0x214ea50, func=0x21953f0, nResults=1) at ldo.c:377
#10 0x0000000000414a87 in luaD_rawrunprotected (L=0x214ea50, f=0x412100 <f_call>, ud=0x41aabfd0) at ldo.c:116
#11 0x0000000000414b05 in luaD_pcall (L=0x7f856d183218, func=0x41aab140, u=0x41aaafe0, old_top=5312, ef=-72340172838076673) at ldo.c:461
#12 0x0000000000411e62 in lua_pcall (L=0x214ea50, nargs=1, nresults=1, errfunc=<value optimized out>) at lapi.c:817
#13 0x000000000040db26 in sb_lua_op_execute_request (sb_req=<value optimized out>, thread_id=<value optimized out>) at script_lua.c:278
#14 0x0000000000404c5d in runner_thread (arg=<value optimized out>) at sysbench.c:386
#15 0x00007f856c97c3ea in start_thread () from /lib/libpthread.so.0
#16 0x00007f856be3fc6d in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? () 

Suggested fix:
Do not use any global vars pointing to thread local memory in default.c
[6 Nov 2008 17:23] Alexey Stroganov
I would add note that there is kind of workaround exists that may help to decrease significantly probability of the happening  cases when race condition leads to segfault.

Just add blank my.cnf files to places where libmysqlclient will look for them i.e.
/etc/my.cnf and ~/.my.cnf.
[13 Nov 2008 2:36] Vladislav Vaintroub
Removing mysql_options() from sysbench code will fix the problem. Don't know if this is  an acceptable workaround.
[13 Nov 2008 2:48] Vladislav Vaintroub
In previous  comment of course sysbench problem is meant, not the race in client code. This works because options are re-read only in some cases( non-default config file or , as in sysbench case, non-default group).

The patch against current sysbench trunk could be like below.

--- drv_mysql.c	(revision 41)
+++ drv_mysql.c	(working copy)
@@ -280,10 +280,8 @@
     hosts_pos = SB_LIST_ITEM_NEXT(hosts_pos);
   host = SB_LIST_ENTRY(hosts_pos, value_t, listitem)->data;
-  mysql_options(con, MYSQL_READ_DEFAULT_GROUP, "sysbench");
-  DEBUG("mysql_options(%p, MYSQL_READ_DEFAULT_GROUP, \"sysbench\")", con);
   if (args.use_ssl)
     ssl_key= "client-key.pem";
[27 Feb 2009 9:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


2812 Alexey Kopytov	2009-02-27
      Fix for bug #40552: Race condition around default_directories  
                          in load_defaults() 
      load_defaults(), my_search_option_files() and 
      my_print_default_files()  utilized a global variable 
      containing  a pointer to thread local memory. This could lead 
      to race conditions when those functions were called with high 
      Fixed by changing the interface of the said functions to avoid 
      the necessity for using a global variable.
      Since we cannot change load_defaults() prototype for API
      compatibility reasons, it was renamed my_load_defaults().
      Now load_defaults() is a thread-unsafe wrapper around
      a thread-safe version, my_load_defaults().
[16 Mar 2009 10:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


2840 Alexey Kopytov	2009-03-16 [merge]
      Manual merge of patch for bug #40552 into the team tree.
      Replaced a call to load_defaults() in sql_plugin.cc with 
      its thread-safe version.
[18 Mar 2009 13:16] Bugs System
Pushed into 6.0.11-alpha (revid:joro@sun.com-20090318122208-1b5kvg6zeb4hxwp9) (version source revid:joro@sun.com-20090317133112-41qn6aly7arljtlq) (merge vers: 6.0.11-alpha) (pib:6)
[19 Mar 2009 3:17] Paul Dubois
Noted in 6.0.11 changelog.

The load_defaults(), my_search_option_files() and
my_print_default_files() functions in the C client library were
subject to a race condition in multi-threaded operation.

Setting report to NDI pending push into 5.1.x.
[27 Mar 2009 14:56] Bugs System
Pushed into 5.1.34 (revid:joro@sun.com-20090327143448-wuuuycetc562ty6o) (version source revid:leonard@mysql.com-20090316090622-sr8lylqvsl1jrcnv) (merge vers: 5.1.34) (pib:6)
[27 Mar 2009 15:26] Paul Dubois
Noted in 5.1.34 changelog.
[9 May 2009 16:39] Bugs System
Pushed into 5.1.34-ndb-6.2.18 (revid:jonas@mysql.com-20090508185236-p9b3as7qyauybefl) (version source revid:jonas@mysql.com-20090508185236-p9b3as7qyauybefl) (merge vers: 5.1.34-ndb-6.2.18) (pib:6)
[9 May 2009 17:36] Bugs System
Pushed into 5.1.34-ndb-6.3.25 (revid:jonas@mysql.com-20090509063138-1u3q3v09wnn2txyt) (version source revid:jonas@mysql.com-20090509063138-1u3q3v09wnn2txyt) (merge vers: 5.1.34-ndb-6.3.25) (pib:6)
[9 May 2009 18:34] Bugs System
Pushed into 5.1.34-ndb-7.0.6 (revid:jonas@mysql.com-20090509154927-im9a7g846c6u1hzc) (version source revid:jonas@mysql.com-20090509154927-im9a7g846c6u1hzc) (merge vers: 5.1.34-ndb-7.0.6) (pib:6)