Bug #68195 The core dump in my_thread_name function
Submitted: 28 Jan 2013 2:03 Modified: 19 Aug 2013 2:43
Reporter: Nan Xiao Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:5.2.3 OS:Solaris
Assigned to: Bogdan Degtyariov
Tags: core, my_thread_name, ODBC

[28 Jan 2013 2:03] Nan Xiao
Description:
Hi, all:

	I use the newest connector/ODBC(5.2.3) and MySQL(5.2.29), and there may be a core dump in some scenarios (The actual repeat steps can't be found now).
	
	Below is the call stack:
	#0  0xfe931bbc in my_thread_name () at /data1/susie/tools/mysql-5.5.29/mysys/my_thr_init.c:454
	454       if (!tmp->name[0])
	(gdb) bt
	#0  0xfe931bbc in my_thread_name () at /data1/susie/tools/mysql-5.5.29/mysys/my_thr_init.c:454
	#1  0xfe931954 in my_thread_init () at /data1/susie/tools/mysql-5.5.29/mysys/my_thr_init.c:354
	#2  0xfe8e4d48 in mysql_thread_init () at /data1/susie/tools/mysql-5.5.29/libmysql/libmysql.c:235
	#3  0xfe8c143c in my_SQLAllocConnect (henv=0x648e58, phdbc=0x6beda0)
    	at /data1/susie/tools/mysql-connector-odbc-5.2.3-src/driver/handle.c:192
	#4  0xfe8c2db4 in SQLAllocHandle (HandleType=2, InputHandle=0x648e58, OutputHandlePtr=0x6beda0)
    	at /data1/susie/tools/mysql-connector-odbc-5.2.3-src/driver/handle.c:757
	#5  0xff18f51c in __connect_part_one (connection=0x6be840, 
    	driver_lib=0xfe27a880 "/data1/susie/tools/mysql-connector-odbc-5.2.3-src/lib/libmyodbc5a.so", driver_name=0x0, 
    	warnings=0xfe2790b8) at SQLConnect.c:1501
	#6  0xff191bd4 in SQLConnect (connection_handle=0x6be840, server_name=0xeb7e8 "ASG_HA_DB_4", name_length1=-3, 
    	user_name=0xeb828 "root", name_length2=-3, authentication=0xeb868 "aicent", name_length3=-3) at SQLConnect.c:3932
	#7  0x000a3bc8 in db_connect (pTransID=0xbea68 "ss7_db_thread", pDBServer=0xeb7e8, pdbc=0xfe27adac) at db_module.c:567
	#8  0x000a4554 in db_execute_for_query (pTransID=0xbea68 "ss7_db_thread", p_sqlreq=0xfe27ae28, p_sqlres=0xfe27aee0)
    	at db_module.c:806
	#9  0x0007e478 in select_record_from_db (pTransID=0xbea68 "ss7_db_thread", msg_type=QUEUE_MSG_TYPE_SS7_NORMAL, 
    	p_sql_statement=0xfe27aef8 "select * from asg_rep.asg_ss7_normal_msg_table where NextRetryTime<=1359110803 and ExpireTime>1359110803 and ActionState=0 and ASG_ID='A3' limit 20", p_col_info=0x6a8d38, col_num=30, p_sql_result=0xfe27aee0)
    	at sms_queue.c:943
	#10 0x00081328 in ss7_db_thread_process_get_retry_msg_from_db_event (p_thread_context=0x66c930) at sms_queue.c:2303
	#11 0x00080e34 in thread_ss7_process_db (arg=0x633540) at sms_queue.c:2168
	#12 0xff0c8a28 in _lwp_start () from /usr/lib/libc.so.1
	#13 0xff0c8a28 in _lwp_start () from /usr/lib/libc.so.1
	
	Below is the code of my_thread_name() function:
	{
  		char name_buff[100];
  		struct st_my_thread_var *tmp=my_thread_var;
  		if (!tmp->name[0])
  		{
    			my_thread_id id= my_thread_dbug_id();
    			sprintf(name_buff,"T@%lu", (ulong) id);
    			strmake(tmp->name,name_buff,THREAD_NAME_SIZE);
  		}
  		return tmp->name;
	}
	
	The cause of core dump is the tmp is NULL. So why the my_thread_var is NULL?
	
	Below is the steps of building Connector/ODBC driver:
	
	Build MySQL client:
	cmake -DCMAKE_INSTALL_PREFIX=/usr/local/asg_mysql -DCMAKE_BUILD_TYPE=Debug -DWITHOUT_SERVER=1 -DDISABLE_SHARED=1 -DWITH_PIC=1
	make
	make install
	
	Build MySQL Connector/ODBC:
	export MYSQL_DIR=/usr/local/asg_mysql
	cmake -G "Unix Makefiles" -DWITH_UNIXODBC=1 -DCMAKE_BUILD_TYPE=Debug -DANSI=1
	make
	make install
	
Best Regards
Nan Xiao
	
	

How to repeat:
The actual repeat steps can't be found.
[29 Jan 2013 11:19] Bogdan Degtyariov
my_thread_var is actually mapped to _my_thread_var():

#define my_thread_var (_my_thread_var())

and it calls pthread_getspecific()

I do not have any ideas why it might return the NULL pointer.
Looks like this issue needs more specific details.
[30 Jan 2013 2:05] Nan Xiao
Hi, Bogdan:
    
    Thanks very much for your reply!
    
    Below is the MySQL code:
    my_bool my_thread_init(void)
    {
        ......
        mysql_mutex_unlock(&THR_LOCK_threads);
        tmp->init= 1;
        #ifndef DBUG_OFF
  	/* Generate unique name for thread */
  	(void) my_thread_name();
	#endif
    }
    I think the only possibilty is that before calling my_thread_name(), there occurs some interruptions which cause my_thread_end() called. Is it possible? Thanks very much!
 
 Best Regards
 Nan Xiao
[30 Jan 2013 7:05] Bogdan Degtyariov
Hi Nan,

Connector/ODBC calls my_thread_end() function upon deallocating the environment handle. However, this should be done explicitly by calling SQLFreeEnv() from your application code. There is no callback functions for thread attach/detach such as in windows, so no asynchronous calls are to be made at all.

The interruption you mentioned might happen if the client application shares environment and/or connection handles between one or more threads. Does your application do that?
[30 Jan 2013 7:35] Nan Xiao
Hi, Bogdan:
    
    Thanks very much for your reply!
    
    Our application uses connection pool feature of the unixODBC.
    
    Our application allocates only one Environment handle when initializing, and every thread allocates Connection handle in the Environment handle. Is this coding model right? Thanks in advance!
 
 Best Regards
 Nan Xiao
[31 Jan 2013 7:56] Bogdan Degtyariov
Nan,

thanks for your reply. Your model is right. The scenario of the crash with the connection pool looks possible to me because of the following reasons:

the Connector/ODBC does not call my_thread_init() explicitly because the thread initialization is done automatically upon the connect.

However, in case of the connection pool an application thread receives a connection, which is already open. As a result my_thread_init() is not called for such thread and the thread key is not allocated. Naturally, it results in the crash, but only if this thread never opened the actual connections and just re-used the pooled ones.

The solution to this problem is to check whether the thread was initialized before each query execution and call my_init_thread() if necessary.
[31 Jan 2013 9:04] Nan Xiao
Hi, Bogdan:

	Thanks very much for your reply!
	
	But I am very sorry, I can't get a clear idea from your explanation. Below are my questions:
	
	(1) From the function call stack:
	#0  0xfe931bbc in my_thread_name () at /data1/susie/tools/mysql-5.5.29/mysys/my_thr_init.c:454
	454       if (!tmp->name[0])
	(gdb) bt
	#0  0xfe931bbc in my_thread_name () at /data1/susie/tools/mysql-5.5.29/mysys/my_thr_init.c:454
	#1  0xfe931954 in my_thread_init () at /data1/susie/tools/mysql-5.5.29/mysys/my_thr_init.c:354
	......
	
	In the pervious part of my_thread_init() function, the thread-specific data has been allocated. So personally, I think there is a interruption occurs which
cause the thread-specific data corrupted. Is this possible? From your explanation, I can't get a clear knowledge of it.
	
	(2) Is there possibility that I built a bad dynamic library, which cause the crash occurs?
	
	Thanks very much in advance!

Best Regards
Nan Xiao
[7 Feb 2013 3:07] Bogdan Degtyariov
Hi Nan,

I am just saying that the driver crashes when it is trying to execute a query, but my_thread_init() was not called for that particular application thread.

my_thread_init() should be called for any thread that uses mysql functions. It is usually done by the implicit call of my_thread_init() or doing the connect. Neither of that is done when using the connection pool because the connection is allocated and opened in another thread.
[16 Feb 2013 5:48] Nan Xiao
Hi, Bogdan:

	Thanks very much for your reply!
	
	From my understanding of your explanations, the root cause of this issue is using the unixODBC connection pool. Our application (Multi-threads) uses unixODBC connection pool all the time, and I think it should be esay to be reproduced . But this bug only occurs 1 time, and not easy to be reproduced. So I can't understand why the reproduce ratio is low.

Best Regards
Nan Xiao
[18 Feb 2013 8:03] Nan Xiao
Hi, Bogdan:

	Very sorry for interrupting you again.

	Previously, your colleague told me that I can use thread timeouts to solve read block issue (http://bugs.mysql.com/68196). But after reading the code,
I think it may involve NO_ALARM marco. I don't know how to use this feature in Solaris. Because I can't get answer from your colleague, I hope you can give me a hand. Thanks very much in advance!

Best Regards
Nan Xiao
[18 Feb 2013 11:32] Bogdan Degtyariov
Hello Nan,

The repeating ratio for the problem is low because it needs special conditions to trigger. In most situations this is not happening.

Regarding your question to bug 68196:

I shall ask Sinisa to provide more detailed explanations of his idea with the thread timeout in Solaris.

Unfortunately, we do not have the MySQL bugs team anymore and people need to allocate additional time to process bugs reports. So, we really appreciate your patience and understanding if the delays occur.
[19 Feb 2013 1:17] Nan Xiao
Hi, Bogdan:

	Thanks very much for your reply! And I am very appreciate for your patient explanations!

	Regarding to this issue, I think it is a MySQL bug, and you will fix it in future. Is it right?
	Regarding to bug 68196, I will wait for your feedback.

	Thanks very much again! You are a very nice guy!

Best Regards
Nan Xiao
[20 Feb 2013 6:27] Nan Xiao
Hi, Bogdan:

    Very Sorry for interrupting you again. Please forgive me.
   
    In my Solaris 10 environments, I find using the same compile steps, the MySQL ODBC driver can be compiled in 2 flavors (I don't know why):
   
    One is with ltdl library dynamic linked, it likes this:
   
    root@192.168.23.236 # ldd /data/nan/mysql-connector-odbc-5.2.4-src/lib/libmyodbc5a.so
        libodbc.so.2 =>  /usr/lib/libodbc.so.2
        libthread.so.1 =>        /usr/lib/libthread.so.1
        libm.so.2 =>     /usr/lib/libm.so.2
        libodbcinst.so.2 =>      /usr/lib/libodbcinst.so.2
        libgcc_s.so.1 =>         /usr/lib/libgcc_s.so.1
        libltdl.so.7 =>  /usr/local/lib/libltdl.so.7
        libiconv.so.2 =>         /usr/local/lib/libiconv.so.2
        libc.so.1 =>     /usr/lib/libc.so.1
        /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
       
        The other is without ltdl library dynamic linked, it likes this:
       
        bash-3.00# ldd /data1/susie/mysql/tools/test_static_modify/mysql-connector-odbc-5.2.3-src/lib/libmyodbc5a.so
        libodbc.so.2 =>  /usr/local/lib/libodbc.so.2
        libthread.so.1 =>        /usr/lib/libthread.so.1
        libm.so.2 =>     /usr/lib/libm.so.2
        libodbcinst.so.2 =>      /usr/local/lib/libodbcinst.so.2
        libgcc_s.so.1 =>         /usr/local/lib/libgcc_s.so.1
        libiconv.so.2 =>         /usr/local/lib/libiconv.so.2
        libc.so.1 =>     /usr/lib/libc.so.1
        /platform/SUNW,UltraAX-i2/lib/libc_psr.so.1
       
        I find sometimes the unixODBC can't work well with the first MySQL ODBC driver flavour(with ltdl library dynamic linked), such as connecting MySQL failed.
        But unixODBC always work well with the second MySQL ODBC driver flavour.
        
        So my questions are:
        (1) Why the same compile steps can cause 2 kinds of MySQL ODBC driver flavours?
        (2) Does the ltdl library affect the driver behaviour?
       
        Thanks very much in advance!
 
 Best Regards
 Nan Xiao
[20 Feb 2013 10:53] Bogdan Degtyariov
hi Nan,

The driver can be linked against the shard library libltdl.so or the static library archive libltdl.a. In case of linking against the static library the actual code from there is embedded in the driver binary, so ldd does not show any dependencies.

Otherwise, ldd displays something similar to the output you had:

 libltdl.so.7 =>  /usr/local/lib/libltdl.so.7
[21 Feb 2013 6:55] Nan Xiao
Hi, Bogdan´╝Ť

	Thanks very much for your patient replies in these days!
	
	I have summarized all the questions:
	
	(1) For this core dump in my_thread_name function issue, I think it is a MySQL bug, and you will fix it in future. Is it right?
	(2) Regarding to bug 68196, I will wait for your feedback;
	(3) I find in my Solaris 10 environments,
	
	    When unixODBC and MyODBC driver are dynamically linked with ltdl library, sometimes the application will connect MySQL error.
	    But When unixODBC and MyODBC driver aren't dynamically linked with ltdl library, the application runs always well.
	    
	    I am not very sure whether there is a relation with ltdl library. I have sent related logs to Nick(unixODBC leader), and hav't get response from him.
	    Does the linking ltdl library method affect MyODBC driver?
	    
	Thanks very much in advance!

Best Regards
Nan Xiao
[28 Feb 2013 10:02] Bogdan Degtyariov
Hi Nan,

This bug has the status "Verified", which means that we consider it our bug and are going to fix it. However, the fix is planned on a different level than ODBC driver. It is going to be in libmysqlclient library, which the driver links statically.

About libltdl I do not know how it can affect the driver's ability to connect. Theoretically it should not create any problems. Anyway, we are planning to stop using libltdl in the future releases.
[6 Mar 2013 8:48] Nan Xiao
Hi, Bogdan:

    Very sorry for my late response!
    OK, I get your idea. Thanks very much for your patient reply! You can close the issue.

Best Regards
Nan Xiao
[7 Aug 2013 2:06] Nan Xiao
Hi, Bogdan:

	Sorry for interrupting you again.
	
	I find there is no fix for this issue in the recent releases. Because this issue occurs again in our production environment, could you help to provide a temporary patch for it?
	
	Thanks very much in advance!
	
Best Regards
Nan Xiao
[8 Aug 2013 10:38] Bogdan Degtyariov
Hi Nan,

Unfortunately, our connectors group does not own all the code that goes into Connector/ODBC driver. The large part of low-level client functions such as transport protocol, encryption, thread storage is linked as mysql client library, which is the part of the server distribution (a separate product, though related).

The server developers do not want to accept that patch only for connectors sake.
I will try to request another independent review, but cannot guarantee anything...
Sorry, it causes a lot of inconvenience to you, but that is all I can do at the moment.
[9 Aug 2013 3:11] Nan Xiao
Hi, Bogdan:

	Firstly, thanks very much for your kindly help! You are a very nice engineer!
	
	To be honest, I can't understand the root cause of this issue throughly. Per my understandings, it is a libmysqlclient bug, not Connectot/ODBC bug, right?

Best Regards
Nan Xiao
[9 Aug 2013 7:26] Bogdan Degtyariov
Hi Nan,

Thank you for your kind words.
Before discussing the problem itself I wanted to explain how libmysqlclient is a part of MySQL Connector/ODBC. The libmysqlclient library is a part of MySQL Server distribution, but we use it for building our Connector/ODBC driver.
The code from libmysqlclient is statically linked and embedded into the ODBC driver library file, so for the end user they are tightly integrated with each other.

libmysqlclient exports a set of functions, which we call MySQL C API.
The thread management issues in libmysqlclient occur because it is not made specially for the ODBC driver needs. The C API assumes that whoever uses it has the full control of the client application including threads creating/stopping. The Connector/ODBC does not have such control. It does not even know when the new thread is created or stopped. Before the new thread can work with any MySQL C API functions it needs to allocate the thread-specific storage. In most cases this is done automatically when the connection is established. However, in case of the connection pooling the connection is established in one thread, but later re-used in another thread. So, the crash might occur. Also, when the first thread, which established the connection finished the TLS is not deallocated causing memory leaks.

In order to do things right we need the access to libmysqlclient internals from the upper level of the ODBC driver. This can be done by exporting some internal functions or improving the automatic initialization and clean-up routine inside libmysqlclient.

We know what should be done from the programming point of view, but formal procedures of making such changes is very slow.
[9 Aug 2013 9:23] Nan Xiao
Hi, Bogdan:

	Thanks very much for your reply!
	
	Below are my understandings:
	(1) libmysqlclient only needs export some APIs, no need to modify other code, right?
	(2) Need to modify Connector/ODBC code to control libmysqlclient.
	
	I think if my understanding is right, could you tell me the modifications about libmysqlclient and Connector/ODBC? I think I can build libmysqlclient myself and test it.
	
	Thanks very much in advance!

Best Regards
Nan Xiao
[16 Aug 2013 1:34] Nan Xiao
Hi, Bogdan:

    Any comments for my proposals? Thanks very much!

Best Regards
Nan Xiao
[16 Aug 2013 7:04] Bogdan Degtyariov
Nan,

Sorry for the late reply. Unfortunately, the patch is not ready on 100%. I offered the concept to the developers, but I cannot afford spending the working time on developing and testing something without the guarantee of the result.

However, there are good news. My concept and partial patch are being reviewed and discussed again, which gives me hopes about the good outcome. The procedure is not fast and many people involved in it are on vacations, so it will take some time before the final decision is made.
[16 Aug 2013 7:11] Nan Xiao
Hi, Bogdan:

    Thanks very much for your reply!
    If possbile, I'd like to test your patch.

Best Regards
Nan Xiao
[16 Aug 2013 7:42] Bogdan Degtyariov
Nan,

As I said, the patch is not complete and it is useless in its current state. About 8 hours of extra work is needed to bring it to some acceptable condition and do the preliminary smoke-testing. 

Publishing of uncompleted patches is the violation of the company policy. So, we have to wait until the situation is resolved on the upper management level and I have the permission to finish my work.
[19 Aug 2013 2:43] Nan Xiao
Hi, Bogdan:

    Thanks very much for your kindly explanations!
    I understand the situation, and wait for your following patch patiently.

Best Regards
Nan Xiao