Bug #48406 MyODBC crashes application after server gone away
Submitted: 29 Oct 2009 12:08 Modified: 10 Jun 2010 19:39
Reporter: Edwin van Putten Email Updates:
Status: Duplicate Impact on me:
None 
Category:Connector / ODBC Severity:S1 (Critical)
Version:5.1.5GA / 5.1.6 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: Citect, Intersolv, SCADA, SQL_DATABASE_NAME, SQLDisconnect, SQLGetInfoW

[29 Oct 2009 12:08] Edwin van Putten
Description:

Some background: calling application is a SCADA system logging to MySQL. After a period of inactivity, MyODBC loses its TCP/IP connection and subsequently crashes.

The reason for this crash is a stray pointer (dbc->database).

You run into this problem if you happen to call SQLGetInfoW(dbc,SQL_DATABASE_NAME,.....) on a broken connection ("server has gone away").

Before crashing you'll get a "failed to return current catalog" for the SQLGetInfoW call.
 

How to repeat:

The MySQL ODBC connector will always crash if you perform the following steps:

1. start your application which sets up a plain MyODBC connection and does some stuff (doesn't really matter what, as long as its connected)

2. make server go away (stop windows MySQL service, wait for TCP/IP timeout etc.)

3. (re)start server (start windows MySQL service)

4. calling application attempts to do something with the (now broken) connection, it happens to be a call to:

SQLGetInfoW(dbc,SQL_DATABASE_NAME, ....    (there might be other variants too that cause problems)

5. SQLGetInfoW returns error [HY000] [MySQL][ODBC 5.1 Driver]SQLGetInfo() failed to return current catalog. (0) 

6. Upon learning this, calling application decides to close the connection because of this reason (with the intention to reconnect later)

7. calling application proceeds to call SQLDisconnect()

*crash*
(at: connect.c, SQLDisconnect(): x_free(dbc->database); 

Suggested fix:

Fix how function 'reget_current_catalog' treats the dbc->database handle?

It seems to me that the first line from function 'reget_current_catalog' frees the handle but doesn't NULL the pointer?

my_free(dbc->database,MYF(0)); <-- this seems to cause problems and causes dbc->database to become a stray pointer?
[29 Oct 2009 13:24] Tonci Grgin
Hi Edwin and thanks for your report.

I believe that is the exact function of "Auto reconnect" option. Can you please try to test with that option enabled? If it fails, please attach *complete* test case.
[29 Oct 2009 17:03] Edwin van Putten
I just uploaded a file which contains a simple QnD test program I wrote to reproduce the problem...

You can connect to the DB by clicking a button.
After that, you restart the MySQL server.
Then you press the execute query button. You'll see the catalog error I was writing about.

Finally you proceed click the "disconnect" button and it crashes...

It needs a predefined DSN. 

P.S. I already had that AUTO_RECONNECT flag enabled in my DSN options.
[30 Oct 2009 13:57] Tonci Grgin
Edwin, I decided to take another approach and test with odbcte (default MS ODBC client):
  o Description: Make c/ODBC (32bit) system DSN with "Auto reconnect" option checked, connect, issue "SHOW TABLES", shut remote MySQL server down, issue SQLGetInfo(SQL_DATABASE_NAME), after error, issue SQLDisconnect

	Full Connect(Default)

	Env. Attr. SQL_ATTR_ODBC_VERSION set to SQL_OV_ODBC3

	Successfully connected to DSN '5-1-5-on-opensol'.
SQLExecDirect:	In:	hstmt = 0x00135B48, szSqlStr = "", cbSqlStr = -3	Return:	SQL_SUCCESS=0

Get Data All:
"Tables_in_test"
"PrepayMSISDNs"
"Repro"
...
"vbug44199"
"vbug47985"
136 rows fetched from 1 column.

SQLGetInfo:
	In:	ConnectionHandle = 0x00135970, InfoType = SQL_DATABASE_NAME=16, InfoValuePtr = 0x004A4450, BufferLength = 600, StringLengthPtr = 0x0049B3F8, Information Value Type = SQL_C_WCHAR=-8
	Return:	SQL_ERROR=-1
	Out:	*InfoValuePtr = <unmodified>, *StringLengthPtr = <unmodified>
	dbc:	szSqlState = "HY000", *pfNativeError = 0, *pcbErrorMsg = 70, *ColumnNumber = -1, *RowNumber = -1
										MessageText = "[MySQL][ODBC 5.1 Driver]SQLGetInfo() failed to return current catalog."

SQLDisconnect:
	In:	ConnectionHandle = 0x00135970
	Return:	SQL_SUCCESS=0

So, as you can see, nothing weird happened...
[30 Oct 2009 15:30] Edwin van Putten
After stepping into the SQLDisconnect() function

Attachment: debugger.png (image/png, text), 69.52 KiB.

[30 Oct 2009 15:44] Edwin van Putten
Unfortunately i don't have the ODBCte utility. Could you upload it for me?
Seems to be part of some visual studio version I don't have.

Ran the test application today on another machine: see the (debugger) screenshot I uploaded. The HEAP still gets corrupted in SQLDisconnect(). 

Yet it doesn't crash this time until I compile it with 'Debug settings'. 
With 'Release settings', it crashes when I add an extra 'Connect'-again step:

So now, it's like this:
- connect to database
- execute sqlgetinfow / queries
- restart remote server
- execute same queries (you'll now get the 'catalog' error)
- disconnect (returns OK but still corrupts heap)
- connect..
*crash* 

Regards,

-- 
Edwin
[30 Oct 2009 18:20] Tonci Grgin
Edwin, http://msdn.microsoft.com/en-us/library/ms712676(VS.85).aspx.

Now, there is no way of repeating this problem with proper ODBC client... Still I will make stand-alone test case next week to recheck. So far, my suspicion is there's something wrong in your compiler. I see all sorts of ODBC and ADO problems on Embarcadero tools so that would be nothing new.

Can you attach ODBC trace of the crash?
[30 Oct 2009 18:55] Edwin van Putten
Connect after disconnect of broken link, now with ODBC test

Attachment: odbcte32_crash.png (image/png, text), 32.19 KiB.

[30 Oct 2009 19:05] Edwin van Putten
I only created the test program after discovering problems with a commercial SCADA program (Citect, which we blamed first ;-)).

Until I noticed that my test program crashed too, executing the same ODBC functions. And I can crash ODBC test too as you can see on the screen shot, so did I convince you enough now? ;-)

Working on the ODBC trace...
[30 Oct 2009 19:15] Edwin van Putten
ODBCTE32.EXE ODBC trace log, note the absense of the fd8-be4 EXIT SQLDriverConnectW section (due to crash)

Attachment: odbc.log (application/octet-stream, text), 31.28 KiB.

[30 Oct 2009 19:21] Tonci Grgin
Edwin, who knows what's happening on your boxes... but I'll try figuring it out.

However, be aware that if I'm unable to reproduce the crash I have nothing to work on (in sense of fixing). The crash you're seeing should be easily reproducible, and, considering number of people using ODBC, reported at least 50 times in BugsDB already.
Have you checked MD sum? Did you built driver yourself?
[30 Oct 2009 19:53] Edwin van Putten
Hey, how are things going in Croatia?

About your previous post: as you've seen I can repeat it with the MS ODBC SDK TEST tool you suggested (see screenshot). 
So there are now 3 applications which can trigger this crash:
- CitectSCADA 
- ODBCTE32
- My little test program

About boxes/hardware configurations etc:
To me it's not a question of different computers, special configurations or any other magic going on. I've seen these crashes on at least 5 different machines now.

The only reason why not more people noticed it is the "special" sequence of events. In other words: corner case, but it's still a bug ;-)
And my compiler has nothing to do with it. I don't even use their database tools, I'm calling the ODBC functions myself.

Now, the call to SQLGetInfoW plays a major role in 'preparing' the crash by corrupting some memory (freeing some memory and not NULLing it). So the fix to the connector driver should be easy, right?

So if you never call that SQLGetInfoW(...DATABASE_NAME...) function with a broken link then everything is just fine!

But IF you do, then you crash either immediately on SQLDisconnect or the next SQLConnect after that  
[depending on the environment (debugger active or not), but the crash is *guaranteed*]

Then I went through the trouble of downloading the MyODBC connector source code and found a potential problem in the code for you (in one of the functions called by SQLGetInfoW) - see first post(s). 

So even if you could not practically reproduce the problem (which I don't believe BTW) you could refer to the source code at those points and come to a certain conclusion?
[30 Oct 2009 20:03] Tonci Grgin
Edwin, things are just fine here :-)

To repeat my trace from around 15:00:
	Full Connect(Default)

	Env. Attr. SQL_ATTR_ODBC_VERSION set to SQL_OV_ODBC3

	Successfully connected to DSN '5-1-5-on-opensol'.
SQLExecDirect:	In:	hstmt = 0x00135B48, szSqlStr = "", cbSqlStr = -3	Return:	SQL_SUCCESS=0

Get Data All:
"Tables_in_test"
"PrepayMSISDNs"
"Repro"
...
"vbug44199"
"vbug47985"
136 rows fetched from 1 column.

SQLGetInfo:
	In:	ConnectionHandle = 0x00135970, InfoType = SQL_DATABASE_NAME=16, InfoValuePtr =
0x004A4450, BufferLength = 600, StringLengthPtr = 0x0049B3F8, Information Value Type =
SQL_C_WCHAR=-8
	Return:	SQL_ERROR=-1
	Out:	*InfoValuePtr = <unmodified>, *StringLengthPtr = <unmodified>
	dbc:	szSqlState = "HY000", *pfNativeError = 0, *pcbErrorMsg = 70, *ColumnNumber = -1,
*RowNumber = -1
										MessageText = "[MySQL][ODBC 5.1 Driver]SQLGetInfo() failed to return current
catalog."

SQLDisconnect:
	In:	ConnectionHandle = 0x00135970
	Return:	SQL_SUCCESS=0

So, as you can see, nothing weird happened and you better believe this is *not* repeatable on my box. Further more, I specifically avoided using new 5.1.6 driver and tested 5.1.5GA (will have to check if it contains any tweaks though).

However, I am not dismissing this report and in case I'm unable to repeat the crash I'll ask colleagues to try.
[30 Oct 2009 20:11] Edwin van Putten
That test case is almost okay, you just need to do a Connect after your test.
Then you'll see it crash.
[30 Oct 2009 20:21] Tonci Grgin
Edwin, sure I did after you said I should :-) I even toggled start/stop server while issuing final connect, no crash.

As I promised I'll assign colleague if I'm not able to reproduce, I'm passing this to Lawrentiy. Since it's already Friday night, I doubt he'll be able to check sooner than next week.
[30 Oct 2009 20:58] Edwin van Putten
Okay. Thanks. 

BTW: I am also using the very same 5.1.5GA version of the connector.
[2 Nov 2009 13:36] Edwin van Putten
Adj. version
[10 Jun 2010 15:03] Edwin van Putten
Just tried v5.1.6 of the ODBC connector, the problem is still there, as it still uses the same C-connector version as 5.1.5 :

my_bool reget_current_catalog(DBC FAR *dbc)
{
<b>    my_free(dbc->database,MYF(0)); </b>  <--- why??? 
    if ( odbc_stmt(dbc, "select database()") )
    {
        return 1;
    }

I don't understand why it performs the my_free before executing the db statement with odbc_stmt.
When closing the connection, the host program crashes!
Because it tries to free the memory  dcb->database yet another time...
[10 Jun 2010 15:37] Lawrenty Novitsky
Edwin,
I think your problem is cared by the patch for Bug#46910.
As soon as it gets reviewed, it will be pushed to the trunk. it will go to the next release. Actually it's good that you reminded us of this bug. I was sure that patch had already been pushed.

I will mark your report as a duplicate. You can revert that if you disagree.
[10 Jun 2010 17:29] Edwin van Putten
See also #11588 (dated 2005!)

Anyway, the reason I opened this case again was a patch from your side; a diff file called "46910.diff" :

... which has something in the lines of:

=== modified file 'driver/utility.c'
--- driver/utility.c	2009-08-12 04:56:20 +0000
+++ driver/utility.c	2009-10-12 15:27:46 +0000
@@ -2053,6 +2053,7 @@
     my_free(dbc->database,MYF(0));
     if ( odbc_stmt(dbc, "select database()") )
     {
+        dbc->database= NULL;
         return 1;
     }

Doesn't solve the problem I guess. 
I'd expect the =NULL statement right after the "my_free" statement...  
Now the pointer only gets nulled after the select statement works out, but it will never will in our cases where the server is 'away'...

Another one who noticed this has ticket #11588. While we were talking about duplicates ;-)

-- 
Edwin
[10 Jun 2010 18:23] Edwin van Putten
Suggested fix: (Emmanuel KARTMANN)
Do not free memory twice. A simple fix would be to NULLify the pointer after free and
checking if it's not NULL before released again:

  my_bool reget_current_catalog(DBC FAR *dbc)
  {
      if (dbc->database)
      {
        my_free((gptr) dbc->database,MYF(0));
        dbc->database = NULL;
      }
      ...
[10 Jun 2010 19:39] Lawrenty Novitsky
Edwin,

Looks like you agree that bugs have the same reason. looks like you question the correctness of the patch. So why not to keep discussion in one place. And why shouldn't that place be where the patch is posted.

Thus i "re-duplicating" the report. And thank you for pointing to another one. same story - "can't repeat" and it gets out of our view. we have too many of those that we could repeat.

As for the correctness of the patch I'll reply on the Bug#46910 page - in that way reviewer will have more chances to see your and my arguments ;)