Bug #33384 | Crash in net_clear() when calling mysql_stmt_execute() | ||
---|---|---|---|
Submitted: | 19 Dec 2007 22:13 | Modified: | 16 Nov 2009 19:37 |
Reporter: | Olivier Langlois (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: C API (client library) | Severity: | S3 (Non-critical) |
Version: | 5.0.27 and 5.1.22-rc, 5.1.42 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[19 Dec 2007 22:13]
Olivier Langlois
[3 Jan 2008 16:14]
Susanne Ebrecht
Many thanks for writing a bug report. Please add a test case for your issue. MySQL 5.0.27 is very old, please try newest version (5.0.51) here.
[3 Jan 2008 16:20]
Olivier Langlois
I know that 5.0.27 is old but the problem is also present in 5.1.22-rc and I would guess that is also present in the latest 5.0.x version. I have included a test case in the 'how to repeat' section of the original bug submission but here it is again: This happens if: I prepare statement #1 successfully Preparing statement #2 fails with: Error 2006 (HY000): MySQL server has gone away and I then call mysql_stmt_execute() on statement #1.
[4 Jan 2008 18:31]
Susanne Ebrecht
I can't repeat this behaviour. Neither with CLI nor with C API. I tested on version 5.1.22-rc and 5.0.51. The result is the same for both. Look here: mysql> create table t(id serial, num integer, t text, primary key(id)); Query OK, 0 rows affected (0.01 sec) mysql> insert into t(num, t) values(1,'a'); Query OK, 1 row affected (0.01 sec) mysql> prepare mystmt from 'insert into t(num,t) values(?,?)'; Query OK, 0 rows affected (0.00 sec) Statement prepared mysql> prepare stmt from 'select * from t where num=?'; Query OK, 0 rows affected (0.00 sec) Statement prepared mysql> set @nr=2; Query OK, 0 rows affected (0.00 sec) mysql> set @txt='abc'; Query OK, 0 rows affected (0.00 sec) mysql> set @n=2; Query OK, 0 rows affected (0.00 sec) mysql> execute mystmt using @nr, @txt; Query OK, 1 row affected (0.00 sec) mysql> execute stmt using @n; +----+------+------+ | id | num | t | +----+------+------+ | 2 | 2 | abc | +----+------+------+ 1 row in set (0.01 sec) I dropped the table and tried to use C API. I used the attached C programm. After executing the attached C programm: mysql> select * from t; +----+------+------+ | id | num | t | +----+------+------+ | 1 | 1 | abc | | 2 | 2 | def | | 3 | 2 | NULL | +----+------+------+ 3 rows in set (0.00 sec)
[4 Jan 2008 18:34]
Susanne Ebrecht
test file for testing C API
Attachment: bug33384.c (text/x-csrc), 5.87 KiB.
[4 Jan 2008 19:17]
Olivier Langlois
Hi Susanne, Here is what is missing in your test program: if (mysql_stmt_prepare(stmt2, ins2, strlen(ins2))){ fprintf(stderr, "Prepare 2nd statement: mysql_stmt_prepare(), INSERT failed\n"); fprintf(stderr, "%s\n", mysql_stmt_error(stmt2)); exit(1); } mysql_stmt_prepare for stmt2 must return: Error 2006 (HY000): MySQL server has gone away How you can simulate that behavior? I do not know, you are the MySQL expert but when this is going to happen, if you continue the processing for statement #1, libMySQL will core dump when /* Execute the INSERT statement */ if (mysql_stmt_execute(stmt)){ is called. As I said, the correct way to use the lib would be to free all sql handles as soon as mysql_stmt_prepare returns an error as you do in your test program but my point in this bug report was that the library should be more resilient by not core dumping when a user tries to use a handle that is not valid anymore. Greetings,
[7 Jan 2008 13:48]
Susanne Ebrecht
Olivier, unfortunately, I can't follow you. There is no server gone away, when I executed my test programm. So why should the programm return, the server is gone away, when it still works?
[7 Jan 2008 15:08]
Olivier Langlois
Susanne, You cannot make the server go away from the client code. This happens by an external cause. ie: - A broken TCP connection - The server drops connections - The server is resetted. In our production environment, this is a very rare occurence. We have servers connected 24 hours a day 7 days a week to a DB server and the 'server gone away' error is happenning aproximately once every 3 months on 1 client out of 50. Since this is an extremely rare condition that I could not figure out how to recreate, I went to inspect the library source code to find the bug by visual inspection. In the original post, I provide detailed description of what is going on, you can double check the source code visual inspection with my indications to have an undertstanding of the problem or if you insist on reproducing the problem, you must find a way to generate the 'server gone away' error at the right timing and I would assume that MySQL developpers know how to that. Greetings, Olivier Langlois
[11 Jan 2008 14:50]
Hartmut Holzgraefe
Hello Olivier, i tried to check the library code but found that the line numbers in the stack trace do not match the mysql version numbers you reported. i assume the crash is happening on the while((ready= net_data_is_ready(net->vio->sd)) > 0) line in net_clear(), but in any mysql source version (5.0.27, 5.1.22, 5.0 and 5.1 bitkeeper trees) the net_clear() function only starts around line 260 or later so the net.c:223 line information in the backtrace doesn't really make sense to me ... so i'd like to ask which mysql client library source version your application that created this core/backtrace was actually compiled against
[11 Jan 2008 16:05]
Hartmut Holzgraefe
The code path in cli_stmt_prepare() that calls net_clear() is itself only called if stmt->param_count is greater than zero. After mysql_stmt_init() the param_count is initialized to 0 and a failed mysql_stmt_prepare() does not change this. So i assume that you are reusing the MYSQL_STMT structures returned by mysql_stmt_init() (which is perfectly ok) and in some code path the param_count is not properly reset which then later makes cli_stmt_prepare() try to call net_clear() in an invalid context ... Can you confirm that this is the case?
[11 Jan 2008 18:17]
Olivier Langlois
Hi Hartmut, i assume the crash is happening on the while((ready= net_data_is_ready(net->vio->sd)) > 0) That is my understanding too. What is weird is that I have noticed the line number mismatch too in the core dump stack trace. I have not compiled myself the library so I do not undertstand this but I am calling mysql_get_client_info() and it returns 5.1.22-rc. However even with the line mismatch, the core dump can be understood from 5.1.22-rc source code.
[11 Jan 2008 18:34]
Olivier Langlois
The code path in cli_stmt_prepare() that calls net_clear() is itself only called if stmt->param_count is greater than zero. After mysql_stmt_init() the param_count is initialized to 0 and a failed mysql_stmt_prepare() does not change this. So i assume that you are reusing the MYSQL_STMT structures returned by mysql_stmt_init() (which is perfectly ok) and in some code path the param_count is not properly reset which then later makes cli_stmt_prepare() try to call net_clear() in an invalid context ... Can you confirm that this is the case? Hi Hartmut, That is not the problem. I'll try to rephrase the description of the bug differently to try to make you see the problem: I have 2 prepared statement handles: stmt1 and stmt2. I call mysql_stmt_init and mysql_stmt_prepare successfully on stmt1. I call mysql_stmt_init successfully on stmt2. At that point both statement handle share a pointer on the same MYSQL structure. If a function call on one of these statement handle modifies their MYSQL structure, the other statement handle will be affected, right? Now, I call mysql_stmt_prepare for stmt2 but because of a network error, it fails. In the implementation, the function end_server() is called and net->vio and net->buff are freed and set to NULL. Those pointers will be NULL when they are accessed from stmt1 and stmt2, right? You are going to have core dump after that if you call mysql_stmt_execute() with stmt1 because its vio pointer accessed at: while((ready= net_data_is_ready(net->vio->sd)) > 0) will be NULL and not validation will have been performed before using it. I hope things are clearer now. Again, I will repeat myself the core dump is avoidable by modifying the client code by making sure that all statement handles are closed at the first library function call failing but IMO, the library should perform some more validation on these pointers to not core dump.
[11 Jan 2008 22:27]
Hartmut Holzgraefe
self contained test project
Attachment: bug33384-0.1.tar.gz (application/x-gzip, text), 295.31 KiB.
[11 Jan 2008 22:30]
Hartmut Holzgraefe
Verified with 5.1.22 and the attached test project. How to reproduce: - ulimit -c unlimited - start server - ./configure --with-mysql=/path/to/mysql/installation - make test - when you see the "shut down server, then press [ENTER] to continue" message do what it asks for - see the program crash, the produced core dumps backtrace will look like this: (gdb) bt #0 0x400b49ab in net_clear (net=0xbfcf1854, clear_buffer=1 '\001') at net.c:294 #1 0x400661a7 in cli_stmt_execute (stmt=0x805a250) at libmysql.c:2553 #2 0x40066adb in mysql_stmt_execute (stmt=0x805a250) at libmysql.c:2900 #3 0x0804899e in main (argc=1, argv=0xbfcf1d44) at bug33384.c:84
[11 Jan 2008 22:32]
Hartmut Holzgraefe
The key to this is that the first statement does have "?" parameters while the 2nd doesn't ... without parameters on the first statements the code path that calls net_clear() is never executed
[27 Jul 2008 19:19]
Sveta Smirnova
Bug #38396 was marked as duplicate of this one.
[22 Jan 2009 19:58]
Jeremy Cole
Isn't this a duplicate of 31244? This one has more details, so maybe 31244 should be marked as a duplicate of this one.
[29 Jan 2009 2:56]
Jeremy Cole
Suggested patch for Bug 33384
Attachment: net_vio_bug33384.patch (application/octet-stream, text), 1.05 KiB.
[29 Jan 2009 2:57]
Jeremy Cole
Does the supplied patch solve this problem in a reasonable way? This simple bug has been open far too long.
[17 Mar 2009 6:39]
Trent Apted
Thanks for the patch. I encountered this in a fairly critical app for us. It basically does the following: - Connect to server at 4am and prepare SQL statements - At around midday, try and use each statement in turn (and all in the same thread) The first one returns the "server has gone away" error (because these things time out after 7 hours I think). The second one causes the segmentation fault. I applied the patch and it seemed to fix the problem. I've since changed the code to also bail out completely on the first error, and reconnect, in case this ends up running against an unpatched mysql client library.
[8 Jun 2009 22:54]
liz drachnik
Hello Olivier - In order for us to continue the process of reviewing your contribution to MySQL - We need you to review and sign the Sun|MySQL contributor agreement (the "SCA") The process is explained here: http://forge.mysql.com/wiki/Sun_Contributor_Agreement Getting a signed/approved SCA on file will help us facilitate your contribution-- this one, and others in the future. Thank you ! Liz Drachnik - Program Manager - MySQL
[15 Jun 2009 15:24]
Olivier Langlois
Hello Liz, You should address your comment to Jeremy Cole. He is the author of the suggested patch. I just analyzed/documented the problem.
[16 Jun 2009 0:14]
Davi Arnaut
This could be a duplicate of Bug#43560. Bug verification team, please re-verify.
[16 Jun 2009 1:41]
Liz Drachnik
Hi all - If its determined that this is not a duplicate bug and that you want to proceed with the submitted patch - the contributor of the patch (Jeremy Cole) needs to review and sign the Sun|MySQL contributor agreement (the "SCA") The process is explained here: http://forge.mysql.com/wiki/Sun_Contributor_Agreement Thanks in advance
[16 Nov 2009 19:37]
Sveta Smirnova
Still repeatable on Linux with today bzr sources, so not duplicate.
[16 Nov 2009 19:38]
Sveta Smirnova
Jeremy, please sign SCA if you didn't already. Thanks in advance.
[20 Aug 2014 7:23]
Alexander Petrossian
Bug is still here in MySQL-shared-compat-5.1.34-0.rhel5 (gdb) bt #0 0xf57c5018 in net_clear () from /usr/lib/libmysqlclient_r.so #1 0xf5793944 in cli_stmt_execute () from /usr/lib/libmysqlclient_r.so #2 0xf5794712 in mysql_stmt_execute () from /usr/lib/libmysqlclient_r.so (gdb) disass Dump of assembler code for function net_clear: 0xf57c4ff0 <net_clear+0>: push %ebp 0xf57c4ff1 <net_clear+1>: mov %esp,%ebp 0xf57c4ff3 <net_clear+3>: push %edi 0xf57c4ff4 <net_clear+4>: push %esi 0xf57c4ff5 <net_clear+5>: push %ebx 0xf57c4ff6 <net_clear+6>: sub $0x1c,%esp 0xf57c4ff9 <net_clear+9>: mov 0x8(%ebp),%esi 0xf57c4ffc <net_clear+12>: call 0xf578dec7 <geteuid+251> 0xf57c5001 <net_clear+17>: add $0x15154b,%ebx 0xf57c5007 <net_clear+23>: lea -0x14(%ebp),%edi 0xf57c500a <net_clear+26>: cmpb $0x0,0xc(%ebp) 0xf57c500e <net_clear+30>: je 0xf57c5060 <net_clear+112> 0xf57c5010 <net_clear+32>: mov (%esi),%eax 0xf57c5012 <net_clear+34>: movw $0x3,-0x10(%ebp) 0xf57c5018 <net_clear+40>: mov (%eax),%eax (gdb) f #0 0xf57c5018 in net_clear () from /usr/lib/libmysqlclient_r.so (gdb) p $eax