Bug #33384 Crash in net_clear() when calling mysql_stmt_execute()
Submitted: 19 Dec 2007 22:13 Modified: 16 Nov 2009 19:37
Reporter: Olivier Langlois (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S3 (Non-critical)
Version:5.0.27 and 5.1.22-rc, 5.1.42 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[19 Dec 2007 22:13] Olivier Langlois
Description:
I get a core dump:

#0  net_clear (net=0xb7550f44, clear_buffer=1 '\001') at net.c:223
#1  0x080a4bb6 in cli_stmt_execute (stmt=0x9335150) at libmysql.c:2553
#2  0x080a75a5 in mysql_stmt_execute (stmt=0x9335150) at libmysql.c:2900

(gdb) p *stmt
$2 = {mem_root = {free = 0x9335410, used = 0x0, pre_alloc = 0x9335410, min_malloc = 32, block_size = 2020, block_num = 4, first_block_usage = 0,
    error_handler = 0}, list = {prev = 0x931ed68, next = 0x0, data = 0x9335150}, mysql = 0xb7550f44, params = 0x9335420, bind = 0x93356a0, fields = 0x0,
  result = {data = 0x0, embedded_info = 0x0, alloc = {free = 0x9337c60, used = 0x0, pre_alloc = 0x9337c60, min_malloc = 12, block_size = 4068,
      block_num = 4, first_block_usage = 0, error_handler = 0}, rows = 0, fields = 0, extension = 0x0}, data_cursor = 0x0,
  read_row_func = 0x80a5047 <stmt_read_row_no_result_set>, affected_rows = 1, insert_id = 0, stmt_id = 1, flags = 0, prefetch_rows = 1,
  server_status = 2, last_errno = 0, param_count = 10, field_count = 0, state = MYSQL_STMT_PREPARE_DONE, last_error = '\0' <repeats 511 times>,
  sqlstate = "00000", send_types_to_server = 1 '\001', bind_param_done = 1 '\001', bind_result_done = 0 '\0', unbuffered_fetch_cancelled = 0 '\0',
  update_max_length = 0 '\0', extension = 0x0}

(gdb) p *stmt->mysql
$3 = {net = {vio = 0x0, buff = 0x0, buff_end = 0x9327140 "", write_pos = 0x9325140 " \r2\txX´", read_pos = 0x9325144 "xX´", fd = 11, remain_in_buf = 0,
    length = 0, buf_length = 52, where_b = 0, max_packet = 8192, max_packet_size = 1073741824, pkt_nr = 0, compress_pkt_nr = 0,
    write_timeout = 31536000, read_timeout = 31536000, retry_count = 1, fcntl = 0, return_status = 0x0, reading_or_writing = 0 '\0', save_char = 0 '\0',
    no_send_ok = 0 '\0', no_send_eof = 0 '\0', compress = 1 '\001', no_send_error = 0 '\0', query_cache_query = 0x0, last_errno = 0, error = 2 '\002',
    report_error = 0 '\0', return_errno = 0 '\0', last_error = "\000ySQL server has gone away", '\0' <repeats 485 times>, sqlstate = "00000",
    extension = 0x0}, connector_fd = 0x0, host = 0x931e298 "Not relevent", user = 0x9324d18 "Not relevent", passwd = 0x931e2f0 "Not relevent",
  unix_socket = 0x0, server_version = 0x931e2b0 "5.0.27-standard-log", host_info = 0x931e280 "xx.xx.xx.xx via TCP/IP", info = 0x0,
  db = 0x931e300 "Not relevent", charset = 0x815f380, fields = 0x0, field_alloc = {free = 0x0, used = 0x0, pre_alloc = 0x0, min_malloc = 32,
    block_size = 8164, block_num = 4, first_block_usage = 0, error_handler = 0}, affected_rows = 18446744073709551615, insert_id = 0, extra_info = 0,
  thread_id = 134407515, packet_length = 0, port = 3306, client_flag = 41645, server_capabilities = 41516, protocol_version = 10, field_count = 0,
  server_status = 2, server_language = 8, warning_count = 0, options = {connect_timeout = 10, read_timeout = 0, write_timeout = 0, port = 0,
    protocol = 0, client_flag = 160, host = 0x0, user = 0x0, password = 0x0, unix_socket = 0x0, db = 0x0, init_commands = 0x0, my_cnf_file = 0x0,
    my_cnf_group = 0x0, charset_dir = 0x0, charset_name = 0x9320928 "latin1", ssl_key = 0x0, ssl_cert = 0x0, ssl_ca = 0x0, ssl_capath = 0x0,
    ssl_cipher = 0x0, shared_memory_base_name = 0x0, max_allowed_packet = 0, use_ssl = 0 '\0', compress = 1 '\001', named_pipe = 0 '\0',
    rpl_probe = 0 '\0', rpl_parse = 0 '\0', no_master_reads = 0 '\0', separate_thread = 0 '\0', methods_to_use = MYSQL_OPT_GUESS_CONNECTION,
    client_ip = 0x0, secure_auth = 0 '\0', report_data_truncation = 1 '\001', local_infile_init = 0, local_infile_read = 0, local_infile_end = 0,
    local_infile_error = 0, local_infile_userdata = 0x0, extension = 0x0}, status = MYSQL_STATUS_READY, free_me = 0 '\0', reconnect = 0 '\0',
  scramble = "+|Q^hlgBPMog:+fD_0qj", rpl_pivot = 1 '\001', master = 0xb7550f44, next_slave = 0xb7550f44, last_used_slave = 0x0,
  last_used_con = 0xb7550f44, stmts = 0x931ed68, methods = 0x8271320, thd = 0x0, unbuffered_fetch_owner = 0x0, info_buffer = 0x0, extension = 0x0}

This happens if:

I prepare statement #1 successfully
Preparing statement #2 fails with:

Error 2006 (HY000): MySQL server has gone away

In that case the function end_server() is called and net->vio and net->buff are freed and set to NULL.

and I then call mysql_stmt_execute() on statement #1.

This is a bug in my client code because I should close all open prepared statement handles as soon as 
mysql_stmt_prepare() fails but I think that libmysql should be more resilient in net_clear() and/or mysql_stmt_execute() by validating that the vio pointer is valid before using it.

Greetings,
Olivier Langlois
http://www.olivierlanglois.net
http://blog.olivierlanglois.net

How to repeat:
This happens if:

I prepare statement #1 successfully
Preparing statement #2 fails with:

Error 2006 (HY000): MySQL server has gone away

In that case the function end_server() is called and net->vio and net->buff are freed and set to NULL.

and I then call mysql_stmt_execute() on statement #1.

This is a bug in my client code because I should close all open prepared statement handles as soon as 
mysql_stmt_prepare() fails but I think that libmysql should be more resilient in net_clear() and/or mysql_stmt_execute() by validating that the vio pointer is valid before using it.

Greetings,
Olivier Langlois
http://www.olivierlanglois.net
http://blog.olivierlanglois.net

Suggested fix:
Validate vio pointer before using it in net_clear() and/or mysql_stmt_execute().
[3 Jan 2008 16:14] Susanne Ebrecht
Many thanks for writing a bug report.

Please add a test case for your issue.

MySQL 5.0.27 is very old, please try newest version (5.0.51) here.
[3 Jan 2008 16:20] Olivier Langlois
I know that 5.0.27 is old but the problem is also present in 5.1.22-rc and I would guess that is also present in the latest 5.0.x version. I have included a test case in the 'how to repeat' section of the original bug submission but here it is again:

This happens if:

I prepare statement #1 successfully
Preparing statement #2 fails with:
Error 2006 (HY000): MySQL server has gone away

and I then call mysql_stmt_execute() on statement #1.
[4 Jan 2008 18:31] Susanne Ebrecht
I can't repeat this behaviour. Neither with CLI nor with C API.

I tested on version 5.1.22-rc and 5.0.51. The result is the same for both.

Look here:
mysql> create table t(id serial, num integer, t text, primary key(id));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t(num, t) values(1,'a');
Query OK, 1 row affected (0.01 sec)

mysql> prepare mystmt from 'insert into t(num,t) values(?,?)';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

mysql> prepare stmt from 'select * from t where num=?';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

mysql> set @nr=2;
Query OK, 0 rows affected (0.00 sec)

mysql> set @txt='abc';
Query OK, 0 rows affected (0.00 sec)

mysql> set @n=2;
Query OK, 0 rows affected (0.00 sec)

mysql> execute mystmt using @nr, @txt;
Query OK, 1 row affected (0.00 sec)

mysql> execute stmt using @n;
+----+------+------+
| id | num  | t    |
+----+------+------+
|  2 |    2 | abc  | 
+----+------+------+
1 row in set (0.01 sec)

I dropped the table and tried to use C API. I used the attached C programm.

After executing the attached C programm:
mysql> select * from t;
+----+------+------+
| id | num  | t    |
+----+------+------+
|  1 |    1 | abc  | 
|  2 |    2 | def  | 
|  3 |    2 | NULL | 
+----+------+------+
3 rows in set (0.00 sec)
[4 Jan 2008 18:34] Susanne Ebrecht
test file for testing C API

Attachment: bug33384.c (text/x-csrc), 5.87 KiB.

[4 Jan 2008 19:17] Olivier Langlois
Hi Susanne,

Here is what is missing in your test program:

  if (mysql_stmt_prepare(stmt2, ins2, strlen(ins2))){
    fprintf(stderr, "Prepare 2nd statement: mysql_stmt_prepare(), INSERT failed\n");
    fprintf(stderr, "%s\n", mysql_stmt_error(stmt2));
    exit(1);
  }

mysql_stmt_prepare for stmt2 must return:

Error 2006 (HY000): MySQL server has gone away

How you can simulate that behavior? I do not know, you are the MySQL expert but when this is going to happen, if you continue the processing for statement #1,

libMySQL will core dump when

  /* Execute the INSERT statement */
  if (mysql_stmt_execute(stmt)){

is called.

As I said, the correct way to use the lib would be to free all sql handles as soon as mysql_stmt_prepare returns an error as you do in your test program but my point in this bug report was that the library should be more resilient by not core dumping when a user tries to use a handle that is not valid anymore.

Greetings,
[7 Jan 2008 13:48] Susanne Ebrecht
Olivier,

unfortunately, I can't follow you. There is no server gone away, when I executed my test programm. So why should the programm return, the server is gone away, when it still works?
[7 Jan 2008 15:08] Olivier Langlois
Susanne,

You cannot make the server go away from the client code. This happens by an external cause. ie:

- A broken TCP connection
- The server drops connections
- The server is resetted.

In our production environment, this is a very rare occurence. We have servers connected 24 hours a day 7 days a week to a DB server and the 'server gone away' error is happenning aproximately once every 3 months on 1 client out of 50.

Since this is an extremely rare condition that I could not figure out how to recreate, I went to inspect the library source code to find the bug by visual inspection.

In the original post, I provide detailed description of what is going on, you can double check the source code visual inspection with my indications to have an undertstanding of the problem or if you insist on reproducing the problem, you must find a way to generate the 'server gone away' error at the right timing and I would assume that MySQL developpers know how to that.

Greetings,
Olivier Langlois
[11 Jan 2008 14:50] Hartmut Holzgraefe
Hello Olivier, 

i tried to check the library code but found that the line
numbers in the stack trace do not match the mysql version
numbers you reported.

i assume the crash is happening on the 

    while((ready= net_data_is_ready(net->vio->sd)) > 0)

line in net_clear(), but in any mysql source version 
(5.0.27, 5.1.22, 5.0 and 5.1 bitkeeper trees) the
net_clear() function only starts around line 260 or
later so the net.c:223 line information in the backtrace
doesn't really make sense to me ... so i'd like to ask
which mysql client library source version your application
that created this core/backtrace was actually compiled against
[11 Jan 2008 16:05] Hartmut Holzgraefe
The code path in cli_stmt_prepare() that calls net_clear() is itself only called if stmt->param_count is greater than zero. After mysql_stmt_init() the param_count is initialized to 0 and a failed mysql_stmt_prepare() does not change this.

So i assume that you are reusing the MYSQL_STMT structures returned by mysql_stmt_init() (which is perfectly ok) and in some code path the param_count is not properly reset which then later makes cli_stmt_prepare() try to call net_clear() in an invalid context ...

Can you confirm that this is the case?
[11 Jan 2008 18:17] Olivier Langlois
Hi Hartmut,

i assume the crash is happening on the

    while((ready= net_data_is_ready(net->vio->sd)) > 0)

That is my understanding too.

What is weird is that I have noticed the line number mismatch too in the core dump stack trace. I have not compiled myself the library so I do not undertstand this but I am calling mysql_get_client_info() and it returns 5.1.22-rc.

However even with the line mismatch, the core dump can be understood from 5.1.22-rc source code.
[11 Jan 2008 18:34] Olivier Langlois
The code path in cli_stmt_prepare() that calls net_clear() is itself only called if stmt->param_count is greater than zero. After
mysql_stmt_init() the param_count is initialized to 0 and a failed
mysql_stmt_prepare() does not change this.

So i assume that you are reusing the MYSQL_STMT structures returned by
mysql_stmt_init() (which is perfectly ok) and in some code path the param_count is not properly reset which then later makes
cli_stmt_prepare() try to call net_clear() in an invalid context ...

Can you confirm that this is the case?

Hi Hartmut,

That is not the problem. I'll try to rephrase the description of the bug differently to try to make you see the problem:

I have 2 prepared statement handles: stmt1 and stmt2.

I call mysql_stmt_init and mysql_stmt_prepare successfully on stmt1.

I call mysql_stmt_init successfully on stmt2. At that point both statement handle share a pointer on the same MYSQL structure. If a function call on one of these statement handle modifies their MYSQL structure, the other statement handle will be affected, right?

Now, I call mysql_stmt_prepare for stmt2 but because of a network error, it fails. In the implementation, the function end_server() is called and net->vio and net->buff are freed and set to NULL. Those pointers will be NULL when they are accessed from stmt1 and stmt2, right?

You are going to have core dump after that if you call mysql_stmt_execute() with stmt1 because its vio pointer accessed at:

    while((ready= net_data_is_ready(net->vio->sd)) > 0)

will be NULL and not validation will have been performed before using it.

I hope things are clearer now. Again, I will repeat myself the core dump is avoidable by modifying the client code by making sure that all statement handles are closed at the first library function call failing but IMO,

the library should perform some more validation on these pointers to not core dump.
[11 Jan 2008 22:27] Hartmut Holzgraefe
self contained test project

Attachment: bug33384-0.1.tar.gz (application/x-gzip, text), 295.31 KiB.

[11 Jan 2008 22:30] Hartmut Holzgraefe
Verified with 5.1.22 and the attached test project.

How to reproduce:

- ulimit -c unlimited
- start server
- ./configure --with-mysql=/path/to/mysql/installation
- make test
- when you see the "shut down server, then press [ENTER] to continue"
  message do what it asks for
- see the program crash, the produced core dumps
  backtrace will look like this:

(gdb) bt
#0  0x400b49ab in net_clear (net=0xbfcf1854, clear_buffer=1 '\001') at net.c:294
#1  0x400661a7 in cli_stmt_execute (stmt=0x805a250) at libmysql.c:2553
#2  0x40066adb in mysql_stmt_execute (stmt=0x805a250) at libmysql.c:2900
#3  0x0804899e in main (argc=1, argv=0xbfcf1d44) at bug33384.c:84
[11 Jan 2008 22:32] Hartmut Holzgraefe
The key to this is that the first statement does have "?" parameters while the 2nd doesn't ... without parameters on the first statements the code path that calls net_clear() is never executed
[27 Jul 2008 19:19] Sveta Smirnova
Bug #38396 was marked as duplicate of this one.
[22 Jan 2009 19:58] Jeremy Cole
Isn't this a duplicate of 31244?  This one has more details, so maybe 31244 should be marked as a duplicate of this one.
[29 Jan 2009 2:56] Jeremy Cole
Suggested patch for Bug 33384

Attachment: net_vio_bug33384.patch (application/octet-stream, text), 1.05 KiB.

[29 Jan 2009 2:57] Jeremy Cole
Does the supplied patch solve this problem in a reasonable way?  This simple bug has been open far too long.
[17 Mar 2009 6:39] Trent Apted
Thanks for the patch.

I encountered this in a fairly critical app for us. It basically does the following:

 - Connect to server at 4am and prepare SQL statements
 - At around midday, try and use each statement in turn (and all in the same thread)

The first one returns the "server has gone away" error (because these things time out after 7 hours I think). The second one causes the segmentation fault.

I applied the patch and it seemed to fix the problem.

I've since changed the code to also bail out completely on the first error, and reconnect, in case this ends up running against an unpatched mysql client library.
[8 Jun 2009 22:54] liz drachnik
Hello Olivier - 

In order for us to continue the process of reviewing your contribution to MySQL - We need you to review and sign the Sun|MySQL contributor agreement (the "SCA")

The process is explained here: 
http://forge.mysql.com/wiki/Sun_Contributor_Agreement

Getting a signed/approved SCA on file will help us facilitate your contribution-- this one, and others in the future.

Thank you ! 

Liz Drachnik  - Program Manager - MySQL
[15 Jun 2009 15:24] Olivier Langlois
Hello Liz,

You should address your comment to Jeremy Cole. He is the author of the suggested patch. I just analyzed/documented the problem.
[16 Jun 2009 0:14] Davi Arnaut
This could be a duplicate of Bug#43560. Bug verification team, please re-verify.
[16 Jun 2009 1:41] Liz Drachnik
Hi all - If its determined that this is not a duplicate bug and that you want to proceed with the submitted patch - the contributor of the patch (Jeremy Cole)  needs to  review and sign the Sun|MySQL contributor agreement (the "SCA")

The process is explained here: 
http://forge.mysql.com/wiki/Sun_Contributor_Agreement

Thanks in advance
[16 Nov 2009 19:37] Sveta Smirnova
Still repeatable on Linux with today bzr sources, so not duplicate.
[16 Nov 2009 19:38] Sveta Smirnova
Jeremy,

please sign SCA if you didn't already. Thanks in advance.
[20 Aug 2014 7:23] Alexander Petrossian
Bug is still here in
MySQL-shared-compat-5.1.34-0.rhel5

(gdb) bt
#0  0xf57c5018 in net_clear () from /usr/lib/libmysqlclient_r.so
#1  0xf5793944 in cli_stmt_execute () from /usr/lib/libmysqlclient_r.so
#2  0xf5794712 in mysql_stmt_execute () from /usr/lib/libmysqlclient_r.so

(gdb) disass
Dump of assembler code for function net_clear:
0xf57c4ff0 <net_clear+0>:	push   %ebp
0xf57c4ff1 <net_clear+1>:	mov    %esp,%ebp
0xf57c4ff3 <net_clear+3>:	push   %edi
0xf57c4ff4 <net_clear+4>:	push   %esi
0xf57c4ff5 <net_clear+5>:	push   %ebx
0xf57c4ff6 <net_clear+6>:	sub    $0x1c,%esp
0xf57c4ff9 <net_clear+9>:	mov    0x8(%ebp),%esi
0xf57c4ffc <net_clear+12>:	call   0xf578dec7 <geteuid+251>
0xf57c5001 <net_clear+17>:	add    $0x15154b,%ebx
0xf57c5007 <net_clear+23>:	lea    -0x14(%ebp),%edi
0xf57c500a <net_clear+26>:	cmpb   $0x0,0xc(%ebp)
0xf57c500e <net_clear+30>:	je     0xf57c5060 <net_clear+112>
0xf57c5010 <net_clear+32>:	mov    (%esi),%eax
0xf57c5012 <net_clear+34>:	movw   $0x3,-0x10(%ebp)
0xf57c5018 <net_clear+40>:	mov    (%eax),%eax

(gdb) f
#0  0xf57c5018 in net_clear () from /usr/lib/libmysqlclient_r.so
(gdb) p $eax