Bug #17107 SIGSEGV on DELETE with left join in prepared statement
Submitted: 3 Feb 2006 21:58 Modified: 11 Sep 2006 17:01
Reporter: Joel Dice Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.22 OS:Linux (Debian Etch (testing))
Assigned to: CPU Architecture:Any

[3 Feb 2006 21:58] Joel Dice
Description:
We've been seeing random mysqld crashes when executing a DELETE of the following form:

  delete from t1 using t1 left join t2 on t1.id = t2.ref where t1.id = ? and t2.ref is null;

The expected behavior is that it will delete any rows in t1 such that no rows exist in t2 where t1.id = t2.ref and t1.id = ?.  Usually this works, but sooner or later it causes mysqld to crash with a SIGSEGV (see attached stack trace and valgrind log) when multiple simultaneous clients are involved.

We're running a custom Java application using the latest stable Connector/J driver.  We're using our own connection pooling system, and we're using prepared statements exclusively.  We've seen this behavior using both the current Debian Etch package (5.0.16 + patches), and a stock distribution (5.0.18) built from source.  I can provide additional detail if needed (configure flags, etc.).

How to repeat:
Unfortunately, I've been unable to isolate this as a simple test case, as it seems to be a subtle race condition.  I'll continue to try, but I wanted to enter this as a bug before I went any further.

Suggested fix:
None yet.
[3 Feb 2006 22:01] Joel Dice
gdb session with backtrace

Attachment: gdb-trace.txt (text/plain), 4.42 KiB.

[3 Feb 2006 22:05] Joel Dice
valgrind log

Attachment: valgrind-log.txt (text/plain), 23.48 KiB.

[5 Feb 2006 11:21] Valeriy Kravchuk
Thank you for a problem report. Are that tables created in 5.0.x or you created them in any version before 5.0.x and just upgraded? SHOW CREATE TABLE and SHOW TABLE STATUS results for that tables may be also useful.
[6 Feb 2006 14:10] Joel Dice
Thanks for your response, Valeriy.  In both cases (Debian pkg 5.0.16 and stock 5.0.18), the tables were created from scratch on a fresh installation of 5.0.x (i.e. no upgrade involved).  I will attach the SHOW CREATE TABLE and SHOW TABLE STATUS information as you requested.  When examining that information, please note that the actual query which crashes mysqld is this:

  delete from server_messages using server_messages left join messages_and_servers on server_messages.id = messages_and_servers.message where server_messages.id = ? and messages_and_servers.message is null;
[6 Feb 2006 14:11] Joel Dice
show create tables and show table status info

Attachment: tables.txt (text/plain), 19.76 KiB.

[6 Feb 2006 15:06] Joel Dice
FYI, I've been testing the following query this morning and have not seen any crashes:

  delete from server_messages where id = ? and not exists (select message from messages_and_servers where message = server_messages.id);

This does the same thing as the other query, but uses a subquery instead of USING and LEFT JOIN.  This is an effective workaround for anyone else who may be running in to this problem.  Since a workaround exists, I'll downgrade the severity of the bug.
[10 Feb 2006 15:07] Valeriy Kravchuk
Sorry, but that "messages_and_servers.message is null" condition seems redundant (always FALSE) for your table created as:

mysql> CREATE TABLE `messages_and_servers` (
    ->   `message` bigint(20) NOT NULL,
    ->   `server` int(11) NOT NULL,
    ->   KEY `message` (`message`),
    ->   KEY `server` (`server`),
    ->   CONSTRAINT `messages_and_servers_ibfk_1` FOREIGN KEY (`message`) REFERENCES `server_messages` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
    ->   CONSTRAINT `messages_and_servers_ibfk_2` FOREIGN KEY (`server`) REFERENCES `servers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.02 sec)

Anyway, I was not able to repeat with a single session, so, as you already noted, several concurrent sessions may be needed. Can you give some more suggestions on how to create a repeatable test case?
[10 Feb 2006 15:29] Joel Dice
Actually, the query is valid as written.  Please see the second example on this page for another example:

  http://dev.mysql.com/doc/refman/5.1/en/rewriting-subqueries.html

The point is, a left join will return null entries (reguardless of a 'not null' constraint) when there is no corresponding row in the table being joined.  That's what distinguishes a left join from an inner join, for example.

I'm afraid I've been unable to reproduce this in isolation.  It's easily reproducable when using our application, but it's proprietary and I'm not able to share it.  If there's anything you'd like me to examine and report using GDB, Valgrind, etc., I'm happy to do so.
[19 Feb 2006 15:10] Valeriy Kravchuk
If you can repeat it each and every time with your application, can you, please, send the results of SHOW PROCESSLIST just before the last action that initiates the crash?
[20 Feb 2006 14:35] Joel Dice
I'm not sure how to do that.  When I say that the crash is reproduceable, I mean that mysqld always crashes within two or three seconds of our application starting up.  Exactly when it crashes and which thread causes it depends on how the threads are scheduled and network conditions, both of which are out of my control.  Is there a way to invoke SHOW PROCESSLIST from within the mysqld signal handler, and would it help?
[21 Feb 2006 16:46] Valeriy Kravchuk
OK, let's do it the other way. Can you, please, restart your server with general query log, as described in http://dev.mysql.com/doc/refman/5.0/en/query-log.html, and then send or upload the log file? It should be small, if your application gives that crash so fast.
[21 Feb 2006 22:41] Joel Dice
query log

Attachment: mysql.log (text/x-log), 178.08 KiB.

[21 Feb 2006 22:42] Joel Dice
execution log of mysqld corresponding to mysql.log

Attachment: mysql-exec.txt (text/plain), 2.80 KiB.

[21 Feb 2006 22:43] Joel Dice
I've added the requested query log (mysql.log), plus a log of the crash in case it helps (mysql-exec.txt).
[6 Mar 2006 14:52] Valeriy Kravchuk
Thank you for the additional information. Can you, please, try to filter out queries of session with id 14 from that log you uploaded, and try to execute them all (without preparing) in a single mysql command line session. I want to check, if it leads to the same crash.
[6 Mar 2006 21:09] Joel Dice
I did as you asked, but the server did not crash.  I'll attach a copy of the script I ran, which I produced with

  egrep '14 Query|14 Execute' mysql.log >replay.sql

and then manually removed the id and command columns.  Also, I had to change part of the final insert from "select 6, members.member" to "select last_insert_id(), members.member" since it would fail due to a foreign key constraint otherwise.
[6 Mar 2006 21:10] Joel Dice
script to attempt to reproduce crash

Attachment: replay.sql (text/x-sql), 21.20 KiB.

[6 Apr 2006 15:55] Valeriy Kravchuk
As it is on Debian, please, specify the exact glibc version used. Send the uname -a results also.

Have you tried to repeat with a newer version of MySQL server, 5.0.19?

Please, take a look at bug #18300. It is related to possible memory leaks with prepared statements. Do you see anything similar to your situation?
[24 Apr 2006 16:20] Joel Dice
Sorry it's taken so long to reply - I've been very busy lately.

libc version: 2.3.6-3
uname -a: Linux lnxdb1 2.6.12-1-686-smp #1 SMP Tue Sep 27 13:10:31 JST 2005 i686 GNU/Linux

I haven't had a chance to try version 5.0.19.  I'll give it a try when I have the time, but that may be a few weeks from now.

I don't know if bug #18300 is related, but I haven't seen any evidence of leaks, so I doubt it.  Also, I recall having problems even when server-side prepared statements were disabled.  I'll try to confirm that when I test with 5.0.19.
[29 Apr 2006 13:45] Valeriy Kravchuk
It's new glibc and not AMD64... No ideas about the reason for crash.

Try to repeat with the latest version (5.0.20a now) when you'll have a chance and reopen this bug report with the results.
[2 May 2006 22:10] Joel Dice
5.0.20a also crashes in the same place.  I'll attach a gdb trace.
[2 May 2006 22:11] Joel Dice
crash trace using 5.0.20a

Attachment: gdb-trace-5.0.20a.txt (text/plain), 3.56 KiB.

[21 May 2006 9:57] Valeriy Kravchuk
As your last stack trace contains the following:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1257927760 (LWP 31447)]
0x081af678 in check_table_access (thd=0xb24c6220, want_access=8, 
    tables=0x8bd69f0, no_errors=false) at sql_parse.cc:5190
5190	    if (tables->derived || tables->schema_table || tables->belong_to_view ||
(gdb) bt
#0  0x081af678 in check_table_access (thd=0xb24c6220, want_access=8, 
    tables=0x8bd69f0, no_errors=false) at sql_parse.cc:5190

the reason of crash can be related to some "incorrect" values in mysql database tables. So, please, try to repeat with a fresh installation of MySQL server (5.0.21), with initially "fresh" mysql database, and inform about the results.
[19 Jun 2006 15:23] Joel Dice
It still crashes on the same line using 5.0.22 with fresh tables:

  Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1257067600 (LWP 20145)]
0x081b16f8 in check_table_access (thd=0x8c156d8, want_access=8, 
    tables=0x8ead498, no_errors=false) at sql_parse.cc:5206
5206	    if (tables->derived || tables->schema_table || tables->belong_to_view ||
[15 Jul 2006 14:55] Valeriy Kravchuk
Can you, please, send/upload a (smallest) dump of data that allows to repeat the behaviour you described?
[15 Aug 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[11 Sep 2006 17:01] Joel Dice
Sorry it's taken me so long to respond.  Unfortunately, the servers I had been using to reproduce this are now part of our production system and are no longer accessible to me.  Also, I'm unable provide the data dump you requested due to the proprietary nature of our product.  Since we don't seem likely to make progress on this bug, I'll close it.