Bug #14851 killing a long-running sub-query crashes mysqld
Submitted: 11 Nov 2005 0:40 Modified: 14 Feb 2006 2:55
Reporter: Timothy Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.16 OS:Any (all)
Assigned to: Oleksandr Byelkin

[11 Nov 2005 0:40] Timothy Smith
Description:
Using data from the user (see private comment for details), the following query is run:

SELECT ID, Text FROM vocez WHERE ID IN (SELECT DISTINCT( ValID) FROM atr WHERE AtrID='A00630');

This query never returns.

When this query is killed from another session, mysqld seg faults in JOIN::join_free().

How to repeat:
I tested this on Linux using a fresh pull from mysql-4.1 tree.  I ran it with --no-defaults, so there are no my.cnf settings required.  See following private comments for where to get user's data.

Back trace is below.

Just copy the data files into the test db, run the server, and then issue the above select query.

If I issue just the inside select statement, it returns ~5800 values in ~60 seconds.

(gdb) run
Starting program: /nfstmp1/tsmith/m/41/m/bin/mysqld --no-defaults --basedir=/users/tsmith/m/41/m --datadir=/users/tsmith/m/41/m/data --tmpdir=/users/tsmith/m/41/m/tmp --log-error=/users/tsmith/m/41/m/data/log.err --socket=mysql.sock --port=33410 --server-id=33410 --log-bin --gdb
[Thread debugging using libthread_db enabled]
[New Thread 1302656 (LWP 3043)]
[New Thread 31341488 (LWP 3046)]
[New Thread 71216048 (LWP 3047)]
[New Thread 50719664 (LWP 3048)]
[New Thread 81705904 (LWP 3049)]
[New Thread 107908016 (LWP 3050)]
[New Thread 92195760 (LWP 3051)]
[New Thread 118537136 (LWP 3052)]
[New Thread 6314928 (LWP 3053)]
[New Thread 6564784 (LWP 16656)]
[Thread 6564784 (zombie) exited]
[New Thread 6564784 (LWP 30209)]
[New Thread 6900656 (LWP 12180)]
[Thread 6900656 (zombie) exited]
[New Thread 6900656 (LWP 12182)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 6564784 (LWP 30209)]
JOIN::join_free (this=0x8b31ce8, full=true) at sql_string.h:146
146 if (alloced)
(gdb) bt full
#0 JOIN::join_free (this=0x8b31ce8, full=true) at sql_string.h:146
tab = (JOIN_TAB *) 0x69137af4
end = (JOIN_TAB *) 0x69137b20
_db_func_ = 0x8aee148 "h\026�\b0\036d"
_db_file_ = 0x8af0fd8 "��0\b"
_db_level_ = 145956072
_db_framep_ = (char **) 0x6420a8
#1 0x081720a0 in JOIN::cleanup (this=0x8b31ce8) at sql_select.cc:1534
_db_func_ = 0x8aee148 "h\026�\b0\036d"
_db_file_ = 0x8af0fd8 "��0\b"
_db_level_ = 145678664
_db_framep_ = (char **) 0x6420e8
#2 0x08172086 in JOIN::cleanup (this=0x8aee148) at sql_select.cc:1530
_db_func_ = 0x8aed260 "\b�0\b"
_db_file_ = 0x8af0fd8 "��0\b"
_db_level_ = 145676072
_db_framep_ = (char **) 0x642138
#3 0x081fb351 in st_select_lex_unit::cleanup (this=0x8aed260)
at sql_union.cc:602
error = 0
_db_func_ = 0x82eaf01 "�\003"
_db_file_ = 0x8ae8a88 '\217' <repeats 200 times>...
_db_level_ = 138038105
_db_framep_ = (char **) 0xec
join = (JOIN *) 0x8aee148
sl = (SELECT_LEX *) 0x8aed148
#4 0x0817214f in JOIN::cleanup (this=0x8aed728) at sql_select.cc:1546
lex_unit = (SELECT_LEX_UNIT *) 0x8aed260
_db_func_ = 0x81722d7 "\203&#65533;0\205&#65533;uw\203&#65533;\fS&#65533;q&#65533;&#65533;&#65533;\203&#65533;\020\205&#65533;ug\213\207L\005"
_db_file_ = 0x8aed728 "h\017&#65533;\b&#65533;\037d"
_db_level_ = 145690820
_db_framep_ = (char **) 0x8aed080
#5 0x08172372 in mysql_select (thd=0x8af0eb0, rref_pointer_array=0x8af10c4,
tables=0x8aed080, wild_num=0, fields=@0x8af1040, conds=0x8aed5a0,
og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0,
select_options=2156153344, result=0x8aed718, unit=0x8af0ef8,
select_lex=0x8af0fd8) at sql_select.cc:1633
err = 6562260
free_join = true
_db_func_ = 0x6421d8 ""
_db_file_ = 0x8af0ec4 "\030\r&#65533;\b&#65533;&#942;\b&#65533;&#942;\b "
_db_level_ = 8
_db_framep_ = (char **) 0x6421d0
join = (JOIN *) 0x8aed728
#6 0x0816eaeb in handle_select (thd=0x8af0eb0, lex=0x8af0eec,
result=0x8aed718) at sql_select.cc:177
res = 145690584
select_lex = (SELECT_LEX *) 0x8af0fd8
_db_func_ = 0x812790d "\203&#65533;\020\211C\004\213]&#65533;&#65533;&#65533;U\211&#65533;S\203&#65533;\020\213]\b&#65533;\003\b&#65533;2\b&#65533;5L&#65533;I\b&#65533;4&#65533;&#65533;\203&#65533;\020\211C\004\213]&#65533;&#65533;&#65533;U\211&#65533;\203&#65533;\f&#65533;u\020&#65533;u\f\213E\b&#65533;p\004&#65533;qO"
_db_file_ = 0x1 <Address 0x1 out of bounds>
_db_level_ = 12
_db_framep_ = (char **) 0xaf0eec
#7 0x0814654d in mysql_execute_command (thd=0x8af0eb0) at sql_parse.cc:2096
result = (class select_result *) 0x8aed718
res = -1
lex = (LEX *) 0x8af0eec
select_lex = (SELECT_LEX *) 0x8af0fd8
tables = (TABLE_LIST *) 0x8aed6b0
unit = (SELECT_LEX_UNIT *) 0x8af0ef8
_db_func_ = 0x84afac0 ""
_db_file_ = 0x642bb0 "&#65533;+d"
_db_level_ = 1
_db_framep_ = (char **) 0x82f4c89
#8 0x0814b5af in mysql_parse (thd=0x8af0eb0,
inBuf=0x8aecef0 "SELECT ID, Text FROM vocez WHERE ID IN (SELECT DISTINCT( ValID) FROM atr WHERE AtrID='A00630')", length=145690348) at sql_parse.cc:4339
lex = (LEX *) 0x8af0eec
_db_func_ = 0x8af0eb0 "h&#65533;2\b\b-&#65533;\b`&#65533;I\bx&#65533;2\b&#65533;&&#65533;\b\030\r&#65533;\b&#65533;&#942;\b&#65533;&#942;\b "
_db_file_ = 0x3 <Address 0x3 out of bounds>
_db_level_ = 145690288
_db_framep_ = (char **) 0x642998
#9 0x081450b0 in dispatch_command (command=COM_QUERY, thd=0x8af0eb0,
packet=0x8ae8eb9 "", packet_length=95) at sql_parse.cc:1499
packet_end = 0x8aecf4e ""
net = (NET *) 0x8af1408
error = false
_db_func_ = 0x4274acd7 "\201&#65533;\035\203"
_db_file_ = 0x4274b090 "]&#65533;\215\001&#65533;G/"
_db_level_ = 6563880
_db_framep_ = (char **) 0x0
#10 0x08144ae2 in do_command (thd=0x8af0eb0) at sql_parse.cc:1312
packet = 0x8ae8eb8 "\001"
old_timeout = 30
packet_length = 95
net = (NET *) 0x8af1408
command = COM_QUERY
_db_func_ = 0x8125c7e "\203&#65533;\020\213]&#65533;&#65533;&#65533;U\211&#65533;VS\213]\b\203&#65533;\fS&#65533;m"
_db_file_ = 0x8af202c "\020&#65533;\b"
_db_level_ = 8192
_db_framep_ = (char **) 0x1000
#11 0x0814408a in handle_one_connection (arg=0xc0c0c07) at sql_parse.cc:1044
error = 145955588
net = (NET *) 0x8af1408
thd = (class THD *) 0x8af0eb0
launch_time = 202116103
set = {__val = {0 <repeats 32 times>}}
#12 0x4274898c in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
#13 0x426b67da in clone () from /lib/tls/libc.so.6
No symbol table info available.

This was just after I did 'kill 2' to kill the sub-select query.

Suggested fix:
Unknown.  Perhaps the crash is related to the fact that the query never finishes.
[14 Nov 2005 4:01] Timothy Smith
I've repeated this with a simpler test case:

create table a (id int primary key);
create table b (id int unsigned not null);

insert into a values (1);
set @c = (select count(*) from a); insert into a select id + @c from a;
-- Hit <Up><Enter> enough times to have 4096 rows in table a

insert into b select id from a;

-- This query will take forever; if you kill it from mysqladmin,
-- it will crash mysqld with a seg fault
select id from a where id in (select distinct id from b);

Note that the "distinct" is important.  If that is removed, then the
query still takes forever, but it will not crash the server to kill it.

Regards,

Timothy
[14 Nov 2005 4:17] Timothy Smith
I said "forever" in the previous comment, but that's not true.  It just takes a long time....
[16 Nov 2005 15:13] Oleksandr Byelkin
Thank you for bugreport!
I can't repeat  this bug any more. I tried following mysql-test script:
create table t1 (id int primary key);
create table t2 (id int unsigned not null);

let $1 = 4097;
while ($1)
{
  eval insert into t1 values ($1);
  dec $1;
}
insert into t2 select id from t1;

connect (con1, localhost, root,,);
connection con1;

create table t3 (kill_id int);
insert into t3 values(connection_id());

select id from t1 where id in (select distinct id from t2);

connection default;
select ((@id := kill_id) - kill_id) from t3;
sleep 3;
kill @id;

connection con1;
-- error 2013
reap;

disconnect con1;
connection default;

drop table t1, t2, t3;

Also I tried your short test suite on my linux comp, and even full data test suite on  debx86. server kill connection smoothly. If I lost something in the procedure please reopen bug with some comments.
[16 Nov 2005 22:09] Timothy Smith
I could not repeat this with the mysqltest binary.  However, I
removed the "drop table" statement from the end of the test.
After running the test (and it returned "ok"), I then connected
with the 'mysql' client and ran the select from the test case,
then suspended that mysql and did 'mysqladmin proc' and
'mysqladmin kill <ID>', and it crashed the server.

I have set this up on mysqldev@production so you can use the
exact setup I have.

I just did 'bk pull' in my 4.1 tree, and built it with this
command (as user tsmith):

#################################################

cd ~/m/41/a
~/m/build > log 2>&1
make install

#################################################

Then, as user mysqldev, I did:

#################################################

d=/nfstmp1/tsmith/production/bug14851
rm -rf $d
mkdir $d
cd ~tsmith/m/x/41/bk
./bin/mysql_install_db --no-defaults --basedir="$PWD" --datadir="$d"
./libexec/mysqld --no-defaults --basedir="$PWD" \
  --datadir="$d" --skip-networking --socket=mysql.sock &

./bin/mysqltest --no-defaults --socket="$d/mysql.sock" \
  --user=root test <<'EOF'
drop table if exists t1, t2, t3;

create table t1 (id int primary key);
create table t2 (id int unsigned not null);

let $1 = 4097;
while ($1)
{
  eval insert into t1 values ($1);
  dec $1;
}
insert into t2 select id from t1;

connect (con1, localhost, root,,);
connection con1;

create table t3 (kill_id int);
insert into t3 values(connection_id());

select id from t1 where id in (select distinct id from t2);

connection default;
select ((@id := kill_id) - kill_id) from t3;
sleep 3;
kill @id;

connection con1;
-- error 2013
reap;

disconnect con1;
connection default;

/* drop table t1, t2, t3; */
EOF

./bin/mysql --no-defaults --user=root \
  --socket="$d/mysql.sock" test

#################################################

-- Now at the mysql> prompt...

select id from t1 where id in (select distinct id from t2);

-- Now hit ^Z, and at the shell prompt:

#################################################

./bin/mysqladmin --no-defaults --user=root \
  --socket="$d/mysql.sock" proc

./bin/mysqladmin --no-defaults --user=root \
  --socket="$d/mysql.sock" kill 4

#################################################

Regards,
[16 Nov 2005 22:21] Timothy Smith
log of a session showing the exact commands run as mysqldev@production

Attachment: typescript (application/octet-stream, text), 146.27 KiB.

[16 Nov 2005 22:32] Timothy Smith
By the way, I tried this on 5.0.15 and it does *not* crash there.  It seems that only 4.1 is affected.

Timothy
[21 Nov 2005 15:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/32466
[21 Nov 2005 15:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/32468
[15 Jan 2006 20:09] Konstantin Osipov
Approved with a few comments by email.
[18 Jan 2006 11:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/33363
[18 Jan 2006 11:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/33364
[18 Jan 2006 11:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/33366
[12 Feb 2006 11:51] Oleksandr Byelkin
bugfix pushed into 5.1.7, 5.0.19, 4.1.19

Problem was that killing a longruning query with subqueries may lead to a mysql server crash due to incorrect attempt to free resources in case of kill
[14 Feb 2006 2:55] Paul Dubois
Noted in 4.1.19, 5.0.19, 5.1.7 changelogs.

          Killing a long-running query containing a subquery could cause
          a server crash. (Bug #14851)