Bug #38691 segfault/abort in ``UPDATE ...JOIN'' while ``FLUSH TABLES WITH READ LOCK''
Submitted: 9 Aug 2008 16:08 Modified: 17 Oct 2008 18:22
Reporter: Erik Schoenfelder Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0.67, 5.0 - rev 2646, 5.1 - rev 2673, 6.0 - rev 2664 bzr OS:Any
Assigned to: Gleb Shchepa
Tags: flush tables with read lock, mysqlhotcopy, race, segfault, update join
Triage: Triaged: D1 (Critical)

[9 Aug 2008 16:08] Erik Schoenfelder
Description:
hi,

segfault's of mysqld are observed while running mysqlhotcopy (at
FLUSH TABLES WITH READ LOCK).

with the logged statement (a UPDATE ... INNER JOIN ...) a
testcase with two identical tables with two fields
could be created to repeat the faults:

table t1 and table t2: 

>  knr int(11) unsigned default NULL
>  em varchar(255) default NULL
>  UNIQUE KEY knr
>  KEY em

and the update statement

>  UPDATE t2 INNER JOIN t1 USING(knr) SET knr = NULL WHERE t1.em <> t2.em

stock mysql-5.0.67-linux-i686-glibc23 receives a signal 11, but 
with a compiled ``--with-debug'' 5.0.67 version two different 
failed assertions were flagged:

> mysqld: table.cc:3015: Natural_join_column*
> Field_iterator_table_ref::get_or_create_column_ref(TABLE_LIST*):
> Assertion `!nj_col->table_field || nj_col->table_ref->table
> == nj_col->table_field->table' failed.
> 080809 17:12:05 - mysqld got signal 6 ;

and

> mysqld: sql_base.cc:4672: bool
> store_natural_using_join_columns(THD*, TABLE_LIST*,
> TABLE_LIST*, TABLE_LIST*, List<String>*, uint): Assertion
> `!natural_using_join->join_columns' failed.
> 080809 17:12:07 - mysqld got signal 6 ;

as a result of the restarting mysqld, we observed sometimes table corruption
and always no backup via mysqlhoptcopy of this particular database.

this errors has been observed with different and earlier mysql versions too
and was tested now with the latest version.

the error seems to trigger more often at low system loads.
the testcase triggers an a idle system within a couple of seconds,
but with load upto minutes.

thanks, erik

How to repeat:
we have two shell-scripts to repeat:

script s1.sh creates two identical tables, 
fills in some data and runs the UPDATE ... INNER JOIN in a loop.

script s2.sh simply runs FLUSH TABLES WITH READ LOCK; UNLOCK TABLES
in a loop.

script s1.sh:
------------------------------------------------------------
#!/bin/bash

##
## create table t1 and t2 (identical definition):
##
mysql -B test <<'EOF'
drop table if exists t1;
drop table if exists t2;

CREATE TABLE t1 (
  `knr` int(11) unsigned default NULL,
  `em` varchar(255) default NULL,
  UNIQUE KEY `knr` (`knr`),
  KEY `em` (`em`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci;

CREATE TABLE t2 like t1;
EOF

##
## fill table with data:
##
max=100
i=0
while [ $i -lt $max ] ; do
  i=$(($i + 1))
  echo "insert into t1 values ($i, 'em$i');"
  echo "insert into t2 values ($i, 'em$i');"
done | mysql -B test 

##
## loop endless updates:
##
## print a dot on every update to have some progress indicator.
##
while true ; do

  mysql test -e "UPDATE t2 INNER JOIN t1 USING(knr) SET knr = NULL WHERE t1.em <> t2.em"
  echo -n .

done
------------------------------------------------------------

script s2.sh:
------------------------------------------------------------
#!/bin/bash

while : ; do
  mysql -B test -e "FLUSH TABLES WITH READ LOCK; UNLOCK TABLES"
done
------------------------------------------------------------

running both script concurrent:

------------------------------------------------------------

$ bash s1.sh
.........................ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
.ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111)
.ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[...]
....................ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
.ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[...]
------------------------------------------------------------

$ bash s2.sh
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111)
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111)
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111)
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111)
[...]
------------------------------------------------------------

# cd /tmp/mysql-debug-5.0.67 ; /tmp/mysql-debug-5.0.67/bin/mysqld_safe &
[1] 11041
# Starting mysqld daemon with databases from /tmp/mysql-debug-5.0.67/var
Number of processes running now: 0
080809 17:12:05  mysqld restarted
[...]
Number of processes running now: 0
080809 17:12:07  mysqld restarted
[...]
------------------------------------------------------------

Suggested fix:
unfortunately we have no fix to suggest.
[9 Aug 2008 16:10] Erik Schoenfelder
script s1.sh

Attachment: s1.sh (application/x-sh, text), 805 bytes.

[9 Aug 2008 16:11] Erik Schoenfelder
script s2.sh

Attachment: s2.sh (application/x-sh, text), 96 bytes.

[9 Aug 2008 17:12] Sveta Smirnova
Thank you for the report.

Verified as described.
[9 Aug 2008 20:22] Shane Bester
sveta, do you think this could be related to bug #38499 ?
[12 Aug 2008 6:11] Sveta Smirnova
Yes, looks similar, but because trace is a bit different, I'd leave these bugs separate until one who will fix them decide if they are duplicates.
[16 Sep 2008 18:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/54225

2674 Gleb Shchepa	2008-09-16
      Bug #38691: segfault/abort in ``UPDATE ...JOIN'' while
                  ``FLUSH TABLES WITH READ LOCK''
      
      Concurrent execution of 1) multitable update with a
      NATURAL/USING join and 2) a such query as "FLUSH TABLES
      WITH READ LOCK" or "ALTER TABLE" of updating table led to
      a server crash.
         
         
      The mysql_multi_update_prepare() function call is optimized
      to lock updating tables only, so it postpones locking to the
      last, and if locking fails, it does cleanup of modified 
      syntax structures and repeats a query analysis. However,
      that cleanup procedure was incomplete for NATURAL/USING
      join syntax data: 1) some Field_item items pointed into
      freed table structures, and 2) the TABLE_LIST::join_columns
      fields was not reset.
[6 Oct 2008 22:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55494

2689 Gleb Shchepa	2008-10-07
      Bug #38691: segfault/abort in ``UPDATE ...JOIN'' while
                ``FLUSH TABLES WITH READ LOCK''
      
      Concurrent execution of 1) multitable update with a
      NATURAL/USING join and 2) a such query as "FLUSH TABLES
      WITH READ LOCK" or "ALTER TABLE" of updating table led
      to a server crash.
      
      
      The mysql_multi_update_prepare() function call is optimized
      to lock updating tables only, so it postpones locking to
      the last, and if locking fails, it does cleanup of modified
      syntax structures and repeats a query analysis.  However,
      that cleanup procedure was incomplete for NATURAL/USING join
      syntax data: 1) some Field_item items pointed into freed
      table structures, and 2) the TABLE_LIST::join_columns fields
      was not reset.
      
      Major change:
        short-living Field *Natural_join_column::table_field has
        been replaced with long-living Item*.
[7 Oct 2008 11:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55573

2689 Gleb Shchepa	2008-10-07
      Bug #38691: segfault/abort in ``UPDATE ...JOIN'' while
                ``FLUSH TABLES WITH READ LOCK''
      
      Concurrent execution of 1) multitable update with a
      NATURAL/USING join and 2) a such query as "FLUSH TABLES
      WITH READ LOCK" or "ALTER TABLE" of updating table led
      to a server crash.
      
      
      The mysql_multi_update_prepare() function call is optimized
      to lock updating tables only, so it postpones locking to
      the last, and if locking fails, it does cleanup of modified
      syntax structures and repeats a query analysis.  However,
      that cleanup procedure was incomplete for NATURAL/USING join
      syntax data: 1) some Field_item items pointed into freed
      table structures, and 2) the TABLE_LIST::join_columns fields
      was not reset.
      
      Major change:
        short-living Field *Natural_join_column::table_field has
        been replaced with long-living Item*.
[7 Oct 2008 16:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55611

2689 Gleb Shchepa	2008-10-07
      Bug #38691: segfault/abort in ``UPDATE ...JOIN'' while
                ``FLUSH TABLES WITH READ LOCK''
      
      Concurrent execution of 1) multitable update with a
      NATURAL/USING join and 2) a such query as "FLUSH TABLES
      WITH READ LOCK" or "ALTER TABLE" of updating table led
      to a server crash.
      
      
      The mysql_multi_update_prepare() function call is optimized
      to lock updating tables only, so it postpones locking to
      the last, and if locking fails, it does cleanup of modified
      syntax structures and repeats a query analysis.  However,
      that cleanup procedure was incomplete for NATURAL/USING join
      syntax data: 1) some Field_item items pointed into freed
      table structures, and 2) the TABLE_LIST::join_columns fields
      was not reset.
      
      Major change:
        short-living Field *Natural_join_column::table_field has
        been replaced with long-living Item*.
[7 Oct 2008 18:08] Paul Dubois
Noted in 5.1.29 changelog.

A server crash resulted from concurrent execution of a multiple-table
UPDATE that used a NATURAL ... USING join together with FLUSH TABLES
WITH READ LOCK or ALTER TABLE for the table being updated.

Leaving report status unchanged; this is early documentation of an upcoming push into 5.1.29.
[7 Oct 2008 18:35] Paul Dubois
Revised changelog entry.

A server crash resulted from concurrent execution of a multiple-table
UPDATE that used a NATURAL or USING join together with FLUSH TABLES
WITH READ LOCK or ALTER TABLE for the table being updated.
[7 Oct 2008 21:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55630

2689 Gleb Shchepa	2008-10-08
      Bug #38691: segfault/abort in ``UPDATE ...JOIN'' while
                ``FLUSH TABLES WITH READ LOCK''
      
      Concurrent execution of 1) multitable update with a
      NATURAL/USING join and 2) a such query as "FLUSH TABLES
      WITH READ LOCK" or "ALTER TABLE" of updating table led
      to a server crash.
      
      
      The mysql_multi_update_prepare() function call is optimized
      to lock updating tables only, so it postpones locking to
      the last, and if locking fails, it does cleanup of modified
      syntax structures and repeats a query analysis.  However,
      that cleanup procedure was incomplete for NATURAL/USING join
      syntax data: 1) some Field_item items pointed into freed
      table structures, and 2) the TABLE_LIST::join_columns fields
      was not reset.
      
      Major change:
        short-living Field *Natural_join_column::table_field has
        been replaced with long-living Item*.
[9 Oct 2008 17:25] Bugs System
Pushed into 5.0.72  (revid:gshchepa@mysql.com-20081007213400-0iombn7tl0rf1sok) (version source revid:kgeorge@mysql.com-20081008083624-owy9lio04hq16le1) (pib:4)
[9 Oct 2008 17:47] Bugs System
Pushed into 5.1.30  (revid:gshchepa@mysql.com-20081007213400-0iombn7tl0rf1sok) (version source revid:gshchepa@mysql.com-20081007215249-0yjxbe6boyydv96k) (pib:4)
[9 Oct 2008 18:34] Paul Dubois
Setting report to NDI pending push into 6.0.x.
[9 Oct 2008 18:35] Paul Dubois
Noted in 5.0.72 changelog.
[14 Oct 2008 17:57] Paul Dubois
See also Bug#38823.
[17 Oct 2008 16:42] Bugs System
Pushed into 6.0.8-alpha  (revid:gshchepa@mysql.com-20081007213400-0iombn7tl0rf1sok) (version source revid:kgeorge@mysql.com-20081008092313-4wzrak2duhsz7ylm) (pib:5)
[17 Oct 2008 18:22] Paul Dubois
Noted in 6.0.8 changelog.
[28 Oct 2008 21:05] Bugs System
Pushed into 5.1.29-ndb-6.2.17  (revid:gshchepa@mysql.com-20081007213400-0iombn7tl0rf1sok) (version source revid:tomas.ulin@sun.com-20081028140209-u4emkk1xphi5tkfb) (pib:5)
[28 Oct 2008 22:23] Bugs System
Pushed into 5.1.29-ndb-6.3.19  (revid:gshchepa@mysql.com-20081007213400-0iombn7tl0rf1sok) (version source revid:tomas.ulin@sun.com-20081028194045-0353yg8cvd2c7dd1) (pib:5)
[1 Nov 2008 9:46] Bugs System
Pushed into 5.1.29-ndb-6.4.0  (revid:gshchepa@mysql.com-20081007213400-0iombn7tl0rf1sok) (version source revid:jonas@mysql.com-20081101082305-qx5a1bj0z7i8ueys) (pib:5)