Bug #36639 subselect.test crashes on 64 bit pentium4 when compiled for valgrind
Submitted: 10 May 2008 1:18 Modified: 22 Nov 2010 1:14
Reporter: Sergey Petrunya Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0, 5.1, 6.0-bk OS:Linux (openSUSE 10.3 x86_64)
Assigned to: Sergey Petrunya CPU Architecture:Any

[10 May 2008 1:18] Sergey Petrunya
Description:
subselect.test crashes on 64 bit pentium 4 when compiled for valgrind under openSUSE 10.3 and gcc-4.2.1. 

The problem is not repeatable on any other machine (32/64 bit, intel/amd, suse or other distribution).

The test fails as follows:

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.subselect                 [ fail ]

mysqltest: At line 1891: query 'explain SELECT sql_no_cache t1.a, r.a, r.b FROM t1 LEFT JOIN t2 r 
ON r.a = (SELECT t2.a FROM t2 WHERE t2.c = t1.a AND t2.b <= '359899' 
ORDER BY t2.c DESC, t2.b DESC LIMIT 1) WHERE t1.a = 10' failed: 2013: Lost connection to MySQL server during query

The result from queries just before the failure was:
< snip >
INSERT INTO t1 VALUES("0037", "2", "2006-01-06 12:25:53");
INSERT INTO t1 VALUES("0048", "1", "2006-01-06 12:37:50");
INSERT INTO t1 VALUES("0059", "1", "2006-01-06 12:37:50");
select * from t1 r1 
where (r1.retailerID,(r1.changed)) in 
(SELECT r2.retailerId,(max(changed)) from t1 r2 
group by r2.retailerId);
retailerID      statusID        changed
0026    2       2006-01-06 12:25:53
0037    2       2006-01-06 12:25:53
0048    1       2006-01-06 12:37:50
0059    1       2006-01-06 12:37:50
drop table t1;
create table t1(a int, primary key (a));
insert into t1 values (10);
create table t2 (a int primary key, b varchar(32), c int, unique key b(c, b));
insert into t2(a, c, b) values (1,10,'359'), (2,10,'35988'), (3,10,'35989');
explain SELECT sql_no_cache t1.a, r.a, r.b FROM t1 LEFT JOIN t2 r 
ON r.a = (SELECT t2.a FROM t2 WHERE t2.c = t1.a AND t2.b <= '359899' 
ORDER BY t2.c DESC, t2.b DESC LIMIT 1) WHERE t1.a = 10;

How to repeat:
Difficult, see above notes about this being very platform-specific. Compile with compile-pentium-valgrind-max and run mysql-test-run t/subselect.test.
[10 May 2008 1:19] Sergey Petrunya
Stack trace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x408c3950 (LWP 31500)]
0x0000000000737a1e in ~QUICK_RANGE_SELECT (this=0x109f740) at table.h:665
665         read_set= read_set_arg;
(gdb) wher
#0  0x0000000000737a1e in ~QUICK_RANGE_SELECT (this=0x109f740) at table.h:665
#1  0x000000000072a2aa in QUICK_SELECT_DESC (this=0x10a0630, q=0x109f740, 
    used_key_parts_arg=<value optimized out>, create_err=<value optimized out>)
    at opt_range.cc:8725
#2  0x00000000006bfaae in test_if_skip_sort_order (tab=0x109f370, 
    order=0x1086dc0, select_limit=1, no_changes=false, 
    map=<value optimized out>) at sql_select.cc:15944
#3  0x00000000006bfd65 in create_sort_index (thd=0x10817b0, join=0x109acb0, 
    order=0x1086dc0, filesort_limit=1, select_limit=1, is_order_by=false)
    at sql_select.cc:16027
#4  0x00000000006cd663 in JOIN::exec (this=0x109acb0) at sql_select.cc:2726
#5  0x0000000000607120 in subselect_single_select_engine::exec (this=0x1095d00)
    at item_subselect.cc:2277
#6  0x0000000000605cf4 in Item_subselect::exec (this=0x1095c10)
    at item_subselect.cc:280
#7  0x00000000006065da in Item_singlerow_subselect::val_int (this=0x107cda0)
    at item_subselect.cc:629
#8  0x00000000005b4dff in Item::update_null_value (this=0x107cda0)
    at ../item.h:822
#9  0x000000000060e5d3 in Item_subselect::is_null (this=0x107cda0)
    at item_subselect.h:97
#10 0x00000000006bad35 in make_join_statistics (join=0x1098d80, 
    tables=0xfffffffffffffffd, conds=0x109cbf0, keyuse_array=0x109a748)
---Type <return> to continue, or q <return> to quit---
[10 May 2008 1:36] Sergey Petrunya
Investigation
=============

Here is the debug log:
 
 Breakpoint 1, test_if_skip_sort_order (tab=0x1960cb0, order=0x19518d0, select_limit=1, no_changes=false, map=0x19517d0) at sql_select.cc:15927
(gdb) list
  15922     {
  15923       /*
  15924         Don't reverse the sort order, if it's already done.
  15925         (In some cases test_if_order_by_key() can be called multiple times
  15926       */
  15927       if (!select->quick->reverse_sorted())
  15928       {
  15929         QUICK_SELECT_DESC *tmp;
  15930         bool error= FALSE;
  15931         int quick_type= select->quick->get_type();

psergey> let's look at the quick select

(gdb) p select->quick
  $15 = (QUICK_RANGE_SELECT *) 0x1961080
(gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->records
  $16 = 2
(gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->head
  $17 = (TABLE *) 0x19580c0

> psergey: everything good so far. let's proceed till QUICK_RANGE_SELECT's copy constructor
(gdb) next
(gdb) next
(gdb) next
(gdb) next
(gdb) step
  operator new (sz=368) at my_new.cc:25
(gdb) next
(gdb) fini
  Run till exit from #0  operator new (sz=368) at my_new.cc:27
  0x00000000006bfa8b in test_if_skip_sort_order (tab=0x1960cb0, order=0x19518d0, select_limit=1, no_changes=false, map=0x19517d0) at 
sql_select.cc:15944
  Value returned is $18 = (void *) 0x1961f70
(gdb) step
  QUICK_SELECT_DESC (this=0x1961f70, q=0x1961080, used_key_parts_arg=2, create_err=0x44086ce4) at opt_range.cc:8706
(gdb) step
(gdb) step
  QUICK_SELECT_I (this=0x1961f70) at opt_range.cc:1075
(gdb) step
(gdb) next
  QUICK_SELECT_DESC (this=0x1961f70, q=0x1961080, used_key_parts_arg=<value optimized out>, create_err=<value optimized out>) at opt_
range.h:349
(gdb) list
  344   #endif
  345   private:
  346     /* Used only by QUICK_SELECT_DESC */
  347     QUICK_RANGE_SELECT(const QUICK_RANGE_SELECT& org) : QUICK_SELECT_I()
  348     {
  349       bcopy(&org, this, sizeof(*this));
  350       /* 
  351         Use default MRR implementation for reverse scans. No table engine
  352         currently can do an MRR scan with output in reverse index order.
  353       */
(gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->head
  $19 = (TABLE *) 0x19580c0
psergey> ok we're just about to do bcopy and the original quick select 
psergey> is still ok (I'm referring to it by address because this while 
psergey> function was optimized away by the compiler and the local vars
psergey> are not accessible 
(gdb) n
(gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->head
  $20 = (TABLE *) 0x0

psergey> bummer. This is the cause of the crash. The bcopy command somehow managed to damage the data it has been copying *from*.
[10 May 2008 2:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46586

ChangeSet@1.2643, 2008-05-10 06:12:44+04:00, sergefp@mysql.com +2 -0
  BUG#36639: subselect.test crashes on 64 bit pentium4 when compiled for valgrind
  - Use the compiler's default copy constructor for QUICK_RANGE_SELECT. 
    bcopy(this, copy, ...) call caused some odd action on gcc-4.1.2 on x86_64
[28 May 2008 10:01] Bugs System
Pushed into 6.0.6-alpha
[12 Aug 2008 3:41] Artem Russakovskii
I just hit the same bug on 5.1.26rc built from source. It's running on a 32GB RAM box with 2x Intel XEON E5420 procs.

# uname -a
Linux XXXXXX 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/SuSE-release 
openSUSE 10.3 (X86-64)
VERSION = 10.3

Unfortunately I didn't run the tests and put started running a slave that I was going to promote to the master. Everything was fine until I ran a few SELECT queries that the production server performs, which immediately crashed the database with something like:

SELECT * FROM SOMETHING WHERE SOMETHING AND SOMETHING ORDER BY SOMETHING DESC, id DESC LIMIT 0,1;

080812  1:03:57 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=67108864
read_buffer_size=4194304
max_used_connections=2
max_threads=1024
threads_connected=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12658824 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x27c3be50

Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
/usr/local/mysql/libexec/mysqld [0x6f131e]
/usr/local/mysql/libexec/mysqld(handle_segfault+0x327) [0x5bc9f7]
/lib64/libpthread.so.0 [0x2adb3f4d8fb0]
/usr/local/mysql/libexec/mysqld(QUICK_RANGE_SELECT::~QUICK_RANGE_SELECT()+0x82) [0x689092]
/usr/local/mysql/libexec/mysqld [0x628573]
/usr/local/mysql/libexec/mysqld [0x628a7f]
/usr/local/mysql/libexec/mysqld(JOIN::exec()+0x7e9) [0x634c29]
/usr/local/mysql/libexec/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x17d) [0x63691d]
/usr/local/mysql/libexec/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x169) [0x6372a9]
/usr/local/mysql/libexec/mysqld [0x5c6aea]
/usr/local/mysql/libexec/mysqld(mysql_execute_command(THD*)+0x44ae) [0x5cedfe]
/usr/local/mysql/libexec/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x1e3) [0x5d2133]
/usr/local/mysql/libexec/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xb43) [0x5d2c83]
/usr/local/mysql/libexec/mysqld(do_command(THD*)+0xe9) [0x5d3609]
/usr/local/mysql/libexec/mysqld(handle_one_connection+0x22e) [0x5c492e]
/lib64/libpthread.so.0 [0x2adb3f4d1020]
/lib64/libc.so.6(clone+0x6d) [0x2adb4056a28d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x27c431b0 = *******************COMMENTED OUT*****************
thd->thread_id=4
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
080812 01:03:57 mysqld_safe Number of processes running now: 0
080812 01:03:57 mysqld_safe mysqld restarted

InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
080812  1:04:01  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Last MySQL binlog file position 0 1418, file name ./mysql-bin.000084
080812  1:04:04  InnoDB: Started; log sequence number 29 533492891
080812  1:04:04 [Note] Recovering after a crash using mysql-bin
080812  1:04:04 [Note] Starting crash recovery...
080812  1:04:04 [Note] Crash recovery finished.
080812  1:04:05 [Note] Event Scheduler: Loaded 0 events
080812  1:04:05 [Note] /usr/local/mysql/libexec/mysqld: ready for connections.
Version: '5.1.26-rc-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution

If LIMIT or ORDER BY is removed, the query runs fine. The error message led me to this bug report, which in turn prompted me to run the mentioned test, which failed as described below:

./mysql-test-run.pl t/subselect.test
Logging: ./mysql-test-run.pl t/subselect.test
MySQL Version 5.1.26
Using dynamic switching of binlog format
Skipping ndbcluster, mysqld not compiled with ndbcluster
Skipping SSL, mysqld not compiled with SSL
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Using IM_PORT               = 9313
Using IM_MYSQLD1_PORT       = 9314
Using IM_MYSQLD2_PORT       = 9315
Killing Possible Leftover Processes
Removing Stale Files
Creating Directories
Installing Master Database
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.subselect                 [ fail ]

mysqltest: At line 1905: query 'explain SELECT sql_no_cache t1.a, r.a, r.b FROM t1 LEFT JOIN t2 r
ON r.a = (SELECT t2.a FROM t2 WHERE t2.c = t1.a AND t2.b <= '359899'
ORDER BY t2.c DESC, t2.b DESC LIMIT 1) WHERE t1.a = 10' failed: 2013: Lost connection to MySQL server during query

The result from queries just before the failure was:
< snip >
INSERT INTO t1 VALUES("0037", "2", "2006-01-06 12:25:53");
INSERT INTO t1 VALUES("0048", "1", "2006-01-06 12:37:50");
INSERT INTO t1 VALUES("0059", "1", "2006-01-06 12:37:50");
select * from t1 r1
where (r1.retailerID,(r1.changed)) in
(SELECT r2.retailerId,(max(changed)) from t1 r2
group by r2.retailerId);
retailerID      statusID        changed
0026    2       2006-01-06 12:25:53
0037    2       2006-01-06 12:25:53
0048    1       2006-01-06 12:37:50
0059    1       2006-01-06 12:37:50
drop table t1;
create table t1(a int, primary key (a));
insert into t1 values (10);
create table t2 (a int primary key, b varchar(32), c int, unique key b(c, b));
insert into t2(a, c, b) values (1,10,'359'), (2,10,'35988'), (3,10,'35989');
explain SELECT sql_no_cache t1.a, r.a, r.b FROM t1 LEFT JOIN t2 r
ON r.a = (SELECT t2.a FROM t2 WHERE t2.c = t1.a AND t2.b <= '359899'
ORDER BY t2.c DESC, t2.b DESC LIMIT 1) WHERE t1.a = 10;

More results from queries before failure can be found in /tmp/mysql/mysql-test/var/log/subselect.log

Aborting: main.subselect failed in default mode.
To continue, re-run with '--force'.
Stopping All Servers

It seems this doesn't only affect 6.0, but 5.1 too. What is the proposed fix and when will it be available? Is there another solution we can use for now on this architecture?

Thanks, Sergey. You're thorough as always.

Artem,
http://beerpla.net.
[12 Aug 2008 3:45] Artem Russakovskii
Just re-reading your report, my gcc version matches yours:
gcc --version
gcc (GCC) 4.2.1 (SUSE Linux)

Can we expect a patch for the 5.1 branch and then roll it up into 5.1.27?
[12 Aug 2008 3:47] Artem Russakovskii
I'd like to recommend upgrading this from S3 to something more critical, because if the server was untested with mysql tests, this bug can create a DOS opportunity. A server should never crash, as we know.

Thanks.
[13 Aug 2008 17:06] Sergey Petrunya
Confirm, 5.0 and 5.1 are also affected.
[18 Aug 2008 20:54] Artem Russakovskii
Any hope for a 5.1 patch in the nearest future? I'd like to see if I should switch OS or wait a little bit more.
Thanks.
[19 Aug 2008 19:55] Thomas buck
I run opensuse 10.3 and had the same problem. Just now, I ran Online Update, updated everything (including scary glibc update!), which went smoothly. I restarted MySQL... and now the bug's gone. Would love to have the time to install each update one-by-one to find out where the bug lies, but this has fixed it for me, hopefully it'll fix it for you.
[19 Aug 2008 22:25] Artem Russakovskii
I can confirm the bug is gone as well. I saw the following glibc packages being upgraded:
glib2
glib2-32bit
glibc
glibc-32bit
glibc-devel
glibc-i18ndata
glibc-locale
glibc-locale-32bit

Both my select and the subselect.test pass now. I'm still a bit wary about the data corruption if an incorrect memory copy function was used but that's up to Sergey to decide.
[26 Aug 2008 19:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52594

2677 Sergey Petrunia	2008-08-25
      BUG#36639: subselect.test crashes on 64 bit pentium4 when compiled for valgrind, commit into 5.0
      - Use the compiler's default copy constructor for QUICK_RANGE_SELECT. 
        bcopy(this, copy, ...) call caused some odd action on gcc-4.1.2 on x86_64
[26 Aug 2008 19:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52595

2726 Sergey Petrunia	2008-08-25 [merge]
      Merge BUG#36639 into 5.1
[26 Aug 2008 19:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52596

2677 Sergey Petrunia	2008-08-25
      BUG#36639: subselect.test crashes on 64 bit pentium4 when compiled for valgrind, commit into 5.0
      - Use the compiler's default copy constructor for QUICK_RANGE_SELECT. 
        bcopy(this, copy, ...) call caused some odd action on gcc-4.1.2 on x86_64
[26 Aug 2008 21:03] Sergey Petrunya
The original fix was pushed into 6.0.6 alpha. This fact wasn't recorded in the bugs db for some reason.

Since it repeats with earlier versions, too, I've pushed the fix into 5.0-bugteam and 5.1-bugteam trees. The fix will appear in next 5.0 and 5.1 releases.

Artem, thanks for pointing out that 5.x is also affected.
[28 Aug 2008 12:23] Georgi Kodinov
Pushed into 5.0.70 and 5.1.28
[13 Sep 2008 19:41] Bugs System
Pushed into 6.0.6-alpha  (revid:sergefp@mysql.com-20080825170254-j0rl6m0686sgrvq9) (version source revid:sergefp@mysql.com-20080611231653-nmuqmw6dedjra79i) (pib:3)
[18 Sep 2008 7:23] Sergey Petrunya
Bug description for the changelog:
A query which has an ORDER BY DESC which is satisfied with a reverse range scan could crash mysqld on some specific CPU/compiler combinations
[2 Oct 2008 14:56] Paul DuBois
Noted in 5.0.70, 5.1.28, 6.0.6 changelogs.
[16 Aug 2010 6:41] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:18] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)