Bug #36639 | subselect.test crashes on 64 bit pentium4 when compiled for valgrind | ||
---|---|---|---|
Submitted: | 10 May 2008 1:18 | Modified: | 22 Nov 2010 1:14 |
Reporter: | Sergey Petrunya | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.0, 5.1, 6.0-bk | OS: | Linux (openSUSE 10.3 x86_64) |
Assigned to: | Sergey Petrunya | CPU Architecture: | Any |
[10 May 2008 1:18]
Sergey Petrunya
[10 May 2008 1:19]
Sergey Petrunya
Stack trace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x408c3950 (LWP 31500)] 0x0000000000737a1e in ~QUICK_RANGE_SELECT (this=0x109f740) at table.h:665 665 read_set= read_set_arg; (gdb) wher #0 0x0000000000737a1e in ~QUICK_RANGE_SELECT (this=0x109f740) at table.h:665 #1 0x000000000072a2aa in QUICK_SELECT_DESC (this=0x10a0630, q=0x109f740, used_key_parts_arg=<value optimized out>, create_err=<value optimized out>) at opt_range.cc:8725 #2 0x00000000006bfaae in test_if_skip_sort_order (tab=0x109f370, order=0x1086dc0, select_limit=1, no_changes=false, map=<value optimized out>) at sql_select.cc:15944 #3 0x00000000006bfd65 in create_sort_index (thd=0x10817b0, join=0x109acb0, order=0x1086dc0, filesort_limit=1, select_limit=1, is_order_by=false) at sql_select.cc:16027 #4 0x00000000006cd663 in JOIN::exec (this=0x109acb0) at sql_select.cc:2726 #5 0x0000000000607120 in subselect_single_select_engine::exec (this=0x1095d00) at item_subselect.cc:2277 #6 0x0000000000605cf4 in Item_subselect::exec (this=0x1095c10) at item_subselect.cc:280 #7 0x00000000006065da in Item_singlerow_subselect::val_int (this=0x107cda0) at item_subselect.cc:629 #8 0x00000000005b4dff in Item::update_null_value (this=0x107cda0) at ../item.h:822 #9 0x000000000060e5d3 in Item_subselect::is_null (this=0x107cda0) at item_subselect.h:97 #10 0x00000000006bad35 in make_join_statistics (join=0x1098d80, tables=0xfffffffffffffffd, conds=0x109cbf0, keyuse_array=0x109a748) ---Type <return> to continue, or q <return> to quit---
[10 May 2008 1:36]
Sergey Petrunya
Investigation ============= Here is the debug log: Breakpoint 1, test_if_skip_sort_order (tab=0x1960cb0, order=0x19518d0, select_limit=1, no_changes=false, map=0x19517d0) at sql_select.cc:15927 (gdb) list 15922 { 15923 /* 15924 Don't reverse the sort order, if it's already done. 15925 (In some cases test_if_order_by_key() can be called multiple times 15926 */ 15927 if (!select->quick->reverse_sorted()) 15928 { 15929 QUICK_SELECT_DESC *tmp; 15930 bool error= FALSE; 15931 int quick_type= select->quick->get_type(); psergey> let's look at the quick select (gdb) p select->quick $15 = (QUICK_RANGE_SELECT *) 0x1961080 (gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->records $16 = 2 (gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->head $17 = (TABLE *) 0x19580c0 > psergey: everything good so far. let's proceed till QUICK_RANGE_SELECT's copy constructor (gdb) next (gdb) next (gdb) next (gdb) next (gdb) step operator new (sz=368) at my_new.cc:25 (gdb) next (gdb) fini Run till exit from #0 operator new (sz=368) at my_new.cc:27 0x00000000006bfa8b in test_if_skip_sort_order (tab=0x1960cb0, order=0x19518d0, select_limit=1, no_changes=false, map=0x19517d0) at sql_select.cc:15944 Value returned is $18 = (void *) 0x1961f70 (gdb) step QUICK_SELECT_DESC (this=0x1961f70, q=0x1961080, used_key_parts_arg=2, create_err=0x44086ce4) at opt_range.cc:8706 (gdb) step (gdb) step QUICK_SELECT_I (this=0x1961f70) at opt_range.cc:1075 (gdb) step (gdb) next QUICK_SELECT_DESC (this=0x1961f70, q=0x1961080, used_key_parts_arg=<value optimized out>, create_err=<value optimized out>) at opt_ range.h:349 (gdb) list 344 #endif 345 private: 346 /* Used only by QUICK_SELECT_DESC */ 347 QUICK_RANGE_SELECT(const QUICK_RANGE_SELECT& org) : QUICK_SELECT_I() 348 { 349 bcopy(&org, this, sizeof(*this)); 350 /* 351 Use default MRR implementation for reverse scans. No table engine 352 currently can do an MRR scan with output in reverse index order. 353 */ (gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->head $19 = (TABLE *) 0x19580c0 psergey> ok we're just about to do bcopy and the original quick select psergey> is still ok (I'm referring to it by address because this while psergey> function was optimized away by the compiler and the local vars psergey> are not accessible (gdb) n (gdb) p ((class QUICK_RANGE_SELECT*) 0x1961080)->head $20 = (TABLE *) 0x0 psergey> bummer. This is the cause of the crash. The bcopy command somehow managed to damage the data it has been copying *from*.
[10 May 2008 2:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/46586 ChangeSet@1.2643, 2008-05-10 06:12:44+04:00, sergefp@mysql.com +2 -0 BUG#36639: subselect.test crashes on 64 bit pentium4 when compiled for valgrind - Use the compiler's default copy constructor for QUICK_RANGE_SELECT. bcopy(this, copy, ...) call caused some odd action on gcc-4.1.2 on x86_64
[28 May 2008 10:01]
Bugs System
Pushed into 6.0.6-alpha
[12 Aug 2008 3:41]
Artem Russakovskii
I just hit the same bug on 5.1.26rc built from source. It's running on a 32GB RAM box with 2x Intel XEON E5420 procs. # uname -a Linux XXXXXX 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 x86_64 x86_64 GNU/Linux # cat /etc/SuSE-release openSUSE 10.3 (X86-64) VERSION = 10.3 Unfortunately I didn't run the tests and put started running a slave that I was going to promote to the master. Everything was fine until I ran a few SELECT queries that the production server performs, which immediately crashed the database with something like: SELECT * FROM SOMETHING WHERE SOMETHING AND SOMETHING ORDER BY SOMETHING DESC, id DESC LIMIT 0,1; 080812 1:03:57 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=67108864 read_buffer_size=4194304 max_used_connections=2 max_threads=1024 threads_connected=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12658824 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x27c3be50 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... /usr/local/mysql/libexec/mysqld [0x6f131e] /usr/local/mysql/libexec/mysqld(handle_segfault+0x327) [0x5bc9f7] /lib64/libpthread.so.0 [0x2adb3f4d8fb0] /usr/local/mysql/libexec/mysqld(QUICK_RANGE_SELECT::~QUICK_RANGE_SELECT()+0x82) [0x689092] /usr/local/mysql/libexec/mysqld [0x628573] /usr/local/mysql/libexec/mysqld [0x628a7f] /usr/local/mysql/libexec/mysqld(JOIN::exec()+0x7e9) [0x634c29] /usr/local/mysql/libexec/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x17d) [0x63691d] /usr/local/mysql/libexec/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x169) [0x6372a9] /usr/local/mysql/libexec/mysqld [0x5c6aea] /usr/local/mysql/libexec/mysqld(mysql_execute_command(THD*)+0x44ae) [0x5cedfe] /usr/local/mysql/libexec/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x1e3) [0x5d2133] /usr/local/mysql/libexec/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xb43) [0x5d2c83] /usr/local/mysql/libexec/mysqld(do_command(THD*)+0xe9) [0x5d3609] /usr/local/mysql/libexec/mysqld(handle_one_connection+0x22e) [0x5c492e] /lib64/libpthread.so.0 [0x2adb3f4d1020] /lib64/libc.so.6(clone+0x6d) [0x2adb4056a28d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x27c431b0 = *******************COMMENTED OUT***************** thd->thread_id=4 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 080812 01:03:57 mysqld_safe Number of processes running now: 0 080812 01:03:57 mysqld_safe mysqld restarted InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 080812 1:04:01 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Last MySQL binlog file position 0 1418, file name ./mysql-bin.000084 080812 1:04:04 InnoDB: Started; log sequence number 29 533492891 080812 1:04:04 [Note] Recovering after a crash using mysql-bin 080812 1:04:04 [Note] Starting crash recovery... 080812 1:04:04 [Note] Crash recovery finished. 080812 1:04:05 [Note] Event Scheduler: Loaded 0 events 080812 1:04:05 [Note] /usr/local/mysql/libexec/mysqld: ready for connections. Version: '5.1.26-rc-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution If LIMIT or ORDER BY is removed, the query runs fine. The error message led me to this bug report, which in turn prompted me to run the mentioned test, which failed as described below: ./mysql-test-run.pl t/subselect.test Logging: ./mysql-test-run.pl t/subselect.test MySQL Version 5.1.26 Using dynamic switching of binlog format Skipping ndbcluster, mysqld not compiled with ndbcluster Skipping SSL, mysqld not compiled with SSL Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 9306 Using MASTER_MYPORT1 = 9307 Using SLAVE_MYPORT = 9308 Using SLAVE_MYPORT1 = 9309 Using SLAVE_MYPORT2 = 9310 Using IM_PORT = 9313 Using IM_MYSQLD1_PORT = 9314 Using IM_MYSQLD2_PORT = 9315 Killing Possible Leftover Processes Removing Stale Files Creating Directories Installing Master Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.subselect [ fail ] mysqltest: At line 1905: query 'explain SELECT sql_no_cache t1.a, r.a, r.b FROM t1 LEFT JOIN t2 r ON r.a = (SELECT t2.a FROM t2 WHERE t2.c = t1.a AND t2.b <= '359899' ORDER BY t2.c DESC, t2.b DESC LIMIT 1) WHERE t1.a = 10' failed: 2013: Lost connection to MySQL server during query The result from queries just before the failure was: < snip > INSERT INTO t1 VALUES("0037", "2", "2006-01-06 12:25:53"); INSERT INTO t1 VALUES("0048", "1", "2006-01-06 12:37:50"); INSERT INTO t1 VALUES("0059", "1", "2006-01-06 12:37:50"); select * from t1 r1 where (r1.retailerID,(r1.changed)) in (SELECT r2.retailerId,(max(changed)) from t1 r2 group by r2.retailerId); retailerID statusID changed 0026 2 2006-01-06 12:25:53 0037 2 2006-01-06 12:25:53 0048 1 2006-01-06 12:37:50 0059 1 2006-01-06 12:37:50 drop table t1; create table t1(a int, primary key (a)); insert into t1 values (10); create table t2 (a int primary key, b varchar(32), c int, unique key b(c, b)); insert into t2(a, c, b) values (1,10,'359'), (2,10,'35988'), (3,10,'35989'); explain SELECT sql_no_cache t1.a, r.a, r.b FROM t1 LEFT JOIN t2 r ON r.a = (SELECT t2.a FROM t2 WHERE t2.c = t1.a AND t2.b <= '359899' ORDER BY t2.c DESC, t2.b DESC LIMIT 1) WHERE t1.a = 10; More results from queries before failure can be found in /tmp/mysql/mysql-test/var/log/subselect.log Aborting: main.subselect failed in default mode. To continue, re-run with '--force'. Stopping All Servers It seems this doesn't only affect 6.0, but 5.1 too. What is the proposed fix and when will it be available? Is there another solution we can use for now on this architecture? Thanks, Sergey. You're thorough as always. Artem, http://beerpla.net.
[12 Aug 2008 3:45]
Artem Russakovskii
Just re-reading your report, my gcc version matches yours: gcc --version gcc (GCC) 4.2.1 (SUSE Linux) Can we expect a patch for the 5.1 branch and then roll it up into 5.1.27?
[12 Aug 2008 3:47]
Artem Russakovskii
I'd like to recommend upgrading this from S3 to something more critical, because if the server was untested with mysql tests, this bug can create a DOS opportunity. A server should never crash, as we know. Thanks.
[13 Aug 2008 17:06]
Sergey Petrunya
Confirm, 5.0 and 5.1 are also affected.
[18 Aug 2008 20:54]
Artem Russakovskii
Any hope for a 5.1 patch in the nearest future? I'd like to see if I should switch OS or wait a little bit more. Thanks.
[19 Aug 2008 19:55]
Thomas buck
I run opensuse 10.3 and had the same problem. Just now, I ran Online Update, updated everything (including scary glibc update!), which went smoothly. I restarted MySQL... and now the bug's gone. Would love to have the time to install each update one-by-one to find out where the bug lies, but this has fixed it for me, hopefully it'll fix it for you.
[19 Aug 2008 22:25]
Artem Russakovskii
I can confirm the bug is gone as well. I saw the following glibc packages being upgraded: glib2 glib2-32bit glibc glibc-32bit glibc-devel glibc-i18ndata glibc-locale glibc-locale-32bit Both my select and the subselect.test pass now. I'm still a bit wary about the data corruption if an incorrect memory copy function was used but that's up to Sergey to decide.
[26 Aug 2008 19:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/52594 2677 Sergey Petrunia 2008-08-25 BUG#36639: subselect.test crashes on 64 bit pentium4 when compiled for valgrind, commit into 5.0 - Use the compiler's default copy constructor for QUICK_RANGE_SELECT. bcopy(this, copy, ...) call caused some odd action on gcc-4.1.2 on x86_64
[26 Aug 2008 19:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/52595 2726 Sergey Petrunia 2008-08-25 [merge] Merge BUG#36639 into 5.1
[26 Aug 2008 19:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/52596 2677 Sergey Petrunia 2008-08-25 BUG#36639: subselect.test crashes on 64 bit pentium4 when compiled for valgrind, commit into 5.0 - Use the compiler's default copy constructor for QUICK_RANGE_SELECT. bcopy(this, copy, ...) call caused some odd action on gcc-4.1.2 on x86_64
[26 Aug 2008 21:03]
Sergey Petrunya
The original fix was pushed into 6.0.6 alpha. This fact wasn't recorded in the bugs db for some reason. Since it repeats with earlier versions, too, I've pushed the fix into 5.0-bugteam and 5.1-bugteam trees. The fix will appear in next 5.0 and 5.1 releases. Artem, thanks for pointing out that 5.x is also affected.
[28 Aug 2008 12:23]
Georgi Kodinov
Pushed into 5.0.70 and 5.1.28
[13 Sep 2008 19:41]
Bugs System
Pushed into 6.0.6-alpha (revid:sergefp@mysql.com-20080825170254-j0rl6m0686sgrvq9) (version source revid:sergefp@mysql.com-20080611231653-nmuqmw6dedjra79i) (pib:3)
[18 Sep 2008 7:23]
Sergey Petrunya
Bug description for the changelog: A query which has an ORDER BY DESC which is satisfied with a reverse range scan could crash mysqld on some specific CPU/compiler combinations
[2 Oct 2008 14:56]
Paul DuBois
Noted in 5.0.70, 5.1.28, 6.0.6 changelogs.
[16 Aug 2010 6:41]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:18]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)