Bug #41710 | MySQL 5.1.30 crashes on the latest OpenSolaris 10 | ||
---|---|---|---|
Submitted: | 23 Dec 2008 11:51 | Modified: | 17 Jul 2009 3:22 |
Reporter: | Vladimir Kolesnikov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: General | Severity: | S2 (Serious) |
Version: | 5.1.30 | OS: | Solaris |
Assigned to: | Alexey Kopytov | CPU Architecture: | Any |
[23 Dec 2008 11:51]
Vladimir Kolesnikov
[23 Dec 2008 11:52]
Vladimir Kolesnikov
err, a note - it's x86 hardware, vmware machine 1GB vm ram
[23 Dec 2008 23:08]
MySQL Verification Team
Latest bzr source on OpenSolaris 2008.11 ended with the below error: ndb.ndb_index [ pass ] 6010 ndb.ndb_index_ordered [ fail ] --- /export/home/miguel/dbs/5.1/mysql-test/suite/ndb/r/ndb_index_ordered.result 2008-12-14 05:24:35.299112198 +0300 +++ /export/home/miguel/dbs/5.1/mysql-test/suite/ndb/r/ndb_index_ordered.reject 2008-12-24 01:04:32.850211411 +0300 @@ -643,7 +643,7 @@ begin; select count(*) from t1; count(*) -2 +0 ALTER TABLE t1 ADD COLUMN c int; select a from t1 where b = 2; a mysqltest: Result content mismatch Aborting: ndb.ndb_index_ordered failed in default mode. To continue, re-run with '--force'. Stopping All Servers mysql-test-run: WARNING: Forcing kill of process 6721 miguel@skybr.net:~/dbs/5.1/mysql-test$
[24 Dec 2008 8:23]
Vladimir Kolesnikov
Miguel, couple of questions 1. which compiler are you using and what were compilation options? 2. I see that your tests stopped after the first error, what if you use the "--force" option for mysql-test-run.pl Thanks,
[24 Dec 2008 13:00]
MySQL Verification Team
I compiled the source with GCC compiler and I did the test with the aim to verify that the errors are independent of compiler but unlucky I've not applied the option --force. I will test again with --force.
[29 Dec 2008 15:03]
MySQL Verification Team
Thank you for the bug report. Not repeatable on Linux however: Stopping All Servers All 1258 tests were successful. The servers were restarted 360 times Spent 4307.718 of 7067 seconds executing testcases miguel@hegel:~/dbs/5.1/mysql-test$
[16 Jan 2009 14:06]
Sveta Smirnova
Re-verified with Sun Studio Compiler. All tests listed in the initial description fail in my case too. I complied with ./configure --with-plugins=max-no-ndb Tests in binary packages which we distribute don't fail.
[16 Jan 2009 14:09]
Sveta Smirnova
To avoid misunderstanding: verified failure of tests main.greedy_optimizer main.insert_notembedded main.join main.join_crash main.join_nested main.join_outer main.kill main.limit main.order_by main.subselect main.type_blob main.user_var with Sun Studio compiler on latest OpenSolaris.
[2 Mar 2009 11:49]
Sveta Smirnova
test logs
Attachment: bug41710.log (application/octet-stream, text), 18.55 KiB.
[2 Mar 2009 11:51]
Sveta Smirnova
Re-tested with 5.1.31. Problem exists. Build/compile logs contain no interesting information: MySQL was configured with options ./configure --prefix=PATH, all compiled fine. Test logs indicate server crashed during tests.
[2 Mar 2009 11:53]
Sveta Smirnova
config.log
Attachment: bug41710.config.log.gz (application/x-gzip, text), 35.81 KiB.
[4 May 2009 11:32]
Sveta Smirnova
Bug #44538 was marked as duplicate of this one.
[10 Jun 2009 17:18]
Kristofer Pettersson
My modest investigation on this bug indicated that this was a problem with the build process. Somehow 32 and 64 code is mixed and my explicitly specify this by adding CXXFLAGS="-m64" and CFLAGS="-m64" the crash can be avoided. Example that should work without a crash: ./configure CFLAGS="-Xa -m32 -mt" CXXFLAGS="-m32 -mt" --with-plugins=myisam,innobase
[18 Jun 2009 7:55]
Alexey Kopytov
This is a result of a Sun Studio compiler bug. All failing test cases are crashing with SIGSEGV in prev_record_reads(). Here's the relevant code lines from there: static double prev_record_reads(JOIN *join, uint idx, table_map found_ref) { POSITION *pos_end= join->positions - 1; for (POSITION *pos= join->positions + idx - 1; pos != pos_end; pos--) { if (pos->table->table->map & found_ref) ... } } CC at the -O3 optimization level unrolls the 'for' loop, but generates broken code since we end up trying to access pos_end->table->table->map, that is memory outside of the join->positions array, and this is where the crash occurs. A couple of observations: 1. Lowering the optimization level to -O2 results in correct code being generated (in fact, in this case prev_record_reads() is inlined by the compiler without loop unrolling). Our release binaries are built with -O2, so they are not affected by this problem. 2. Applying the following changes to prev_records_reads() results in correct code even with -O3 (apparently CC gets confused by pos_end pointing outside of the array boundaries): --- sql_select.cc.old 2009-06-17 23:51:42.016106355 +0400 +++ sql_select.cc 2009-06-17 23:52:01.652673736 +0400 @@ -5412,8 +5412,8 @@ static double prev_record_reads(JOIN *join, uint idx, table_map found_ref) { double found=1.0; - POSITION *pos_end= join->positions - 1; - for (POSITION *pos= join->positions + idx - 1; pos != pos_end; pos--) + POSITION *pos_end= join->positions; + for (POSITION *pos= join->positions + idx - 1; pos >= pos_end; pos--) { if (pos->table->table->map & found_ref) { However, we don't know if there is more code in the server affected by this bug. So the only reliable workaround for the time being is to change the configure defaults so that we use -O2 instead of -O3 when building with Sun Studio. One can also override the defaults by specifying CFLAGS="-O2" CXXFLAGS="-O2" explicitly. I will file a Sun Studio bug later when I reduce the testcase.
[18 Jun 2009 12:59]
Daniel Fischer
Patch looks fine. I agree with the intention and reasoning behind it. I'm assuming the patch was tested thoroughly. A bug report should be filed against Sun Studio.
[18 Jun 2009 13:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76556 2778 Alexey Kopytov 2009-06-18 Bug #41710: MySQL 5.1.30 crashes on the latest OpenSolaris 10 Change the default optimization level for Sun Studio to "-O1". This is a workaround for a Sun Studio bug (see bug #41710 comments for details): 1. Use $GCC instead of $ac_cv_prog_gcc to check for gcc, since the first one is the only documented way to do it. 2. Use $GXX instead of $ac_cv_prog_cxx_g to check for g++, since the latter is set to "yes" when the C++ compiler accepts "-g" which is the case for both g++ and CC. 3. When building with Sun Studio, set the default values for CFLAGS/CXXFLAGS to "-O1", since unlike GCC, Sun Studio interprets "-O" as "-xO3" (see the manual pages for cc and CC). @ configure.in 1. Use $GCC instead of $ac_cv_prog_gcc to check for gcc, since the first one is the only documented way to do it. 2. Use $GXX instead of $ac_cv_prog_cxx_g to check for g++, since the latter is set to "yes" when the C++ compiler accepts "-g" which is the case for both g++ and CC. 3. When building with Sun Studio, set the default values for CFLAGS/CXXFLAGS to "-O1", since unlike GCC, Sun Studio interprets "-O" as "-xO3" (see the manual pages for cc and CC).
[20 Jun 2009 6:38]
Alexey Kopytov
My report about the Sun Studio bug has been identified as a new bug and moved from bugs.sun.com to the internal bug tracker (bug id: 6853081). I'm duplicating the test case here, since the bug tracker is not visible from the outside: ---------- BEGIN SOURCE ---------- typedef struct { unsigned long long map; } TABLE; typedef struct { TABLE *table; } JOIN_TAB; typedef struct { double records_read; double unused1; JOIN_TAB *table; void *unused2; unsigned long long ref_depend_map; } POSITION; typedef struct { POSITION *positions; } JOIN; static double prev_record_reads(JOIN *join, unsigned int idx, unsigned long long found_ref) { double found= 1.0; POSITION *pos_end= join->positions - 1; for (POSITION *pos= join->positions + idx - 1; pos != pos_end; pos--) { if (pos->table->table->map & found_ref) { found_ref|= pos->ref_depend_map; if (pos->records_read) found*= pos->records_read; } } return found; } int main() { TABLE t1 = {1}, t2 = {2}, t3 = {4}; JOIN_TAB jt1 = {&t1}, jt2 = {&t2}, jt3 = {&t3}; POSITION positions[3] = {{2, 0, &jt1, 0, 0}, {1, 0, &jt2, 0, 1}, {1, 0, &jt3, 0, 1}}; JOIN join = {positions}; prev_record_reads(&join, 3, 1); return 0; } ---------- END SOURCE ---------- STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Assuming the attached test case is in testcase.c: $ CC -O3 -g testcase.c $ gdb ./a.out (gdb) r Starting program: /export/home/kaa/src/bug41710/testcase/a.out Program received signal SIGSEGV, Segmentation fault. 0x08050f60 in __1cRprev_record_reads6FpnEJOIN_IX_d_ () (gdb) disassemble __1cRprev_record_reads6FpnEJOIN_IX_d_ ... 0x08050f5d <__1cRprev_record_reads6FpnEJOIN_IX_d_+689>: mov 0xffffffb0(%edx),%eax 0x08050f60 <__1cRprev_record_reads6FpnEJOIN_IX_d_+692>: mov (%eax),%edi ... (gdb) i r eax 0x2 2 ecx 0x0 0 edx 0x8047c10 134511632 ebx 0x1 1 esp 0x8047b60 0x8047b60 ebp 0x8047b88 0x8047b88 esi 0x8047bb1 134511537 edi 0x1 1 eip 0x8050f60 0x8050f60 eflags 0x10287 66183 cs 0x43 67 ss 0x4b 75 ds 0x4b 75 es 0x4b 75 fs 0x0 0 gs 0x1c3 451 (gdb) p/x $edx + 0xffffffb0 $3 = 0x8047bc0 (gdb) p join->positions $2 = (POSITION *) 0x8047bd0 (gdb) q The program is running. Exit anyway? (y or n) y $ cc -O3 -g testcase.c $ ./a.out $
[7 Jul 2009 7:52]
Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:alexey.kopytov@sun.com-20090626135943-5tl682hvhkrno2og) (merge vers: 5.0.84) (pib:11)
[8 Jul 2009 13:30]
Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:alexey.kopytov@sun.com-20090626135952-u5t753l3jt3st14r) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 7:35]
Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:alexey.kopytov@sun.com-20090626135943-5tl682hvhkrno2og) (merge vers: 5.0.84) (pib:11)
[9 Jul 2009 7:37]
Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:alexey.kopytov@sun.com-20090626135952-u5t753l3jt3st14r) (merge vers: 5.1.37) (pib:11)
[10 Jul 2009 11:20]
Bugs System
Pushed into 5.4.4-alpha (revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source revid:alexey.kopytov@sun.com-20090626135959-wa4n96u00bw0llt2) (merge vers: 5.4.4-alpha) (pib:11)
[17 Jul 2009 3:22]
Paul DuBois
Noted in 5.0.84, 5.1.37, 5.4.4 changelogs. A workaround for a Sun Studio bug was instituted.
[12 Aug 2009 22:47]
Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 2009 2:02]
Paul DuBois
Ignore previous comment about 5.4.2.
[26 Aug 2009 13:45]
Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Aug 2009 13:46]
Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[26 Aug 2009 13:48]
Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)
[27 Aug 2009 16:32]
Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[8 Oct 2009 19:35]
Paul DuBois
The 5.4 fix has been pushed to 5.4.2.