Bug #41710 MySQL 5.1.30 crashes on the latest OpenSolaris 10
Submitted: 23 Dec 2008 11:51 Modified: 17 Jul 2009 3:22
Reporter: Vladimir Kolesnikov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.1.30 OS:Solaris
Assigned to: Alexey Kopytov CPU Architecture:Any

[23 Dec 2008 11:51] Vladimir Kolesnikov
Description:
MySQL 5.1.30 built from sources using Sun compiler crashes on some tests. There are also result mismatches in tests

How to repeat:
./configure --prefix=<your-prefix-here> --with-extra-charsets=complex --enable-thread-safe-client --enable-local-infile --enable-assembler --disable-shared
make
make install

./mysql-test-run gives the following summary:

Stopping All Servers
Failed 12/903 tests, 98.67% were successful.

The log files in var/log may give you some hint
of what went wrong.
If you want to report this error, please read first the documentation at
http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
The servers were restarted 210 times
Spent 824.476 of 1761 seconds executing testcases
mysql-test-run: WARNING: Got errors/warnings while running tests, please examine "/export/home/vkolesn/projects/test/mysql-test/var/log/warnings" for details.

mysql-test-run in default mode: *** Failing the test(s): main.greedy_optimizer main.insert_notembedded main.join main.join_crash main.join_nested main.join_outer main.kill main.limit main.order_by main.subselect main.type_blob main.user_var
mysql-test-run: *** ERROR: there were failing test cases
[23 Dec 2008 11:52] Vladimir Kolesnikov
err, a note - it's x86 hardware, vmware machine 1GB vm ram
[23 Dec 2008 23:08] MySQL Verification Team
Latest bzr source on OpenSolaris 2008.11 ended with the below error:

ndb.ndb_index                  [ pass ]           6010
ndb.ndb_index_ordered          [ fail ]

--- /export/home/miguel/dbs/5.1/mysql-test/suite/ndb/r/ndb_index_ordered.result	2008-12-14 05:24:35.299112198 +0300
+++ /export/home/miguel/dbs/5.1/mysql-test/suite/ndb/r/ndb_index_ordered.reject	2008-12-24 01:04:32.850211411 +0300
@@ -643,7 +643,7 @@
 begin;
 select count(*) from t1;
 count(*)
-2
+0
 ALTER TABLE t1 ADD COLUMN c int;
 select a from t1 where b = 2;
 a

mysqltest: Result content mismatch

Aborting: ndb.ndb_index_ordered failed in default mode. 
To continue, re-run with '--force'.
Stopping All Servers
mysql-test-run: WARNING: Forcing kill of process 6721
miguel@skybr.net:~/dbs/5.1/mysql-test$
[24 Dec 2008 8:23] Vladimir Kolesnikov
Miguel,

couple of questions 
1. which compiler are you using and what were compilation options?
2. I see that your tests stopped after the first error, what if you use the "--force" option for mysql-test-run.pl

Thanks,
[24 Dec 2008 13:00] MySQL Verification Team
I compiled the source with GCC compiler and I did the test with the aim to verify that the errors are independent of compiler but unlucky I've not applied the option --force. I will test again with --force.
[29 Dec 2008 15:03] MySQL Verification Team
Thank you for the bug report. Not repeatable on Linux however:

Stopping All Servers
All 1258 tests were successful.
The servers were restarted 360 times
Spent 4307.718 of 7067 seconds executing testcases

miguel@hegel:~/dbs/5.1/mysql-test$
[16 Jan 2009 14:06] Sveta Smirnova
Re-verified with Sun Studio Compiler. All tests listed in the initial description fail in my case too. I complied with ./configure --with-plugins=max-no-ndb Tests in binary packages which we distribute don't fail.
[16 Jan 2009 14:09] Sveta Smirnova
To avoid misunderstanding: verified failure of tests main.greedy_optimizer
main.insert_notembedded main.join main.join_crash main.join_nested main.join_outer
main.kill main.limit main.order_by main.subselect main.type_blob main.user_var with Sun Studio compiler on latest OpenSolaris.
[2 Mar 2009 11:49] Sveta Smirnova
test logs

Attachment: bug41710.log (application/octet-stream, text), 18.55 KiB.

[2 Mar 2009 11:51] Sveta Smirnova
Re-tested with 5.1.31. Problem exists.

Build/compile logs contain no interesting information: MySQL was configured with options ./configure --prefix=PATH, all compiled fine.

Test logs indicate server crashed during tests.
[2 Mar 2009 11:53] Sveta Smirnova
config.log

Attachment: bug41710.config.log.gz (application/x-gzip, text), 35.81 KiB.

[4 May 2009 11:32] Sveta Smirnova
Bug #44538 was marked as duplicate of this one.
[10 Jun 2009 17:18] Kristofer Pettersson
My modest investigation on this bug indicated that this was a problem with the build process. Somehow 32 and 64 code is mixed and my explicitly specify this by adding CXXFLAGS="-m64" and CFLAGS="-m64" the crash can be avoided.

Example that should work without a crash:
./configure CFLAGS="-Xa -m32 -mt" CXXFLAGS="-m32 -mt" --with-plugins=myisam,innobase
[18 Jun 2009 7:55] Alexey Kopytov
This is a result of a Sun Studio compiler bug. All failing test cases are crashing with SIGSEGV in prev_record_reads(). Here's the relevant code lines from there:

static double 
prev_record_reads(JOIN *join, uint idx, table_map found_ref) 
{
  POSITION *pos_end= join->positions - 1; 
  for (POSITION *pos= join->positions + idx - 1; pos != pos_end; pos--) 
  { 
    if (pos->table->table->map & found_ref) 
      ...
   }
}

CC at the -O3 optimization level unrolls the 'for' loop, but generates broken code since we end up trying to access pos_end->table->table->map, that is memory outside of the join->positions array, and this is where the crash occurs.

A couple of observations:

1. Lowering the optimization level to -O2 results in correct code being generated (in fact, in this case prev_record_reads() is inlined by the compiler without loop unrolling). Our release binaries are built with -O2, so they are not affected by this problem.

2. Applying the following changes to prev_records_reads() results in correct code even with -O3 (apparently CC gets confused by pos_end pointing outside of the array boundaries):

--- sql_select.cc.old   2009-06-17 23:51:42.016106355 +0400
+++ sql_select.cc       2009-06-17 23:52:01.652673736 +0400
@@ -5412,8 +5412,8 @@ static double
 prev_record_reads(JOIN *join, uint idx, table_map found_ref)
 {
   double found=1.0;
-  POSITION *pos_end= join->positions - 1;
-  for (POSITION *pos= join->positions + idx - 1; pos != pos_end; pos--)
+  POSITION *pos_end= join->positions;
+  for (POSITION *pos= join->positions + idx - 1; pos >= pos_end; pos--)
   {
     if (pos->table->table->map & found_ref)
     {

However, we don't know if there is more code in the server affected by this bug. So the only reliable workaround for the time being is to change the configure defaults so that we use -O2 instead of -O3 when building with Sun Studio. One can also override the defaults by specifying CFLAGS="-O2" CXXFLAGS="-O2" explicitly.

I will file a Sun Studio bug later when I reduce the testcase.
[18 Jun 2009 12:59] Daniel Fischer
Patch looks fine. I agree with the intention and reasoning behind it. I'm assuming the patch was tested thoroughly. 

A bug report should be filed against Sun Studio.
[18 Jun 2009 13:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76556

2778 Alexey Kopytov	2009-06-18
      Bug #41710: MySQL 5.1.30 crashes on the latest OpenSolaris 10 
       
      Change the default optimization level for Sun Studio to "-O1". 
      This is a workaround for a Sun Studio bug (see bug #41710 
      comments for details): 
       
      1. Use $GCC instead of $ac_cv_prog_gcc to check for gcc, since 
      the first one is the only documented way to do it. 
       
      2. Use $GXX instead of $ac_cv_prog_cxx_g to check for g++, 
      since the latter is set to "yes" when the C++ compiler accepts 
      "-g" which is the case for both g++ and CC. 
       
      3. When building with Sun Studio, set the default values for 
      CFLAGS/CXXFLAGS to "-O1", since unlike GCC, Sun Studio 
      interprets "-O" as "-xO3" (see the manual pages for cc and CC). 
     @ configure.in
        1. Use $GCC instead of $ac_cv_prog_gcc to check for gcc, since 
        the first one is the only documented way to do it. 
         
        2. Use $GXX instead of $ac_cv_prog_cxx_g to check for g++, 
        since the latter is set to "yes" when the C++ compiler accepts 
        "-g" which is the case for both g++ and CC. 
         
        3. When building with Sun Studio, set the default values for 
        CFLAGS/CXXFLAGS to "-O1", since unlike GCC, Sun Studio 
        interprets "-O" as "-xO3" (see the manual pages for cc and CC).
[20 Jun 2009 6:38] Alexey Kopytov
My report about the Sun Studio bug has been identified as a new bug and moved from bugs.sun.com to the internal bug tracker (bug id: 6853081).

I'm duplicating the test case here, since the bug tracker is not visible from the outside:

---------- BEGIN SOURCE ----------
typedef struct {
  unsigned long long map;
} TABLE;

typedef struct {
  TABLE *table;
} JOIN_TAB;

typedef struct {
  double records_read;
  double unused1;
  JOIN_TAB *table;
  void *unused2;
  unsigned long long ref_depend_map;
} POSITION;

typedef struct {
  POSITION *positions;
} JOIN;

static double
prev_record_reads(JOIN *join, unsigned int idx, unsigned long long found_ref)
{
  double found= 1.0;
  POSITION *pos_end= join->positions - 1;

  for (POSITION *pos= join->positions + idx - 1; pos != pos_end; pos--)
  {
    if (pos->table->table->map & found_ref)
    {
      found_ref|= pos->ref_depend_map;
      if (pos->records_read)
        found*= pos->records_read;
    }
  }
  return found;
}

int main()
{
  TABLE t1 = {1}, t2 = {2}, t3 = {4};
  JOIN_TAB jt1 = {&t1}, jt2 = {&t2}, jt3 = {&t3};
  POSITION positions[3] = {{2, 0, &jt1, 0, 0}, {1, 0, &jt2, 0, 1}, {1, 0, &jt3, 0, 1}};
  JOIN join = {positions};

  prev_record_reads(&join, 3, 1);

  return 0;
}
---------- END SOURCE ----------

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Assuming the attached test case is in testcase.c:

$ CC -O3 -g testcase.c
$ gdb ./a.out
(gdb) r
Starting program: /export/home/kaa/src/bug41710/testcase/a.out

  Program received signal SIGSEGV, Segmentation fault.
0x08050f60 in __1cRprev_record_reads6FpnEJOIN_IX_d_ ()

(gdb) disassemble __1cRprev_record_reads6FpnEJOIN_IX_d_
...
0x08050f5d <__1cRprev_record_reads6FpnEJOIN_IX_d_+689>: mov    0xffffffb0(%edx),%eax
0x08050f60 <__1cRprev_record_reads6FpnEJOIN_IX_d_+692>: mov    (%eax),%edi
...
(gdb) i r
eax            0x2      2
ecx            0x0      0
edx            0x8047c10        134511632
ebx            0x1      1
esp            0x8047b60        0x8047b60
ebp            0x8047b88        0x8047b88
esi            0x8047bb1        134511537
edi            0x1      1
eip            0x8050f60        0x8050f60
eflags         0x10287  66183
cs             0x43     67
ss             0x4b     75
ds             0x4b     75
es             0x4b     75
fs             0x0      0
gs             0x1c3    451

(gdb) p/x $edx + 0xffffffb0
$3 = 0x8047bc0

(gdb) p join->positions
$2 = (POSITION *) 0x8047bd0

(gdb) q
The program is running.  Exit anyway? (y or n) y

$ cc -O3 -g testcase.c
$ ./a.out
$
[7 Jul 2009 7:52] Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:alexey.kopytov@sun.com-20090626135943-5tl682hvhkrno2og) (merge vers: 5.0.84) (pib:11)
[8 Jul 2009 13:30] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:alexey.kopytov@sun.com-20090626135952-u5t753l3jt3st14r) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 7:35] Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:alexey.kopytov@sun.com-20090626135943-5tl682hvhkrno2og) (merge vers: 5.0.84) (pib:11)
[9 Jul 2009 7:37] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:alexey.kopytov@sun.com-20090626135952-u5t753l3jt3st14r) (merge vers: 5.1.37) (pib:11)
[10 Jul 2009 11:20] Bugs System
Pushed into 5.4.4-alpha (revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source revid:alexey.kopytov@sun.com-20090626135959-wa4n96u00bw0llt2) (merge vers: 5.4.4-alpha) (pib:11)
[17 Jul 2009 3:22] Paul DuBois
Noted in 5.0.84, 5.1.37, 5.4.4 changelogs.

A workaround for a Sun Studio bug was instituted.
[12 Aug 2009 22:47] Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 2009 2:02] Paul DuBois
Ignore previous comment about 5.4.2.
[26 Aug 2009 13:45] Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Aug 2009 13:46] Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[26 Aug 2009 13:48] Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)
[27 Aug 2009 16:32] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[8 Oct 2009 19:35] Paul DuBois
The 5.4 fix has been pushed to 5.4.2.