Bug #42213 Check for "stack overrun" doesn't work, server crashes
Submitted: 20 Jan 2009 10:42 Modified: 29 Sep 2009 1:15
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Compiling Severity:S2 (Serious)
Version:5.0.72sp1,5.1.x OS:HP/UX (11.23 (IA64))
Assigned to: Joerg Bruehe CPU Architecture:Any

[20 Jan 2009 10:42] Joerg Bruehe
Description:
New bug in this version (5.0.72 worked fine) on this single platform,
optimized build only (debug build works):

=====
sp_notembedded                 [ fail ]

mysqltest: At line 243: query 'call bug10100p(255, @var)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1436...

The result from queries just before the failure was:
< snip >
create procedure bug10100pc(level int, lim int)
begin
declare lv int;
declare c cursor for select a from t3;
open c;
if level < lim then
select level;
fetch c into lv;
select lv;
update t3 set a=level+lv;
FLUSH TABLES;
call bug10100pc(level+1, lim);
else
select * from t3;
end if;
close c;
end|
set @@max_sp_recursion_depth=255|
set @var=1|
call bug10100p(255, @var)|

More results from queries before failure can be found in /data/mysqldev/tmp-200901121119-5.0.72sp1-23990/hpux11.23-ia64/test/mysql-test/var/log/sp_notembedded.log

Stopping All Servers
Restoring snapshot of databases
Saving core
Resuming Tests
=====

=====
subselect                      [ fail ]

mysqltest: At line 3057: query '$start $end' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...

The result from queries just before the failure was:
< snip >
5
4
3
2
1
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12

More results from queries before failure can be found in /PATH/mysql-test/var/log/subselect.log
=====

This part of the test did not change since 5.0.72, and none of the code backports should be relevant.
I suspect it is a change in compiler flags:

5.0.72 used
CC=cc CFLAGS="-g +O2 +DD64 +DSitanium2 -mt -AC99 -DPTHREAD_COMPAT_MODE -O1" CXX=aCC CXXFLAGS="-g +O2 +DD64 +DSitanium2 -mt -DPTHREAD_COMPAT_MODE -O1 -Aa" LDFLAGS=+DD64  ./configure --prefix=/usr/local/mysql --localstatedir=/usr/local/mysql/data --libexecdir=/usr/local/mysql/bin --with-comment="MySQL Enterprise Server (Commercial)" --with-server-suffix="-enterprise" --enable-thread-safe-client --enable-local-infile --with-pic --with-client-ldflags="-static" --with-mysqld-ldflags="-static" --with-zlib-dir=bundled --with-big-tables --with-libedit --with-innodb --without-ndbcluster --with-archive-storage-engine --with-blackhole-storage-engine --with-csv-storage-engine --without-example-storage-engine --with-federated-storage-engine --with-extra-charsets=complex

5.0.72sp1 uses
CC=cc CFLAGS="-g +O2 +DD64 +DSitanium2 -mt -AC99" CPPFLAGS=-DPTHREAD_COMPAT_MODE CXX=aCC CXXFLAGS="-g +O2 +DD64 +DSitanium2 -mt -Aa" LDFLAGS=+DD64  ./configure --prefix=/usr/local/mysql --localstatedir=/usr/local/mysql/data --libexecdir=/usr/local/mysql/bin --with-comment="MySQL Enterprise Server (Commercial)" --with-server-suffix="-enterprise" --enable-thread-safe-client --enable-local-infile --with-pic --with-client-ldflags="-static" --with-mysqld-ldflags="-static" --with-zlib-dir=bundled --with-big-tables --with-libedit --with-innodb --without-ndbcluster --with-archive-storage-engine --with-blackhole-storage-engine --with-csv-storage-engine --without-example-storage-engine --with-federated-storage-engine --with-extra-charsets=complex

The difference is that 5.0.72 had "+O2 ... -O1" where 5.0.72sp1 has just "+O2".

How to repeat:
Build on that platform, and run test "subselect".

Suggested fix:
Change the compiler flags back to the value used in 5.0.72 and rebuild.

Add comments to the test that show up in the result, to save the reader from guessing (wrong).
[20 Jan 2009 21:18] Joerg Bruehe
The flags were changed in the scripts we use for release builds.
For my check, I partially reverted this change, that check is still running.

I will grab the bug if that reversal solves the issue, but for now I must prepare for the case it might be a code problem.

None of these scripts are given to users, so in the optimistic case (flag change caused the problem) they should not be affected.

HP-UX seems to be no high-frequency platform for us (just guessing, based on support input and custom build requests), and the number of users building from source on HP-UX might be extremely low.
OTOH, "stack overrun" can easily be produced by having too complicated SQL statements, recursive functions, or other space-consuming actions, so it is essential that this is checked reliably at run time.
[21 Jan 2009 9:19] Joerg Bruehe
Ensuring that the CFLAGS and CXXFLAGS contain "+O2 ... -O1" and then running a rebuild has made these tests pass in the first 2 (of 4) builds.
I have not seen any negative effects yet, so this seems to be the proper fix.

This fix is now pushed into our build tools.

No need to document it:
1) The affected file is used only internally.
2) No affected binary has been published, as we discovered the problem in the build process.
[23 Jul 2009 11:06] Joerg Bruehe
Created work trees to fix it, from 5.0 up.

Progress will partly depend on the accessibility of the machines in Uppsala where we are just suffering power problems.
[27 Jul 2009 15:01] Joerg Bruehe
Test on HP-UX 11.23 (IA64) shows:

If using optimization level "+O2" when compiling the test program used in "configure", the compiler creates a binary which doesn't use recursion, and then the stack growth direction is wrongly reported as "to higher addresses".

If using only "+O1", or if preventing such optimization using a pragma, this does not happen, and stack growth direction is reported as "to lower addresses".

The easiest and safest fix would be to add the pragma to the test program, so I now check whether that causes any harm on any other host (especially IBM, their compilers are a bit picky).
[30 Jul 2009 15:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79672

2780 Joerg Bruehe	2009-07-30 [merge]
      Merge the fix for bug#42213 into 5.0-build.
[30 Jul 2009 15:26] Joerg Bruehe
Received approval via IRC.
[30 Jul 2009 15:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79684

2961 Joerg Bruehe	2009-07-30 [merge]
      Merge the fix for bug#42213 up into 5.1-build:
         Check for "stack overrun" doesn't work, server crashes
[30 Jul 2009 17:31] Joerg Bruehe
Patch is queued to the build team trees for 5.0 and 5.1.

Upmerge to higher version will follow soon.
[4 Aug 2009 13:31] Bugs System
Pushed into 5.0.85 (revid:joerg@mysql.com-20090730150354-h0c0cob2212sjs30) (version source revid:joerg@mysql.com-20090730150354-h0c0cob2212sjs30) (merge vers: 5.0.85) (pib:11)
[4 Aug 2009 13:32] Bugs System
Pushed into 5.1.38 (revid:joerg@mysql.com-20090730152409-ko4up2l6jceuszgf) (version source revid:joerg@mysql.com-20090730152409-ko4up2l6jceuszgf) (merge vers: 5.1.38) (pib:11)
[7 Aug 2009 12:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/80357

2857 Joerg Bruehe	2009-08-07 [merge]
      Upmerge from 5.1 to 5.4 ("azalea"):
      10 merge changesets (no code change) plus
      the fix for bug#42213.
[10 Aug 2009 22:35] Paul DuBois
Noted in 5.0.85, 5.1.38, 5.4.4 changelogs.

A test for stack growth failed on some platforms, leading to server
crashes.
[24 Aug 2009 13:54] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090824135126-2rngffvth14a8bpj) (version source revid:joerg@mysql.com-20090805185305-g7obi1157h314xk1) (merge vers: 5.4.4-alpha) (pib:11)
[29 Sep 2009 1:15] Paul DuBois
Noted in 5.0.87, 5.1.40, 5.4.3 changelogs.
[1 Oct 2009 5:59] Bugs System
Pushed into 5.1.39-ndb-6.3.28 (revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (version source revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (merge vers: 5.1.39-ndb-6.3.28) (pib:11)
[1 Oct 2009 7:25] Bugs System
Pushed into 5.1.39-ndb-7.0.9 (revid:jonas@mysql.com-20091001072547-kv17uu06hfjhgjay) (version source revid:jonas@mysql.com-20091001071652-irejtnumzbpsbgk2) (merge vers: 5.1.39-ndb-7.0.9) (pib:11)
[1 Oct 2009 13:25] Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (version source revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (merge vers: 5.1.39-ndb-7.1.0) (pib:11)
[5 Oct 2009 10:50] Bugs System
Pushed into 5.1.39-ndb-6.2.19 (revid:jonas@mysql.com-20091005103850-dwij2dojwpvf5hi6) (version source revid:jonas@mysql.com-20090930185117-bhud4ek1y0hsj1nv) (merge vers: 5.1.39-ndb-6.2.19) (pib:11)