MySQL Bugs: #15922: Test case for bug#10100 fails with wrong error on Solaris 8 Sparc 64 bit

Bug #15922	Test case for bug#10100 fails with wrong error on Solaris 8 Sparc 64 bit
Submitted:	22 Dec 2005 2:24	Modified:	12 Oct 2007 18:52
Reporter:	Kent Boortz	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Stored Routines	Severity:	S2 (Serious)
Version:	5.0.17 5.0.18-pre 5.1.4	OS:	Solaris (Solaris 8 64-bit)
Assigned to:		CPU Architecture:	Any

Description:
Test case 'sp' fails with

  At line 4738: query 'call bug10100p(255, @var)' failed with wrong errno 2013 instead of 1436...

How to repeat:
Build and run the test case on Solaris 8 64-bit

I got the same error when I compiled it on Solaris 10 x64(amd64). 

Kent, 

What compiler did you use to compile 5.0.18 with the "sp" test failed with the reported error, and how's the progress of this bug in the later version of MySQL(after 5.0.18) on Solaris platform?

Thanks for your information!

In addition, it will be great to know the information on 
what 2013 and 1436 error messages are and why it is supposed to fail 
exactly with 1436?

Here is the source snippet which causes the error:

set @@max_sp_recursion_depth=255|
set @var=1|
#disable log because error about stack overrun contains numbers which
#depend on a system
-- disable_result_log
-- error ER_STACK_OVERRUN_NEED_MORE
call bug10100p(255, @var)|
-- error ER_STACK_OVERRUN_NEED_MORE
call bug10100pt(1,255)|
-- error ER_STACK_OVERRUN_NEED_MORE
call bug10100pv(1,255)|
-- error ER_STACK_OVERRUN_NEED_MORE
call bug10100pd(1,255)|
-- error ER_STACK_OVERRUN_NEED_MORE
call bug10100pc(1,255)|
-- enable_result_log

If I change the value

call bug10100p(255, @var)|

to, say,

call bug10100p(5, @var)|

the test passes.  It is not ok to the harness too, as it expects the 
test to fail with exact errno 1436, why?

After the further investigation, we found the reason of "mysql" crash in this "sp" test case as bellow:

The SQL query in sp test suppose to overflow small stack allocated for 
the thread and is waiting for correct error STACK_OVERRUN. Instead 
mysqld drops a core file. Searching how it's supposed to work I found 
that MySQL is using the following function to check whether we still 
have enough stack:

bool check_stack_overrun(THD *thd, long margin,
                         char *buf __attribute__((unused)))
{
  long stack_used;
  DBUG_ASSERT(thd == current_thd);
  if ((stack_used=used_stack(thd->thread_stack,(char*) &stack_used)) >=
      (long) (thread_stack - margin))
  {
    sprintf(errbuff[0],ER(ER_STACK_OVERRUN_NEED_MORE),
            stack_used,thread_stack,margin);
    my_message(ER_STACK_OVERRUN_NEED_MORE,errbuff[0],MYF(0));
    thd->fatal_error();
    return 1;
  }
#ifndef DBUG_OFF
  max_stack_used= max(max_stack_used, stack_used);
#endif
  return 0;
}

where used_stack() macro is defined above this function as:

#if STACK_DIRECTION < 0
#define used_stack(A,B) (long) (A - B)
#else
#define used_stack(A,B) (long) (B - A)
#endif

You can see that it depends on STACK_DIRECTION value which is set in 
config.h by ./configure script. At optimizations of -xO4 and above using Sun's compiler the 
value of STACK_DIRECTION is set incorrectly to 1 (above zero) meaning 
that stack grows to higher addresses which is wrong. I've checked 
where's an error and found the following test in ./configure used to 
determine direction of stack growth:

 int find_stack_direction ()
 {
   static char *addr = 0;
   auto char dummy;
   if (addr == 0)
     {
       addr = &dummy;
       return find_stack_direction ();
     }
   else
     return (&dummy > addr) ? 1 : -1;
 }

You can see that the test checks addresses of local variable when the 
function is called for the first time and again for the second time. 
Basically this should give correct answer but on higher levels of 
optimizations our compiler decides to inline the call to 
find_stack_direction() because this should definitely increase the 
performance. It is strange for me that GCC doesn't inline this call, but I think it will probably do inline as well some day and then it would run into the same problem: Inlining makes both dummy variables (one for the first call of function and another one for recursive call) to be allocated on the same stack frame. In this condition noone can be sure about the placement of these two variables. So in order to resolve this problem completely, it will need to change the current way how the ./configure determines stack growth direction.

This is also an issue on hpux11.23/IA64 with aCC and optimization at +O2.

In the Solaris case a

  #pragma no_inline(find_stack_direction)

after the function declaration, before main, solves the
problem. But it is better to find a more permanent solution.

The test used in Ruby configure.in, on the page
http://www.opensource.apple.com/darwinsource/Current/ruby-22.2.2/ruby/configure.in
is confirmed to work for Solaris 8 "Sun C 5.6 2004/07/15"
with or without the "volatile" declaration.

Another solution is to mix static knowledge with
a test, like in "libsigsegv"
http://cl-debian.alioth.debian.org/repository/pvaneynd/libsigsegv-upstream/configure.ac

This is no longer relevant.