Bug #25526 mysqld got signal 11 and crashes (0x746f6f72)
Submitted: 10 Jan 2007 16:50 Modified: 20 Mar 2007 16:03
Reporter: Matthias Albert Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.0.27 OS:Linux (SLES 9 SP3 (x86_64))
Assigned to: Assigned Account CPU Architecture:Any

[10 Jan 2007 16:50] Matthias Albert
Description:
I have the following problem an several (>20) absolut identic machines all running the same os, same patch level, same hardware, same mysql version and same application.

---begin-mysql-log---
...
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=33554432
read_buffer_size=2093056
max_used_connections=71
max_connections=1000
threads_connected=24
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 10268760 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x38638b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x44283fd8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x746f6f72
New value of fp=0x38638b0 failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x13f310a0 = UPDATE DSGV_Entries SET localHash = 53648072110 WHERE id = 8
thd->thread_id=15742
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
mysqld: my_new.cc:51: int __cxa_pure_virtual(): Assertion `"Pure virtual method called." == "Aborted"' failed.
Fatal signal 6 while backtracing

Number of processes running now: 0
070102 09:48:22  mysqld restarted
070102  9:48:23  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
...
---end-mysql-log---

---start-mysql-log---
...
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 2372317762; in additional pool allocated 16759808
Buffer pool size   131072
Free buffers       63685
Database pages     64243
Modified db pages  602
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 26312, created 37931, written 24952434
0.00 reads/s, 0.00 creates/s, 14.37 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 11367, id 1140881760, state: sleeping
Number of rows inserted 23512563, updated 15501288, deleted 22108620, read 388854341
11.69 inserts/s, 14.00 updates/s, 11.69 deletes/s, 524.78 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=33554432
read_buffer_size=2093056
max_used_connections=68
max_connections=1000
threads_connected=30
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 10268760 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x2aaab4b9bb60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x44ca8fd8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x746f6f72
Stack trace seems successful - bottom reached
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x2aab5ca3d150  is invalid pointer
thd->thread_id=19657
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
070105 07:11:14  mysqld restarted
070105  7:11:15  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
070105  7:11:15  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 46 3952089023.
InnoDB: Doing recovery: scanned up to log sequence number 46 3952867792
...
---end-mysql-log---

# Some informations:
#-------------------

Server version          5.0.27-standard
Protocol version        10
Connection              Localhost via UNIX socket
UNIX socket             /data/mysql/mysql.sock
Uptime:                 18 hours 23 min 30 sec

Threads: 29  Questions: 14591588  Slow queries: 3  Opens: 149  Flush tables: 1  Open tables: 113  Queries per second avg: 220.383
---
uname -a
Linux sli0000731 2.6.17.7 #1 SMP Thu Aug 3 14:11:16 CEST 2006 x86_64 x86_64 x86_64 GNU/Linux
---
cat /etc/SuSE-release 
SUSE LINUX Enterprise Server 9 (x86_64)
VERSION = 9
PATCHLEVEL = 3
---
rpm -qa |grep -i glibc
glibc-locale-2.3.3-98.73
glibc-2.3.3-98.73
---
free -m
             total       used       free     shared    buffers     cached
Mem:         16000      15936         64          0        158       7528
-/+ buffers/cache:       8249       7751
Swap:          976          0        976
---
df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              67G  4.9G   59G   8% /
tmpfs                 7.9G  8.0K  7.9G   1% /dev/shm
/dev/sda1              93M   60M   28M  69% /boot
/dev/sdb1             6.3T   44G  6.3T   1% /data
---

I cannot reproduce it exactly but as I wrote, it happens on over 20 machines abnormal 3-4 days, 12 hours and so on.

Another thing what I have tried is this (which is descriped here
http://dev.mysql.com/doc/refman/5.0/en/using-stack-trace.html)

resolve_stack_dump -s mysqld.sym -n mysqld.stack
0x746f6f72 _end + 1941432362

my.cnf will be attached.

If you need more information, let me know it :-).

Cheers from Germany,

 Matthias

resolved_dump...

How to repeat:
The bad news are: At the moment I have no idea how I can reproduce it.
The good news: I have the neeed machines and environment for the tests :)
[10 Jan 2007 16:52] Matthias Albert
my.cnf

Attachment: my.cnf (application/octet-stream, text), 2.12 KiB.

[10 Jan 2007 17:57] Heikki Tuuri
Matthias,

I am afraid that the x86_64 binary does not print a sensible stack trace. Can you compile a debug version of mysqld yourself, and run it inside gdb?

When it crashes, do:

(gdb) bt full

That usually tells where the bug is.

Regards,

Heikki
[16 Jan 2007 9:45] Matthias Albert
Hello Heikki,

thx for your answer and sorry for my delay (I was in holidays for 3 days).

I've compiled a debug version of mysql (from this source rpm: MySQL-5.0.27-0.glibc23.src.rpm and enabled debugging (--with-debug).

After installation of these new packages -> the problem seems to be gone :-). Probably it is a build problem of mysql.

I've build the new package on a machine which is similiar to my productive mysql machines.

So perhaps it is a, SLES 9 ServicePack 1-2-3 problem or so or anything else which isn't compatible with our hardware/os. However, at the moment, I've no machine (>20) which has crashed over the last 4 days.

And now? Closing this bug? Debugging the build process?

Cheers,

 Matthias
[16 Jan 2007 13:58] Heikki Tuuri
Matthias,

the bug might be something that is masked in the debug build. If you build a normal production binary, does it crash?

Regards,

Heikki
[17 Jan 2007 8:54] Matthias Albert
Hello Heikki,

this night I had two more mysql crashes WITH the debug version.

But the output of "resolve_stack_dump" isn't really helpful.

Host A:
resolve_stack_dump -s /usr/lib64/mysql/mysqld.sym -n /tmp/my.stack
0x746f6f72 _end + 1941324714

Host B:
resolve_stack_dump -s /usr/lib64/mysql/mysqld.sym  -n /tmp/my.stack
0x76f6f72 _end + 112608170

I will attach the mysql-error.log file

Any idea?

Cheers,

 Matthias
[17 Jan 2007 9:46] Matthias Albert
Today I will start 2-3 mysql daemons inside gdb so we can get a qualified dump if it crashse the next time.
[17 Jan 2007 13:28] Heikki Tuuri
Matthias,

yes, when it crashes inside gdb, do:

(gdb)bt full

It may help to print also other 'interesting' thread stack traces.

With:

(gdb) info threads

you see the threads. Then:

(gdb) thread 15

(gdb) bt full

prints the stack trace of thread 15.

You can also print the values of some 'interesting' variables  in the crash. With:

(gdb) frame 10

you can go to stack frame 10 of the current thread, and print some values:

(gdb) print *thd

etc. I cannot say beforehand what values are 'interesting' :).

Regards,

Heikki
[18 Jan 2007 9:24] Matthias Albert
Hello Heikki,

I have good news :). Now I have some backtraces from my mysql servers. I hope you can use this to identify the problem. If you need some more information -> let me know it.

Cheers,

 Matthias

---snip---
Host A:

Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1154328928 (LWP 15674)]
0x00002adafc97638f in __read_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt full
#0  0x00002adafc97638f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000000007a82e0 in vio_read ()
No symbol table info available.
#2  0x00000000005421e7 in net_realloc ()
No symbol table info available.
#3  0x0000000000542847 in my_net_read ()
No symbol table info available.
#4  0x000000000056d3a5 in do_command ()
No symbol table info available.
#5  0x000000000056de54 in handle_one_connection ()
No symbol table info available.
#6  0x00002adafc971b8f in start_thread () from /lib64/tls/libpthread.so.0
No symbol table info available.
#7  0x00002adafcfe44b3 in clone () from /lib64/tls/libc.so.6
No symbol table info available.
#8  0x0000000000000000 in ?? ()
No symbol table info available.
---snap---

---snip--- 
Host B:

Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1147705696 (LWP 10850)]
0x00002b76f33a238f in __read_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt full
#0  0x00002b76f33a238f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000000007a82e0 in vio_read ()
No symbol table info available.
#2  0x00000000005421e7 in net_realloc ()
No symbol table info available.
#3  0x0000000000542847 in my_net_read ()
No symbol table info available.
#4  0x000000000056d3a5 in do_command ()
No symbol table info available.
#5  0x000000000056de54 in handle_one_connection ()
No symbol table info available.
#6  0x00002b76f339db8f in start_thread () from /lib64/tls/libpthread.so.0
No symbol table info available.
#7  0x00002b76f3a104b3 in clone () from /lib64/tls/libc.so.6
No symbol table info available.
#8  0x0000000000000000 in ?? ()
No symbol table info available.
---snap---

Host C:
---snip---
Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1154328928 (LWP 15674)]
0x00002adafc97638f in __read_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt full
#0  0x00002adafc97638f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000000007a82e0 in vio_read ()
No symbol table info available.
#2  0x00000000005421e7 in net_realloc ()
No symbol table info available.
#3  0x0000000000542847 in my_net_read ()
No symbol table info available.
#4  0x000000000056d3a5 in do_command ()
No symbol table info available.
#5  0x000000000056de54 in handle_one_connection ()
No symbol table info available.
#6  0x00002adafc971b8f in start_thread () from /lib64/tls/libpthread.so.0
No symbol table info available.
#7  0x00002adafcfe44b3 in clone () from /lib64/tls/libc.so.6
No symbol table info available.
#8  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) n
Not confirmed.
---snap---
[18 Jan 2007 15:37] Heikki Tuuri
Matthias,

sorry, I should have given you this link:

http://dev.mysql.com/doc/refman/5.0/en/using-gdb-on-mysqld.html

those SIGUSR1 are completely normal events.

"
If you are using gdb 4.17.x or above on Linux, you should install a .gdb file, with the following information, in your current directory: 
set print sevenbit off
handle SIGUSR1 nostop noprint
handle SIGUSR2 nostop noprint
handle SIGWAITING nostop noprint
handle SIGLWP nostop noprint
handle SIGPIPE nostop
handle SIGALRM nostop
handle SIGHUP nostop
handle SIGTERM nostop noprint
"

You need the .gdb file to ignore those unnecessary stops.

You should also compile mysqld with:

CFLAGS="-O3 -g" CXXFLAGS="-O3 -g" ./configure

so that you get the debug info into the mysqld binary, and gdb can show a stack trace with rich information.

Regards,

Heikki
[30 Jan 2007 12:11] Matthias Albert
Hello Heikki, 
today we have catched the mysql with gdb in error-condition.

When i follow your guidance to debug the process, i have some ambiguity here and there.

1. Calling info threads results hundreds of thread-references. 
How can i identify the thread which is relating to our problem? 

2. Calling "where" and "bt full" results the same stack as descriptive as bug-entry at 18 Jan 10:24 from Matthias

3. The mysqld-bin doesn't contain the symbol-table. Instead of, the symbols are listed in /usr/lib64/mysql/mysqld.sym. 
How can i load the symbols table into gdb? "symbol-file" works only on binaries.

many thanks in advance. 
Stefan
[13 Feb 2007 17:14] Heikki Tuuri
Matthias,

I am sorry for the delay in replying.

When you do the info threads command, you will find various threads in various functions. You may find that the 'interesting' ones are stuck in a different function.

Usually, the 'interesting' threads have a deep call stack: 15 frames or more.

The brute force approach is to print all thread stacks.

Regards,

Heikki
[14 Mar 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[15 Mar 2007 19:23] Matthias Albert
GDB backtrace full

Attachment: gdb_mysql_bt_full.txt (text/plain), 55.89 KiB.

[15 Mar 2007 19:38] Matthias Albert
Hello Heikki, 
a long time later we had a crashed mysql again, at last.

I have attached the backtrace to the bug. 
The mysql-process is still in errror-condition with attached gdb.
If required, you can get more traces. 
Just send me some instructions, how to attent upon gdb to get the needed informations. 

Many thanks in advance, 
Stefan
[20 Mar 2007 15:26] MySQL Verification Team
heikki, i wonder if this is related to the new bug #27294
note the crash point is the same, and both cases have srv_locks_unsafe_for_binlog=1.
[20 Mar 2007 15:41] Heikki Tuuri
Shane,

indeed, this looks the same crash.

The bug is that there is no guarantee that prebuilt->trx is a sensible pointer in this function!

                    prebuilt->trx->isolation_level != TRX_ISO_SERIALIZABLE &&
Regards,

Heikki
[20 Mar 2007 16:03] Heikki Tuuri
Duplicate of http://bugs.mysql.com/bug.php?id=27294