MySQL Bugs: #9460: Mysql 4.1.10 on Linux 2.4.9-e.59smp crash

Bug #9460	Mysql 4.1.10 on Linux 2.4.9-e.59smp crash
Submitted:	29 Mar 2005 17:14	Modified:	27 Sep 2005 11:25
Reporter:	Tate Baumrucker	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	Version: '4.1.10a-standard-log'	OS:	Linux (RH Linux 2.4.9-e.59smp AS 2.1)
Assigned to:		CPU Architecture:	Any

Description:
Having stability issues w/ Mysql 4.1.10 (innodb) on RH Linux 
2.4.9-e.59smp running AS 2.1.  DB crashes randomly during execution of 
various SQL code (all using user variables and temporarly tables and 
mostly simple selects, inserts, joins, etc.).  Interestingly, same problem was occurring w/ MyISAM tables/engine (changed to innodb last week to see if the problem would go away). 

Using standard binaries Version: '4.1.10a-standard-log'

Database crashes about 3-5 times a day and usually recovers, but 
sometimes dies with:

[ERROR] Can't start server: Bind on TCP/IP port: Address already in 
use [ERROR] Do you already have another mysqld server running on port: 3306 ?
[ERROR] Aborting

relevant log is: 

050328 15:39:35  mysqld started
050328 15:39:36  InnoDB: Started; log sequence number 15 2379024139
/data/mysql/bin/mysqld: ready for connections.
Version: '4.1.10a-standard-log'  socket: '/tmp/mysql.sock'  port: 3306 
MySQL Community Edition - Standard (GPL) mysqld got signal 11; This 
could be because you hit a bug. It is also possible that this binary 
or one of the libraries it was linked against is corrupt, improperly 
built, or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help 
diagnose the problem, but since we have already crashed, something is 
definitely wrong and this may fail.

key_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=6
max_connections=100
threads_connected=3
It is possible that mysqld could use up to key_buffer_size + 
(read_buffer_size + sort_buffer_size)*max_connections =
802415 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x96225e90
Attempting backtrace. You can use the following information to find 
out where mysqld died. If you see no messages after this, something 
went terribly wrong...
Cannot determine thread, fp=0xbfe7f458, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x808b193
0x82debe8
0x80dbbf8
0x80b3caf
0x8081c2f
0x808ad8b
0x8098c56
0x82dc39c
0x8305d2a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html 
and follow instructions on how to resolve the stack trace. Resolved 
stack trace is much more helpful in diagnosing the problem, so please 
do resolve it Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil)  is invalid pointer
thd->thread_id=3549
The manual page at http://www.mysql.com/doc/en/Crashing.html contains 
information that should help you find out what is causing the crash.

Number of processes running now: 0
050328 17:22:04  mysqld restarted
050328 17:22:04  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the 
doublewrite
InnoDB: buffer...
050328 17:22:04  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 15 2433246331.
InnoDB: Doing recovery: scanned up to log sequence number 15 
2435610788
050328 17:22:04  InnoDB: Starting an apply batch of log records to the 
database...
InnoDB: Progress in percents: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
41 42 43 44
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
91 92 93 94
95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 10782286, file name
./performdb-1-bin.000111
050328 17:22:06  InnoDB: Flushing modified pages from the buffer pool...
050328 17:22:06  InnoDB: Started; log sequence number 15 2435610788
/data/mysql/bin/mysqld: ready for connections.
Version: '4.1.10a-standard-log'  socket: '/tmp/mysql.sock'  port: 3306 
MySQL Community Edition - Standard (GPL)

And the stack trace:

0x808b193 handle_segfault + 423
0x82debe8 pthread_sighandler + 184
0x80dbbf8 write__9MYSQL_LOGP9Log_event + 1564
0x80b3caf close_temporary_tables__FP3THD + 247
0x8081c2f cleanup__3THD + 119
0x808ad8b end_thread__FP3THDb + 19
0x8098c56 handle_one_connection + 950
0x82dc39c pthread_start_thread + 220
0x8305d2a thread_start + 4

Anyone have an idea as the the meaning of the trace?  
Any help greatly appreciated.  What other information can I provide?
Thinking of trying standard binaries ....
Thanks,
Tate

----------snip------------

ADDITIONAL INFO (from Jocelyn Fournier [joc@presence-pc.com]):

Hi,

This crash seems to occur when mysql tries to write in the binlog the DROP /*!40005 TEMPORARY */ TABLE IF EXISTS ... query which is issued when the temporary tables are automatically destroyed when the current thread is ending.

Question for internal :

Just curious, is it possible to have a thd containing both user temporary tables and "internal" temporary tables ? Because in this case in the close_temporary_tables function, found_user_tables will be set to
1 and the DROP /*!40005 TEMPORARY */ TABLE IF EXISTS statement will be written to the binlog with the internal temp table name.

Thanks !
Jocelyn

How to repeat:
Not a consistant crash behaviour.  We run identical SQL queries via cron in 15 minute cycles.  Crashes occur once every couple hours.

Suggested fix:
Potentially a BINLOG problem? I could turn off the binlog, but that will stop replication (this server replicates to a slave).  I could run the master and slave as ships in the night, but worry about data consistancy. Not a great fix.  

Currently, I'm running the server and have a 5 minute cron job to check the mysql process and start it if it doesn't exist.  This seems to solve the problem of a crash w/o a restart ([ERROR] Can't start server: Bind on TCP/IP port: Address already in 
use [ERROR] Do you already have another mysqld server running on port: 3306 ?
[ERROR] Aborting)

Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Not sure what else I can provide in the way of information.  Looks like this ticket is closed because I can't predictably repeat the problem?  Is the stack trace not enough information to specify the operation causing the server to crash?  Can you provide any insight into that?  Would you like snippets of my binlog last written before the crash?  What information is lacking?

From: Jocelyn Fournier <joc@presence-pc.com>
To: "C. Tate Baumrucker" <tate.baumrucker@callisma.com>
CC: internal@lists.mysql.com
Subject: Re: Mysql 4.1.10 on Linux 2.4.9-e.59smp crash (UNCLASSIFIED)
Date: Tue, 29 Mar 2005 01:26:00 +0000

Hi,

This crash seems to occur when mysql tries to write in the binlog the
DROP /*!40005 TEMPORARY */ TABLE IF EXISTS ... query which is issued
when the temporary tables are automatically destroyed when the current
thread is ending.

Question for internal :

Just curious, is it possible to have a thd containing both user
temporary tables and "internal" temporary tables ? Because in this case  
in the close_temporary_tables function, found_user_tables will be set to
1 and the DROP /*!40005 TEMPORARY */ TABLE IF EXISTS statement will be
written to the binlog with the internal temp table name.

Thanks !
  Jocelyn

Hello Tate,
don't worry, you needn't provide anything more so far. There indeed is something fishy in what you show, even though we cannot reproduce it, and as you know we have another user experiencing it (Jocelyn Fournier). The stack trace is helpful as it says in which function the crash occurs. Just why it occurs, is a mystery. Using another binary is not likely to help much.

Ok. Thanks for the update.  Let me know if I can provide additional information.
Tate

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=134217728
read_buffer_size=1044480
max_used_connections=69
max_connections=100
threads_connected=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 335471 K
bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

--------------------------
Ver 14.7 Distrib 4.1.11
OS: FreeBSD 5.3 Release

4.1.11-standard-log
Red Hat Linux release 9 (Shrike)
2.4.20-8smp

I've got the same bug. 
100% identical. I tried without/with InnoDB support.
mysqld crashes randomly without any kind of load.
The same problem exists in 15 diferent mysql replication machines, with same OS and indentical configurations  (only server-id changed). 
Master server is free of this problem by now.
It happens 5-6 times per day.

Today's ans yesterday's backtraces:

ey_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=89
max_connections=200
threads_connected=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1211614 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x5b73bb60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfd1eaa8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x808d597
0x82e6e08
0x830353a
0x8302c91
0x82b86de
0x82b6b12
0x82841a3
0x80ff747
0x80f87f8
0x80f825d
0x80ccd5a
0x80c082f
0x80c0d20
0x80bde66
0x809db4a
0x80a221f
0x809c672
0x809c034
0x809b6e7
0x82e45bc
0x830de8a
New value of fp=(nil) failed sanity check, terminating stack trace!

key_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=167
max_connections=200
threads_connected=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1211614 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x5d21baa0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfbfee28, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x808d597
0x82e6e08
0x8302e25
0x8302c91
0x82b86de
0x8089e82
0x8089888
0x80b8954
0x80b8811
0x809d961
0x80a221f
0x809c672
0x809c034
0x809b6e7
0x82e45bc
0x830de8a
New value of fp=(nil) failed sanity check, terminating stack trace!

key_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=201
max_connections=200
threads_connected=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1211614 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x5f41baa0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfd5f4f8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x808d597
0x82e6e08
0x8302e25
0x8302c91
0x8092282
0x8092698
0x809adc4
0x809b5a9
0x82e45bc
0x830de8a
New value of fp=(nil) failed sanity check, terminating stack trace!

key_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=101
max_connections=100
threads_connected=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 802415 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x5b80b9c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfe3ed98, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x808d597
0x82e6e08
0x8302e25
0x8302c91
0x82b86de
0x8089e82
0x8089a55
0x80be89d
0x80c0cd4
0x80bde66
0x809db4a
0x80a221f
0x809c672
0x809c034
0x809b6e7
0x82e45bc
0x830de8a
New value of fp=(nil) failed sanity check, terminating stack trace!

I've been able to reduce the frequency of crash by:
1) going to innodb vs. myisam
2) reducing the overall db size (by pruning old data): went from 70GB to 19GB
3) configuring multiple 1GB (20+) innodb_data_file_path files via my.cnf to distribute the db over multiple files
4) configuring innodb_flush_method = O_DSYNC via my.cnf
Not sure what specifically helped, but I've now gone about 3 days w/o a crash (probably just jinxed myself) ;)
We'll see.
Tate

Have you any other information to add? Have you tried to use recent versions of MySQL (4.1.13, 4.1.14)?

Just would like to remind everybody involved about this P1 bug.

The same... no evolution about this.
random crashes ( hour, day, etc, etc, see my last message ). 
Unique solution:  read every minute .err log, and restart mysqld if is necessary. 
This is rude, but is effective.

But good notice:  I have totally uptime 4.1.14-standard log (RHEL4 2.6.9-5 kernel) 100% for 2 days, with the same load level, that 4.1.11-standard-log RH9 2.4.20-smp that causing problems.

Hi all...

in my last message:
"But good notice:  I have totally uptime 4.1.14-standard log (RHEL4 2.6.9-5
kernel) 100% for 2 days, with the same load level, that 4.1.11-standard-log RH9
2.4.20-smp that causing problems."

Today, for one week that runs without problems of any type!!!!!!. 
If it continues working correctly during one week more, i will inform you.
But i begin to think in this moment, that update to 4.1.14 is a solution.

Hi guys!!!!!

run perfect, 100% performance for 2 weeks.
definitively it is the solution. 
I recommend to you update to this version.

Thanks all, and good luck.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

resolved by Issue ID: 6625  

"The problem was in the thread cleanup process. At the end of
this process close_temporary_tables() tries to log DROP temp_table_name
and MYSQL_LOG::write() adds all user variables to the event (just in case).
When doing so the memory is already freed, but the list of user variables is not.
I am guessing that you were seeing this crash when temporary tables were used
during the connection. This information is logged to the binary log for replication.

Best Regards,

Matt Lord, Support Engineer
MySQL AB, www.mysql.com"

Fix will be included in forthcoming 4.1.15 release.  Until then, 
ftp.mysql.com/pub/mysql/download/mysql-standard-4.1.15-nightly-20050919-pc-linux-i686.tar.gz