MySQL Bugs: #10154: Server Crash on Client Connect

Bug #10154	Server Crash on Client Connect
Submitted:	25 Apr 2005 18:34	Modified:	23 Dec 2005 15:50
Reporter:	Sergio Salvatore	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	4.1.11	OS:	Linux (Linux/RedHat ES 3)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
When connecting from a client (in this case, PHP 4.3.10 with a statically compiled mysql library) the server crashes.  An excerpt from the error log follows:

 mysqld got signal 11;
 This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.
 We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.

 key_buffer_size=402653184
 read_buffer_size=2093056
 max_used_connections=252
 max_connections=500
 threads_connected=10
 It is possible that mysqld could use up to
 key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 2439212 K
 bytes of memory
 Hope that's ok; if not, decrease some variables in the equation.

 thd=0x62017260
 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...
 Cannot determine thread, fp=0xbfe3f5a8, backtrace may not be correct.
 Stack range sanity check OK, backtrace follows:
 0x808ae43
 0x82def78
 0x808fd9a
 0x8098078
 0x8098839
 0x82dc72c
 0x83060ba
 New value of fp=(nil) failed sanity check, terminating stack trace!
 Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it
 Trying to get some variables.
 Some pointers may be invalid and cause the dump to abort...
 thd->query at (nil) is invalid pointer
 thd->thread_id=10452006
 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash.

 Number of processes running now: 0
 050424 15:36:39 mysqld restarted
 050424 15:36:39 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
 050424 15:36:39 [ERROR] Do you already have another mysqld server running on port: 3306 ?
 050424 15:36:39 [ERROR] Aborting

 050424 15:36:39 [Note] /usr/local/mysql/bin/mysqld: Shutdown complete

 050424 15:36:39 mysqld ended

 050424 15:56:44 mysqld started
 050424 15:56:45 InnoDB: Database was not shut down normally!
 InnoDB: Starting crash recovery.
 InnoDB: Reading tablespace information from the .ibd files...
 InnoDB: Restoring possible half-written data pages from the doublewrite
 InnoDB: buffer...
 050424 15:56:45 InnoDB: Starting log scan based on checkpoint at
 InnoDB: log sequence number 18 1742500735.
 InnoDB: Doing recovery: scanned up to log sequence number 18 1742501175
 050424 15:56:45 InnoDB: Starting an apply batch of log records to the database...
 InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
 InnoDB: Apply batch completed
 InnoDB: Last MySQL binlog file position 0 666224384, file name ./Monk-bin.001422
 050424 15:56:46 InnoDB: Flushing modified pages from the buffer pool...
 050424 15:56:46 InnoDB: Started; log sequence number 18 1742501175
 /usr/local/mysql/bin/mysqld: ready for connections.
 Version: '4.1.11-standard-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Edition - Standard (GPL)

And, resolution of the stack trace yields this:

0x808ae43 handle_segfault + 423
0x82def78 pthread_sighandler + 184
0x808fd9a ip_to_hostname__FP7in_addrPUi + 474
0x8098078 check_connection__FP3THD + 212
0x8098839 handle_one_connection + 297
0x82dc72c pthread_start_thread + 220
0x83060ba thread_start + 4

The server is configured to check the /etc/hosts file before DNS (in /etc/nsswitch.conf) and all client machines are located /etc/hosts.  Additionally, the client machines are not available via DNS.

How to repeat:
This bug is not exactly repeatable, although it has happened more than once.  We are currently unable to replicate it.

Can you please try to run a debug build of mysqld with  '--core-file'
so that we can try to track down were exactly the IP resolution
function fails?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I am not able to run a debug build in our production environment, but I've continued to experience this bug at random times.

My current track is to switch to using DNS rather than a static hosts files on the machine in question.

Thanks.

/sergio

Hi Sergio,

Thank you for your bug report!

Can you tell me if you are using our RHEL 3 RPM binaries or the generic linux ones?  I think this may be related to thread stack size.
This can happen because the new glibc library requires a stack size greater than 128KB for gethostbyaddr() call. To fix the problem, start mysqld with the --thread-stack=192K option. (Use -O thread_stack=192K before MySQL 4.).  Our static binaries are not able to set this properly so I wanted to see if using our RHEL 3 binaries still had the problem for you.

Best Regards

Matthew,

Thanks for your response.  We're using the generic Linux binaries.  The advice is good -- fortunately we haven't been having any problems since switching to using DNS on the master server (which I guess circumvents calls to gethostbyaddr()).

In general, do you recommend using the ES 3 rpms that you provide over the generic binaries?  Is there a significant improvement in performance?

Thanks...

/sergio

Hi Sergio,

Yes, we recommend using the more specific ones when available.  There shouldn't
be much of any difference in performance but more for stability.

Best Regards

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Hi,

I'm pretty sure we've narrowed this problem down to calls of gethostbyname() when the entry is in the /etc/hosts file.  We've since switched the problematic server over to using DNS and we haven't seen that crash since...  Perhaps this is RedHat ES 3 specific?  Otherwise, it seems relatively odd.

Sincerely,

Sergio Salvatore

I was unable to repeat this issue on FC3 and Suse linux 10.0