Bug #48349 | Mysql startup is extremely long after crash | ||
---|---|---|---|
Submitted: | 27 Oct 2009 13:58 | Modified: | 9 Jul 2011 18:19 |
Reporter: | Michael Skulsky | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 5.1.39 | OS: | Linux (CentOS 4.5) |
Assigned to: | CPU Architecture: | Any |
[27 Oct 2009 13:58]
Michael Skulsky
[27 Oct 2009 13:59]
Michael Skulsky
my.cnf from the server in question
Attachment: my.cnf (application/octet-stream, text), 1.91 KiB.
[27 Oct 2009 14:22]
Valeriy Kravchuk
Looks like a duplicate of bug #29847. Please, check.
[27 Oct 2009 14:33]
Michael Skulsky
From what I understand, #29847 is about crash recovery time, while in my case crash recovery is pretty fast (note "ready for connections" in the error log), but still connections are not accepted.
[28 Oct 2009 4:34]
Valeriy Kravchuk
Did you have any transactions running when you killed mysqld and mysqld_safe?
[28 Oct 2009 6:59]
Michael Skulsky
No transactions. There was no load on this server at all.
[18 Nov 2009 19:29]
Valeriy Kravchuk
Please, check with a newer version, 5.1.41, and inform about the results.
[19 Dec 2009 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[9 Feb 2011 15:31]
Vladimir Preobrazhenskiy
I was able to reproduce that buggy behavior and here are my observations: Summary table: Server version; Could reproduce or not ------------------------------------------- MySQL 5.0.45-log Source distribution - Bug is Not observed MySQL 5.1.39 - Observed MySQL 5.1.48-community-log - Observed MySQL 5.5.8-log - Observed Percona 5.1.53-rel11.7-log - Not observed ------------------------------------------- I took the my.cnf from Michael (2nd message here) and modified only appropriate paths specification to point to a separate directories for all different versions. Also I had to replace "default-table-type" with "default-storage-engine" for 5.5. In short, I tried to reproduce the same configuration for different version, and run all tests on the same machine. ---- CPU: 2 x AMD Athlon(tm) II X2 240 Processor; RAM: 4Gb # uname -srvmpio Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux ---- To fill up the storage I used some 1.6 Gb dump file that was at hand, in fact, nothing too specific. It contains about 2600 tables, all of them of InnoDB type. Being imported it generates about 6Gig of (table data+index data), according to “show table status” command output. I imported that dump three times with different schema names, hence getting about 18 Gig of InnoDB data. In my experiments I observed about 2 min 20s delay between restarting the crashed server and appearing of the "mysql>" prompt. Also I re-run the test against 5.5 configuration with the same dump being loaded 6 times. I obtained about 4.5 minutes delay. So twice the data gives about twice the time for me, and Michael speaks about 400 Gig and 2.5 hours, which seems to be the same order of magnitude time per gigabyte :) What's interesting, MySQL doesn't reject incoming connection during the delay, but just "freeses" them. This behavior appears both in case the tcp network connection and linux socket connection. Also I see that "mysqladmin ping" command (which is used in /etc/init.d/mysql script) also appears stalled. Also I would note some strange behavior. If I crash and restart the server, then walk around for several minutes, then try to connect - I also get the same 2 m 20 s delay as when I reconnect immediately. So it seems that incoming connection triggers some long process inside MySQL rather than restart itself. I'd be glad to provide more details for the bug. Maybe that would be a sound idea to "kill -11" the server once more in this "stalled" state and obtain the core dump, to look what it's doing...
[11 Feb 2011 12:41]
Vladimir Preobrazhenskiy
I guess I understood what’s happening in this scenario. Initial my.cnf file contains this: log_output = file,table log-warnings = 2 general_log = 1 general_log_file =/path/to/somefile.log In turn, by default MySQL creates ‘mysql.general_log’ table being of CSV type (ENGINE=CSV). Then, as a new client connection is requested, MySQL has to put a log record of this event. At this moment it noticed that ‘mysql.general_log’ was not properly closed because of server crash and begins to recover this table. I guess recovering CSV table means that MySQL has to scan (and revalidate?) every single row, one-by-one, which is a time-consuming task. During this process the client connection becomes frozen. In my experiment I was loading all the data in “general_log = 1” mode, so the physical size of the ‘general_log’ table was about the same size as the total dump file size. When I issue “truncate table general_log” statement, I get zero post-crash delay. When I re-create the ‘general_log’ table setting in to be MyISAM table, the delay time diminishes significantly. When I set ‘general_log = 0’, I also get zero post-crash delay. I concluded all above by looking thru the MySQL trace file. In fact, I’m not good at it’s interpretation, but it seems that it’s the good guess. An excerpt from /tmp/mysql.trace attached (with line numbers)
[11 Feb 2011 12:42]
Vladimir Preobrazhenskiy
An excerpt from /tmp/mysql.trace, with line numbers
Attachment: mysql_trace_cut.txt (text/plain), 4.42 KiB.
[9 Jul 2011 18:20]
Valeriy Kravchuk
So, looks like this line from my.cnf: log_output = file,table is essential here. Our manual, http://dev.mysql.com/doc/refman/5.1/en/log-destinations.html, says: "Currently, logging to tables incurs significantly more server overhead than logging to files. If you enable the general log or slow query log and require highest performance, you should use file logging, not table logging." Looks like this is one more case of "overhead" mentioned. Both explanation and workarounds from Vladimir Preobrazhenskiy above looks valid. I'd say this is not a bug, but a result of design decision (to use CSV storage engine for logging to tables by default) and specific cases of crash with big general query log in CSV table.