Bug #13057 | Server crash after mysqlhotcopy after update from 4.1.13 to 4.1.14 | ||
---|---|---|---|
Submitted: | 8 Sep 2005 8:56 | Modified: | 12 Oct 2005 12:33 |
Reporter: | Beat Vontobel (Silver Quality Contributor) (OCA) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.1.14 | OS: | Linux (Linux 2.4.21 (SuSE Professional)) |
Assigned to: | CPU Architecture: | Any |
[8 Sep 2005 8:56]
Beat Vontobel
[8 Sep 2005 8:57]
Beat Vontobel
Switched to S1, as this is a crashing bug.
[24 Sep 2005 11:38]
Valeriy Kravchuk
Thank you for the bug report. I see nothing in the 4.1.14 changes (http://dev.mysql.com/doc/mysql/en/news-4-1-14.html) that may cause such a bug that is never observed on 4.1.13. But, please, look at the changes list for any items similar to your environment, usage profile etc. Please, provide the error log for the appropriate period, your my.cnf file content and hardware description. The definition of a "very busy server" will be also useful for those who will try to repeat this behaviour. How many queries per second, how many simultaneous connections? We really need more information to build a repeatable test case or prove that it is not, say, a bad hardware or configuration used.
[27 Sep 2005 8:34]
Beat Vontobel
Of course I can not prove you, that it's a new bug in 4.1.14. I just never had a crash on our master server with any build before and I didn't have a crash since I switched back to 4.1.13. I switched to 4.1.14 again today and activated the general query log. Let's wait what happens. Your questions: 1. I see the following three bugfixes in the change-log for 4.1.14, that possibly _could_ have introduced a new bug concerning table locking: * FLUSH TABLES WITH READ LOCK combined with LOCK TABLE .. WRITE caused deadlock. (BUG#9459) * When two threads compete for the same table, a deadlock could occur if one thread has also a lock on another table through LOCK TABLES and the thread is attempting to remove the table in some manner and the other thread want locks on both tables. (BUG#10600) * If a thread (connection) has tables locked, the query cache is switched off for that thread. This prevents invalid results where the locking thread inserts values between a second thread connecting and selecting from the table. (BUG#12385) 2. Hardware/OS/Configuration/Usage: Hardware: Dell PowerEdge 2650 CPUs: Dual Intel Xeon 3.06GHz (with HyperThreading turned on) RAM: 2 GB (no swapping occurs) /proc/version: Linux version 2.4.21-273-smp4G (root@i386.suse.de) (gcc version 3.3.1 (SuSE Linux)) #1 SMP Mon Jan 17 13:19:07 UTC 2005 MySQL: mysql-standard-4.1.13-pc-linux-gnu-i686 (dedicated db server, no other processes besides basic os) Database usage: * 22 databases with a total of 220 tables (mostly MyISAM, some InnoDB, some MERGE, few MEMORY) * total database size of about 13 GB * 24h-average of 100 queries/s and 30 concurrent connections (not evenly distributed) * query cache turned on * two slaves (still on 4.1.14, as they don't crash, only master affected) 3. my.cnf/error.log See attached files (error.log since update to 4.1.14, including downgrade to 4.1.13 in the end).
[27 Sep 2005 8:59]
Beat Vontobel
error.log
Attachment: error.log (application/octet-stream, text), 6.31 KiB.
[29 Sep 2005 15:11]
Valeriy Kravchuk
On September 27th you wrote: "I switched to 4.1.14 again today and activated the general query log. Let's wait what happens." Have you noted any new crashes after that switch? Please, inform me as soon as you got something new. Thank you for the error log and my.cnf files. Error log is a bit old, but it contains a crash information. Are you sure you have enough memory for mysqld all the time? Not obvious for me. Please, send the top results for the system without mysql running, when it will be possible. show create table for the table used in that REPLACE statement noted in the log (if it has something to do with the problem) can also give me some insights for the other questions.
[29 Sep 2005 16:14]
Beat Vontobel
No, I'm sorry, no new crashes so far, I will inform you as soon as it happens again. In the attached log you can actually find two crashes, both happend exactly at the end of the hotcopy-backup (over the period of one week). After the second crash I switched immediately back to 4.1.13, that's why the log is a little bit old. Since then (the whole period on 4.1.13) nothing unusual happened. The table definitions for the two tables involved in the REPLACE INTO ... SELECT ... FROM ... statement. First the table SELECTed from (`meteonews`.`metar`): CREATE TABLE `metar` ( `id` varchar(4) collate latin1_general_ci NOT NULL default '', `datetime` datetime NOT NULL default '0000-00-00 00:00:00', `dd` smallint(3) default NULL, `ff` smallint(6) default NULL, `fxx` smallint(6) default NULL, `tt` tinyint(4) default NULL, `td` tinyint(4) default NULL, `rh` tinyint(3) default NULL, `vis` varchar(5) collate latin1_general_ci default NULL, `ppnn` smallint(6) default NULL, `cld_okt` tinyint(1) default NULL, `cld_hgt` smallint(6) default NULL, `ceiling` smallint(6) default NULL, `prcp_type` enum('snow','hail','rain') collate latin1_general_ci default NULL, `prcp_str` tinyint(1) default NULL, `wx` tinyint(3) NOT NULL default '-99', `flags` set('ltg','fog','CAVOK','NSC','SKC','CLR','err') collate latin1_general_ci NOT NULL default '', `metar` text collate latin1_general_ci NOT NULL, `source` enum('unknown','NOAA','SCAA','FSU','MeteoSchweiz','SAWS') collate latin1_general_ci NOT NULL default 'unknown', `inserttime` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `datetime` (`datetime`), KEY `inserttime` (`inserttime`), KEY `source` (`source`), KEY `flags` (`flags`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci; Then the table REPLACEd INTO (`met_data`.`metar_archive`): CREATE TABLE `metar_archive` ( `id` char(4) collate latin1_general_ci NOT NULL default '', `datetime` datetime NOT NULL default '0000-00-00 00:00:00', `dd` smallint(3) default NULL, `ff` smallint(6) default NULL, `fxx` smallint(6) default NULL, `tt` tinyint(4) default NULL, `td` tinyint(4) default NULL, `rh` tinyint(3) default NULL, `vis` char(5) collate latin1_general_ci default NULL, `ppnn` smallint(6) default NULL, `cld_okt` tinyint(1) default NULL, `cld_hgt` smallint(6) default NULL, `ceiling` smallint(6) default NULL, `prcp_type` enum('snow','hail','rain') collate latin1_general_ci default NULL, `prcp_str` tinyint(1) default NULL, `wx` tinyint(3) NOT NULL default '-99', `flags` set('ltg','fog','CAVOK','NSC','SKC','CLR','err') collate latin1_general_ci NOT NULL default '', PRIMARY KEY (`id`,`datetime`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci; This table (`met_data`.`metar_archive`) is actually part of the following MERGEd table `meteonews`.`metar_archiv`: CREATE TABLE `metar_archiv` ( `id` char(4) collate latin1_general_ci NOT NULL default '', `datetime` datetime NOT NULL default '0000-00-00 00:00:00', `dd` smallint(3) default NULL, `ff` smallint(6) default NULL, `fxx` smallint(6) default NULL, `tt` tinyint(4) default NULL, `td` tinyint(4) default NULL, `rh` tinyint(3) default NULL, `vis` char(5) collate latin1_general_ci default NULL, `ppnn` smallint(6) default NULL, `cld_okt` tinyint(1) default NULL, `cld_hgt` smallint(6) default NULL, `ceiling` smallint(6) default NULL, `prcp_type` enum('snow','hail','rain') collate latin1_general_ci default NULL, `prcp_str` tinyint(1) default NULL, `wx` tinyint(3) NOT NULL default '-99', `flags` set('ltg','fog','CAVOK','NSC','SKC','CLR','err') collate latin1_general_ci NOT NULL default '', PRIMARY KEY (`id`,`datetime`) ) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci UNION=(`met_data`.`metar_archive`,`met_archive`.`metar_2004_1`,`met_archive`.`metar_2004_2`,`met_archive`.`metar_2005_1`); All the other tables in this MERGE are compressed MyISAM tables. On memory: I never actually saw this machine do some real swapping, so I think it should be okay. Most of the tables we use for everyday queries are really small (in the range of up to only a few MB). And we never experienced any performance bottlenecks. Unfortunately I can not give you a top output without mysqld running (it's a production server), but see attached top.txt for one including mysqld.
[29 Sep 2005 16:39]
Beat Vontobel
top output
Attachment: top.txt (text/plain), 10.70 KiB.
[12 Oct 2005 12:33]
Beat Vontobel
I'm sorry about this, the crash didn't happen again. I'm actually pretty sure I hit a bug before as the crash happened two times in exactly the same circmstances on a server that runs absolutely rock solid. But now mysqld 4.1.14 is up and running since 15 days without crash. Maybe it was a hidden bug but I'm just not able to reproduce it again, so I closed the bug as "not a bug". I'll still keep the general query log running, though, for the case it would happen again.