Bug #45489 | Endless tables corruptions | ||
---|---|---|---|
Submitted: | 14 Jun 2009 23:16 | Modified: | 28 Jul 2009 5:36 |
Reporter: | Jérôme Blion | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 5.1.34-0.dotdeb.1-log and 5.1.35-log | OS: | Linux (Debian Lenny 64b) |
Assigned to: | CPU Architecture: | Any | |
Tags: | corruption, crash, ext4, ext4dev, innodb |
[14 Jun 2009 23:16]
Jérôme Blion
[14 Jun 2009 23:18]
Jérôme Blion
Mysql error log
Attachment: 45489-mysql.error.log.zip (application/zip, text), 368.71 KiB.
[15 Jun 2009 7:45]
Sveta Smirnova
Thank you for the report. Have you run mysql_upgrade when upgraded? have you run CHECK TABLE on problem tables as in error log suggested?
[15 Jun 2009 8:11]
Jérôme Blion
Hello, Thanks for your answer. The database has always been hosted on 5.1 Mysql Servers. # mysql_upgrade Looking for 'mysql' as: mysql Looking for 'mysqlcheck' as: mysqlcheck This installation of MySQL is already upgraded to 5.1.34, use --force if you still need to run mysql_upgrade Will CHECK TABLE have a better result than mysqldump + import of the dump ? (the mysqldump drops the table)
[15 Jun 2009 8:23]
Jérôme Blion
Check table done on phpbb_users (the table which crashed today)
[15 Jun 2009 10:47]
Jérôme Blion
New corruption on tprofile_info and smsgateway ! On check table is running accross the first one (several GB, it will take some times, the server is loaded)
[15 Jun 2009 10:56]
Jérôme Blion
090615 12:21:22 InnoDB: error clustered record for sec rec not found InnoDB: index `fdate` of table `mobiluck`.`tprofile_info` InnoDB: sec index record PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 8; hex 8000124120a8c566; asc A f;; 1: len 4; hex 80032a74; asc *t;; 2: len 5; hex 4130333031; asc A0301;; InnoDB: clust index record PHYSICAL RECORD: n_fields 18; compact format; info bits 0 0: len 4; hex 80040c73; asc s;; 1: len 5; hex 4131313031; asc A1101;; 2: len 6; hex 0000717814b9; asc qx ;; 3: len 7; hex 80000080060fd8; asc ;; 4: len 6; hex 202020202020; asc ;; 5: len 4; hex 800d00b1; asc ;; 6: len 4; hex 80000000; asc ;; 7: len 10; hex 616e6169733030373833; asc anais00783;; 8: len 2; hex 2020; asc ;; 9: len 1; hex 03; asc ;; 10: len 4; hex 80000000; asc ;; 11: len 8; hex 8000000000000000; asc ;; 12: len 4; hex 4718a500; asc G ;; 13: len 8; hex 8000000000000000; asc ;; 14: SQL NULL; 15: SQL NULL; 16: SQL NULL; 17: len 4; hex 00000000; asc ;; TRANSACTION 0 2012146312, ACTIVE 35 sec, process no 14167, OS thread id 1154926928 fetching rows, thread declared inside InnoDB 298 mysql tables in use 1, locked 0 MySQL thread id 76760, query id 19021108 ns60332.ovh.net 91.121.188.151 mobiluck Sending data SELECT userid, stringValue, fdate, moderation_status, moderation_date, moderation_userid, comment FROM tprofile_info WHERE infoTemplateid = 'photo' AND (moderation_status='todo') ORDER BY fdate DESC LIMIT 42, 42 InnoDB: Submit a detailed bug report to http://bugs.mysql.com
[15 Jun 2009 11:19]
Jérôme Blion
20090615 Mysql error log.
Attachment: 45489-20090615-mysql.error.log.txt.zip (application/octet-stream, text), 240.18 KiB.
[15 Jun 2009 11:22]
Jérôme Blion
tloc_link_user which I recreated this night crashed once more ! When trying to do a CHECK TABLE on it, Mysql server went away ! I have to mysqldump it and reimport it. This drives me totally crazy...
[15 Jun 2009 18:26]
Jérôme Blion
Read severity definitions... Increased the severity... Service it totally unusable.
[15 Jun 2009 18:39]
Jérôme Blion
smsgateway crashed one more time, whereas the check table reported nothing wrong...
[15 Jun 2009 18:55]
Jérôme Blion
Packages installed on the Mysql master
Attachment: mysql-dpkg.txt (text/plain), 30.25 KiB.
[15 Jun 2009 18:55]
Jérôme Blion
Packages installed on the Apache server
Attachment: apache-dpkg.txt (text/plain), 47.93 KiB.
[16 Jun 2009 6:33]
Sveta Smirnova
Thank you for the feedback. CHECK TABLE is not required after loading dump, I just wanted to be sure table is OK before load. Can you repeat crash in any case when you run specific query on same table? Anyway please provide dump of one of tables you have repeatable crashes with. Also, please, check with our MySQL binaries accessible from http://dev.mysql.com/downloads and current version 5.1.35 to be sure this is not problem of Debian binaries.
[16 Jun 2009 16:56]
Jérôme Blion
smsgateway crashed one more time today... I didn't repaired it yet. Do you want me to upload the .ibd file too ? I will do that this night, when the server will have less load.
[16 Jun 2009 20:39]
Sveta Smirnova
Thank you for the feedback. > Do you want me to upload the .ibd file too ? Yes, it can be helpful. Please provide *frm file and dump as well.
[17 Jun 2009 0:00]
Jérôme Blion
Mysql error log for 2009-06-16...
Attachment: 45489-20090616-mysql.error.log.zip (application/zip, text), 16.69 KiB.
[18 Jun 2009 22:31]
Jérôme Blion
One more crash on tloc_msg. CHECK TABLE ==> Crash Mysqldump ==> Crash innodb_force_recovery=[1|3|4|5] + mysqldump ==> crash Tried to upgrade to 5.1.35. Same behaviour. ns6702:/backup# mysqldump xxxxx tloc_msg > /backup/tloc_msg.sql mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `tloc_msg` at row: 2170611 I will post the ibd+frm files... I cannot provide the SQL dump !
[18 Jun 2009 23:20]
Jérôme Blion
# ncftpput ftp.mysql.com /pub/mysql/upload/ 45489-tloc_msg.tar.gz 45489-tloc_msg.tar.gz: 1.45 GB 2.38 MB/s
[20 Jun 2009 13:55]
Jérôme Blion
# ncftpput ftp.mysql.com /pub/mysql/upload/ 45489-20090619-mysql.error.log.txt.gz 45489-20090619-mysql.error.log.txt.gz: 1.81 MB 1.40 MB/s
[21 Jun 2009 2:42]
Jérôme Blion
I upgraded the server to 5.1.35 downloaded from Mysql website. I installed the binary version compiled with gcc. Mysql is still randomly crashing. I dumped all databases which uses InnoDB. I dropped them all. I moved ib* files in mysql datadir root outside of it. Now, I'm importing all my data pieces (22GB of sql file...) Did you find some interesting things in all files I provided you? This is becoming really critical for my activity.
[22 Jun 2009 9:39]
Sveta Smirnova
Thank you for the feedback. We are still analyzing files uploaded. No information from these files. Although I'd recommend to check hard disk also.
[23 Jun 2009 22:56]
Jérôme Blion
After more than a day with no crash, I cloned my master to create a slave. The backup is performed with a LVM snapshot. 2 hours later: - on the master : 090623 19:12:12 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.1.35-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 131198. InnoDB: You may have to recover from a backup. 090623 21:09:24 InnoDB: Page dump in ascii and hex (16384 bytes): - on the slave: 090623 18:01:58 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.1.35-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) 090623 19:08:42 [Note] Slave: received end packet from server, apparent master shutdown: 090623 19:08:42 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000037' at postion 45936090 090623 19:08:42 [ERROR] Slave I/O: error reconnecting to master 'slave_user@ns6511.ovh.net:3306' - retry-time: 60 retries: 86400, Error_code: 2013 090623 19:12:42 [Note] Slave: connected to master 'slave_user@ns6511.ovh.net:3306',replication resumed in log 'mysql-bin.000037' at position 45936090 090623 21:01:25 - mysqld got signal 11 ; At 7PM, I increased innodb_log_file_size to 128MB...
[23 Jun 2009 23:02]
Jérôme Blion
Erratum : The backup has been made à 3AM with no errors. I didn't realize my snapshot just before starting the slave, but many hours ago.
[24 Jun 2009 5:21]
MySQL Verification Team
"InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 131198. " This indicates possible hardware problems. Had you checked disks and memory are okay? Does this corruption happen on more than one machine, or just this machine ?
[24 Jun 2009 6:11]
Jérôme Blion
Hello, This problem happens on 2 machines. On both servers, no errors are reported by the RAID controller neither the syslog. I ran several fsck -f on disks which reported no error some days ago. RAM has been checked on the slave. It reported no error. Corruption came back 2 hours after enabling the slave server ! I shut down the slave. I didn't do any corrective action since yesterday and it didn't crash one more time. Changing master/slave order does not solve the issue. There was no backup between the slave activation and the first crash. It seems this problem is occuring since I upgraded Mysql from 5.1.32 to 5.1.34 and was still present when I upgraded to 5.1.35...
[29 Jun 2009 19:12]
Jérôme Blion
ns6511:/backup/bug# ncftpput ftp.mysql.com /pub/mysql/upload/ 45489-20090627-mobiluck-20090627.sql.gz 45489-20090627-mys ql.error.log.txt.gz 45489-20090627-mobiluck-20090627.sql.gz: 3.05 GB 2.25 MB/s 45489-20090627-mysql.error.log.txt.gz: 285.43 kB 686.89 kB/s
[29 Jun 2009 22:22]
Jérôme Blion
45489-20090627-my.cnf: 5.66 kB 56.21 kB/s 45489-20090627-whole-database.tar.gz: 11.31 GB 2.59 MB/s Now you have the entire database... Hope you will find some interesting things...
[1 Jul 2009 9:59]
Jérôme Blion
I have no slave anymore (performance issue, but that's not the worst) On the master, a "start slave" has been issued yesterday. There was no slave data filled, so slave replication didn't start. But the master crashed one hour later and corrupted the smsgateway table once again ! Had to restore it.
[7 Jul 2009 9:07]
Jérôme Blion
Did you find some interesting things? Could it be linked to a LVM snapshot bug?
[7 Jul 2009 22:02]
Jérôme Blion
(sorry, outputs are in french) # fsck /dev/vg1/lv_home -f fsck 1.41.3 (12-Oct-2008) e2fsck 1.41.3 (12-Oct-2008) Passe 1 : vérification des i-noeuds, des blocs et des tailles Passe 2 : vérification de la structure des répertoires Passe 3 : vérification de la connectivité des répertoires Passe 4 : vérification des compteurs de référence Passe 5 : vérification de l'information du sommaire de groupe /dev/vg1/lv_home : 747/4718592 fichiers (46.3% non contigus), 14389071/18874368 blocs ==> no error ! # dd if=/dev/vg1/lv_home of=/dev/null 150994944+0 enregistrements lus 150994944+0 enregistrements écrits 77309411328 bytes (77 GB) copied, 740,691 s, 104 MB/s ==> no error ! It was a ext4dev filesystem. Mounted with options noatime, nodiratime. Filesytem has been recreated as ext3 with defaults options.
[15 Jul 2009 13:08]
Jérôme Blion
Since the last change I did on filesystem, it did not crash yet.
[16 Jul 2009 8:09]
Sveta Smirnova
Thank you for the feedback. No news from our side so far - your database work fine. Looks like related to ext4dev. Will see what can be checked here.
[28 Jul 2009 5:36]
Sveta Smirnova
Thank you for the feedback. I can not repeat described behavior on ext4 volume. So closed as "Can't repeat". Feel free to reopen the report if problem reoccurs.