Bug #2164 | MYI files being corrupted on slaves. | ||
---|---|---|---|
Submitted: | 18 Dec 2003 13:08 | Modified: | 29 Jan 2004 6:36 |
Reporter: | Chad Clark | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 4.0.15 -> 4.0.17 | OS: | Linux (Linux (Debian 3.0)) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[18 Dec 2003 13:08]
Chad Clark
[19 Dec 2003 9:28]
Dean Ellis
Some questions I didn't ask (or answers didn't make it to the report) when we were talking earlier: Are you using the official MySQL AB binaries or the Debian binaries (or self-compiled, or...)? What filing system are the slaves using? Are the slaves using identical hardware? Is it possible to add a slave which uses different hardware/kernel to see if the issue appears there as well? Is this always accompanied by errors such as your failed ALTER TABLE statement? Is this always accompanied by stack traces? Are both slaves showing the same stack traces at the same times? Are you showing other stack traces/crashes which are not accompanied by this corruption? Do the relay logs on the slaves appear "sane" (check with mysqlbinlog)? Can you run some hardware tests on the slaves (memory tests, drive tests, etc)? As we discussed, this sounded very much like a hardware or kernel problem until it appeared on more than one slave, but that is still very possible so we need as much information as we can get (particularly as there is not enough information for us to reproduce this if it is in fact our bug). It does appear to be something specific to your particular environment so far. Thank you
[23 Dec 2003 15:12]
Chad Clark
Our master server froze up within hours of our noticing a problem on the website. It does look like this may have caused a table on both slaves to become corrupt. I do however stress that we have seen corruption which looked the same as what we saw last week without any noted hardware trouble. Following is a timeline of what happend and a description of the hardware. We are in the middle of adding new hardware and so some of the machines have been swapped around a bit making this troubleshooting somewhat more difficult. Right now I don't have any more information (other than some log files but I'm not sure when the corruption occured). I think we may have to suspend researching this issue until we observe it again when we can hopefully get more information as it happens. Thanks, /Chad Trouble on Dec 18th : Timeline Total slaves: db1, db2, iweb1, iweb2 (see hardware description below). Slaves running : db1, iweb2 1) At 17:26 (17 Dec 2003) the master server stopped logging apache requests. Also syslogd contains nothing past 17:25. 2) At 18:27 db1 logged: "Slave I/O thread: error reconnecting to master 'repluser@MASTERS_IP:33 06': Error: 'Lost connection to MySQL server during query' errno: 2013 retry-time: 60 retries: 8 6400" 3) At 18:28 the master machine was hard rebooted. Keyboard was not responding. 4) 18:28 db1 logged: "Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log" followed by "Slave I/O thread exiting". 5 Around aprox. 21:00 - 22:00 attempts were made to resync db1, db2, iweb1 Only db1 and db2 suceeded. iweb1 would not start replication for an unknown reason. 6) Around 03:00 (18 Dec 2003) Reports of strange values appearing on the site led to investigations . It was discovered that slaves (not noted which ones) were reporting different values. 7) Around 04:00-04:30 the site was switched to use only the master for all queries. 8) 09:00-12:00 (18 Dec 2003) The master's tables were converted to MyIASM. The slaves /data directories were cleared and replication was started via "LOAD DATA FROM MASTER". Slaves running : db1, db2 Hardware descriptions: Master: ------ note: also runs apache OS: Debian 3.0 kernel: custom built, vanilla 2.4.21 (since then updated to 2.4.23) filesystem: ext2 (mounted with noatime on the data/ directory) mysql: mysql-standard-pc-linux-i686 4.0.15 - mysql binary install (since updated to mysql-standard-pc-linux-i686 4.0.17 - mysql binary install) Dual Xeon 2.4 GHz w/ hyperthreading RAM: 5 Gig ECC ICP Vortex SCSI Raid 5 Slave 1: (db1) ------- OS: Debian 3.0 kernel: custom built, vanilla 2.4.20 (since then updated to 2.4.23) filesystem: ext2 (atime is enabled.) mysql: mysql-standard-pc-linux-i686 4.0.17 - mysql binary install Dual Athlon MP 2000 Plus RAM: 1.5 Gig Not ECC Single IDE drive. Slave 2: (db2) ------- OS: Debian 3.0 kernel: custom built, vanilla 2.4.23 filesystem: ext2 (atime is enabled.) mysql: mysql-standard-pc-linux-i686 4.0.17 - mysql binary install Dual Xeon 2.4 GHz with hyperthreading RAM: 2 Gig ECC Single IDE drive. Slave 3: (iweb1) ------- OS: Slackware 9.1 kernel: custom built 2.4.22 (with I2C patches applied) filesystem: reiserfs (mounted with noatime on the data directory.) mysql: mysql-standard-pc-linux-i686 4.0.16 - mysql binary install Single Xeon 2.4 GHz on a Dual Motherboard RAM: 3 Gig ECC Single IDE drive. Slave 4: (iweb2) ------- OS: Slackware 9.1 kernel: custom built 2.4.22 (with I2C patches applied) filesystem: reiserfs (mounted with noatime on the data directory.) mysql: mysql-standard-pc-linux-i686 4.0.16 - mysql binary install Single Xeon 2.4 GHz on a Dual Motherboard RAM: 3 Gig ECC Single IDE drive.
[14 Feb 2005 22:54]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".