Bug #46732 | Binary data from popular Wiki system crashes all nodes | ||
---|---|---|---|
Submitted: | 14 Aug 2009 16:41 | Modified: | 21 Aug 2009 17:28 |
Reporter: | Clint Alexander | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.1-ndb-7.0 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | BINARY, cluster, mediawiki, mysql-5.1.34 ndb-7.0.6, ndbd, objectcache, wiki |
[14 Aug 2009 16:41]
Clint Alexander
[14 Aug 2009 16:43]
Clint Alexander
Binlog data that causes a total cluster crash when imported
Attachment: binary_crash.sql.gz (application/x-gzip, text), 1.15 KiB.
[14 Aug 2009 16:48]
Clint Alexander
Additional note... I created the binlog copy with the following command: mysqlbinlog -v -v --start-position=6972392 --stop-position=6972782 /var/lib/mysql/mysql.000013 > binary_crash.sql
[15 Aug 2009 19:02]
Clint Alexander
Updated severity to Serious since this only happens with certain (uncommon?) binary queries. Otherwise it would be Critical.
[17 Aug 2009 13:12]
Jørgen Austvik
Please attach logs and error logs http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-programs-ndb-error-rep...
[17 Aug 2009 14:14]
Clint Alexander
I since have cleaned with an --initial. I was skipping this table in replication so it wouldn't crash again, but, I have turned it back on so that it creates the error in the original configuration by normal operation (instead of just importing the binary crash). As soon as it crashes, I'll generate the report and attach it. Stay tuned...
[18 Aug 2009 11:26]
Hartmut Holzgraefe
I can't reproduce the problem with the provided binary_crash.sql file. First of all the log entries in the file refer to a table `objectcache` in the database `manual` and not `wiki` even though the comments in the file say `wiki.objectcache`. As values like 'wiki:messageslock' in the comments also change to e.g. 'manual:messageslock' when decoding the actual BINLOG data strings i assume you did a 'manual' -> 'wiki' search&replace on the file? After creating the `objectcache` table in the `manual` database i can replay the binlog statements, none of them does seem to have any effect though. The table is still empty after replaying the log even though none of the BINLOG statements produces any warnings or errors and all nodes are still alive at this point.
[18 Aug 2009 20:43]
Clint Alexander
I apologize for this. When I was duplicating the error to make sure the import did what I said it would, I did not want it to insert into the original database, so I created a new one and like you guessed -- did a string replace, hoping that would cover it. I'm not a master at the binary logs (yet). However, I'm not sure why this did not work in your testing environment as it continues to work for mine. I'm still waiting for this to happen again on my network while running under normal activity and configuration. Where it was happening at least once every 24 hours, it has not happened again yet. But I am still monitoring and waiting. I'll try to provide a much better recreation method in the next posting and I apologize for this one not producing what I intended. It's a little embarrassing, but I'll get over it. :) //Clint
[21 Aug 2009 17:28]
Clint Alexander
I have not had this happen since we began the second monitoring session. I am closing this ticket and if the problem comes up again, we can readdress this ticket (or refer to it) in the subsequent report. Again, I apologies for the open-ended report. //Clint