Bug #91161 | NDB crash on Check createTableVersion == lqhCreateTableVersion || lqhCreateTable | ||
---|---|---|---|
Submitted: | 6 Jun 2018 14:31 | Modified: | 13 Jun 2018 3:45 |
Reporter: | Daniël van Eeden (OCA) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | 7.6.6 | OS: | Any |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[6 Jun 2018 14:31]
Daniël van Eeden
[11 Jun 2018 13:20]
MySQL Verification Team
Hi Daniel, > Happened on startup of a cluster for a 7.6.4 to 7.6.6 upgrade. How was the upgrade performed? This looks like it happened during "START BACKUP" and not during upgrade? all best Bogdan
[12 Jun 2018 20:18]
Daniël van Eeden
I started to do a rolling upgrade. From memory: I started with node 48, which went ok. Then I did node 47, which crashed (maybe Bug #90606) Then I did node 46, which went ok. Then replication to the cluster stopped (see Bug #91160). Then I did a full cluster shutdown and start. I think that was when this crash happened on node 26. I think I had to start node 47 and one other node (26?) with --initial to get the cluster 100% healty again.
[13 Jun 2018 3:45]
MySQL Verification Team
Hi, Thanks for the explanation. The new code have some additional logging so if this happens again (with newer code) we'll have more info to go on, but with info we have now there's nothing we can do to find out why it happened and I can't reproduce this using your flow. all best Bogdan
[14 Jun 2018 5:29]
Mikael Ronström
This can happen in the following scenario: 1) Node X stops 2) Drop table 3) Cluster restart including start of node X 4) Create table 5) Crash The reason is that the Drop table isn't performed as part of the cluster restart in node X and this leaves LCP files from the old table remaining and when this table id is reused by a new table we discover an anomaly in the LCP execution code. We will do two solutions to this problem. The short-term solution is to simply not crash in this situation, but simply remove the old files and create new ones. The long-term solution is to ensure that cluster restarts cleans up properly old dropped tables even when the node was down while the drop table occurred.