Bug #105098 | Node 1 constantly reports error 1204 | ||
---|---|---|---|
Submitted: | 1 Oct 2021 10:12 | Modified: | 19 Nov 2021 15:10 |
Reporter: | Mikael Ronström | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0.26 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[1 Oct 2021 10:12]
Mikael Ronström
[1 Oct 2021 12:23]
MySQL Verification Team
Hi Mikael, Thanks for the report. all best Bogdan
[19 Nov 2021 14:23]
Mauritz Sundell
Posted by developer: Conditions for bug * using 8.0.23 or newer * using 3 or 4 replicas * one data node should have node id 1 Rolling restart under load should be enough to hit it. SQL queries will show Warning: Got temporary error 1204 'Temporary failure, distribution changed' from NDB. Note that not every occurrence of error 1204 indicate that this bug is hit, the warning typically show up temporarily during node restarts. But then this bug is hit it is not a temporary condition in data node 1 can and as a workaround data node 1 can be restarted. Also note that if no node in same nodegroup as node id 1 is down when queries fail, this bug is not causing the failed queries. This bug also can impact on any operation that in its implementation uses ndb tables such as ddl, autoincrement, backup, binlogging and replication. Check if error 1204 show up in show warnings after the failed command. A more proactive workaround is to always restart data node 1 if some of other nodes in same nodegroup has restarted.
[19 Nov 2021 14:25]
Mauritz Sundell
Posted by developer: How bug works: * While node 1 is alive, one other node in same ng should restart to change distribution key on fragments * Still while 1 is alive, yet other node in same ng should stop, such that node 1 become primary for some fragment. * There should be some request by key against node 1 on such fragment. SQL queries will typically fail with error 1297 and show warnings will reveal error 1204. By restarting data node 1 it will get the correct distribution keys from the other nodes during start up.
[19 Nov 2021 15:10]
Jon Stephens
Documented fix as follows in the NDB 8.0.28 changelog: Following improvements in LDM handling made in NDB 8.0.23, an UPDATE_FRAG_DIST_KEY_ORD signal was never sent when needed to a data node using 1 as its node ID. When running the cluster with 3 or 4 replicas and another node in the same node group restarted, this could result in SQL statements being rejected with error 1297 and, subsequently, SHOW WARNINGS reporting error 1204. NOTE Prior to upgrading to this release, you can work around the issue by restarting data node 1 whenever any other node in the same node group has been restarted. Closed.