Bug #65787 rename table not atomic across all nodes.
Submitted: 2 Jul 2012 18:11 Modified: 7 Jul 2012 15:30
Reporter: Jon Duggin Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.5.20-ndb-7.2.5-gpl OS:Linux (2.6.32-5-amd64)
Assigned to: CPU Architecture:Any
Tags: ndb, rename table

[2 Jul 2012 18:11] Jon Duggin
Description:
When swapping a staging table into production, the rename is not atomic across all nodes.  The sql node that the query originates from is atomic, however when the rename is broadcasted to all other sql nodes, they are not atomic and cause subsequent lookups to error with "table doesn't exist" during the duration of the table swap.

How to repeat:
This works and is the expected effect, only works when everything is happening on the same node:

# ndb sql node sql01

#1) create tables on the ndb cluster
sql01-mysql> create table foo (bar varchar(100)) engine=ndb;
Query OK, 0 rows affected (0.79 sec)

sql01-mysql> create table foo_stg (bar varchar(100)) engine=ndb;
Query OK, 0 rows affected (0.74 sec)

#2) run on sql01
sql01:~$ while [ true ]; do mysql -h sql01 -e 'select * from foo'; done

#3) in a separate session with the loop continuing to run: swap tables on sql01
sql01-mysql> rename table foo to foo_trash, foo_stg to foo;

# The output from step 2 does not have any errors. Everything is working fine.

As soon as we move the select statements to occur on a second sql node, we run into the problem:
2 ndb sql nodes (sql01 and sql02) running in multiple sessions

#1) create tables on the ndb cluster
sql01-mysql> create table foo (bar varchar(100)) engine=ndb;
Query OK, 0 rows affected (0.79 sec)

sql01-mysql> create table foo_stg (bar varchar(100)) engine=ndb;
Query OK, 0 rows affected (0.74 sec)

#2) run on sql02
sql02:~$ while [ true ]; do mysql -h sql02 -e 'select * from foo'; done

#3) in a separate session with the loop continuing to run: swap tables on sql01
sql01-mysql> rename table foo to foo_trash, foo_stg to foo;

# The output from step 2 shows error messages stating the table no longer exists, for the duration of the table swap. This is the problem.

ERROR 1146 (42S02) at line 1: Table 'foo' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'foo' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'foo' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'foo' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'foo' doesn't exist
[2 Jul 2012 21:23] Hartmut Holzgraefe
Expected behavior aka. known limitation, see "No distributed table locks" on

http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-limitations-multiple-nodes.html