| Bug #175 | RENAME TABLE breaks on replication slave when involved in a MERGE table | ||
|---|---|---|---|
| Submitted: | 21 Mar 2003 8:29 | Modified: | 16 May 2003 14:43 |
| Reporter: | Matt Carothers | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 4.0.12-standard-log | OS: | Linux (Linux 2.4.18-686-smp) |
| Assigned to: | CPU Architecture: | Any | |
[21 Mar 2003 10:46]
Matt Carothers
Update: running a myisamchk repair on the two affected tables corrected the line counts. There was some loss of data on the leases table. # myisamchk -cf leases_20030303 Checking MyISAM file: leases_20030303 Data records: 20030919 Deleted blocks: 0 - check file-size myisamchk: error: Size of datafile is: 596207221 Should be: 620958489 - recovering (with sort) MyISAM-table 'leases_20030303.MYI' Data records: 20030919 - Fixing index 1 - Fixing index 2 - Fixing index 3 - Fixing index 4 Data records: 19232491
[21 Mar 2003 14:34]
Guilhem Bichot
This is simple : the user has a MERGE table and does RENAME TABLE on members of this table. Then, as our manual stronly recommends, he does FLUSH TABLES. So this is fine on the master. But FLUSH TABLES like all FLUSH commands is not replicated to the slave, hence later : rows going to the wrong table, and corruption. Our manual : http://www.mysql.com/documentation/mysql/bychapter/manual_Introduction.html#TODO_MySQL_5.0 "Resolving the issue of RENAME TABLE on a table used in an active MERGE table possibly corrupting the table." Or here : http://www.mysql.com/documentation/mysql/bychapter/manual_Table_types.html#MERGE "You can't do DROP TABLE, ALTER TABLE, DELETE FROM table_name without a WHERE clause, REPAIR TABLE, TRUNCATE TABLE, OPTIMIZE TABLE, or ANALYZE TABLE on any of the table that is mapped by a MERGE table that is "open". If you do this, the MERGE table may still refer to the original table and you will get unexpected results. The easiest way to get around this deficiency is to issue the FLUSH TABLES command, ensuring no MERGE tables remain "open"". So we must either fix RENAME TABLE + MERGE, or propagate FLUSH TABLES to the slave.
[16 May 2003 14:43]
Guilhem Bichot
Thank you for your bug report. This issue has been fixed in the latest
development tree for that product. You can find more information about
accessing our development trees at
http://www.mysql.com/doc/en/Installing_source_tree.html
Fixed in 4.1.1 : starting from this version,
OPTIMIZE, ANALYZE, REPAIR, most FLUSH (including FLUSH TABLES) go into the binlog by default. So the MERGE problem on the slave will disappear.
The master must be 4.1.1 (not released yet) or newer; no version need for the slave.
We fixed it in 4.1, not in 4.0, because this is a behaviour change : in 4.0 and 3.23 such commands never go into the binlog. And 4.0 is frozen : now changes, only bug fixes.
See our manual, section about the changes in release 4.1.1.

Description: We store DHCP lease data in a MyISAM table called "leases". Every 7 days we rotate the data by renaming "leases" to "leases_YYYYMMXX" and creating a new "leases" table. The RENAME and CREATE TABLE commands are executed on a master server (3.23.55) and replicated to a slave (4.0.12). After the commands are executed, more data is written to the (now empty) leases table. On the 4.0.12 slave, the leases table does not show up as empty after it has been recreated. Data written to it seems to go to both the new and old table (they show up as having the same size). The condition persists until we execute a FLUSH TABLES on the slave. At that point, all data written to the leases table since the rotation shows up in the leases_YYYYMMXX table, and the leases table becomes empty. Example: We rotated "leases" to "leases_20030303" last night. More data was written to the new "leases" table after the rotation. On the slave, we see this: mysql> select count(*) from leases; +----------+ | count(*) | +----------+ | 20030919 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from leases_20030303; +----------+ | count(*) | +----------+ | 20030919 | +----------+ 1 row in set (0.00 sec) After flushing the tables on the slave, we get this: mysql> flush tables; Query OK, 0 rows affected (0.22 sec) mysql> select count(*) from leases_20030303; +----------+ | count(*) | +----------+ | 20030919 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from leases; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.01 sec) Here's what the correct tables look like on the master: mysql> select count(*) from leases_20030303; +----------+ | count(*) | +----------+ | 19232491 | +----------+ 1 row in set (0.02 sec) mysql> select count(*) from leases; +----------+ | count(*) | +----------+ | 798469 | +----------+ 1 row in set (0.00 sec) How to repeat: This problem is not easily repeatable. It does not occur when only small amounts of data are present in the tables. As you can see from the example in the description, there were 19.2 million lines in the leases table at the time of rotation. 1. Set up a master/slave pair. 2. On the master, create a test table: CREATE TABLE leases ( ip_address_id INT UNSIGNED, cm_mac_id MEDIUMINT UNSIGNED, cpe_mac_id MEDIUMINT UNSIGNED, hostname_id INT UNSIGNED, starts DATETIME, ends DATETIME, KEY (ip_address_id), KEY (cm_mac_id), KEY (cpe_mac_id), UNIQUE KEY (ip_address_id, starts) ) TYPE=MyISAM; 3. Put a lot of data into the table. 4. Simulate our table rotation process with these commands: CREATE TABLE leases_new ( ip_address_id INT UNSIGNED, cm_mac_id MEDIUMINT UNSIGNED, cpe_mac_id MEDIUMINT UNSIGNED, hostname_id INT UNSIGNED, starts DATETIME, ends DATETIME, KEY (ip_address_id), KEY (cm_mac_id), KEY (cpe_mac_id), UNIQUE KEY (ip_address_id, starts) ) TYPE=MyISAM; RENAME TABLE leases TO leases_20030303, leases_new TO leases; FLUSH TABLES; 5. Check the row counts on the slave. Suggested fix: If the caching issue can't be resolved, perhaps the FLUSH TABLES command could be replicated to the slave?