MySQL Bugs: #352: Slave loses temp tables when it is shutdown/restarted

Bug #352	Slave loses temp tables when it is shutdown/restarted
Submitted:	30 Apr 2003 14:31	Modified:	1 Dec 2008 19:53
Reporter:	Guilhem Bichot	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	4.0, 5.0.67, 5.1	OS:	Any (all)
Assigned to:	Mats Kindahl	CPU Architecture:	Any

Description:
When the slave is replicating some temp tables, they survive STOP SLAVE and START SLAVE, which is good. But they don't survive if the slave mysqld is shutdown and restarted. They are deleted as soon as mysqld restarts, even before starting the slave threads. This makes replication stop with error.

Our manual is wrong:

The following table is about problems in 3.23 that are fixed in 4.0:
<cut>
* In 3.23 temporary tables are replicated with the exception of the case when you shut down slave server (not just slave thread) when you have some temporary tables open and they are used in subsequent updates. To deal with this problem shutting down the slave, do SLAVE STOP, check Slave_open_temp_tables variable to see if it is 0, then issue mysqladmin shutdown. If the number is not 0, restart the slave thread with SLAVE START and see if you have better luck next time.

This is the present bug in 4.0 and 4.1!

How to repeat:
MASTER> create temporary table zo(a int);
Query OK, 0 rows affected (0.01 sec)

Leave this connection open.

Then start replication on the slave. Then shutdown the slave and restart
it with --skip-slave-start. Check that Slave_open_tables is 0 instead of 1,
which is bad. Check that the temp tables have disappeared from tmpdir, which
is bad too.

Then do

MASTER> insert into zo values(10);
Query OK, 1 row affected (0.00 sec)
and see:
030430 22:08:15 Slave I/O thread: connected to master 'root@localhost:3306', replication started in log 'gbichot-bin.000001' at position 288
030430 22:08:15 next log './gbichot-relay-bin.000003' is currently active
ERROR: 1146 Table 'test.zo' doesn't exist
030430 22:08:25 Slave: error 'Table 'test.zo' doesn't exist' on query 'insert into zo values(10)', error_code=1146
030430 22:08:25 Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'gbichot-bin.000001' position 288

Suggested fix:
mysqld deletes files from tmpdir at startup:
init_server_components() calls table_cache_init() calls mysql_rm_tmp_tables()
which deletes all tmpdir/#sql* files.

General problem: it is probably a bad idea to put the slave thread's temp tables
in tmpdir (like normal temp tables), because the first need to be persistent, while the second needn't. For example tmpdir can be mapped to memory (like tmpfs) for better performance; this is good for temp tables but not for slave temp tables (because they would be lost at machine restart) (note this is not a super strong problem, because mysqld has options by itself to create memory-based tmp tables, so using tmpfs is not really necessary). And mysqld cleans tmpdir when it starts, while it should not clean slave temp tables. Finally, not cautious users could have a cron which periodically empties /tmp.

4 suggestions:

1st suggestion (favorite): keep everything like it is, except that give the slave temp tables such names that mysqld does not purge them at startup. Good but does not protect from cron empties /tmp.

2nd suggestion: make the slave thread create its temp tables in the database's directory. This solves the persistence problem. But we lose the performance benefit of using tmpdir, with its multiple round-robin directories. So we could end up with a fast master and slow late slave.

3rd suggestion: keep using tmpdir, but when the slave mysqld is shutting down,
move the slave temp tables to some safer place (the database's directory). And at restart, move them back again. Of course, it case of brutal shutdown this does not work, but anyway the slave does not cope well with this already (tables and master.info / relay-log.info are not necessarily in sync).

4th suggestion: rename --slave-load-tmpdir to --slave-tmpdir and use it to store these persistent objects:
- slave LOAD DATA INFILE temp files (like now)
- slave temp tables
- not temp files used for filesort (because they needn't be persistent), these should go in tmpdir.
And we should forbid slave-tmpdir == tmpdir because this would enable the bug again. Which means slave-tmpdir would become a mandatory option to start replication (like server-id is). slave-tmpdir could be multiple and round-robin like tmpdir.

I like 1st. If the user has a cron that empties /tmp, it's his problem, he should either disable his cron task or set tmpdir to another directory.

This is harder than just changing tmp_file_prefix for slave tables.
The hardest is that at restart time the slave tables are not
in the table cache anymore, so the slave does not see them, even
if they exist on disk.
The solution must also work for InnoDB temporary tables and BDB.
Let's ask Heikki about temp tables in InnoDB.

This bug will probably not be fixed soon.

Note that we don't even know the name of the table we discover in tmpdir
at slave server restart!!
We don't know that this #sql321.frm file corresponds to a former temp table
called `mytable`.
In the future things will be easier because we will have the CREATE TABLE statement stored in the .frm file.
Maybe we should wait until that moment? Then things will be easy for MyISAM:
open the #sql.frm file, read the name; re-enter it in the table cache. We will however need the master thread id who originally created this table (maybe we can store it in the temporary file's name).
With InnoDB, Heikki has a way to recover a lost temp table in the InnoDB datafile; things will be tougher but possible.

Here is a way to get a lost temp table back in InnoDB
(if mysqld stops without dropping a temp table, this temp table remains in the InnoDB datafile) :

> If you have an orphaned table #sql... inside the tablespace, then by calling
> 
> CREATE TABLE `rsql..._recover_innodb_tmp_table`(...) type = innodb;
> 
> where the table definition is similar to the temporary table, you can make
> InnoDB to rename the orphaned table to `rsql..._recover_innodb_tmp_table`.
> Then you can dump and drop the renamed table. The backquotes around the
> table name are needed because a temporary table name contains the character
> '-'.

So we need the CREATE TABLE statement for the temp table. Again, this will be solved when we write the CREATE TABLE statement to the frm.

Note that this bug is about disk-based tmp tables, not HEAP tables (for which there is no obvious solution).
Two problems:
1- the table files must not be deleted at startup
2- at startup, the slave server must have a way to know which thread on which server id created this table.
1- is easy: we can simply create the table in the database directory. Distinguish it from permanent tables using a prefix.
2- we have to make it fast enough in all cases. First include the original server id and thread id in the file name (easy). Then when a slave thread starts, it could scan all databases' directories in search for tmp tables. But which tmp tables is it looking for? Depends on the log event (think of a binlog which was generated by circular replication: it contains events of different server ids). I suggest this:
* in init_slave() (some code which is executed only at server startup), scan and collect all tmp tables. Put them in a hash of hash like this:
server_id => thread_id => list_of_tables.
Also the "server id" keys have a flag "already_used" with them, which is initially 0.
* When any slave thread reads an event which is marked LOG_EVENT_THREAD_SPECIFIC_F, look in the hash for this event's server id. If it finds it and its "already_used" is 0, grab the tmp tables for itself, and set "already_used" to 1. This way these tables are its, and only its, because it is responsible for this server id (no other slave thread should have events of this server id, or you have a strange multimaster setup prone to conflicts).
In this method we scan the directories only once at startup, which is efficient.
Note: some people have 50,000 databases in a server. So we may need an option --skip-slave-scan-for-tmp-tables because the initial scan may take a long time. In that case, users should always be sure that Slave_open_temp_tables == 0 before shutdown.
Another different approach: as one day we will need to store info about prepared statements to some file (MyISAM table in the mysql database) so that prepared statements survives a slave shutdown, we could also store info about tmp tables the same way, with columns:
database_name
table_name
server_id
thread_id
frm_file_name
This solution has the advantage of being symmetric of the one for prepared statements.

For the moment, this is documented in the "Replication Features and Known Problems" section of our manual; as a workaround you have to check that the Slave_open_tables variable is 0 after doing STOP SLAVE and before shutting down the slave.

Bug still exists in version 5.0.67

Same error with 5.1 if use statement-based replication. With row-based problem is solved.