Bug #1649 Race condition with Temporary tables during Alters
Submitted: 24 Oct 2003 15:21 Modified: 1 Dec 2003 13:37
Reporter: Dathan Pattishall Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.15 OS:Linux (Linux ef248.friendfinderinc.com)
Assigned to: CPU Architecture:Any

[24 Oct 2003 15:21] Dathan Pattishall
Description:
*************************** 1. row ***************************
          Master_Host: 10.2.12.224
          Master_User: rep
          Master_Port: 3306
        Connect_retry: 60
      Master_Log_File: ef224-bin.020
  Read_Master_Log_Pos: 409223612
       Relay_Log_File: ef242-relay-bin.016
        Relay_Log_Pos: 330063332
Relay_Master_Log_File: ef224-bin.020
     Slave_IO_Running: Yes
    Slave_SQL_Running: No
      Replicate_do_db: 
  Replicate_ignore_db: 
           Last_errno: 25
           Last_error: Error 'Can't create symlink './ff_recent/#sql-17f7_c.MYI' pointing at '/var/lib/mysql/ff_recent/#sql-17f7_c.MYI' (Error 17)' on query 'ALTER TABLE recentmembers_20 add have_children tinyint unsigned not NULL default 0'. Default database: 'ff_recent'
         Skip_counter: 0
  Exec_master_log_pos: 348632736
      Relay_log_space: 390689457

Perror says 
Error code  17:  File exists
 
I have 2 alters submitted on the master in sequential order. On the slave it executes in the same order but produces error 17. Somehow the tempfile already exists, but when I go to the datadir this temp file is not displayed in the directory. 
 
Performing a slave stop slave start does not work.
Restarting the mysql server on the slave is the only method I found to recover from the error.
 
 
Version of mySQL: 4.0.15
Kernel: Linux ef242.friendfinderinc.com 2.4.18-4smp #1 SMP Sun Apr 14 13:50:35 PDT 2002 i686 unknown 
*PATCHED*

How to repeat:
Apply 2 alters on a master in sequentail order across 53 tables with about 600K rows in each.

Wait for the slave to throw the error that the temporary table exists and can't be symlinked

Suggested fix:

Don't symlink the tables

unlink before next execution

use a larger random temporary file name not based on the table name.
[1 Dec 2003 13:21] Dean Ellis
I cannot repeat this with the latest sources from the 4.0.17 BitKeeper repository.    You might try testing it with a newer release.

Thank you.
[1 Dec 2003 13:37] Guilhem Bichot
Sorry Dean; I discussed this bug on mysql@lists and bugs@lists where it was originally reported, and did not notice that it had been entered here as well.
Below I paste the main part of the discussion. The conclusion can be summed up like this: "it's not a bug, but our doc should clearly state that it's not possible to have synonyms with symlinks" (I have fixed the doc since). Sorry again for making you waste some time :(

From: 	Guilhem Bichot <guilhem@mysql.com>
To: 	Dathan Vance Pattishall <dathan@friendfinderinc.com>
Cc: 	mysql@lists.mysql.com, bugs@lists.mysql.com, dlenev@mysql.com, sinisa@mysql.com
Subject: 	RE: reproducible error 17
Date: 	28 Oct 2003 00:15:04 +0100
> -->Are you using symlinks to place tables or databases at other
> locations
> -->than the datadir?
> Yes, in the db directory under the datadir. I have tables which are one
> name symlinked to the tables which where altered. These fake tables or
> tables which are synonyms to the src tables did not have any actions
> performed on them at this moment in time.
> 
> For example 
> 
> recent_members.frm -> /var/lib/mysql/ffadult_recent/recentmembers.frm
> recent_members.MYI -> /var/lib/mysql/ffadult_recent/recentmembers.MYI
> recent_members.MYD -> /var/lib/mysql/ffadult_recent/recentmembers.MYD
> 
> The tables altered where recentmembers(_*)

I have run some tests and what triggers the problems is this symlinking.
If I understood you correctly, your setup is analogous to mine which is:
- the datadir of my MySQL server is /m/data/4/1/
- in this I have a database called test: /m/data/4/1/test/
- I have this in /m/data/4/1/test/ :
lrwxrwxrwx    1 guilhem  qq             24 Oct 27 23:25 tbl_.frm ->
/m/data/4/1/test/tbl.frm
-rw-rw----    1 guilhem  qq           8620 Oct 27 23:30 tbl.frm
lrwxrwxrwx    1 guilhem  qq             24 Oct 27 23:26 tbl_.MYD ->
/m/data/4/1/test/tbl.MYD
-rw-rw----    1 guilhem  qq             84 Oct 27 23:30 tbl.MYD
lrwxrwxrwx    1 guilhem  qq             24 Oct 27 23:26 tbl_.MYI ->
/m/data/4/1/test/tbl.MYI
-rw-rw----    1 guilhem  qq           1024 Oct 27 23:30 tbl.MYI

("tbl_" is a "synonym" for the real "tbl" table).

On my master (no replication) I got:

MASTER> flush tables;
Query OK, 0 rows affected (0.00 sec)

MASTER> desc tbl;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| a     | int(11) | YES  |     | NULL    |       |
| b     | int(11) | YES  |     | NULL    |       |
| c     | int(11) | YES  |     | NULL    |       |
| d     | int(11) | YES  |     | NULL    |       |
| e     | int(11) | YES  |     | NULL    |       |
| f     | int(11) | YES  |     | NULL    |       |
| g     | int(11) | YES  |     | NULL    |       |
| h     | int(11) | YES  |     | NULL    |       |
| i     | int(11) | YES  |     | NULL    |       |
| k     | int(11) | YES  |     | NULL    |       |
| l     | int(11) | YES  |     | NULL    |       |
| m     | int(11) | YES  |     | NULL    |       |
| n     | int(11) | YES  |     | NULL    |       |
| o     | int(11) | YES  |     | NULL    |       |
| p     | int(11) | YES  |     | NULL    |       |
| q     | int(11) | YES  |     | NULL    |       |
| r     | int(11) | YES  |     | NULL    |       |
| s     | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
18 rows in set (0.00 sec)

MASTER> flush tables;
Query OK, 0 rows affected (0.00 sec)

MASTER> alter table tbl drop s;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

This show that altering the real table is ok.
Now we use the synonym (the symbolic link) instead:

MASTER> desc tbl_;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| a     | int(11) | YES  |     | NULL    |       |
| b     | int(11) | YES  |     | NULL    |       |
| c     | int(11) | YES  |     | NULL    |       |
| d     | int(11) | YES  |     | NULL    |       |
| e     | int(11) | YES  |     | NULL    |       |
| f     | int(11) | YES  |     | NULL    |       |
| g     | int(11) | YES  |     | NULL    |       |
| h     | int(11) | YES  |     | NULL    |       |
| i     | int(11) | YES  |     | NULL    |       |
| k     | int(11) | YES  |     | NULL    |       |
| l     | int(11) | YES  |     | NULL    |       |
| m     | int(11) | YES  |     | NULL    |       |
| n     | int(11) | YES  |     | NULL    |       |
| o     | int(11) | YES  |     | NULL    |       |
| p     | int(11) | YES  |     | NULL    |       |
| q     | int(11) | YES  |     | NULL    |       |
| r     | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
17 rows in set (0.00 sec)

MASTER> alter table tbl_ drop r;
ERROR 25: Can't create symlink './test/#sql-1aa7_4.MYI' pointing at
'/m/data/4/1/test/#sql-1aa7_4.MYI' (Error 17)

MASTER> alter table tbl_ drop r;
ERROR 25: Can't create symlink './test/#sql-1aa7_4.MYI' pointing at
'/m/data/4/1/test/#sql-1aa7_4.MYI' (Error 17)

This is the same error as you: it's impossible to alter the synonym.

What happens is that "ALTER TABLE tbl_ ...":
- Creates temporary files (named #sql*) in the same place as the tbl_
files, which is ./ (which is also /m/data/4/1/test btw). Those temporary
files are the result of the ALTER TABLE (the new frm/MYD/MYI files).
- Notices that the tbl_ files are in fact symlinks to files in
/m/data/4/1/test; so it wants to move the temporary files from ./ to
/m/data/4/1/test (ALTER wants, in some way, to preserve the symbolic
links). As the 2 directories are the same, the move fails.
Note that this error occured often on my system but not always,
depending on which table was accessed first (see below), which could
explain why you got it on slave only (on slave all queries are performed
by one unique thread).

Here is another problem:
MASTER> flush tables;
Query OK, 0 rows affected (0.00 sec)

MASTER> alter table tbl_ add t int;
ERROR 25: Can't create symlink './test/#sql-1aa7_4.MYI' pointing at '/m/data/4/1/test/#sql-1aa7_4.MYI' (Error 17)
MASTER> alter table tbl add t int;
ERROR 25: Can't create symlink './test/#sql-1aa7_4.MYI' pointing at '/m/data/4/1/test/#sql-1aa7_4.MYI' (Error 17)

you see, now even the alter on the real table fails.
This is because in the MySQL table cache, the real and synonym are
considered one same table; as MySQL first opened the synonym, it also
uses the synonym when the real table is invoked ("it's the same table"),
leading to the error for the real table too. If now I empty the table
cache:

MASTER> flush tables;
Query OK, 0 rows affected (0.00 sec)

MASTER> alter table tbl add t int;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

it works.

So the conclusion is: unfortunately, the symlink support in MySQL was
not designed for "synonyming", as far as DDL (Data Definition Language -
CREATE TABLE / DROP TABLE / ALTER TABLE) commands are concerned. It was
designed with the thought that symlinks are to be used to point to a
*different* directory (another partition where there is more room, or
another device to balance disk load). For DDL commands MySQL always
expects a table to exist only once, i.e. to have only one name. Putting,
in the database directory, a symlink and the real table means giving 2
names to one table...

I will add a note about this into our manual soon. I understand this is
is an inconvenience for you; you will be safe if you always do the DDL
commands (ALTER TABLE, in your case) on the real table. It's ok to do
DML commands (INSERT/DELETE/UPDATE/LOADDATA, which fortunately occur
much more often than ALTER TABLE normally) on both tables indifferently.

Guilhem