Description:
On slave SQL thread, a collation retrieved from binlog event from master is used for a database collation if no database is set as a current database. It will cause a collation/charset mismatch for LOAD DATA command on master and slave. IMHO, LOAD DATA should warn it's unsafe if following conditions are met:
* no current database is set.
* binlog format is statement.
* no explicit character set is set.
How to repeat:
* Setup a replication
* Connect to a master without current db
* Create table and populate it, then dump it to a file
mysql> CREATE DATABASE test;
mysql> CREATE TABLE test.t (a SERIAL, b VARCHAR(100)) CHARACTER SET utf8mb4;
mysql> INSERT INTO test.t (b) VALUES('こんにちは');
mysql> SELECT * INTO OUTFILE 't.txt' FROM test.t;
mysql> TRUNCATE test.t;
* Change server collation temporarily
mysql> SET collation_server = cp932_japanese_ci;
mysql> SET collation_database = utf8mb4_general_ci;
I set collation_database just for explanation here. I know it's deprecated. If it's not set, global level collation_database is used instead.
* Set binlog format to statement
mysql> SET binlog_format = statement;
* Execute LOAD DATA
mysql> LOAD DATA INFILE 't.txt' INTO TABLE test.t;
Now, you will find that the slave is stopped due to an charset error.
Suggested fix:
In sql/sql_db.cc:
bool load_db_opt(THD *thd, const char *path, HA_CREATE_INFO *create)
{
File file;
char buf[256];
DBUG_ENTER("load_db_opt");
bool error=1;
uint nbytes;
memset(create, 0, sizeof(*create));
create->default_table_charset= thd->variables.collation_server; <====
This is a source of the problem. Setting collation server as table's default character set isn't follow a usual behavior.