Bug #21583 Federated table returns broken strings.
Submitted: 11 Aug 2006 13:33 Modified: 29 Feb 2008 12:10
Reporter: Tetsuro Ikeda Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Federated storage engine Severity:S3 (Non-critical)
Version:5.0.24/5.0BK OS:Linux (Linux)
Assigned to: Assigned Account CPU Architecture:Any

[11 Aug 2006 13:33] Tetsuro Ikeda
Description:
Federated table returns broken strings if table charset is different from default charset defined by "./configure --with-charset=XXX".

When "--init-connect=SET NAMES XXX" is used or when "--skip-character-set-client-handshake" is used, it may seems working.
But strings from table got broken again when connection got down and 
reconnected in "mysql_real_query".

There is nothing solution for non-latin1 users except making own MySQL binary by compiling with "--with-charset=XXX", and it means non-latin1 users can't use Federated storage engine with MySQL AB binary distributions.

How to repeat:
Get two mysql server installed. Use MySQL AB binary distribution this time.

Let's say "remote-mysqld" and "local-mysqld" for each mysql server.

Create a table like following on remote-mysqld.

CREATE TABLE t_remote (c1 VARCHAR(20)) DEFAULT CHARSET utf8:

You may notice we just created utf8 table on latin1 mysql server.

Insert a row into t_remote like followings by using non-latin1 characters:

INSERT INTO t_remote VALUES ("あああ");

Then, create a table like following on local-mysqld.

CREATE TABLE t_local (c1 VARCHAR(20)) DEFAULT CHARSET utf8 ENGINE = Federated
connection='mysql:foo:bar@hostname:3306/test/t_remote';

Then, call SELECT on t_local table:

SELECT c1 FROM t_local;

You will get broken strings which are replaced with ??? marks:

[test] > select * from t_local;
+------+
| c1   |
+------+
| ???  |
+------+
1 rows in set (0.01 sec)

The reason you get broken strings is the connection between local-mysqld and
remote-mysqld is set by default charset which is latin1, charset at compilation.

character-set-client=latin1
character-set-connection=latin1
character-set-result=latin1

and table's charset is utf8 in this case because utf8 is declared at table creation.

remote-mysqld converts table's string from utf8 to latin1 and send them to
local-mysqld, so the result gets broken.

------------------------------------------------------------------
If you use "--init-connect=SET NAMES utf8" when remote-mysqld booting,
the connection' charset is set to following by remote-mysqld at connect time.

character-set-client=utf8
character-set-connection=utf8
character-set-result=utf8

So, remote-mysqld doesn't convert table data, and the result won't get broken.

But, unfortunatly, mysql_real_query function used by ha_federated breaks 
these variable's setting when connection got down and it cause auto-reconnect
in mysql_real_query with mysql_real_connect, mysql_reconnect functions.

To repeat this, kill the connection between remote-mysqld and local-mysqld by
"KILL #" command on remote-mysqld, then do SELECT again on local-mysqld.

This cause the calling of "mysql_reconnect" in libmysql/client.c and 
this function does "SET NAMES xxx" by using charset name in MYSQL structure.

Suggested fix:
I think there are 2 ways of fixing this problem.

One is to remove following code from mysql_reconnect function in 
libmysql/client.c, line 2203.

  if (mysql_set_character_set(&tmp_mysql, mysql->charset->csname))

By doing this, "--init-connect=XXX" solution works also after auto-reconnect.

About second way, please see a patch I attach.

If this patch is appried, ha_federated::open become to see table's 
charset and put it to MYSQL structure so that mysql_set_character_set works
good both in ha_federated::open and in mysql_reconnect.

And ha_federated::open also become to execute "SET NAMES table_charset"
when opening table, so that we can get non-latin1 strings with federated table
and MySQL AB binary distribution at same time.

This patch will also enhance federatad engine.

By using this patch, we can use federated table in mixed charset environment.
We will be able to use both utf8 and other charsets for federated tables 
at same time, because this way don't depend server variables.

I think the second solution is better. 
Please take a look little bit ;)
[11 Aug 2006 13:34] Tetsuro Ikeda
suggested patch for this report

Attachment: federated-charset.patch (application/octet-stream, text), 694 bytes.

[14 Aug 2006 18:31] MySQL Verification Team
Thank you for the bug report and patch suggestion.

mysql> CREATE TABLE t_local (c1 VARCHAR(20)) DEFAULT CHARSET utf8 ENGINE = Federated connection='mysql://foo:bar@192.168.0.119:3306/test/t_remote';
Query OK, 0 rows affected (0.11 sec)

mysql> SELECT c1 FROM t_local;
+------+
| c1   |
+------+
| ???  |
+------+
1 row in set (0.05 sec)

mysql>
[20 Sep 2006 17:26] Gleb Paharenko
I'd like to add that inserts don't work in  similar way.

Say, we have a table on remote server

mysql> show create table tcp\G;
*************************** 1. row ***************************
       Table: tcp
Create Table: CREATE TABLE `tcp` (
  `a` varchar(60) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=cp1251
1 row in set (0.00 sec)

On the local server we have:
mysql> show create table tcp\G;
*************************** 1. row ***************************
       Table: tcp
Create Table: CREATE TABLE `tcp` (
  `a` varchar(60) default NULL
) ENGINE=FEDERATED DEFAULT CHARSET=cp1251 CONNECTION='mysql://g:*****@p.dev.mtx.kiev.ua:3306/test/tcp'

On all servers:
mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | cp1251                           |
| character_set_connection | cp1251                           |
| character_set_database   | cp1251                           |
| character_set_filesystem | binary                           |
| character_set_results    | cp1251                           |
| character_set_server     | cp1251                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+

Operating system is FreeBSD.

SQL query file: aby.sql

[root@newgleb rezina]# cat /tmp/aby.sql  |iconv -f cp1251 -t utf8
insert into tcp1 set a='абу';

This command
 mysql test </tmp/aby.sql

produces ??? in remote table, in case local server was compiled without with_charset=cp1251, and produces 'абу' when local server was compiled with_charset=cp1251

Please, fix this.
[29 Feb 2008 12:10] Ramil Kalimullin
See bug #17044: FEDERATED storage engine is not UTF8 clean
[23 Mar 2008 16:33] Patrick Galbraith
I have a fix for FederatedX storage engine using your patch, with some changes:

On remotedb, port 3307:

mysql> CREATE TABLE t_remote (c1 VARCHAR(20)) DEFAULT CHARSET utf8;
Query OK, 0 rows affected (0.27 sec)

mysql> INSERT INTO t_remote VALUES ("あああ");
Query OK, 1 row affected (0.00 sec)

mysql> select * from t_remote;
+-----------+
| c1        |
+-----------+
| あああ |
+-----------+
1 row in set (0.00 sec)

mysql> CREATE TABLE t_local (c1 VARCHAR(20)) DEFAULT CHARSET utf8 ENGINE=FEDERATEDX connection='mysql://root@localhost:3307/remotedb/t_remote';
Query OK, 0 rows affected (0.06 sec)

mysql> select * from t_local;
+-----------+
| c1        |
+-----------+
| あああ |
+-----------+
1 row in set (0.00 sec)
[23 Mar 2008 16:38] Patrick Galbraith
Might I also add:

insert to federatedx table (I took Japanese chars from slashdot.jp):

mysql> insert into t_local values ("がどこまで巧");
Query OK, 1 row affected (0.00 sec)

mysql> select * from t_local;
+--------------------+
| c1                 |
+--------------------+
| あああ          |
| がどこまで巧 |
+--------------------+
2 rows in set (0.00 sec)

Check on remote table:
mysql> select * from t_remote;
+--------------------+
| c1                 |
+--------------------+
| あああ          |
| がどこまで巧 |
+--------------------+
2 rows in set (0.00 sec)