Bug #51196 | Slave SQL: Got an error writing communication packets, Error_code: 1160 | ||
---|---|---|---|
Submitted: | 15 Feb 2010 17:03 | Modified: | 28 Jan 2012 15:35 |
Reporter: | Alvaro Palacios | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.42, 5.1.47, 5.0, 5.1.50 | OS: | Linux (Centos 2.6.15.1) |
Assigned to: | Daogang Qu | CPU Architecture: | Any |
Tags: | federated, replication |
[15 Feb 2010 17:03]
Alvaro Palacios
[15 Feb 2010 17:15]
Sveta Smirnova
Thank you for the report. Which exact minor version do you use?
[17 Feb 2010 9:35]
Alvaro Palacios
The minor versión is 5.1.36 and also we have test in 5.1.42.
[5 Mar 2010 6:37]
Sveta Smirnova
Thank you for the feedback. I the initial description you wrote federation engine should be off while in "how to repeat" section you wrote it should be on. So should it be on or off? Please also send us output of SHOW SLAVE STATUS\G after error occur. Query which caused the problem would be helpful too. You said you had test with version 5.1.42. Can you send this test to us, so we can repeat the problem in our environment?
[5 Mar 2010 9:50]
Alvaro Palacios
Sorry, I want to say that ENGINE FEDERATED may be ON in slave server. We get the error when execute 120k rows COMMAND REPLACE aprox. but i think that is not relevant because we get it in two diferent slave servers, one with various consecutive 5.1 upgrades since three month. Due to work in a production enviroment, we had to make the FEDERATED ENGINE = OFF in slave, so i can not test. Configuration basic: Two servers with Mysql 5.1.x Master - Slave Federated ON High load COMMAND REPLACE in master. Result: Slave SQL thread die
[21 Jul 2010 7:30]
Andy Thörnblom
We experience the same problem. Example output: mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: X.X.X.X Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: XXXX-bin.000091 Read_Master_Log_Pos: 978446181 Relay_Log_File: XXXX-relay.000228 Relay_Log_Pos: 591993295 Relay_Master_Log_File: XXXX-bin.000091 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1160 Last_Error: Error 'Got an error writing communication packets' on query. Default database: 'snapshot_work'. Query: 'DROP TABLE IF EXISTS federate_tables.player_remote' Skip_Counter: 0 Exec_Master_Log_Pos: 591993151 Relay_Log_Space: 978446517 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1160 Last_SQL_Error: Error 'Got an error writing communication packets' on query. Default database: 'snapshot_work'. Query: 'DROP TABLE IF EXISTS federate_tables.player_remote' 1 row in set (0.00 sec) mysql> select version(); +------------------------------------+ | version() | +------------------------------------+ | 5.1.47-enterprise-gpl-advanced-log | +------------------------------------+ 1 row in set (0.00 sec)
[31 Jul 2010 13:33]
Sveta Smirnova
Thank you for the feedback. Problem is replication affected by communication between federated and server federated table connects to. I can not repeat exactly same problem, but communication between master and slave failure can be shown by following test. I used 5.0 here, results for 5.1 will be posted later. 1. Create test for MTR: --source include/master-slave.inc CREATE TABLE `t1` ( `f1` int(11) DEFAULT NULL ) ENGINE=FEDERATED DEFAULT CHARSET=utf8 CONNECTION='mysql://root@127.0.0.1:4040/test/t1'; --sync_slave_with_master select * from t1; connection master; #set binlog_format='row'; create table t2(f1 int); insert into t2 values(1),(2),(3),(4),(5); --sync_slave_with_master --vertical_results show slave status; stop slave; start slave; sleep 2; show slave status; 2. Create table t1 with same definition as federated one on one of local servers. Insert 6 rows into it. 3. Start MySQL Proxy with backend this third server. 4. Run ./mtr --manual-gdb rpl+_bug51196 5. On slave server add breakpoints: b log_event.cc:1922 and mysql_parse 6. When you hit "Breakpoint 2, Query_log_event::exec_event (this=0x36050e8, rli=0x4846098, query_arg=0x36051a1 "insert into t2 values(1),(2),(3),(4),(5)", q_len_arg=40) at log_event.cc:1929 1929 thd->catalog= catalog_len ? (char *) catalog : (char *)"";" kill MySQL Proxy process, then continue execution 7. In the first SHOW SLAVE STATUS OUTPUT you see "Slave_IO_Running No" although no connection to master was interrupted.
[2 Aug 2010 11:03]
Sveta Smirnova
All reporters, I can not repeat it with current 5.1 BZR sources anymore. Please try if this still repeatable with 5.1.49 for you and inform us about results.
[30 Aug 2010 7:56]
Roel Van de Paar
A customer reports that stopping and starting the slave did not resume replication while a start/stop slave did.
[31 Aug 2010 1:18]
Roel Van de Paar
Issue may happen even if FEDERATED tables are not actually replicated (FEDERATED turned ON, but database which contains FEDERATED tables excluded using any of the repliate* or binlog* do/ignore options). All reporters, please add comments if you find the same.
[30 Nov 2010 2:36]
Daogang Qu
Could you please generate a test case to fix the problem?
[7 May 2011 5:20]
MySQL Verification Team
many folks report federated table data is not even being replicated, but the error is still seen. this is strange. perhaps it is some table cache problem? i'd be curious to know if regular "flush tables" helps to reduce the frequency of this breakage?
[11 May 2011 8:18]
Alvaro Palacios
I've seen this error when run REPLACE LOW_PRIORITY in master into the database with federated tables, but not applied on then. ¿It's possible that the federated source server connection has expired when wait_timeout variable it's over and failed to reconnect when the write operation arrive to the database?
[20 Jul 2011 15:30]
Alexey Kopytov
This is likely the same problem with FEDERATED + table cache as in bug #61790, see my analysis there.
[28 Jan 2012 13:59]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html
[28 Jan 2012 14:16]
Jon Stephens
Fixed in 5.1+. Documented bugfix in the 5.1.62, 5.5.21, and 5.6.5 changelogs, as follows: It was possible on replication slaves where FEDERATED tables were in use to get timeouts on long-running operations, such as Error 1160 -Got an error writing communication packets-. The FEDERATED tables did not need to be replicated for the issue to occur. Closed.
[27 Jul 2012 13:37]
K Vargo
We're seeing intermittent 1160 errors in a related environment: FEDERATED engine is on, but the error is triggered (intermittently, not reliably reproducable) during mysqldump of databases that have, and do not have actual FEDERATED tables. That is, we have FEDERATED enabled, but databases that don't make use of FEDERATED tables also intermittently fail with 1160 type errors (even when setting mysqldump to SOCKETS vice tcp).
[27 Jul 2012 13:38]
K Vargo
Sorry, using: MySQL-client-5.5.25a-1.rhel5.x86_64 MySQL-server-5.5.25a-1.rhel5.x86_64 MySQL-shared-compat-5.5.25a-1.rhel5.x86_64
[20 Aug 2012 14:52]
Sander Smeenk
Running 5.5.24, we see these errors on a setup with FEDERATED enabled too. Happens mostly during the scheduled MySQL dump runs.
[4 Apr 2014 22:23]
Douglas Chadwick Jr.
I experienced the same error (i.e. Got an error writing communication packets, Error_code: 1160) with MySQL 5.6.16 x64 (which is a later version than what you said has the bug fix) when executing stored procedures and functions from the command line or from running a .bat files. The SP selects data from a federated table and I suspect that my temporary nightly export/dump of the database is failing for the same reason. I plan to attach to this bug all the related files right after I post this comment. Can you please reopen this bug and fix it? Thanks!
[4 Apr 2014 22:26]
Douglas Chadwick Jr.
Attachment with more info on our bug has been attached for the developers to see. Thanks!
[4 Apr 2014 22:29]
Douglas Chadwick Jr.
For my bug, MysqlServer is installed on Windows 2008 R2
[4 Apr 2014 22:37]
Douglas Chadwick Jr.
The workaround it try running it multiple times first from double clicking the bat and second from the command line (as shown in the screenshot above by “cd” to the directory in cmd and then running the .bat file there). Then it started working every time for that minute. But then fails again intermittently if you try a week later.
[16 May 2014 11:32]
AmirBehzad Eslami
We have the same issue in our multi-master setup on MySQL Ver 5.6.17 for Linux on x86_64. When the federated engine is queried for the first time, it fetches the rows without a problem. But if we kill the open connection on the remote server, and query the the federated table again, mysql says: ERROR 1158 (08S01): Got an error reading communication packets Shouldn't it re-establish the connection and query the table, instead of just throwing the error? The open-connection is automatically closed due to wait_timeout.
[2 Jun 2014 10:11]
Prasad Gowda
Related to Federated Table : Whenever a query is issued against a federated table, a single/multiple [ depends on Number = table_open_cache_instances] connection will be created from the DB instance hosting the federated table to the DB instance hosting the main table. If those single/mutiple connection on the DB instance hosting main table expires because of [wait_timeout,connect_timeout] or if those gets terminated manually , any new queries on federated table will result in "ERROR 1158 (08S01) at line 1: Got an error reading communication packets" on the DB instance hosting the federated table. Reason : Flushing of the federated table does not happen properly from the table_open_cache_instances , Solution : To avoid this, the temporary solution is to manually/forcefully flush only those federated tables once before every [wait_timeout,connect_timeout] seconds. In our case we put the flush command in event_scheduler since the number of federated tables were very less and we knew exactly which federated tables were rarely getting queried with gap of more than [wait_timeout,connect_timeout] seconds. This used to affect even our MEB backup when it used execute "flush tables with READ lock". flush table <federated_table_name>; -- executed every 4 hours Key : The frequency with which it errors depends on how many table_open_cache_instances is configured on the Instance. Since then, after putting the manual flush of federated tables with a frequence less than [wait_timeout,connect_timeout] , this issue has not showed up. Hope this helps. Thanks Prasad
[2 Jun 2014 15:11]
Douglas Chadwick Jr.
That is a very helpful comment. Thank you! Austin
[30 Oct 2014 0:50]
MySQL Verification Team
http://bugs.mysql.com/bug.php?id=74618 marked as duplicate of this one.