Bug #956 Replication failure after table structure modification on master
Submitted: 30 Jul 2003 7:01 Modified: 30 Sep 2003 14:15
Reporter: Andrei Korobeinik Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.14 OS:Linux (Linux SuSe)
Assigned to: Guilhem Bichot CPU Architecture:Any

[30 Jul 2003 7:01] Andrei Korobeinik
Description:
After adding a new field to the table on the master server and running an isert query on the modified table, replication fails.

Query on master:

ALTER TABLE `rate`.`friends` ADD `comment` VARCHAR(250); 

Error on slave:

ERROR: 1054  Unknown column 'comment' in 'field list'                                                                         
030730  2:33:55  Slave: error 'Unknown column 'comment' in 'field list'' on query 'UPDATE friends SET comment='Comment goes here.' WHERE user='16' AND friend='7982'', error_code=1054                                                                    
030730  2:33:55  Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE 
START". We stopped at log 'mysql-bin.015' position 15592645

How to repeat:
Probably can occure when you modify a table on master and run a query on that having replication running.

Suggested fix:
It seems that slave server does not copy the table modification (ALTER) and operates with obsolete structure. This modification took long time (about 30 seconds), so the problem may be in a kind of timeout (in that case this can be fixed) or in the specific query (maybe ALTER TABLE statement cause the problem).
[30 Jul 2003 7:25] Guilhem Bichot
We would need a copy of 'mysql-bin.015'. This is so that I know if the ALTER found his way to the master's binlog (if we can assume it should have been in that binlog, not in .014 or older; do you know?).
It would also be helpful to have a copy of the slave's relay log, to know if the ALTER was in the relay log; if the relay log is more than 10MB after compression, then maybe I'll start with mysql-bin.015 first, and ask for the relay log only if really needed.
Could you please make an archive (tar + bzip2) of these 2 files and ftp it as file
'bug_956.tar.bz2' at ftp://support.mysql.com/pub/mysql/secret/ ?
Thanks for your help troubleshooting this!
[30 Jul 2003 10:19] Andrei Korobeinik
Unfortunately, I can not provide you with those files. The server that
experienced the error is extremely busy (more than 4mln page views per day)
so the logs are deleted pretty fast to save the space and now the needed
files are gone. The replicaton was restarted right after the problem
occured.

It's not the first occurence of such a problem, so next time I will try to
save bin-logs for further investigation, upload the requested file to FTP
and answer this letter once again. Hope this may help you.
[31 Jul 2003 7:53] Guilhem Bichot
On Thu, 2003-07-31 at 13:48, Andrei Korobeinik wrote:
Hi there!
> 
> > Ok. Does your master run with some binlog-do-db or binlog-ignore-db
> > options, does your slave run with replicate-do-db, replicate-ignore-db,
> > replicate-do-table, replicate-ignore-table, replicate-wild-do-table,
> > replicate-wild-ignore-table options?
> 
> The following options are in use:
> 
> replicate-do-db=rate
> replicate-ignore-table=rate.votes
> replicate-ignore-table=rate.clicks
> replicate-wild-ignore-table=rate.%cache
> 

The query was:
ALTER TABLE `rate`.`friends` ADD `comment` VARCHAR(250); 

I'm 90% sure that this replicate-do-db option explains the problem.
I have now added more explaination of these replicate-* options to our manual:
http://www.mysql.com/doc/en/Binary_log.html (this one is rather about binlog-do-db and binlog-ignore-db, but is useful to read too),
http://www.mysql.com/doc/en/Replication_Options.html (see the paragraph after the HTML table).

What may have happened is that PhpMyAdmin did not have `rate` as the current database (the one selected by the "USE" or "CONNECT" SQL command) of its MySQL connection (it was another database or it was undefined). Then the replicate-do-db rule caused the slave to ignore (skip) the ALTER query, hence the problem.

We almost always recommend our users and customers to prefer replicate-wild-do-table to replicate-do-db, because it is safer (explained in the links). The latter has the only advantage of being faster; it is great but requires some attention to always set the current database carefully before issuing a query.

I would recommend you ensure to send a "USE rates" before the ALTER. But I don't know how easily this can be done with PhpMyAdmin (the USE must be in the same connection as the ALTER for things to work fine).
Or maybe do not use replicate-do-db.

Regards,
Guilhem