MySQL Bugs: #65617: SQL update statement hangs during replication on slave

Bug #65617	SQL update statement hangs during replication on slave
Submitted:	14 Jun 2012 12:48	Modified:	14 Jul 2012 15:29
Reporter:	James Siddle	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.5.21	OS:	Linux (Ubuntu 11.0.4)
Assigned to:		CPU Architecture:	Any
Tags:	hanging, replication, slave, temporary table

Description:
Hi,

We're running the Percona version of MySQL, version 5.5.21, and we're hitting an odd replication issue.

We run the following long-running update (with some filled in values) and find that the CREATE TEMPORARY TABLE consistently hangs when running on one of our slaves.

CREATE TEMPORARY TABLE (SELECT csid, count(csid) AS ${COUNT_COLUMN} FROM ${dataset}.locations AS loc, ${dataset}.entityid_to_csid AS e2c WHERE e2c.entityid=loc.entityid AND docid>=${MINDOC} AND docid<=${MAXDOC} GROUP BY csid) 
UNION ALL
(SELECT csid, count(csid) AS ${COUNT_COLUMN} FROM ${dataset}.images AS img, ${dataset}.entityid_to_csid AS e2c WHERE e2c.entityid=img.entityid AND docid>=${MINDOC} AND docid<=${MAXDOC} GROUP BY csid)

(table schema defs below) 

This is running on an Ubuntu 11.0.4 server.

I've checked the MySQL and Percona release histories for more recent builds, but haven't found anything that exactly describes my issue. There's a replication bug fixed in MySQL 5.5.22 that sounds like it could be related (http://dev.mysql.com/doc/refman/5.5/en/news-5-5-22.html, search for "Replication:") but it's tenuous.

Also worth noting that we get a 100% disk I/O on the hanging slave, and that a different slave hangs each time. The replication status appears OK except that the rows read keeps resetting to zero, and the process status is reported as "Sending data". When the update runs correctly, we see a message indicating that the data is being sent to the temporary table instead.

Any ideas on how we can resolve this would be greatly appreciated.

Thanks,
Jim from Digital Science

  CREATE TABLE locations (
  id INTEGER NOT NULL AUTO_INCREMENT,
  docid INTEGER NOT NULL,
  entityid INTEGER NOT NULL,
  fieldid INTEGER NOT NULL,
  rank INTEGER NOT NULL,
  startoffset INTEGER NOT NULL,
  endoffset INTEGER NOT NULL,
  isExemplified TINYINT(1) DEFAULT 0,
  PRIMARY KEY(id),
  INDEX(docid,fieldid,rank),
  INDEX(fieldid),
  INDEX(entityid),
  UNIQUE(docid,entityid,fieldid,rank,startoffset),
  FOREIGN KEY (docid) REFERENCES documents(docid)
  ) DEFAULT CHARSET=ascii MAX_ROWS=1000000000 ENGINE=INNODB;

  CREATE TABLE images (
  id INTEGER NOT NULL AUTO_INCREMENT,
  docid INTEGER NOT NULL,
  entityid INTEGER NOT NULL,
  fieldid INTEGER NOT NULL,
  source VARCHAR(500),
  PRIMARY KEY(id),
  INDEX(docid,fieldid),
  INDEX(fieldid),
  INDEX(entityid)
  ) DEFAULT CHARSET=ascii MAX_ROWS=1000000000 ENGINE=INNODB;
  
  CREATE TABLE entityid_to_csid (
  entityid INTEGER NOT NULL,
  csid INTEGER default NULL,
  fragment BOOLEAN default NULL,
  connected BOOLEAN default NULL,
  radical BOOLEAN default NULL,
  element BOOLEAN default 0,
  molweight FLOAT default NULL,
  INDEX(entityid),
  INDEX(csid),
  UNIQUE(entityid,csid)
  ) DEFAULT CHARSET=ascii MAX_ROWS=100000000 ENGINE=INNODB;

How to repeat:
This may be tricky to reproduce. The steps would be:
 - create database as described by schema (see Description field)
 - populate with test data (we may be able to provide a dump)
 - create master and two slaves
 - run update statement on master
 - check slave status

Please, send the output of:

show full processlist;

from the slave at the moment when you see the problem. Also, please, send my.cnf file content from this slave.

Slave configuration file

Attachment: slave.cnf.txt (text/plain), 1.56 KiB.

Hi,

I've just uploaded the slave configuration file, and I'll upload the process list output the next time we hit the issue - this might take a week or two because it's a huge update.

J

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".