MySQL Bugs: #36757: SQL thread stop

Bug #36757	SQL thread stop
Submitted:	16 May 2008 13:57	Modified:	18 May 2017 3:02
Reporter:	Cyril SCETBON	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Replication	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-6.3	OS:	Linux (debian etch)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	5.1.27-ndb-6.3.17-telco, cluster, MySQL, replication

Description:
SQL_THREAD stop cause of the following error :
It was not possible to update the positions of the relay log information the slave may be in an inconsistent state. Stopped in ./mysqld-relay-bin.000003 position 225463952

We're hitting this error when we make changes to a disk table. When we start again the SQL_THREAD there isn't any error :

start slave SQL_THREAD;

mysql> desc spp_disk02
    -> ;
+-------+---------------------+------+-----+-------------------+-----------------------------+
| Field | Type                | Null | Key | Default           | Extra                       |
+-------+---------------------+------+-----+-------------------+-----------------------------+
| id    | bigint(20) unsigned | NO   | PRI | NULL              | auto_increment              | 
| ise   | varchar(54)         | NO   | UNI | NULL              |                             | 
| vc01  | varchar(350)        | YES  |     | NULL              |                             | 
| vc02  | varchar(11)         | NO   |     | -1                |                             | 
| vc03  | varchar(11)         | NO   |     | -1                |                             | 
| ch04  | char(10)            | NO   |     |                   |                             | 
| ch05  | char(10)            | NO   |     |                   |                             | 
| ch06  | char(1)             | NO   |     |                   |                             | 
| ch07  | char(3)             | NO   |     |                   |                             | 
| ts    | timestamp           | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | 
+-------+---------------------+------+-----+-------------------+-----------------------------+

How to repeat:
generate requests of type :

SELECT * FROM spp_disk02 
followed by 
UPDATE spp_disk02 SET vc01 = '360;20070504;999#6065;20070604;999#3655;20070704;999#3393;20070804;999#3370;20070904;999#3564;20071004;999#3317;20071104;999#3379;20071204;999#3354;20080104;999#3339;20080204;999#3290;20080304;999#3389;20080404;999' WHERE ise = 'ID-SPP-100-gciigJW4NGkLJElBlgFh23DhdkKZZIw1IYlaMIeZNAU'

change severity

Can you attach the mysqld and cluster logs spanning the time of the incident to the bug report?

No error on clusterlog, just local checkpoints messages

mysqld error log

Attachment: mysqld.err (application/octet-stream, text), 5.49 KiB.

version upgraded.

We still get the same error :

It was not possible to update the positions of the relay log information: the slave may be in an inconsistent state. Stopped in ./mysqld-relay-bin.000014 position 44473744

any idea ?
It seems to be correlated with workload

We're still getting the same error but we've noticed that in the binary log of the master we have something like :

# at posi
# at posi+1
....
# at posi+n
#datei server id ...
...
#datei+n server id ...

When SQL thread stops at posi it doesn't work anymore (even if we restart mysqld). But if we use "CHANGE MASTER" to jump to position posi+n it works until the next similar error.

Per Martin S. Assigning to Mat's for comments

Assigned to Mat's for comment per Martin S.

still the same error :(

                   Last_Error: It was not possible to update the positions of the relay log information: the slave may be in an inconsistent state. Stopped in ./replication01-relay-bin.000002 position 28685608

any more information about a workaround or a fix ?

cannot reproduce this with any of the recent releases, tested:
7.2.29
7.3.17
7.4.15
7.5.6