Bug #36757 SQL thread stop
Submitted: 16 May 2008 15:57 Modified: 5 Feb 15:09
Reporter: Cyril SCETBON
Status: Open
Category:Server: ClusterRep Severity:S1 (Critical)
Version:mysql-5.1-telco-6.3 OS:Linux (debian etch)
Assigned to: Mats Kindahl Target Version:
Tags: 5.1.27-ndb-6.3.17-telco, MySQL, cluster, replication
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[16 May 2008 15:57] Cyril SCETBON
Description:
SQL_THREAD stop cause of the following error :
It was not possible to update the positions of the relay log information the slave may be
in an inconsistent state. Stopped in ./mysqld-relay-bin.000003 position 225463952

We're hitting this error when we make changes to a disk table. When we start again the
SQL_THREAD there isn't any error :

start slave SQL_THREAD;

mysql> desc spp_disk02
    -> ;
+-------+---------------------+------+-----+-------------------+-----------------------------+
| Field | Type                | Null | Key | Default           | Extra                   
   |
+-------+---------------------+------+-----+-------------------+-----------------------------+
| id    | bigint(20) unsigned | NO   | PRI | NULL              | auto_increment          
   | 
| ise   | varchar(54)         | NO   | UNI | NULL              |                         
   | 
| vc01  | varchar(350)        | YES  |     | NULL              |                         
   | 
| vc02  | varchar(11)         | NO   |     | -1                |                         
   | 
| vc03  | varchar(11)         | NO   |     | -1                |                         
   | 
| ch04  | char(10)            | NO   |     |                   |                         
   | 
| ch05  | char(10)            | NO   |     |                   |                         
   | 
| ch06  | char(1)             | NO   |     |                   |                         
   | 
| ch07  | char(3)             | NO   |     |                   |                         
   | 
| ts    | timestamp           | NO   |     | CURRENT_TIMESTAMP | on update
CURRENT_TIMESTAMP | 
+-------+---------------------+------+-----+-------------------+-----------------------------+

How to repeat:
generate requests of type :

SELECT * FROM spp_disk02 
followed by 
UPDATE spp_disk02 SET vc01 =
'360;20070504;999#6065;20070604;999#3655;20070704;999#3393;20070804;999#3370;20070904;999#3564;20071004;999#3317;20071104;999#3379;20071204;999#3354;20080104;999#3339;20080204;999#3290;20080304;999#3389;20080404;999'
WHERE ise = 'ID-SPP-100-gciigJW4NGkLJElBlgFh23DhdkKZZIw1IYlaMIeZNAU'
[16 May 2008 16:01] Cyril SCETBON
change severity
[16 May 2008 16:19] Hartmut Holzgraefe
Can you attach the mysqld and cluster logs spanning the time of the incident to the bug
report?
[16 May 2008 16:35] Cyril SCETBON
No error on clusterlog, just local checkpoints messages
[16 May 2008 16:36] Cyril SCETBON
mysqld error log

Attachment: mysqld.err (application/octet-stream, text), 5.49 KiB.

[6 Jun 2008 10:25] Cyril SCETBON
version upgraded.

We still get the same error :

It was not possible to update the positions of the relay log information: the slave may
be in an inconsistent state. Stopped in ./mysqld-relay-bin.000014 position 44473744
[10 Jun 2008 16:24] Cyril SCETBON
any idea ?
It seems to be correlated with workload
[5 Feb 15:09] Cyril SCETBON
We're still getting the same error but we've noticed that in the binary log of the master
we have something like :

# at posi
# at posi+1
....
# at posi+n
#datei server id ...
...
#datei+n server id ...

When SQL thread stops at posi it doesn't work anymore (even if we restart mysqld). But if
we use "CHANGE MASTER" to jump to position posi+n it works until the next similar error.
[18 May 15:45] Jonathan Miller
Per Martin S. Assigning to Mat's for comments
[18 May 15:47] Jonathan Miller
Assigned to Mat's for comment per Martin S.