MySQL Bugs: #72825: WORKER_ID in replication_execute_status_by

Bug #72825	WORKER_ID in replication_execute_status_by_worker is always 1 SQL Not Running
Submitted:	1 Jun 2014 3:47
Reporter:	Jesper wisborg Krogh	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Performance Schema	Severity:	S3 (Non-critical)
Version:	5.7.4	OS:	Linux (OL6)
Assigned to:	Marc ALFF	CPU Architecture:	Any

Description:
The WORKER_ID column in the performance_schema.replication_execute_status_by_worker table is always 1. The WORK_ERID is referenced in performance_schema.replication_execute_status_by_coordinator and SHOW SLAVE STATUS to tell where to find more information for replication errors.

How to repeat:
Set slave_parallel_workers > 0.

slave> SELECT WORKER_ID, THREAD_ID FROM performance_schema.replication_execute_status_by_worker;
+-----------+-----------+
| WORKER_ID | THREAD_ID |
+-----------+-----------+
|         1 |        41 |
|         2 |        42 |
|         3 |        43 |
|         4 |        44 |
|         5 |        45 |
|         6 |        46 |
|         7 |        47 |
|         8 |        48 |
+-----------+-----------+
8 rows in set (0.00 sec)

Suggested fix:
An example from performance_schema.replication_execute_status_by_coordinator:

slave> SELECT * FROM performance_schema.replication_execute_status_by_coordinator\G
*************************** 1. row ***************************                     
           THREAD_ID: NULL                                                         
       SERVICE_STATE: OFF                                                          
   LAST_ERROR_NUMBER: 1062                                                         
  LAST_ERROR_MESSAGE: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '' at master log , end_log_pos 587543. See error log and/or performance_schema.replication_execute_status_by_worker table for more details about this failure or others, if any.                                                                                                                                             
LAST_ERROR_TIMESTAMP: 2014-06-01 13:10:07                                                                                                                                                                                                    
1 row in set (0.01 sec)

Sorry - wrong output from replication_execute_status_by_worker. The issue only exists when the SQL threads are not running:

slave> SELECT WORKER_ID, THREAD_ID, LEFT(LAST_ERROR_MESSAGE, 20) AS 'Last Error' FROM performance_schema.replication_execute_status_by_worker;
+-----------+-----------+----------------------+
| WORKER_ID | THREAD_ID | Last Error           |
+-----------+-----------+----------------------+
|         1 |      NULL |                      |
|         1 |      NULL |                      |
|         1 |      NULL |                      |
|         1 |      NULL |                      |
|         1 |      NULL |                      |
|         1 |      NULL |                      |
|         1 |      NULL |                      |
|         1 |      NULL | Worker 7 failed exec |
+-----------+-----------+----------------------+
8 rows in set (0.00 sec)

But as this will always be the case when one of the SQL threads encountered an error it makes it more difficult to investigate replication errors.

One important side effect of this is that the WORKER_ID displayed in the error message in replication_execute_status_by_coordinator is always 1 no matter which worker actually encountered the error.