MySQL Bugs: #91008: mysql stop for multi-threaded slave gives error code 1756

Bug #91008	mysql stop for multi-threaded slave gives error code 1756
Submitted:	24 May 2018 11:03	Modified:	17 Dec 2024 16:33
Reporter:	Prasad N	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.21	OS:	CentOS (CentOS (2.6.32-696.23.1.el6.x86_64 ))
Assigned to:	MySQL Verification Team	CPU Architecture:	x86
Tags:	MTS, slave

Description:
On stopping the mysql server that is running multi-threaded slave, I am seeing the following message in the error log:
[ERROR] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756

How to repeat:
Multi threaded slave configured with 30 worker threads doing asynchronous replication .
Master has continuous writes - 30 client threads inserting data into a table with 2 columns .

stop the the slave mysql server [ may have to repeat start and stop few time]

Seeing the above error message.

my.cnf file:

[mysqld]

port=3306
skip_name_resolve=1
bind_address=0.0.0.0
user=mysql
pid_file=/var/run/mysqld/mysqld.pid
socket=/var/lib/mysql/mysql.sock
server_id=137
require_secure_transport=OFF
log_bin=mysql-bin
expire_logs_days=21
sync_binlog=1
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=1
max_connect_errors=1000000
max_allowed_packet=16777216
max_heap_table_size=33554432
max_connections=60
max_user_connections=48
thread_cache_size=50
open_files_limit=65535
table_open_cache=2048
table_definition_cache=2048
relay_log=relay-log-slave
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_format=MIXED
log_slave_updates=true
slave_net_timeout=60
master_info_repository=TABLE
relay_log_info_repository=TABLE
sync_master_info=10000
sync_relay_log=10000
relay_log_recovery=1
slave_parallel_workers=30
slave_preserve_commit_order=1
slave_parallel_type=LOGICAL_CLOCK
relay_log_space_limit=0
max_relay_log_size=0
max_binlog_size=1073741824
datadir=/mysql_data/mysql
general_log=ON
general_log_file=/var/log/mysql/general.log
log_error=/var/log/mysql/mysqld.log
default_storage_engine=innodb
innodb_flush_method=O_DIRECT
innodb_file_per_table=ON
innodb_log_file_size=134217728
innodb_buffer_pool_size=134217728
innodb_io_capacity=200
innodb_adaptive_hash_index=ON
innodb_lock_wait_timeout=50
log_queries_not_using_indexes=OFF
log_slow_admin_statements=OFF
log_throttle_queries_not_using_indexes=0
long_query_time=10
slow_query_log=ON
slow_query_log_file=/var/log/mysql/mysql-slowquery.log
symbolic_links=0
interactive_timeout=28800
div_precision_increment=4
sql_mode="ONLY_FULL_GROUP_BY,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
event_scheduler=OFF

Suggested fix:
Ensure there is no data inconsistency as this is a clean shutdown.

Hi,
I don't believe this is a bug. It would be a bug if a start would not solve the issue but since the system will autocorrect when you start it... it's a design choice.

If you want consistent data in the slave you should stop replication before you do shutdown.

all best
Bogdan

I am facing exact same problem, when I am replicating from , Mysql 5.7.23 to mysql 8.0.17.
my gtid is stuck, my relay logs keeps on rotating. 
Slave_IO_State: waiting for handler commit

Just a little color to this.  We experienced this error today.  Root cause was a mystery to me.  I stopped replication and restarted, the server self-healed.

Turned out someone on our BI team manually killed a coordinator thread (replication thread) - which probably is what corrupted replication.

Hi.
We have exactly the same issue today.

2024-10-16T10:24:23.319479Z 64 [Warning] [MY-010584] [Repl] Replica SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000361, end_l
og_pos 401108027; Error in cleaning up after an event preceding the commit; the group log file/position: binlog.000361 401107180, Error_code: MY-000001
2024-10-16T10:24:23.319598Z 64 [ERROR] [MY-010584] [Repl] Replica SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000361, end_log
_pos 401108027; Error 'Got error 125 - 'Transaction has been rolled back' during COMMIT' on query. Default database: ''. Query: 'COMMIT', Error_code: MY-001180
2024-10-16T10:24:23.319660Z 63 [Warning] [MY-010584] [Repl] Replica SQL for channel '': ... The replica coordinator and worker threads are stopped, possibly leaving data in
 inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to probl
ems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756

we do have ndbd 8.0.34.
Suddenly it stopped with this error.

       Slave_IO_State: Waiting for source to send event
                  Master_Host: master_host
                  Master_User: repliaction_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000361
          Read_Master_Log_Pos: 591918710
               Relay_Log_File: relay_file
                Relay_Log_Pos: 39318280
        Relay_Master_Log_File: binlog.000361
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1180
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000361, end_log_pos 401108027. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 401107180
              Relay_Log_Space: 57718384

and the only action was to execute "start slave;". No skip, nothing, it has reach last position of master binlog.

BUT.

It looks that mysqld has skipped problematic transaction.
I can see in binlog, that value was:

# at 401108471
#241016 12:24:23 server id 41  end_log_pos 401108536 CRC32 0x7dc8061e   Write_rows: table id 272
# at 401108536
#241016 12:24:23 server id 31  end_log_pos 401108609 CRC32 0xe20d63df   Write_rows: table id 1008
# at 401108609
#241016 12:24:23 server id 41  end_log_pos 401108743 CRC32 0xa759b900   Write_rows: table id 477
# at 401108743
#241016 12:24:23 server id 41  end_log_pos 401108818 CRC32 0xc1e1b813   Write_rows: table id 479 flags: STMT_END_F
### INSERT INTO `mysql`.`ndb_apply_status`
### SET
###   @1=41
###   @2=77507911831519245
###   @3=''
###   @4=0
###   @5=0
### INSERT INTO `database1`.`table_1`
### SET
###   @1='key_1'
###   @2=38
### INSERT INTO `database2`.`table_2`
### SET
###   @1='val_1'
###   @2='val_2'
###   @3=''
###   @4=1
###   @5=1729074263
###   @6=1
###   @7=1
###   @8='longer value'
### INSERT INTO `database2`.`table_2`

I've checked on master 

### INSERT INTO `database1`.`table_1`
### SET
###   @1='key_1'
###   @2=38

val 38 was set for key_1, however when slave was synch to master had value 36 for key_1.

It looks that mysqld has skipped this transaction on slave side.

Bug #116682 is marked as duplicate of this one

Hi,

> It looks that mysqld has skipped this transaction on slave side.

What version are you reproducing this with? 

I just tried 8.0.40 and I cannot reproduce this.

Can you please try to reproduce this with current version of MySQL 

Thanks

we do have ndbd 8.0.34.

Hi Marek,

I understand, but that is a release more than a year old with 6 releases with bugfixes released after it. Even if I can reproduce the problem with that release the only solution is to upgrade.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".