Bug #116682 replication breaks when update on 2 col table
Submitted: 15 Nov 2024 15:05 Modified: 17 Nov 2024 13:24
Reporter: Marek Kretkiewicz Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S3 (Non-critical)
Version:8.0.34 OS:Red Hat
Assigned to: MySQL Verification Team CPU Architecture:Any

[15 Nov 2024 15:05] Marek Kretkiewicz
Description:
I think that there is something in BUG #91008. I made an update there but it's in closed, so you are not gonna to check it again.

We made very similar problem like there.

In a nut shell:
- master was changing table with only 2 columns,
- replication stopped

More information which I passed to 91008.

2024-10-16T10:24:23.319479Z 64 [Warning] [MY-010584] [Repl] Replica SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000361, end_l
og_pos 401108027; Error in cleaning up after an event preceding the commit; the group log file/position: binlog.000361 401107180, Error_code: MY-000001
2024-10-16T10:24:23.319598Z 64 [ERROR] [MY-010584] [Repl] Replica SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000361, end_log
_pos 401108027; Error 'Got error 125 - 'Transaction has been rolled back' during COMMIT' on query. Default database: ''. Query: 'COMMIT', Error_code: MY-001180
2024-10-16T10:24:23.319660Z 63 [Warning] [MY-010584] [Repl] Replica SQL for channel '': ... The replica coordinator and worker threads are stopped, possibly leaving data in
 inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to probl
ems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756

we do have ndbd 8.0.34.
Suddenly it stopped with this error.

       Slave_IO_State: Waiting for source to send event
                  Master_Host: master_host
                  Master_User: repliaction_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000361
          Read_Master_Log_Pos: 591918710
               Relay_Log_File: relay_file
                Relay_Log_Pos: 39318280
        Relay_Master_Log_File: binlog.000361
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1180
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000361, end_log_pos 401108027. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 401107180
              Relay_Log_Space: 57718384

and the only action was to execute "start slave;". No skip, nothing, it has reach last position of master binlog.

BUT.

It looks that mysqld has skipped problematic transaction.
I can see in binlog, that value was:

# at 401108471
#241016 12:24:23 server id 41  end_log_pos 401108536 CRC32 0x7dc8061e   Write_rows: table id 272
# at 401108536
#241016 12:24:23 server id 31  end_log_pos 401108609 CRC32 0xe20d63df   Write_rows: table id 1008
# at 401108609
#241016 12:24:23 server id 41  end_log_pos 401108743 CRC32 0xa759b900   Write_rows: table id 477
# at 401108743
#241016 12:24:23 server id 41  end_log_pos 401108818 CRC32 0xc1e1b813   Write_rows: table id 479 flags: STMT_END_F
### INSERT INTO `mysql`.`ndb_apply_status`
### SET
###   @1=41
###   @2=77507911831519245
###   @3=''
###   @4=0
###   @5=0
### INSERT INTO `database1`.`table_1`
### SET
###   @1='key_1'
###   @2=38
### INSERT INTO `database2`.`table_2`
### SET
###   @1='val_1'
###   @2='val_2'
###   @3=''
###   @4=1
###   @5=1729074263
###   @6=1
###   @7=1
###   @8='longer value'
### INSERT INTO `database2`.`table_2`

I've checked on master 

### INSERT INTO `database1`.`table_1`
### SET
###   @1='key_1'
###   @2=38

val 38 was set for key_1, however when slave was synch to master had value 36 for key_1.

It looks that mysqld has skipped this transaction on slave side.

How to repeat:
Pleas try to set up replication, and perform updates on 2 columns table.
[15 Nov 2024 19:38] MySQL Verification Team
Duplicate of Bug #91008
[17 Nov 2024 13:24] Marek Kretkiewicz
It's not completly duplicated problem.
In old one, problem was when user was stopping mysql - your answer was to stop replication first. In my case no server stop was done, replication just stopped suddenly, without any reason, but server working without any problem.

In my opinion something is wrong in mysql, when processing 2 col table. In both cases it stopped, but in my case server was not touched.
[17 Nov 2024 16:31] MySQL Verification Team
Hi,

Yes, I noticed the difference, that is why we mark both the duplicate and original so that devs working on the resolution of the bug look at all the duplicates for additional information.

I believe the 'cause of the issue is the same for both hence marked it duplicate and reopened original.

Thanks
[17 Nov 2024 16:33] MySQL Verification Team
Additionally, as I asked in the original bug, can you reproduce with something more modern, .34 is very old with a ton of bugs fixed since so would be appreciated if you can retest with .40