Bug #97531 | 5.7 replication breakage with syntax error with GRANT management | ||
---|---|---|---|
Submitted: | 7 Nov 2019 11:22 | Modified: | 7 Jan 2020 15:53 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.7.28, 8.0.18 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | grants, replication, writeset |
[7 Nov 2019 11:22]
Simon Mudd
[7 Nov 2019 16:27]
MySQL Verification Team
Thanks for report Simon, let me try to make a test case.
[7 Nov 2019 17:34]
Simon Mudd
Further investigation showed the reported syntax error on the slave was due to a corrupt statement in the relay logs. (GRANT statements are sent as SBR) and the statement in the relaylog was corrupt. The upstream master's binlog had the same event (e.g. a corrupt statement) yet the statement had been written correctly on the master. So I think for now the problem on the slave should be attributed to an issue on the master which I'm following up on other channels.
[7 Nov 2019 18:20]
MySQL Verification Team
Hi Simon, This was easy to reproduce on 8.0 but I'm having issues reproducing on 5.7 master [localhost:19428] {msandbox} ((none)) > show variables like 'binlog%'; +--------------------------------------------+--------------+ | Variable_name | Value | +--------------------------------------------+--------------+ | binlog_cache_size | 32768 | | binlog_checksum | CRC32 | | binlog_direct_non_transactional_updates | OFF | | binlog_error_action | ABORT_SERVER | | binlog_format | ROW | | binlog_group_commit_sync_delay | 0 | | binlog_group_commit_sync_no_delay_count | 0 | | binlog_gtid_simple_recovery | ON | | binlog_max_flush_queue_time | 0 | | binlog_order_commits | ON | | binlog_row_image | MINIMAL | | binlog_rows_query_log_events | OFF | | binlog_stmt_cache_size | 32768 | | binlog_transaction_dependency_history_size | 25000 | | binlog_transaction_dependency_tracking | WRITESET | +--------------------------------------------+--------------+ 15 rows in set (0.02 sec) master [localhost:19428] {msandbox} ((none)) > master [localhost:19428] {msandbox} ((none)) > show variables like 'gtid%'; +----------------------------------+-----------+ | Variable_name | Value | +----------------------------------+-----------+ | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_next | AUTOMATIC | | gtid_owned | | | gtid_purged | | +----------------------------------+-----------+ 5 rows in set (0.00 sec) $cat bug.sh #!/bin/bash for (( c=1; c<=5000; c++ )) do /home/arhimed/sandboxes/rsandbox_5_7_27/m -uroot -e "DROP USER IF EXISTS 'bug$c'@'localhost';CREATE USER IF NOT EXISTS 'bug$c'@'localhost';ALTER USER 'bug$c'@'localhost' IDENTIFIED BY 'Pa$$123';GRANT ALL ON bug$c.* TO 'bug$c'@'localhost';\ grant all on db1.* to 'bug$c'@'localhost';\ grant all on db2.* to 'bug$c'@'localhost';\ grant all on db3.* to 'bug$c'@'localhost';\ grant all on db4.* to 'bug$c'@'localhost';\ grant all on db5.* to 'bug$c'@'localhost';\ grant all on db6.* to 'bug$c'@'localhost';\ grant all on db7.* to 'bug$c'@'localhost';\ grant all on db8.* to 'bug$c'@'localhost';\ grant all on db9.* to 'bug$c'@'localhost';\ grant all on db10.* to 'bug$c'@'localhost';\ " done $ I run the script a few dozen times and the replication is not failing. Anything you might add to the test case to make it look more like what you are seeing? thanks bogdan
[7 Nov 2019 18:28]
Simon Mudd
Easy to reproduce in 8.0? Oh. I thought 8.0 was free from this.
[7 Nov 2019 18:32]
Simon Mudd
Note: this bug report actually reports what appear to be 2 different issues. At the time I thought it was a single one. * the SQL syntax error with GRANT statements [ due to relaylog corruption ] * an error with a large CREATE USER statement which created multiple users in one go and was giving an error. So not sure which issue you have been able to repeat.
[7 Nov 2019 19:13]
MySQL Verification Team
Hi Simon, The latest 8 is free from it but the 8.0.16 is not (fixed in 8.0.18). As for 5.7 running the said script does not stop replication, will check the SR too for more data/config and discuss with Shane. Not sure if the config is different or the issue is with the script itself (if there needs to be more grants or bigger grants or..) all best Bogdan
[20 Nov 2019 17:21]
MySQL Verification Team
8.0.18 is also affected
[21 Nov 2019 7:32]
MySQL Verification Team
Here's proof of the binlog corruption from my test system: ------------------------- -- Version: '5.7.28-log' socket: '/home/anon/rpl/slave1/sock' port: 3307 MySQL Community Server (GPL) 2019-11-20T14:43:14.907597Z 2 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=fc30-relay-bin' to avoid this problem. 2019-11-20T14:43:14.925701Z 2 [Note] 'CHANGE MASTER TO FOR CHANNEL '' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='127.0.0.1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2019-11-20T14:43:14.965970Z 4 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2019-11-20T14:43:14.966570Z 5 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'FIRST' at position 0, relay log './fc30-relay-bin.000001' position: 4 2019-11-20T14:43:14.966819Z 4 [Note] Slave I/O thread for channel '': connected to master 'root@127.0.0.1:3306',replication started in log 'FIRST' at position 4 2019-11-20T14:55:38.631289Z 5 [ERROR] Slave SQL for channel '': Query caused different errors on master and slave. Error on master: message (format)='Operation %s failed for %.256s' error code=1396; Error on slave:actual message='You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1', error code=1064. Default database:'test'. Query:'CREATE USER 'user0005'@'%'', Error_code: 3002 2019-11-20T14:55:38.631309Z 5 [ERROR] Slave (additional info): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 Error_code: 1064 2019-11-20T14:55:38.631314Z 5 [Warning] Slave: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 Error_code: 1064 2019-11-20T14:55:38.631318Z 5 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 370941511. -- Master_Log_File: binlog.000001 Read_Master_Log_Pos: 391045951 Relay_Log_File: fc30-relay-bin.000002 Relay_Log_Pos: 370941718 Relay_Master_Log_File: binlog.000001 Last_SQL_Errno: 3002 Last_SQL_Error: Query caused different errors on master and slave. Error on master: message (format)='Operation %s failed for %.256s' error code=1396; Error on slave:actual message='You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1', error code=1064. Default database:'test'. Query:'CREATE USER 'user0005'@'%'' Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Exec_Master_Log_Pos: 370941511 Relay_Log_Space: 391046364 Retrieved_Gtid_Set: 092a1971-0ba4-11ea-b3f4-000d6179f72b:1-1850823 Executed_Gtid_Set: 092a1971-0ba4-11ea-b3f4-000d6179f72b:1-1754855 mysqlbinlog --base64-output=decode-rows ~/rpl/master/data/binlog.000001 --include-gtids=092a1971-0ba4-11ea-b3f4-000d6179f72b:1754854-1754856 --hexdump -vvv |less Looking at the binary log, notice the NULL byte here: "27 00 49" # GTID last_committed=1754855 sequence_number=1754856 rbr_only=no SET @@SESSION.GTID_NEXT= '092a1971-0ba4-11ea-b3f4-000d6179f72b:1754856'/*!*/; # at 370941576 #191120 16:55:38 server id 1 end_log_pos 370941805 CRC32 0xd0c6483f # Position Timestamp Type Master ID Size Master Pos Flags # 161c1e88 ea 53 d5 5d 02 01 00 00 00 e5 00 00 00 6d 1f 1c 16 00 00 # 161c1e9b 0f 00 00 00 00 00 00 00 04 74 05 22 00 00 00 00 |.........t......| # 161c1eab 00 00 01 20 00 a0 55 00 00 00 00 06 03 73 74 64 |......U......std| # 161c1ebb 04 08 00 08 00 08 00 0c 01 6d 79 73 71 6c 00 74 |.........mysql.t| # 161c1ecb 65 73 74 00 43 52 45 41 54 45 20 55 53 45 52 20 |est.CREATE.USER.| # 161c1edb 27 75 73 65 72 30 30 30 35 27 40 27 25 27 00 49 |.user0005......I| <<<<<<<<<<<<<<<<<<<<<< # 161c1eeb 44 45 4e 54 49 46 49 45 44 20 57 49 54 48 20 27 |DENTIFIED.WITH..| # 161c1efb 6d 79 73 71 6c 5f 6e 61 74 69 76 65 5f 70 61 73 |mysql.native.pas| # 161c1f0b 73 77 6f 72 64 27 2c 27 75 73 65 72 30 30 30 35 |sword...user0005| # 161c1f1b 27 40 27 25 27 20 49 44 45 4e 54 49 46 49 45 44 |......IDENTIFIED| # 161c1f2b 20 57 49 54 48 20 27 6d 79 73 71 6c 5f 6e 61 74 |.WITH..mysql.nat| # 161c1f3b 69 76 65 5f 70 61 73 73 77 6f 72 64 27 20 52 45 |ive.password..RE| # 161c1f4b 51 55 49 52 45 20 53 55 42 4a 45 43 54 20 27 3b |QUIRE.SUBJECT...| # 161c1f5b 27 20 41 43 43 4f 55 4e 54 20 4c 4f 43 4b 3f 48 |..ACCOUNT.LOCK.H| # 161c1f6b c6 d0 |..| # Query thread_id=15 exec_time=0 error_code=1396 SET TIMESTAMP=1574261738/*!*/; CREATE USER 'user0005'@'%'^@IDENTIFIED WITH 'mysql_native_password','user0005'@'%' IDENTIFIED WITH 'mysql_native_password' REQUIRE SUBJECT ';' ACCOUNT LOCK /*!*/; --------------------------------
[2 Dec 2019 9:13]
Tatjana Nuernberg
Posted by developer: As previously indicated, SHOW PROCESSLIST works differently from performance schema (threads / events_statements tables), so the S/P is where we might see the issue. There is a mechanism in the server that aims to prevent plaintext passwords from being displayed in logs, monitoring, and so on. Some changes in the handling of CREATE/ALTER USER and GRANTs have led to an unexpected change in the life-cycle of these rewritten queries. As these strings could be inspected from another thread (using SHOW PROCESSLIST), this could result in SHOW PROCESSLIST obtaining information from a thread, and this information subsequently being changed during the execution phase (rather than just after the parsing phase), resulting in SHOW PROCESSLIST using stale information. The above patch therefore modifies the handling of users and grants, and lays some groundwork for the server to be more robust with regard to unexpected use of the rewrite framework.
[7 Jan 2020 15:53]
Margaret Fisher
Posted by developer: Changelog entry added for MySQL 8.0.20 and 5.7.30: While an SQL statement was in the process of being rewritten for the binary log so that sensitive information did not appear in plain text, if a SHOW PROCESSLIST statement was used to inspect the query, the query could become corrupted when it was written to the binary log, causing replication to stop. The process of rewriting the query is now kept private, and the query thread is updated only when rewriting is complete.
[29 Jan 2020 21:27]
Margaret Fisher
Posted by developer: Already changelogged, reclosing.
[29 Feb 2020 17:35]
Michael Finch
Hi Margaret. Has the fix for this bug been published? It has been a big pain for my team so we're wondering if we can cherry pick the fix into our build. Thanks.