Bug #104153 CREATE...SELECT statements break replication when ANSI mode is enabled
Submitted: 29 Jun 2021 14:34 Modified: 1 Sep 2021 21:29
Reporter: Kevin Michel (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.21, 8.0.25 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[29 Jun 2021 14:34] Kevin Michel
Description:
# Error

Error 'Cannot execute statements with implicit commit inside a transaction when @@SESSION.GTID_NEXT == 'UUID:NUMBER' '.' on query.

# Details

Since MySQL 8.0.21, CREATE...SELECT statements are replicated using a single transaction instead of two. To implement that correctly, the binlog contain a CREATE TABLE...START TRANSACTION Query followed by Table_map/Write_rows events to insert the rows found by the SELECT :

'binlog.000001', 1336, 'Gtid', 1, 1415, "SET @@SESSION.GTID_NEXT= 'c8ec3a3b-ce8d-11eb-a313-8c8caaa4a076:6'"
'binlog.000001', 1415, 'Query', 1, 1492, 'BEGIN'
'binlog.000001', 1492, 'Query', 1, 1636, 'use `db_one`; CREATE TABLE "t1" (\n  "1" int NOT NULL DEFAULT \'0\'\n) START TRANSACTION'
'binlog.000001', 1636, 'Table_map', 1, 1686, 'table_id: 111 (db_one.t1)'
'binlog.000001', 1686, 'Write_rows', 1, 1726, 'table_id: 111 flags: STMT_END_F'
'binlog.000001', 1726, 'Xid', 1, 1757, 'COMMIT /* xid=24 */'

This non-standard SQL syntax is correctly recognised as "transactional DDL" and does not generate an implicit commit in the middle of a binlog transaction.

To write the modified query inside the binlog, the same code used to implement SHOW TABLES is reused.

In addition to its documented effects, the ANSI sql mode also alters the behavior of SHOW TABLES to hide MySQL specific syntax (this mode is internally named "foreign_db_mode"). This is useful to create standard SQL dumps.

Because of the two previous points, when ANSI mode is enabled, the generated query inside the binlog does not contain the necessary START TRANSACTION.

Without the START TRANSACTION, the replica trying to execute the binlog will see a normal CREATE TABLE query, which will try to do an implicit commit in the middle of a binlog transaction and fail with the error described above.

To keep the query short, the binlog example at the beginning shows a query which requires tables without primary keys to be allowed, but the problem is the same when creating a table with a primary key, or when selecting rows from an actual table instead of just SELECT 1.

How to repeat:
- Setup a pair of servers with row based replication, allow tables without primary keys and ANSI mode enabled
- Execute "CREATE TABLE t SELECT 1"
- Observe error message in the Replica

Suggested fix:
See PR on GitHub
[29 Jun 2021 14:36] Kevin Michel
GitHub PR is #347
[30 Jun 2021 12:52] MySQL Verification Team
Hello Kevin Michel,

Thank you for the report and feedback.
Verified as described with 8.0.25 build.

regards,
Umesh
[8 Jul 2021 13:38] OCA Admin
Contribution submitted via Github -  #104153  Fix CREATE...SELECT row-based binlog in ANSI mode 
(*) Contribution by Kevin Michel (Github kmichel-aiven, mysql-server/pull/347#issuecomment-876297952): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_680070222.txt (text/plain), 5.95 KiB.

[1 Sep 2021 21:29] Christine Cole
Posted by developer:
 
Fixed as of the upcoming MySQL 8.0.27 release, and here's the proposed changelog entry from the documentation team:

A statement such as CREATE TABLE t SELECT 1; created an InnoDB table that
was written incorrectly to the binary log if the value of binlog_format
was set to ROW and sql_mode was in ANSI mode. As a result, replication of
the statement failed with an error on the replica. Applying the
mysqlbinlog utility to such a binary log could also fail. 

The atomic CREATE...SELECT was implemented by adding a new clause to the CREATE TABLE
called START TRANSACTION. However, this clause was NOT added when ANSI
mode was enabled. This in turn caused the execution of an ordinary
implicit-committing CREATE TABLE in the middle of the transaction and
produced an error in GTID mode if the transaction had an assigned GTID.
The issue is fixed by removing the SQL-mode dependency from the new
clause.

Thank you for the bug report.