Bug #91087 executemany() should work with REPLACE INTO
Submitted: 31 May 2018 7:13 Modified: 25 Jul 2022 17:12
Reporter: hao chen Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S3 (Non-critical)
Version:2.1.6, 2.1.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: Connector/Python, python

[31 May 2018 7:13] hao chen
Description:
executemany() can`t support REPLACE INTO batch syntax like:
REPLACE INTO t1 VALUES(),(),(),()...;

How to repeat:
Table:

CREATE TABLE `t1` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(20) NOT NULL,
  `last_name` varchar(20) NOT NULL,
  `age` int(11) NOT NULL,
  `male` tinyint(4) NOT NULL,
  `other` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `udx_fl_name` (`first_name`,`last_name`),
  UNIQUE KEY `udx_other` (`other`),
  KEY `idx_age` (`age`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8mb4

python code: 
import mysql.connector
cnx = mysql.connector.connect(user='xxx', password='xxx',
                              host='10.10.10.12',
                              database='test')
cursor = cnx.cursor()

sql_replace = '''
REPLACE INTO t1
(`id`, `first_name`, `last_name`, `age`, `male`, `other`)
VALUES
(%(id)s, %(first_name)s, %(last_name)s, %(age)s, %(male)s, %(other)s)
'''

values = []
for i in range(20, 25):
    value = {}
    value['id'] = i
    value['first_name'] = i
    value['last_name'] = i
    value['age'] = i
    value['male'] = i
    value['other'] = i
    values.append(value)
    
cursor.executemany(sql_replace, values)
cnx.commit()

cursor.close()
cnx.close()

binlog output:
BEGIN
/*!*/;
# at 3104
#180531 11:23:47 server id 3306137012  end_log_pos 3233 CRC32 0xf6760f60        Rows_query
# REPLACE INTO t1
# (`id`, `first_name`, `last_name`, `age`, `male`, `other`)
# VALUES
# (20, 20, 20, 20, 20, 20)
......
#180531 11:23:47 server id 3306137012  end_log_pos 3475 CRC32 0x7b1a12ea        Rows_query
# REPLACE INTO t1
# (`id`, `first_name`, `last_name`, `age`, `male`, `other`)
# VALUES
# (21, 21, 21, 21, 21, 21)
......
#180531 11:23:47 server id 3306137012  end_log_pos 3717 CRC32 0xffc8053e        Rows_query
# REPLACE INTO t1
# (`id`, `first_name`, `last_name`, `age`, `male`, `other`)
# VALUES
# (22, 22, 22, 22, 22, 22)
......
#180531 11:23:47 server id 3306137012  end_log_pos 3959 CRC32 0xf1ec47e8        Rows_query
# REPLACE INTO t1
# (`id`, `first_name`, `last_name`, `age`, `male`, `other`)
# VALUES
# (23, 23, 23, 23, 23, 23)
......
#180531 11:23:47 server id 3306137012  end_log_pos 4201 CRC32 0x1d24c291        Rows_query
# REPLACE INTO t1
# (`id`, `first_name`, `last_name`, `age`, `male`, `other`)
# VALUES
# (24, 24, 24, 24, 24, 24)
......
#180531 11:23:51 server id 3306137012  end_log_pos 4345 CRC32 0x820b85e3        Xid = 30
COMMIT/*!*/;

Suggested fix:
file: https://github.com/sanpingz/mysql-connector/blob/master/lib/mysql/connector/cursor.py

about line 42 add a variable RE_SQL_REPLACE_STMT:
RE_SQL_REPLACE_STMT = re.compile(
    r"({0}|\s)*REPLACE({0}|\s)*INTO\s+[`'\"]?.+[`'\"]?(?:\.[`'\"]?.+[`'\"]?)"
    r"{{0,2}}\s+VALUES\s*\(.+(?:\s*,.+)*\)".format(SQL_COMMENT),
    re.I | re.M | re.S)

about line 651(executemany function) like:
        if (re.match(RE_SQL_INSERT_STMT, operation) or
            re.match(RE_SQL_REPLACE_STMT, operation)):
            if not seq_params:
                self._rowcount = 0
                return
            stmt = self._batch_insert(operation, seq_params)
            if stmt is not None:
                return self.execute(stmt)
[31 May 2018 7:51] MySQL Verification Team
Hello hao chen,

Thank you for the report.
In order to submit contributions you must first sign the Oracle Contribution Agreement (OCA).
For additional information please check http://www.oracle.com/technetwork/community/oca-486395.html.
If you have any questions, please contact the MySQL community team - https://dev.mysql.com/community/

Thanks,
Umesh
[25 Jul 2022 16:59] Oscar Pacheco
Posted by developer:
 
After analyzing the use case we concluded there is no bug.

In this use case, the REPLACE statement is being used five times by leveraging the executemany() method. The user passes a values list (each item being a dictionary) where each item includes different values for primary and unique fields. 

According to the documentation, when using the REPLACE statement and there is no existing row with the same data regarding the primary or unique keys, a new row is inserted, and that's exactly what is happening (see the log output included in the bug report). From the log output, we observe that five rows were inserted which makes sense since at no time there are values with matching special fields.

For instance, by fixing the ID field (primary key) to a constant integer to force the collision we observe that no new rows are added (but the first one), and the row where the ID match happens is updated successfully.
[25 Jul 2022 16:59] Oscar Pacheco
Posted by developer:
 
After analyzing the use case we concluded there is no bug.

In this use case, the REPLACE statement is being used five times by leveraging the executemany() method. The user passes a values list (each item being a dictionary) where each item includes different values for primary and unique fields. 

According to the documentation, when using the REPLACE statement and there is no existing row with the same data regarding the primary or unique keys, a new row is inserted, and that's exactly what is happening (see the log output included in the bug report). From the log output, we observe that five rows were inserted which makes sense since at no time there are values with matching special fields.

For instance, by fixing the ID field (primary key) to a constant integer to force the collision we observe that no new rows are added (but the first one), and the row where the ID match happens is updated successfully.