The bug was updated successfully. The following people were notified: the MySQL developers, the bug reporter, the assigned lead, interested observers, and nobody else.
Bug #108145 MySQLCursor.executemany() fails to correctly identify BULK data loading ops
Submitted: 16 Aug 2022 8:13 Modified: 21 Oct 2022 20:45
Reporter: Alex Cazacu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S2 (Serious)
Version:8.0.30 OS:Any
Assigned to: CPU Architecture:Any

[16 Aug 2022 8:13] Alex Cazacu
Description:
The MySQLCursor.executemany() RE sentinel fails to correctly identify INSERT operations performing BULK data loading and falls back to single row inserts.

The MySQLCursor.executemany() method implements a RE sentinel (RE_SQL_INSERT_STMT) to identify INSERT operations. When invoking the method, the operation parameter is matched against the sentinel. Assuming that the method has been invoked with multiple parameters :
- if a match is found for the operation, a batch_insert will be performed.
- if a match is not found for the operation, each parameter array will be individually executed.

This is the RE sentinel, defined in the file connector/cursor.py
RE_SQL_INSERT_STMT = re.compile(
    rf"({SQL_COMMENT}|\s)*INSERT({SQL_COMMENT}|\s)"
    r"*INTO\s+[`'\"]?.+[`'\"]?(?:\.[`'\"]?.+[`'\"]?)"
    r"{{0,2}}\s+VALUES\s*\(.+(?:\s*,.+)*\)",
    re.I | re.M | re.S,
)

See https://github.com/mysql/mysql-connector-python/blob/7c7b1d9f587b8d8ae570ce101aeda966ffba8....

How to repeat:
1. Create a MySQL Database "test_database" with a table "test_table", containing one field "test_field".

2. Run the following code:

db_conn = mysql.connector.connect(
  database="test_database",
  host="localhost",
  port=3306,
  user="root"
)

cursor = db_conn.cursor(dictionary=True)

cursor.executemany("INSERT INTO 'test_database'.'test_table'" ('test_field') VALUES('%s'), [['test_field_1'], ['test_field_2'], ['test_field_3']])

The expected result is that a single insert statement would be executed using the provided values.

The actual result is that 3, separate inserts statements are executed.

Suggested fix:
The issue appears to be that the RE sentinel fails to correctly identify BULK data loading operations due to what seems to be a typo in the RE sentinel. Line https://github.com/mysql/mysql-connector-python/blob/7c7b1d9f587b8d8ae570ce101aeda966ffba8... should likely be:
r"{0,2}\s+VALUES\s*\(.+(?:\s*,.+)*\)",
[16 Aug 2022 12:54] MySQL Verification Team
Hello Alex Cazacu,

Thank you for the report and feedback.
With the provided test case on VS 2022, Python 3.10 and Connector/Python 8.0.30:

 File "C:\Work\Connectors\Python\Bug108145\Bug108145\Bug108145.py", line 13
    cursor.executemany("INSERT INTO 'test_database'.'test_table'" ('test_field') VALUES('%s'), [['test_field_1'], ['test_field_2'], ['test_field_3']])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?
Press any key to continue . . .

-- With changes

cursor.executemany("INSERT INTO test_database.test_table (test_field) VALUES(%s)", [['test_field_1'], ['test_field_2'], ['test_field_3']])

I can confirm the reported issue :

2022-08-16T12:52:52.826605Z	   20 Query	set autocommit=0
2022-08-16T12:52:52.827215Z	   20 Query	INSERT INTO test_database.test_table (test_field) VALUES('test_field_1')
2022-08-16T12:52:52.839012Z	   20 Query	INSERT INTO test_database.test_table (test_field) VALUES('test_field_2')
2022-08-16T12:52:52.839577Z	   20 Query	INSERT INTO test_database.test_table (test_field) VALUES('test_field_3')
2022-08-16T12:52:52.839948Z	   20 Query	commit

This seems to be a regression as I didn't see this with C/Python 8.0.29.

regards,
Umesh
[16 Aug 2022 13:40] Alex Cazacu
Hi Umesh,

Thank you for picking this up!

To provide some more context on the root cause, this is the git blame for the file containing the RE sentinel for executemany BULK data loading operations:

https://github.com/mysql/mysql-connector-python/blame/master/lib/mysql/connector/cursor.py

You can see that the commit that last modified the RE sentinel was:

https://github.com/mysql/mysql-connector-python/commit/ddf6b13704e0e409bb8cac292acc4292850...

Analyzing the previous of the code, you can see that format was invoked on the r statement. Due to this invocation, the {x,y} quantifier in the RE had to be escaped by adding additional, surrounding curly braces: {{x,y}}.

Since the string containing the quantifier is no longer being formatted (the .format invocation was removed), escaping of the RE quantifier is no longer performed and the additional curly braces are seeping into the RE statement itself, making it invalid for its intended purpose.

Removing the wrapping curly braces of the RE quantifier on line https://github.com/mysql/mysql-connector-python/blob/master/lib/mysql/connector/cursor.py#... should fix the reported issue and resolve the regression.
[16 Aug 2022 14:00] Alex Cazacu
Maybe this is more efficient, I have created a PR to resolve the issue: https://github.com/mysql/mysql-connector-python/pull/81

Thank you!

Best,

Alex
[13 Oct 2022 18:03] Oscar Pacheco
Posted by developer:
 
Thanks, Alex Cazacu for the contribution. Your patch (http://github.com/mysql/mysql-connector-python/pull/81) has been reviewed and integrated into the internal main branch. 

Regards.
[21 Oct 2022 20:45] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Connector/Python 8.0.32 release, and here's the proposed changelog entry from the documentation team:

The MySQLCursor.executemany() method failed to batch insert data since
the regular expression (RE) sentinel did not detect batch cases correctly;
this meant using the one-on-one insert instead, which led to decreased
performance. 

Thanks to Alex Cazacu for the contribution.

Thank you for the bug report.