| Bug #99575 | insert into t (c1,...) values(%s,...) as v on duplicate key update c1=v.c1 slow | ||
|---|---|---|---|
| Submitted: | 14 May 2020 15:26 | Modified: | 9 Feb 2023 20:02 |
| Reporter: | Julian Briggs | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | Connector / Python | Severity: | S3 (Non-critical) |
| Version: | 8.0.20, 8.0.25 | OS: | Ubuntu (20.04) |
| Assigned to: | CPU Architecture: | ARM | |
[14 May 2020 15:32]
Julian Briggs
Python script to demonstrate the bug.
Attachment: mysql_on_dup_bug.py (text/x-python), 4.86 KiB.
[15 May 2020 10:20]
MySQL Verification Team
Hello Julian Briggs, Thank you for the report and test case. regards, Umesh
[9 Feb 2023 20:02]
Philip Olson
Posted by developer: Fixed as of the upcoming MySQL Connector/Python 8.0.33 release, and here's the proposed changelog entry from the documentation team: Executing executemany() with insert statements that used 'on duplicate key update' demonstrated poor performance; improved speed by fixing the associated regular expressions that match insert statements. Thank you for the bug report.

Description: Very slow executemany("insert into t (c1,...) ... on duplicate key update ...", data) data=[(0,1,2,3...),] with >24 columns when using the new row alias syntax: VALUES(%s,%s,%s,...) as v on duplicate key update c1=v.c1, ... (Old syntax on duplicate key update c1=VALUES(c1) is deprecated from 8.0.20) Run time doubles with each extra column n cols 21 Run time: 1.0900123119354248 n cols 22 Run time: 2.0359716415405273 n cols 23 Run time: 4.018187046051025 n cols 24 Run time: 8.130314826965332 n cols 25 Run time: 15.858994007110596 I expected similar runtime (under 1 second) to executemany with statement in old syntax. mysql-connector-python-py3 8.0.20-1ubuntu20.04 all MySQL database driver written in pure Python 3 mysql-client-8.0 8.0.20-0ubuntu0.20.04.1 amd64 MySQL database client binaries mysql-community-server 8.0.20-1ubuntu18.04 amd64 MySQL Server How to repeat: statement="insert into test.mysql_on_dup_bug (id,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24) values (%s ,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) as v on duplicate key update c0=v.c0,c1=v.c1,c2=v.c2,c3=v.c3,c4=v.c4,c5=v.c5,c6=v.c6,c7=v.c7,c8=v.c8,c9=v.c9,c10=v.c10,c11=v.c11, c12=v.c12,c13=v.c13,c14=v.c14,c15=v.c15,c16=v.c16,c17=v.c17,c18=v.c18,c19=v.c19,c20=v.c20,c21=v.c21,c22=v.c22,c23=v.c23,c24=v.c24",[(1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)]) cursor.executemany(statement, data) Run time: 15.914669752120972 Suggested fix: Make performance with new alias syntax, eg: c1=v.c1 match that with old VALUES syntax, eg c1=VALUES(c1,...)