Description:
Very slow executemany("insert into t (c1,...) ... on duplicate key update ...", data)
data=[(0,1,2,3...),]
with >24 columns
when using the new row alias syntax:
VALUES(%s,%s,%s,...) as v on duplicate key update c1=v.c1, ...
(Old syntax on duplicate key update c1=VALUES(c1) is deprecated from 8.0.20)
Run time doubles with each extra column
n cols 21
Run time: 1.0900123119354248
n cols 22
Run time: 2.0359716415405273
n cols 23
Run time: 4.018187046051025
n cols 24
Run time: 8.130314826965332
n cols 25
Run time: 15.858994007110596
I expected similar runtime (under 1 second) to executemany with statement in old syntax.
mysql-connector-python-py3 8.0.20-1ubuntu20.04 all MySQL database driver written in pure Python 3
mysql-client-8.0 8.0.20-0ubuntu0.20.04.1 amd64 MySQL database client binaries
mysql-community-server 8.0.20-1ubuntu18.04 amd64 MySQL Server
How to repeat:
statement="insert into test.mysql_on_dup_bug (id,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24)
values (%s ,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) as v
on duplicate key update c0=v.c0,c1=v.c1,c2=v.c2,c3=v.c3,c4=v.c4,c5=v.c5,c6=v.c6,c7=v.c7,c8=v.c8,c9=v.c9,c10=v.c10,c11=v.c11,
c12=v.c12,c13=v.c13,c14=v.c14,c15=v.c15,c16=v.c16,c17=v.c17,c18=v.c18,c19=v.c19,c20=v.c20,c21=v.c21,c22=v.c22,c23=v.c23,c24=v.c24",[(1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)])
cursor.executemany(statement, data)
Run time: 15.914669752120972
Suggested fix:
Make performance with new alias syntax, eg:
c1=v.c1
match that with old VALUES syntax, eg
c1=VALUES(c1,...)