MySQL Bugs: #50367: REPLACE has quadradic computational complexity on windows server (but not linux)

Bug #50367	REPLACE has quadradic computational complexity on windows server (but not linux)
Submitted:	15 Jan 2010 14:12	Modified:	18 Mar 2010 10:15
Reporter:	Morten Kloster	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: General	Severity:	S5 (Performance)
Version:	5.1.42, 5.1 bzr	OS:	Any (MS Windows 32 bits, Mac)
Assigned to:		CPU Architecture:	Any
Tags:	performance, REPLACE, windows

Description:
I have found a few different variations of this performance issue.

Case 1:

Table NOW and table HISTORY have identical columns, except that column ownID is primary index in HISTORY and not indexed in NOW, while column itemID is a unique index in NOW but not indexed in HISTORY. Both indices are BTREEs. There are no other indices.

Table NOW is initially empty, while table HISTORY contains >100,000 rows with successive ownID values, but only 3 different itemID values.

The time required for query (from command line)
> replace into NOW select * from HISTORY where (ownID<####) order by ownID;
increases as the square of the number ####. For instance, 500 rows take 0.48 secs, 1000 take 1.86 secs and 2000 take 7.27 secs.

If, however, NOW is not initially empty, but contains one row for each unique value of itemID (i.e., 3 rows), then the same queries take only a fraction of the time, with linear dependence on ####: 500 rows take 0.03 secs, 1000 take 0.06 secs and 2000 take 0.13 secs.

This performance issue is not seen on a 64-bit Mysql 5.1.40 server running on Linux: in both situations the query is fast with a linear dependence on ####.

Case 2:

As case 1, but ownID is primary index also in NOW.

In this case, the time required for the query increases as the square of #### regardless of whether NOW is initially empty or not. Furthermore, if I split up the query into many smaller queries ('where (ownID<#### and ownID>@@@@)'), where ####-@@@@ is constant, then the time required for each successive query increases about linearly at first, up to a limit, if done immediately following one another. If the smaller queries are spaced out in time, however, then each takes the same amount of time.

Again, this performance issue is not seen on the Linux server.

How to repeat:
See description. The details of the tables don't seem to matter, apart from what's specified in the description.

Suggested fix:
Sorry, don't have a clue what's going on...

mysqldump of relevant tables - reduced version

Attachment: dump (text/plain), 84.03 KiB.

Thank you for the report.

Case #2 verified as described. Case #1 doesn't show dramatic performance decrease in my case.