Bug #2257 AWFUL delete performance for HEAP table with WHERE clause
Submitted: 2 Jan 2004 7:44 Modified: 2 Jan 2004 12:04
Reporter: Igor Mendelev Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.2-nt-alpha OS:Windows (WinXP)
Assigned to: Dean Ellis CPU Architecture:Any

[2 Jan 2004 7:44] Igor Mendelev
Description:
I've created HEAP table genericperf20hp - see
mysql> describe genericperf20hp;
+---------------------+------------+------+-----+---------+-------+
| Field               | Type       | Null | Key | Default | Extra |
+---------------------+------------+------+-----+---------+-------+
| objectId            | bigint(20) |      | MUL | 0       |       |
| namedCharacteristic | tinyint(4) |      |     | 0       |       |
| timeFrom            | int(11)    |      | MUL | 0       |       |
| timeTo              | int(11)    |      | MUL | 0       |       |
| sampleValue         | double     |      |     | 0       |       |
+---------------------+------------+------+-----+---------+-------+

with 3 separate indexes on objectId, timeFrom, timeTo and filled it with about 2 million rows of random data.
SELECT and INSERT are very fast - but simple

DELETE from genericperf20hp where timeTo < timeFrom;

takes FOREVER - it was started 50 min ago and still not done (should have been
well under 1 min - select queries for this table run at almost 1M rows/sec and INSERTS from SELECT at 140K rows/sec). 
Looks like query plan problem to me (quadratic time instead of linear); CPU has
almost 100% utilization and mysqld-nt already spent 50+ min of CPU time on this query (as seen at Windows task manager)

How to repeat:
HARDWARE: P4-2.8/1GB RAM
No other queries on the system

Suggested fix:
DELETE on a single table should NEVER take more than LINEAR time - whether indexes were created or not.
[2 Jan 2004 9:04] Dean Ellis
cannot repeat this:

mysql> select count(*) from ack;
+----------+
| count(*) |
+----------+
|  2097152 |
+----------+
1 row in set (0.00 sec)
 
mysql> delete from ack where timeTo < timeFrom;
Query OK, 1048528 rows affected (6.42 sec)

Unless, of course, I do not actually use HEAP.
[2 Jan 2004 11:56] Dean Ellis
Per discussion elsewhere regarding this bug, it appears to be not so much a bug as simply the nature of HASH indexes with a large number of collisions (hashing 257 unique values for 1.6 million rows).  DELETE is fast when all indexes are BTREE type, or when the hashed columns contain a large number of unique values (with thus fewer collisions).