MySQL Bugs: #44803: Join performance with large tables is way too slow

Bug #44803	Join performance with large tables is way too slow
Submitted:	11 May 2009 20:48	Modified:	20 Dec 2009 17:11
Reporter:	norbert van Nobelen	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	5.1.31, 6.0.10-apha	OS:	Linux (OpenSuse 10.3)
Assigned to:		CPU Architecture:	Any
Tags:	join, query, slow

Description:
Joins of "large" data sets virtually take forever.
For example: This query joins 200000 rows with 80000 rows on UUID(). I expect performance of this to be in terms of seconds to minutes. This just takes forever to return a result.

select uuid FROM (select a.uuid,a,b
FROM (select UUID() as uuid,1 as A FROM extraval where stockkey=475 and testperiod=30 ) a
LEFT OUTER JOIN (
select UUID() as uuid,0 as b FROM (select DISTINCT stockkey, walletvalue, power,numberofdays,historykey,testperiod from extraval where testperiod=200 and stockkey=30 limit
2000) a
) b ON a.uuid=b.uuid ) c
WHERE b IS NULL
);

How to repeat:
Use query on a table of about 800mln rows. Indexes are used to determine the subsets, so first subset of just 200000 rows is determined pretty fast (seconds).

It can also be reproduced by just joining two tables with 200000 and 80000 records with just 1 record:
create table a (uuid varchar (200));
create table b (uuid varchar (200));

fill table a and b with same uuid values and join them together.

Used both myisam and innodb: No noticeable difference in performance (both take way more time than I am willing to wait: I have to sort out 800mln records, this is just a testset).

Hardware info:
Intel core2 duo processor 1.8Ghz.
3.5 GB Ram
SATA disk 1TB

mysql info:
skip-locking
key_buffer = 32M
max_allowed_packet = 1M
table_cache = 64
sort_buffer_size = 2M
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 16M
query_cache_size=32M
query_cache_limit=32M
max_connections=80
innodb_buffer_pool_size = 128M
innodb_additional_mem_pool_size = 32M

Do you have indexes on uuid column of your tables?

In the case I create the two temp tables to join them: No indexes:
Joining 200000 (all records from table 1)  records against 80000 (all records from table 2) looks to me like full table scans are the most efficient way and indexes should be pretty much redundant.

In the case of just direct join with subqueries used: The main selection criteria are indexed, the uuid is derived from UUID() and is not indexed (I can not create indexes on subquery data).

Please, send the results of EXPLAIN for your queries. Send your my.cnf file content also.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".