Bug #44803 Join performance with large tables is way too slow
Submitted: 11 May 2009 20:48 Modified: 20 Dec 2009 17:11
Reporter: norbert van Nobelen Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.1.31, 6.0.10-apha OS:Linux (OpenSuse 10.3)
Assigned to: CPU Architecture:Any
Tags: join, query, slow

[11 May 2009 20:48] norbert van Nobelen
Description:
Joins of "large" data sets virtually take forever.
For example: This query joins 200000 rows with 80000 rows on UUID(). I expect performance of this to be in terms of seconds to minutes. This just takes forever to return a result.

select uuid FROM (select a.uuid,a,b
FROM (select UUID() as uuid,1 as A FROM extraval where stockkey=475 and testperiod=30 ) a
LEFT OUTER JOIN (
select UUID() as uuid,0 as b FROM (select DISTINCT stockkey, walletvalue, power,numberofdays,historykey,testperiod from extraval where testperiod=200 and stockkey=30 limit
2000) a
) b ON a.uuid=b.uuid ) c
WHERE b IS NULL
);

How to repeat:
Use query on a table of about 800mln rows. Indexes are used to determine the subsets, so first subset of just 200000 rows is determined pretty fast (seconds).

It can also be reproduced by just joining two tables with 200000 and 80000 records with just 1 record:
create table a (uuid varchar (200));
create table b (uuid varchar (200));

fill table a and b with same uuid values and join them together.

Used both myisam and innodb: No noticeable difference in performance (both take way more time than I am willing to wait: I have to sort out 800mln records, this is just a testset).

Hardware info:
Intel core2 duo processor 1.8Ghz.
3.5 GB Ram
SATA disk 1TB

mysql info:
skip-locking
key_buffer = 32M
max_allowed_packet = 1M
table_cache = 64
sort_buffer_size = 2M
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 16M
query_cache_size=32M
query_cache_limit=32M
max_connections=80
innodb_buffer_pool_size = 128M
innodb_additional_mem_pool_size = 32M
[12 May 2009 4:42] Valeriy Kravchuk
Do you have indexes on uuid column of your tables?
[12 May 2009 19:10] norbert van Nobelen
In the case I create the two temp tables to join them: No indexes:
Joining 200000 (all records from table 1)  records against 80000 (all records from table 2) looks to me like full table scans are the most efficient way and indexes should be pretty much redundant.

In the case of just direct join with subqueries used: The main selection criteria are indexed, the uuid is derived from UUID() and is not indexed (I can not create indexes on subquery data).
[20 Nov 2009 17:11] Valeriy Kravchuk
Please, send the results of EXPLAIN for your queries. Send your my.cnf file content also.
[21 Dec 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".