The bug was updated successfully. The following people were notified: the MySQL developers, the bug reporter, interested observers, and nobody else.
Bug #62147 LEFT JOIN generating HUGE temp_table (very slow of course)
Submitted: 11 Aug 2011 12:05 Modified: 23 Aug 2011 19:19
Reporter: Morg. What? Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.5.8 OS:Any
Assigned to: CPU Architecture:Any
Tags: left join

[11 Aug 2011 12:05] Morg. What?
Description:
Quick LEFT JOIN on two small tables (5k & 15k rows respectively) with both containing some NULLs and very few DISTINCT values.

Over two minutes query, temptable goes up to 30+ GB and result is not returned as temptable explodes ->
/* SQL Error: Incorrect key file for table 'E:\tmp\#sql1ec_2_0.MYI'; try to repair it *

So of course I try reducing the joined dataset (err... I know it makes no sense but it looked like it could be lighter) by selecting only the columns I need before joining and the result is exactly precisely the same, several minutes of querying, with freezes about 10-30seconds long every minute or so, all that with no result in the end.

What in the world could take 30+GB when there are but 5k*15k=75M rows to generate that contain but one int(8) and one varchar(255) mostly empty? (is there a specific setting to help the hash stuff in such cases ?)

Why is there no limit / control as to the HDD space allocated to the tmp_table files ? -> MySQL seems to try and fill it until it fails and then says the table disappeared / corrupted -> not very explanatory, etc.

How to repeat:
SELECT COUNT(attach),'LEFT JOIN FAKE NOT IN' AS what FROM
	(
		SELECT  attach FROM
		(
			SELECT 
			ticket.attach, events.id
			FROM ticket 
			LEFT JOIN
			events
			ON
			events.attach=ticket.attach
		) a1
		WHERE id IS NULL
	) a2
;

With events.id unique, attach containing the same DISTINCT values in both tables, small number of DISTINCT values (32), some NULL, 5k rows in ticket and 15k rows in events.

Both INNODB utf-8 tables, attach varchar(255)

And for the 'lighter' version :

SELECT COUNT(attach),'LEFT JOIN FAKE NOT IN' AS what FROM
	(
		SELECT  attach FROM
		(
			SELECT 
			ticket.attach, events.id
			FROM 
			(SELECT attach FROM ticket) ticket
			LEFT JOIN
			(SELECT id,attach FROM events) events
			ON
			events.attach=ticket.attach
		) a1
		WHERE id IS NULL
	) a2
;
[11 Aug 2011 12:34] Valeriy Kravchuk
Please, send the results of SHOW CREATE TABLE for all tables used and the results of EXPLAIN for problematic query.
[11 Aug 2011 12:58] Morg. What?
CREATE TABLE `events` (
  `id` int(8) NOT NULL AUTO_INCREMENT,
  `description` mediumtext,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `user` varchar(50) NOT NULL,
  `attach` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=35960 DEFAULT CHARSET=utf8

CREATE TABLE `ticket` (
  `id` int(8) NOT NULL AUTO_INCREMENT,
  `detail` mediumtext,
  `priority` smallint(1) unsigned DEFAULT NULL,
  `summary` varchar(100) NOT NULL,
  `attach` varchar(255) DEFAULT NULL,
  `copyto` longtext,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9792 DEFAULT CHARSET=utf8

And the explain (where is the query plan in there ???)

id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,ticket,ALL,,,,,4732,100,
1,SIMPLE,events,ALL,,,,,21763,100,
[11 Aug 2011 13:04] Morg. What?
Small precision, the Explain I gave is based on the internal left join query which returns this as error : /* Cannot retrieve Resultset data */
Explain on the query itself runs the query (wtf ?) and crashes like running the query does (with no explain).
[11 Aug 2011 15:10] Valeriy Kravchuk
Check http://dev.mysql.com/doc/refman/5.5/en/explain-output.html on how to interpret EXPLAIN results.

This information:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,ticket,ALL,,,,,4732,100,
1,SIMPLE,events,ALL,,,,,21763,100,

means that ticket table is read first, entirely (ALL), row by row, and then, for each row from it, entire events table is read to find corresponding row where events.attach=ticket.attach. You do not have indexes on attach column, so full table scan is hardly avoidable (as MySQL do not yet support merge join or hash join).

Would you mind adding index on attach column?

Also, can you show me execution plan from some other RDBMS for the same tables and data? I doubt that any one will process this query better using only nested loop join (the only kind of join we have implemented in 5.5.x).
[12 Aug 2011 7:12] Morg. What?
Valeriy, I can understand that you support MySQL and it's all respectable.

BUT SERIOUSLY

We're talking about a simple nested loop of two tables (with subqueries to select only the right column, this implies two tables of one column, with 15 - 5 k record each - really not much data).

WHICH ENDS UP TAKING FOREVER AND 30+ GB OF TEMP SPACE !!!

This is completely abnormal behaviour, the fact that indexes would fix the problem is of course normal, but that doesn't change my point, there is no reason for such a failure on a simple nested loop.

On each row of the first seq scan, you do a full seq scan of the second table .. ok, you take your results from the inner seq scan, store them in a temp_table, then go on with the outer seq scan.

And in that process, MySQL generates over 30GB of data and takes several minutes ?

That makes no sense at all

And if you believe other DBMS's would crash in the same fashion ... well it's time you tried something else than MySQL / Oracle.

I could definitely do the test in PostgreSQL but I've done comparable stuff in the past (nested loop joins), quite many times  (yes I know I'm evil), and the worst case I've ever seen was about 80 seconds -> result, which was down to 0.08 seconds when I installed the next version.

Let's be clear, what I'm doing now is cleaning up an old disgusting DB from the dreadful times of MyISAM and ASP/PHP4, it does make sense to me that I'm not going to create/delete indexes every two queries, and that a bit of slowness *could* be accepted, whereas a total failure cannot.
[12 Aug 2011 7:28] Valeriy Kravchuk
For the query you sent explain for (internal one):

SELECT 
ticket.attach, events.id
FROM ticket 
LEFT JOIN
events
ON
events.attach=ticket.attach

temporary table is NOT created (as you can see from EXPLAIN results). But you put this subquery into FROM clause, and want to use it as a source of rows for outer select. This "source" (derived table) is materialized (also for EXPLAIN of bigger query, that's why you can not run it successfully) as temporary table. Now, this is LEFT JOIN, so resulting temporary table, before applying final WHERE clause, will contain 15K * 5K = 75M of rows, each row taking up to 255 + 50 + 255 + 100 + ... = 660+ bytes (I counted varchar column size only), and this easily gives us up to:

mysql> select 5000*15000*660;
+----------------+
| 5000*15000*660 |
+----------------+
|    49500000000 |
+----------------+
1 row in set (0.00 sec)

that is 49G or so in size. In reality we have less, as varchar columns are not filled up entirely...

So, I am not surprised with 30G for temporary table in this case, this is expected, taking into account the way you had written your query and indexed your tables.

I am not saying that this is not a problem. I say that this materialization of derived table (subquery) and only one way ("nested loop join" of a kind) to join tables in MySQL, are known problems. There are feature requests to improve this, and we used to have version 6.0.x (now discontinued) with both problems solved to large extent. Eventually code from that 6.0.x will appear in MySQL 5.6+.

I can only repeat my request: show me execution plan from some other RDBMS, MS SQL or PostgreSQL, or whatever, and either we will see something similar for this kind of query, or some other way to process join or derived tables. We know about these other, better ways, and we are in the process of implementing them.
[12 Aug 2011 7:34] Valeriy Kravchuk
Actually, with utf8 each char in varchar columns needs up to 3 bytes... so row size is bigger than I calculated. Let's take this as a compensation for smaller than 255 average number of chars in these columns.
[12 Aug 2011 7:59] Peter Laursen
Just a question from the sideline! Is this correct "In reality we have less, as varchar columns are not filled up entirely...".

In my understanding compression of varchars (as compared to chars) only happens on disk and not in memory. In memory varchars ae expanded to full length, I think.

Peter
(not a MySQL person)
[12 Aug 2011 8:35] Morg. What?
Valeriy, your argument about 49G does not stand.

Of course this is UTF-8 and thus there is a multiplication FOR NON ASCII STRINGS, on the other hand there are 32 unique attach values (at most 50 char long), many empty strings, 5k nulls in events, AND even joining based on the subquery (second query in my bug report) selecting ONLY the required fields does exactly that.

This * makes * no * sense.

I can't even code in C and I could EASILY do it better than MySQL in PHP, this is unacceptable from a dbms, let's just see it as it is , a total epic fail on the part of the engine, and look at ways of fixing it shall we ?

And about UTF-8, not all UTF-8 chars take 3 bytes --

from : http://www.joelonsoftware.com/articles/Unicode.html
"
Thus was invented the brilliant concept of UTF-8. UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes."

And for the varchar comment, no varchars aren't loaded as full 255 strings in memory, how much space do you think 255 null chars take when they can be defined as "no chars in this string".

from nextdoor : http://dev.mysql.com/doc/refman/5.5/en/char.html

"In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. "

So the REAL dataset being used by the pre-select query (i.e. the one where I only select the columns used for the join) IS ridiculously small, we're talking about 32*50*1-byte char, 5000 + 15000 int(8) with values from 0-35000, and of course (logically) a ton of pointers if it's done correctly.

And even if done incorrectly (i.e. no pointers) the total weight of the dataset is really small still, and by your own way of calculating it, way below 49G or even 4.9G
[12 Aug 2011 9:30] Valeriy Kravchuk
Peter,

Yes, for InnoDB table in memory varchar are still (since 3.23.x) expanded to full length. It is actually a known problem that will be fixed really soon. 

Not sure that played any role in this case.
[12 Aug 2011 9:34] Valeriy Kravchuk
No need to guess, just send the results of SHOW TABLE STATUS for these 2 tables with your real data. This was we will see/calculate average row length, and get better estimation than mine.
[12 Aug 2011 9:45] Morg. What?
Alright, I'll send you that once my other 2+ hours join query is completed (that one will succeed, it doesn't take all the temp_space) ...
[12 Aug 2011 10:10] Morg. What?
SHOW TABLE STATUS LIKE 'ticket'
Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
ticket,InnoDB,10,Compact,6018,438,2637824,0,0,58720256,9792,2011-08-11 16:52:35,,,utf8_general_ci,,,

//different table name, moved stuff around since, table created using like, insert select *

SHOW TABLE STATUS LIKE 'old_events'
Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
old_events,InnoDB,10,Compact,20628,178,3686400,0,0,58720256,35960,2011-08-11 16:55:22,,,utf8_general_ci,,,
[12 Aug 2011 10:13] Morg. What?
SHOW TABLE STATUS LIKE 'ticket'
Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
ticket,InnoDB,10,Compact,6018,438,2637824,0,0,58720256,9792,2011-08-11 16:52:35,,,utf8_general_ci,,,

//different table name, moved stuff around since, table created using like, insert select *

SHOW TABLE STATUS LIKE 'old_events'
Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
old_events,InnoDB,10,Compact,20628,178,3686400,0,0,58720256,35960,2011-08-11 16:55:22,,,utf8_general_ci,,,
[12 Aug 2011 10:24] Valeriy Kravchuk
So, we see 438 and 178 as estimated average row length, with that attach varchar(255) column being probably largest part of rows for both tables. Let's take 150 as average length of that attach column, and let's assume that only this column is included into temporary table created (ignoring all pointers of whatever kind, possible indexes and id column). We will get:

mysql> select 5000*15000*150;
+----------------+
| 5000*15000*150 |
+----------------+
|    11250000000 |
+----------------+
1 row in set (0.00 sec)

that is, 11G+, as a smallest expected size of that temporary table. Same order of magnitude as 30G you reported. 

What was so badly wrong with my initial calculations?

I can only repeat that I see nothing new or unknown, or unexpected reported here. let's wait for comments from others.
[12 Aug 2011 12:16] Morg. What?
So, in your eyes:
a) it's normal that empty fields take up 150 bytes
b) it's normal that no pointers are used
c) it's acceptable that such a query would fail with 30G of free space when the result can AT MOST be about 1/10th of the 11G you listed

Again, the TEMP TABLE CANNOT BE 11G+ IN SIZE because NOT EVERY COLUMN MATCHES EVERY OTHER COLUMN ON THE OTHER SIDE.

The 11G is not a minimum size it's, and by far, a maximum size, as the cases where the join generates the biggest number of rows is ALSO the case where the biggest part of the row is empty.

However, I'm saying even the (select attach FROM ticket) query fails, and here are the statistics for the following resulting tables:
ticket:

Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
ticket_test,InnoDB,10,Compact,4505,36,163840,0,0,54525952,,2011-08-12 13:56:10,,,utf8_general_ci,,,

SAme for events (i.e. only select id, attach) :

Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
events_test,InnoDB,10,Compact,15872,100,1589248,0,0,54525952,,2011-08-12 13:56:23,,,utf8_general_ci,,,

About 136 bytes per row

To get the number of rows, i added index on attach for both of these tables ...

And then I run my simplified query

which gives the following result over 11G of free space :

 /* SQL Error: Incorrect key file for table 'C:\WINDOWS\TEMP\#sqle08_2_e.MYI'; try to repair it */

And so I try with 31.2G of free space (and indexes).. like this :

/* SQL Error: Incorrect key file for table 'E:\tmp\#sql3e8_2_0.MYI'; try to repair it */

And so .. I ponder how useful it could be to try joining from the big table to the small table (said big table has way more records and nulls than the small one - who knows maybe a smaller inner seq scan could help --) :

i.e. 
events_test LEFT JOIN ticket_test
instead of
ticket_test LEFT JOIN events_test

NOTE: this makes no sense at all as my objective was to check on events_test.id IS NULL afterwards, but .. anything for research

And here is the result, still with 31.2G of free temp space:

/* SQL Error: Incorrect key file for table 'E:\tmp\#sql3e8_2_1.MYI'; try to repair it */
[12 Aug 2011 12:51] Morg. What?
Additional tests :
With an insert in the tables _test (those with only attach or id,attach) that is LIMIT x :

With LIMIT 1000 for the insert in both tables :
instant query, result = 1000
With LIMIT 5000:
23 sec query, result =0
With LIMIT 7500:
146 sec query, result = 0 , 11GB used
With LIMIT 10000:
249 sec query, result = 0 , space not tracked, took too long to watch over

Obviously going anywhere above brings us close to the problem, so no real point going there but it looks like 50 - 60G of free space could eventually lead to a result (after a long while) - still a bug though.

// following this type of statements
CREATE TABLE events_test
SELECT id,attach FROM old_events;
CREATE TABLE ticket_test
SELECT attach FROM ticket;
DELETE FROM events_test;
INSERT INTO events_test
(SELECT id,attach FROM old_events LIMIT 5000);
DELETE FROM ticket_test;
INSERT INTO ticket_test
(SELECT attach FROM ticket LIMIT 5000);
CREATE INDEX tmp_index_4 ON ticket_test (attach);
CREATE INDEX tmp_index_5 ON events_test (attach);
SELECT COUNT(attach),'LEFT JOIN FAKE NOT IN' AS what FROM  (  SELECT   ticket_test.attach, events_test.id  FROM events_test  LEFT JOIN  ticket_test   ON  events_test.attach=ticket_test.attach  ) a1  WHERE id IS NULL
[20 Aug 2011 8:07] Sveta Smirnova
Thank you for the feedback.

> So, in your eyes:
> a) it's normal that empty fields take up 150 bytes

Yes, this is how relational databases work

> b) it's normal that no pointers are used

> c) it's acceptable that such a query would fail with 30G of free space when the result
can AT MOST be about 1/10th of the 11G you listed

There are lot of feature request about how to improve OPTIMIZER. No sense to add new one. Currently you must have indexes to be able to run JOINs faster.

Please subscribe to bug #2904 and bug #25402 which both affect your case.

Closing as "Not a Bug"
[20 Aug 2011 11:27] Morg. What?
a) That is not how all RDBMS's work, that's how mysql works, and we both know it's largely suboptimal as a varchar ends up as heavy as a char... and it ain't hard to compare the following "i'm a short string \35"(empty_blocks) and "i'm not\47" (empty_blocks) and "\54" (oo an empty string !!)- if you really need me to point you an alternative most obvious...

b) I used indexes, it made no difference in this case.

c) The optimizer indeed has *major* problems but this is not the topic right now

d) 31.2G friggin free space for a 4.1G estimated resultset, what you fill it with, alternative imaginary values from the parallel universes ? I didn't ask for that.

And you expect to close this as not a bug ???

This is NOT an optimizer issue, this is the MySQL engine doing random useless stuff for no reason(this is the bug), slowly (this is the optimizer issue).
[22 Aug 2011 6:57] Øystein Grøvlen
If I am not mistaken, the given query is unnecessary complex, and the
same result can be achieved with

SELECT count(ticket.attach)
FROM ticket LEFT JOIN events
ON events.attach=ticket.attach
WHERE events.id IS NULL;
[22 Aug 2011 8:40] Morg. What?
Yup .. there is indeed useless nesting however it doesn't quite have any link with the bug.