Bug #14070 MySQL hangs on "Copying to tmp table"
Submitted: 17 Oct 2005 11:03 Modified: 24 Jan 2014 13:20
Reporter: Frank Osterberg Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0.16, 5.0.15, 4.1.14, 5.1 OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: bfsm_2006_11_02

[17 Oct 2005 11:03] Frank Osterberg
Description:
The following happens:
1.) Execute some Query.
2.) Thread Status of executing Query goes to (and stays at) “Copying to tmp table”
Result: Server is at 100% CPU Utilization indefinitely and the Query does not finish (or fail).
Note: once the query hangs there is no significant harddrive activity and the system resources are constant (i.e. free physical memory/cpu)

Searching I found that this happens on both Linux and Windows machines, is not predictable, can sometimes be resolved with a flush tables / server restart / system restart. Apparently resource related although same occurs even on true servers and regardless of number of executing/connected threads.

The problem is NOT that is just takes longer but that it totally hangs (for hours/days/weeks) until it is manually killed.

For further info please see:
http://forums.mysql.com/read.php?24,39066,45979#msg-45979

and:
http://forums.mysql.com/read.php?24,39066,39066#msg-39066

How to repeat:
Not predictable but apparently fails more often at (long) queries using grouping functions such as SUM, AVG, MIN, MAX, STD and using GROUP BY with nested sub queries and/or multiple joins.

However, once a query hangs, i found that it will hang again on re-execution until the server is restarted (for me even flushing the tables does not work)

It’s not that queries just take longer but that they will indefinitely hang.

Suggested fix:
Fix code (I’m rather sure it is a code problem!)
[17 Oct 2005 11:25] Valeriy Kravchuk
Thank you for a problem report. Please, provide the exact queries used, as well as the structure of underlying tables and, at least, number of rows in each of them.

Wihout this information is is absolutely impossible to undertand, what really happened and is it a bug.

As for the discussion you pointed out, I've seen them before... Have you read the final messages in the thread (http://forums.mysql.com/read.php?24,39066,48206#msg-48206, for example). Don't they explain possible problem you encountered?

Can you really show the query that works OK on MySQL 3.x, and "hangs indefinitely" on 4.1.14, on the same data and server configuration?
[24 Oct 2005 1:13] Chip Yamasaki
I can confirm that I am having the same problem.  After submitting http://bugs.mysql.com/bug.php?id=14127, I decided to upgrade our servers to 4.1.14 official binaries.  I have been running into this problem regularly since.  A simple query that joined 2 tables now runs fine in under .2 seconds nearly all of the time, but once in a while (more frequently all the time), it goes into this "Copying to tmp table" state and just hangs there.  Same query that runs in .2 seconds or less.

This DEFINITELY appears to be a MySQL problem.  Please don't confuse this with the post you mentioned, because that person was strayed from the initial compaint in that forum post.  The initial complaint in that thread was not about the query being slow, it was about queries being hung in the "Copying to tmp table" state, which MANY people appear to be seeing on MANY platforms.

Our 4.0 server ran fine for many months with the same query and same data without this problem.
[24 Oct 2005 1:22] Chip Yamasaki
This should be taken from Severity S5 to "critical", since it effectively prevents other queries from running indefinitely.  I'm sure all of the people commenting in the forums about this would concur.
[24 Oct 2005 12:30] Valeriy Kravchuk
What we really need is 100% repetable test case, better - on Linux and simple one.

So, please, send the results of SHOW CREATE TABLE for bd_order and bd_contact tables, inform about the number of rows in both tables, provide the my.cnf content and any unusual messages you got, if any.

Explain me exactly what to do to get this hang.
[28 Oct 2005 22:32] Chip Yamasaki
After doing some more searching, I may have been bitten by a problem similar to the bug verified in http://bugs.mysql.com/bug.php?id=4460.  The question is, how has such an important issue, the proper execution of subqueries, been alowed to stay open 15 months?

It appears that this same behavior also impacts some other queries where the same table is used twice in a single query join, either to self or another table.

This needs serious attention.
[13 Nov 2005 10:55] Valeriy Kravchuk
This report should be in an open state since October 24th really...
[17 Nov 2005 15:57] Thomas Deliduka
I am the author of the original thread. I believe most of the things you're looking for are right in the thread if you have the time to read it.

The query for me that hangs all the time is:

explain SELECT phpbb_topics.topic_title, phpbb_topics.topic_replies, phpbb_topics.topic_views, phpbb_topics.topic_id, phpbb_users.username, phpbb_users.user_realname, phpbb_posts.post_id, phpbb_posts.poster_id, phpbb_posts.post_time FROM ((phpbb_topics INNER JOIN phpbb_posts ON phpbb_topics.topic_last_post_id = phpbb_posts.post_id) INNER JOIN phpbb_users ON phpbb_users.user_id = phpbb_posts.poster_id) INNER JOIN phpbb_forums ON phpbb_forums.forum_id = phpbb_topics.forum_id WHERE (phpbb_forums.auth_view <= 0 ) AND (phpbb_forums.auth_read <= 0 ) AND phpbb_forums.forum_id NOT IN (5, 6, 2, 11) AND phpbb_topics.topic_status <> 2 ORDER BY post_time DESC LIMIT 5;

The explain function says
able type possible_keys key key_len ref rows Extra 
phpbb_forums ALL PRIMARY NULL NULL NULL 11 Using where; Using temporary; Using filesort 
phpbb_topics ref forum_id,topic_last_post_id forum_id 2 phpbb_forums.forum_id 2041 Using where 
phpbb_posts eq_ref PRIMARY,poster_id,p_id PRIMARY 3 phpbb_topics.topic_last_post_id 1 
phpbb_users eq_ref PRIMARY PRIMARY 3 phpbb_posts.poster_id 1 

If you're familiar with phpbb you can get the table struture easily from phpbb.com.

I had this problem originally on 4.0.x that is when I posted the original thread. I made TONS of updates and changes as suggested in the thread (please read the whole thing!) and none of the suggestions has fixed my problem. Then I upgraded to 4.1.14-standard and the problem still exists.

Please give this some serious though. I would also be willing to give the MySQL development team access to my server if it would help.

Currently to keep my server running I have a process run every 7 minutes to check for queries that are hung then flush the tables. Then at 3:30am everyday I restart the mysql server process. That keeps things running. Otherwise without the restart my server would die within 36-48 hours even with the flush tables directive.  Without flushing tables my server lasts about 7-10 hours before it's hung.
[3 Dec 2005 16:34] Valeriy Kravchuk
Bug #15368 is marked as a duplicate of this one. Seed the additional details in that report too. Hanging of phpbb queries on "Copying to tmp table" is noted on Linux 2.6.x, with latest GA versions of MySQL 5 too.
[5 Dec 2005 14:02] Thomas Deliduka
it is not just phpBB though. I have a website, finerkitchens.com that is going down constantly because of a join query that is hanging too.  I only gave the phpBB query as an example because it is the most common.

I am going to post a second time the query that has trouble on finerkitchens.com away from the public view due to proprietary issues and security.
[5 Dec 2005 16:06] Dâniel Fraga
If it doesn't happen only with phpbb then I presume it's only a mysql bug (phpbb developers didn't even answer my bug report because fo this, I think).

Is there anyway to compile mysql with some sort of debug support or to run it thru gdb or some king of debug tool so we can discover what's happening? Thanks.
[6 Dec 2005 20:32] James Harvard
Think I am seeing the same issue (4.1.15 on Mac OS X and Win XP).

I suspect that server resources are a factor (e.g. RAM size - see below for why) which may mean that it is not always possible to reproduce the issue.

I first saw the problem when server I'm developing on (Win XP) bogged down. SHOW PROCESSLIST listed 3 threads that had been running for over 3 hours with status 'copying to tmp table'. This has happened a couple of times since.

The problem queries are variations on a multi-table join involving a couple of large tables (6.5 & 330 MB data files). I've trimmed the query to take out other tables, removing ORDER BY, GROUP BY and SUM() elements.

Strangely, as I have tried slimming the query it has stopped hanging on 'copying to tmp table' and started hanging on 'sending data'.

I dumped a subset of the data from the two tables into a test db (also 4.1.15) on my Mac OS box (n.b. that all the rows actually involved in the query result were included in the dump).

The Mac is not as powerful as the Windows box (1.4 GHz G4 with 516MB versus 3 GHz Pentium with 1GB). A query that was trimmed until it did return on the Windows box (in about 9 seconds) has now been running on my Mac for over 30 minutes and is still going. I think you'll agree that this difference is disproportionate to the difference between the 'puters.

I only have remote access to the Windows box, but on the Mac it looks like MySQL starts taking all the CPU time it can. However there is almost no disc access.

Below are the relevant statements & table structures. I can supply data if required.

If you're wondering what I'm trying to do there's an explanation at the end.

CREATE TABLE `data_eu_at_i` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `is_import` tinyint(1) NOT NULL default '1',
  `comcode_id` mediumint(8) unsigned NOT NULL default '0',
  `date_id` tinyint(3) unsigned NOT NULL default '0',
  `country_id` smallint(5) unsigned NOT NULL default '0',
  `value` bigint(20) unsigned NOT NULL default '0',
  `quantity_1` bigint(20) unsigned NOT NULL default '0',
  `quantity_2` bigint(20) unsigned default NULL,
  `c_value` bigint(20) unsigned NOT NULL default '0',
  `c_quantity_1` bigint(20) unsigned NOT NULL default '0',
  `c_quantity_2` bigint(20) unsigned default NULL,
  PRIMARY KEY  (`id`),
  KEY `comcode_id` (`comcode_id`),
  KEY `date_id` (`date_id`),
  KEY `country_id` (`country_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;

CREATE TABLE `comcodes` (
  `comcode_id` mediumint(8) unsigned NOT NULL auto_increment,
  `comcode` varchar(16) NOT NULL default '',
  `impexp` tinyint(3) unsigned NOT NULL default '3',
  `unit_1` varchar(64) NOT NULL default '',
  `unit_2` varchar(64) default NULL,
  `descrip` text NOT NULL,
  `rep_auth_id` tinyint(3) unsigned NOT NULL default '0',
  `year_start` smallint(5) unsigned NOT NULL default '0',
  `year_end` smallint(5) unsigned default NULL,
  PRIMARY KEY  (`comcode_id`),
  UNIQUE KEY `rep_auth_id` (`rep_auth_id`,`impexp`,`comcode`,`year_start`),
  FULLTEXT KEY `descrip` (`descrip`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

# This query returns successfully (275 rows) on the Windows box but stalls on the Mac
# Note that no rows from table alias cc2 are selected
# If the cc2 join is removed the the query returns normally on the Mac
select left(cc.comcode,2) as comcode, d.quantity_1 as mon_qty1
from data_eu_at_i d
inner join comcodes cc on d.comcode_id = cc.comcode_id and cc.comcode regexp '^(10)......' and cc.rep_auth_id = 1
inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode and cc2.rep_auth_id = 1
where d.date_id = 60;

# This query only has the extra cc2.descrip column but stalls on the Windows box
select left(cc.comcode,2) as comcode, d.quantity_1 as mon_qty1, cc2.descrip
from data_eu_at_i d
inner join comcodes cc on d.comcode_id = cc.comcode_id and cc.comcode regexp '^(10)......' and cc.rep_auth_id = 1
inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode and cc2.rep_auth_id = 1
where d.date_id = 60;

Here is the result of EXPLAIN SELECT for the two queries above.

Query that returns ok:
+----+-------------+-------+--------+---------------------+-------------+---------+----------------------+--------+--------------------------+
| id | select_type | table | type   | possible_keys       | key         | key_len | ref                  | rows   | Extra                    |
+----+-------------+-------+--------+---------------------+-------------+---------+----------------------+--------+--------------------------+
|  1 | SIMPLE      | d     | ref    | comcode_id,date_id  | date_id     |       1 | const                | 138655 | Using where              |
|  1 | SIMPLE      | cc    | eq_ref | PRIMARY,rep_auth_id | PRIMARY     |       3 | bts_new.d.comcode_id |      1 | Using where              |
|  1 | SIMPLE      | cc2   | ref    | rep_auth_id         | rep_auth_id |       1 | const                |  21226 | Using where; Using index |
+----+-------------+-------+--------+---------------------+-------------+---------+----------------------+--------+--------------------------+

Query that stalls:
+----+-------------+-------+--------+---------------------+---------+---------+----------------------+--------+-------------+
| id | select_type | table | type   | possible_keys       | key     | key_len | ref                  | rows   | Extra       |
+----+-------------+-------+--------+---------------------+---------+---------+----------------------+--------+-------------+
|  1 | SIMPLE      | d     | ref    | comcode_id,date_id  | date_id |       1 | const                | 138655 | Using where |
|  1 | SIMPLE      | cc2   | ALL    | rep_auth_id         | NULL    |    NULL | NULL                 |  33323 | Using where |
|  1 | SIMPLE      | cc    | eq_ref | PRIMARY,rep_auth_id | PRIMARY |       3 | bts_new.d.comcode_id |      1 | Using where |
+----+-------------+-------+--------+---------------------+---------+---------+----------------------+--------+-------------+

Overview:
 * The 'comcodes' table holds classification codes for the data (import/export trade figures).
 * I'm doing a normal inner join to match the data rows to their classification information.
 * For some queries I want to group the data not by the normal full 8-digit classification code, but by the first two digits of the code.
 * I do a second join to the 'comcodes' table to get the description of the 2-digit classification code. That's when I start to see problems (but only sometimes).

(Not sure if this is relevant: I think the test query that stalled on the Windows box may have returned after a very long time. I was running the test query using Lasso web application middleware, which timed out after a few minutes. SHOW PROCESSLIST showed a thread from Lasso as having 'sleep' status, so I tried to kill it using 'KILL n;' The status changed to 'killed', but the thread is still there with 'killed' status after at least an hour. I don't know whether this is a symptom of the MySQL issue (thread stuck in an infinite loop therefore not checking for kill flag?) or whether it might be Lasso's handling of problematic MySQL connections. But a thread killed while it is stuck on 'sending data' goes away.)
[6 Dec 2005 20:37] Thomas Deliduka
My server is running 4GB of RAM with over 250GB of disk space available. When the hang happens I don't see the CPU being maxed out and the RAM levels are fine. In fact, there is no visible sign by looking at the server status that there is a problem.
[9 Dec 2005 14:33] Valeriy Kravchuk
James: 

Thank you for a simplest test case. So, have you seen 'copying to tmp table' with that query:

select left(cc.comcode,2) as comcode, d.quantity_1 as mon_qty1, cc2.descrip
  from data_eu_at_i d inner join comcodes cc on d.comcode_id = cc.comcode_id and cc.comcode regexp '^(10)......' and cc.rep_auth_id = 1 
 inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode and cc2.rep_auth_id = 1 
 where d.date_id = 60;

If yes, please, try to provide the smallest set of rows for these 2 tables that will demonstrate it. At leaset, inform how many rows needed and provide a couple of examples.

All: have you seen any disk activity during this hang? Send the SHOW STATUS results before the problematic query and after some time hanging. Thank you for your patience.
[11 Dec 2005 15:41] James Harvard
I think I understand where things are going wrong now.

I was RTFMing about EXPLAIN SELECT and how MySQL performs joins, and I remembered noticing that the table order in EXPLAIN SELECT's output was different for my two test queries. Looking at the output now I can see that for some reason MySQL has switched the order of the join so that it is attempting to join cc2 (table alias for second join to 'comcodes') to the data table before cc (alias for first join to 'comcodes').

To quote the manual:
"MySQL reads a row from the first table, then finds a matching row in the second table, then in the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table."

Just a reminder of the join:

select left(cc.comcode,2) as comcode,
 d.quantity_1 as mon_qty1, cc2.descrip
from data_eu_at_i d
inner join comcodes cc on d.comcode_id = cc.comcode_id
  and cc.comcode regexp '^(10)......' and cc.rep_auth_id = 1
inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode
  and cc2.rep_auth_id = 1
where d.date_id = 60;

If my understanding of MySQL's join procedure is correct, and if the query optimizer is ordering the tables for the join as d - cc2 -cc, then MySQL will try to join every row of the 'comcodes' table cc2 to the data table, because cc2 depends on cc and has no direct relationship with the data table in its join condition. Then it will have to join each join product row to 'comcodes' again. Many, many rows.

I think my theory is proved by testing the query with the STRAIGHT JOIN command, which forces the tables to be joined in the order d - cc - cc2. The query then returns normally.

Therefore I don't think my issue (emphasis on 'my') is a stalling bug, but rather the query optimizer performing badly and causing MySQL to perform the join in a very inefficient way. (I think this also explains why some queries performed differently with a sub-set of the data - MySQL was ignoring some indices because they had low cardinality when only a small sub-set of the data was being used for testing.)

Is there enough info here for you to check whether the query optimizer can be improved to avoid this problem, Valeriy?

That said, I'm not sure all the comments here are reporting the same issue that I was seeing. I will also check further to see whether my 'real world' instances of this issue have the same solution.
[12 Dec 2005 20:12] Thomas Deliduka
I don't know if this is a question of query optimization for me anymore. I know that I am the originator of this bug. Now I only found this today so I don't know if my problem is fixed or not. I noticed on one of my servers there was one particular query that was causing all the problems.  This one:

select DISTINCT xno_items.itemid, prod_ref, title, short_title, subtitle, promo, author, maincat, inventory, price, strikeprice, retprice, reteprice, saleperc, onsale, clearance, blockp, saledt, xno_items.added, updated, thumb_image, xno_manufacturers.mfg_name
from xno_items
left join xno_item_cat on xno_items.itemid = xno_item_cat.itemid
left join xno_categories on xno_item_cat.catid = xno_categories.catid
left join xno_manufacturers on xno_items.mfgid = xno_manufacturers.mfgid
LEFT JOIN xno_prod_index ON xno_items.itemid = xno_prod_index.itemid
where xno_items.active = 1 AND ((xno_prod_index.word ='thermometer')) AND xno_categories.active = 1 order by xno_prod_index.quan DESC, xno_prod_index.density DESC, xno_items.ordered DESC, xno_items.title;

I change the join order to this:
LEFT JOIN xno_prod_index ON xno_items.itemid = xno_prod_index.itemid
left join xno_item_cat on xno_items.itemid = xno_item_cat.itemid
left join xno_categories on xno_item_cat.catid = xno_categories.catid
left join xno_manufacturers on xno_items.mfgid = xno_manufacturers.mfgid

and instead of 9 seconds to execute the query, it took 0.14 seconds.

So far today the server has been humming along without any hangs problems.
[14 Dec 2005 1:22] James Harvard
What output do you get from EXPLAIN SELECT {problem query here} ?

(Maybe we should start a new forum / list thread for this.)
[14 Dec 2005 14:40] Thomas Deliduka
Before the switching of the joins:

 id  select_type  table              type    possible_keys   key      key_len  ref                                       rows  Extra                           

  1  SIMPLE       xno_items          ref     PRIMARY,active  active         4  const                                     1647  Using temporary; Using filesort 
  1  SIMPLE       xno_item_cat       index   NULL            PRIMARY        8  NULL                                      5152  Using index                     
  1  SIMPLE       xno_categories     eq_ref  PRIMARY,active  PRIMARY        4  finerkitchens_com.xno_item_cat.catid         1  Using where                     
  1  SIMPLE       xno_manufacturers  eq_ref  PRIMARY         PRIMARY        4  finerkitchens_com.xno_items.mfgid            1                                  
  1  SIMPLE       xno_prod_index     eq_ref  PRIMARY         PRIMARY       64  finerkitchens_com.xno_items.itemid,const     1  Using where; Distinct           

After switching the joins:
 id  select_type  table              type    possible_keys   key      key_len  ref                                       rows  Extra                           

  1  SIMPLE       xno_items          ref     PRIMARY,active  active         4  const                                     1647  Using temporary; Using filesort 
  1  SIMPLE       xno_prod_index     eq_ref  PRIMARY         PRIMARY       64  finerkitchens_com.xno_items.itemid,const     1  Using where                     
  1  SIMPLE       xno_item_cat       index   NULL            PRIMARY        8  NULL                                      5152  Using index                     
  1  SIMPLE       xno_categories     eq_ref  PRIMARY,active  PRIMARY        4  finerkitchens_com.xno_item_cat.catid         1  Using where                     
  1  SIMPLE       xno_manufacturers  eq_ref  PRIMARY         PRIMARY        4  finerkitchens_com.xno_items.mfgid            1                                  

Is there a "cleaner" way of presenting an explain here?
[15 Dec 2005 14:07] Valeriy Kravchuk
Thomas,

> Is there a "cleaner" way of presenting an explain here?

You may try to send your explain using \G:

mysql> explain select * from t\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra:
1 row in set (0.00 sec)

Or, simply, put the results in file and upload it.
[4 Jan 2006 13:52] Dâniel Fraga
I upgraded to mysql 5.0.18 and phpbb 2.0.19 and, until now, I didn't see this bug happening anymnore. Was it fixed? Anyway, I commented so people upgrade too and test if this bug has gone (or at least it doesn't happen anymore). Thanks.
[10 Jan 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Jan 2006 20:55] Valeriy Kravchuk
Dear reporters,

This bug report contains several reports in one, really. So, please, note, that some related bugs (like bug #4460, about inefficient subqueries optimization), or inefficiant optimization of [NOT] IN  (1, 2, 3), incorrect optimization of some JOINs (not using indexes) etc. are well-known, and shold be fixed later. 

What is common amongh all the reports here is that because of inefficient optimization SELECTs need to use (large) temporary tables and, thus, became extrimely inefficient.

If somebody of you feels that his specific query is still optimized improperly in latest versions, and it has no subquery or [NOT] IN clause, please, submit new bug reports (but check tha tdatabase for existing similar ones).

Thank you for your patience. We really need to split several problems mentioned here.
[20 Jan 2006 21:16] Thomas Deliduka
I am not sure what my problem is. I do know that the suggestion to move to 5.0.19 has helped a little but I still end up having problems with my postnuke/phpbb site (only site running on the 5.x server right now) although the problems are much further apart. Instead of once a day it's like once every 4 days.

I was able to optimize the query as y'all know but I still have a problem when there are multiple search phrases like this:

select DISTINCT xno_items.itemid, prod_ref, title, short_title, subtitle, promo, author, maincat, inventory, price, strikeprice, retprice, reteprice, saleperc, onsale, clearance, blockp, saledt, xno_items.added, updated, thumb_image, (select xno_manufacturers.mfg_name from xno_manufacturers where xno_manufacturers.mfgid = xno_items.mfgid) as mfg_name from xno_items inner join xno_prod_index on xno_items.itemid = xno_prod_index.itemid AND ((xno_prod_index.word ='port') OR (xno_prod_index.word ='glasses')) inner join xno_item_cat on xno_items.itemid = xno_item_cat.itemid inner join xno_categories on xno_item_cat.catid = xno_categories.catid AND xno_categories.active = 1 where xno_items.active = 1 order by xno_prod_index.quan DESC, xno_prod_index.density DESC, xno_items.price DESC, xno_items.title
[20 Jan 2006 21:39] Valeriy Kravchuk
Thomas,

That 

((xno_prod_index.word ='port') OR (xno_prod_index.word ='glasses')) 

can be a problem here. Do you have index on xno_prod_index.word?
[20 Jan 2006 22:23] Thomas Deliduka
There is a key on it, however it's coupled as a primary key with "itemid" should I add an additional index just to that field?
[21 Jan 2006 9:06] Valeriy Kravchuk
If 'word' is not a first column in the primary key, then yes, additional index may help. 

But, please, stop asking questions on this bug report. Especially those, that can be qualified as free support requests. If you think that you had found a bug, just create a new report.
[3 Mar 2006 8:28] Michael Fuita
Just for the developers, this is also happening in MySQL 4.1.18 and has been crashing my site continuously for days and days non stop. I'm using phpBB 2.0.19 and many of the queries run fine and it's just out of the blue that some fast queries will halt at copying to tmp table.
[24 May 2006 15:38] John McNally
I'm seeing the problem of hanging in the "Copying to tmp table" state in version 4.0.15 as well.

The table contains about 100,000 rows and its definition is:

CREATE TABLE FOO
(
        A INTEGER NOT NULL,
        B INTEGER NOT NULL,
        C INTEGER NOT NULL,
        NAME VARCHAR (255) NOT NULL,
        FLAG_1 INTEGER (1) default 1 NOT NULL,
        FLAG_2 INTEGER (1) default 0 NOT NULL,
        ORDER INTEGER default 0 NOT NULL,
        FLAG_3 INTEGER (1) default 0 NOT NULL,
        FLAG_4 INTEGER (1) default 0 NOT NULL,
        FLAG_5 INTEGER (1) default 0 NOT NULL,
    PRIMARY KEY(A,B,C),
    FOREIGN KEY (B) REFERENCES YTBL (B),
    FOREIGN KEY (A) REFERENCES XTBL (A),
    FOREIGN KEY (C) REFERENCES ZTBL (C),
    INDEX IX_NAME (NAME),
    INDEX IX_FLAG2 (FLAG_2),
    INDEX IX_FLAG3 (FLAG_3),
    INDEX IX_CBA (C, B, A),
    INDEX IX_A (A)
);

The query is a very simple grouping sql:

SELECT count(FOO.B), FOO.B
FROM FOO
WHERE FOO.C IN (50 IDs)
AND FOO.B IN (20 IDs)
AND FOO.A IN (300 IDs)
GROUP BY FOO.B

The number of IDs in each of the IN clauses varies and occasionally the query gets stuck in the "Copying to tmp table" state.  This is the most serious part of the problem and seems to be the same problem described in this issue, but there is another problem which I'll briefly mention:

EXPLAIN on the query gives

+-------+-------+---------------------+---------+---------+------+--------+-----------------------------------------------------------+
| table | type  | possible_keys       | key     | key_len | ref  | rows   | Extra                                                     |
+-------+-------+---------------------+---------+---------+------+--------+-----------------------------------------------------------+
| FOO   | range | PRIMARY,IX_CBA,IX_A | PRIMARY |      12 | NULL | 369740 | Using where; Using index; Using temporary; Using filesort |
+-------+-------+---------------------+---------+---------+------+--------+-----------------------------------------------------------+
1 row in set (2.98 sec)

Note that the table only has 100k rows.  However, if I modify the query to force the use of IX_A index:

SELECT count(FOO.B), FOO.B
FROM FOO force index (IX_A)
WHERE FOO.C IN (50 IDs)
AND FOO.B IN (20 IDs)
AND FOO.A IN (300 IDs)
GROUP BY FOO.B

the explain output changes to:

+-------+-------+---------------+---------+---------+------+-------+-------------+
| table | type  | possible_keys | key    | key_len | ref  | rows  | Extra       |
+-------+-------+---------------+--------------+---------+------+-------+-------------+
| FOO   | range | IX_A            | IX_A  |       4 | NULL | 16455 | Using where |
+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.00 sec)

The time to execute the query also drops about 2 orders of magnitude.

What would cause the query optimizer to choose to use a temporary table and filesort when it can use an index to avoid that?
This probably isn't the place to ask and answer that question, but just wanted it to be clear that the use of the temporary table itself is not desirable, but the fact that the query optimizer chooses to do so helps to trigger this bug.
[24 May 2006 16:41] John McNally
Great, I messed up when sanitizing the ddl.  The index IX_A should be IX_B (B).  The index is on the same column that is is being grouped.
[1 Sep 2006 14:50] Matt Lavallee
On Windows, 4.1.14, I resolved this issue by lowering thread_concurrency.  I have a dual-dual-core box, but any thread concurrency above 2 would cause infinite "copying to tmp table" statuses on ANY join.
[12 Sep 2006 3:54] Rodolfo Gonzalez
I'm having this very same problem with MySQL 5.0.24a on Slackware Linux i386. I'm a bit worried when I read that this problem has 1+ year without solution (setting the concurrent threads number to 2 didn't help). I have notice that this problem is more frequent when using DISTINCT or GROUP BY.
[3 Nov 2006 9:16] Kjetil Jensen
Hello,

We have a issue with an customer, the server gets very slow, and the cpu isnt being used and not the memory either, so i think its the disk.
We are today running a SQL server with Dual Xeons 3.2Ghz, 2GB Memory, and the MySQL version is: 4.1.21

This is the following thing so hanging the server:
--------------------------------------------------------------------
| 16578 | oxtract_oxtract      | web01.proinet.se:43370 | oxtract_oxtract   | Query   | 483  | Copying to tmp table on disk | SELECT * FROM resultat,persondata,klubbdata
   WHERE
   resultat.klass_id = '30260'
   AND persondata.per |
--------------------------------------------------------------------

And i havent found any fix for this? Is this his fault or is it an bug in MySQL? 
I have warned the customer, but he is also gonna test on his server, but i really want to know how if its possible to fix this or what actions i can do atleast?
[3 Apr 2007 21:57] Michael Ash
I've been here for 14.5 days. I don't mind waiting, but I would like some assurance that the query is moving forward rather than simply hanging.

Here is the explain for a query that I am attempting. As you can see, I am left joining one large table (1.6 million rows of which I am subsetting about 200,000) and one very large table (1 billion rows). The query also involves a SUM and GROUP BY.

--------------
EXPLAIN
SELECT stusab,logrecno,necma,pmsa,msacmsa,ma,county,tract,tract_cob,blkgrp,x,y,arealand,areawatr,SUM(ToxConc) AS ToxConc
FROM sf3.geo FORCE INDEX (x,logrecno)
LEFT JOIN grids2.cell_2003 FORCE INDEX (X) USING (x,y)
WHERE sumlev="150"
GROUP BY stusab,logrecno
--------------

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE geo index NULL logrecno 15 NULL 1594259 Using where
1 SIMPLE cell_2003 ref X X 5 sf3.geo.x 966422338
2 rows in set

Bye

Should I restructure the query to subset the 200,000 from the 1.6 million into a new table first and then join and group the 1 billion?

Any other tips? Again, I don't mind a long wait, but I would like some idea of whether or not the query will run. Thanks.

Here is the current output of 'show processlist' for the query:

Id User Host db Command Time State Info
81343 mash localhost:4347 NULL Query 1248470 Sending data SELECT stusab,logrecno,necma,pmsa,msacmsa,ma,county,tract,tract_cob,blkgrp,x,y,arealand,areawatr,SUM
[2 May 2007 6:17] Jesse Proudman
We are also having this issue.  Please let us know what details we can provide.
[5 May 2007 6:27] Valeriy Kravchuk
All reporters: 

Please, re-read my comment dated [20 Jan 2006 21:55] to this bug report before sending further comments/reporting new similar bugs. 

Many of you may find this interesting also:

http://jcole.us/blog/archives/2007/02/08/progress-in-mysql-process-list/
[14 Nov 2008 7:15] Robert Freeland
This is STILL a bug in the latest version of MySQL. Queries involving complex joins on large tables can still trigger a "copying to tmp table" status that can run for DAYS (or longer) without finishing. The only dependable workaround is to break the query into smaller pieces (e.g. - by filtering on date range).

IMHO, this goes way beyond a query optimization problem. Even allowing that the optimizer chooses to use a temporary table when it doesn't need to, the actual algorithm for USING that temporary table seems almost inconceivably SLOW. I had a process tonight running for 3 hours, and it only managed to write 45MB to the temporary table, all while using almost zero CPU. I can copy that entire table in 15 seconds, so the process isn't CPU bound. What can it possibly be doing??

This bug has been open for THREE YEARS! Time to fix it already.
[14 Nov 2008 7:28] Robert Freeland
I found the bug report at http://bugs.mysql.com/bug.php?id=28257 that seems to be a continuation of this one. It suggests that the bug may be fixed in the alpha v6.0 builds. So just to clarify, when I said above that the bug still exists in the "latest version", I meant the latest RC version, which is 5.1.29.
[24 Jan 2014 13:19] Manyi Lu
Posted by developer:
 
We have made a lot of relevant fixes since 4.x/5.0, including subquery optimizations. If you have similar problems again, please create a new bug report and add a test case. Thanks.