Bug #14070 MySQL hangs on "Copying to tmp table"
Submitted: 17 Oct 2005 13:03 Modified: 20 Jan 2006 21:55
Reporter: Frank Osterberg
Status: To be fixed later
Category:Server: Optimizer Severity:S2 (Serious)
Version:5.0.16, 5.0.15, 4.1.14 OS:Linux (Linux)
Assigned to: Target Version:
Tags: bfsm_2006_11_02

[17 Oct 2005 13:03] Frank Osterberg
Description:
The following happens:
1.) Execute some Query.
2.) Thread Status of executing Query goes to (and stays at) “Copying to tmp table”
Result: Server is at 100% CPU Utilization indefinitely and the Query does not finish (or
fail).
Note: once the query hangs there is no significant harddrive activity and the system
resources are constant (i.e. free physical memory/cpu)

Searching I found that this happens on both Linux and Windows machines, is not
predictable, can sometimes be resolved with a flush tables / server restart / system
restart. Apparently resource related although same occurs even on true servers and
regardless of number of executing/connected threads.

The problem is NOT that is just takes longer but that it totally hangs (for
hours/days/weeks) until it is manually killed.

For further info please see:
http://forums.mysql.com/read.php?24,39066,45979#msg-45979

and:
http://forums.mysql.com/read.php?24,39066,39066#msg-39066

How to repeat:
Not predictable but apparently fails more often at (long) queries using grouping
functions such as SUM, AVG, MIN, MAX, STD and using GROUP BY with nested sub queries
and/or multiple joins.

However, once a query hangs, i found that it will hang again on re-execution until the
server is restarted (for me even flushing the tables does not work)

It’s not that queries just take longer but that they will indefinitely hang.

Suggested fix:
Fix code (I’m rather sure it is a code problem!)
[17 Oct 2005 13:25] Valeriy Kravchuk
Thank you for a problem report. Please, provide the exact queries used, as well as the
structure of underlying tables and, at least, number of rows in each of them.

Wihout this information is is absolutely impossible to undertand, what really happened
and is it a bug.

As for the discussion you pointed out, I've seen them before... Have you read the final
messages in the thread (http://forums.mysql.com/read.php?24,39066,48206#msg-48206, for
example). Don't they explain possible problem you encountered?

Can you really show the query that works OK on MySQL 3.x, and "hangs indefinitely" on
4.1.14, on the same data and server configuration?
[24 Oct 2005 3:13] Chip Yamasaki
I can confirm that I am having the same problem.  After submitting
http://bugs.mysql.com/bug.php?id=14127, I decided to upgrade our servers to 4.1.14
official binaries.  I have been running into this problem regularly since.  A simple
query that joined 2 tables now runs fine in under .2 seconds nearly all of the time, but
once in a while (more frequently all the time), it goes into this "Copying to tmp table"
state and just hangs there.  Same query that runs in .2 seconds or less.

This DEFINITELY appears to be a MySQL problem.  Please don't confuse this with the post
you mentioned, because that person was strayed from the initial compaint in that forum
post.  The initial complaint in that thread was not about the query being slow, it was
about queries being hung in the "Copying to tmp table" state, which MANY people appear to
be seeing on MANY platforms.

Our 4.0 server ran fine for many months with the same query and same data without this
problem.
[24 Oct 2005 3:22] Chip Yamasaki
This should be taken from Severity S5 to "critical", since it effectively prevents other
queries from running indefinitely.  I'm sure all of the people commenting in the forums
about this would concur.
[24 Oct 2005 14:30] Valeriy Kravchuk
What we really need is 100% repetable test case, better - on Linux and simple one.

So, please, send the results of SHOW CREATE TABLE for bd_order and bd_contact tables,
inform about the number of rows in both tables, provide the my.cnf content and any
unusual messages you got, if any.

Explain me exactly what to do to get this hang.
[29 Oct 2005 0:32] Chip Yamasaki
After doing some more searching, I may have been bitten by a problem similar to the bug
verified in http://bugs.mysql.com/bug.php?id=4460.  The question is, how has such an
important issue, the proper execution of subqueries, been alowed to stay open 15 months?

It appears that this same behavior also impacts some other queries where the same table
is used twice in a single query join, either to self or another table.

This needs serious attention.
[13 Nov 2005 11:55] Valeriy Kravchuk
This report should be in an open state since October 24th really...
[17 Nov 2005 16:57] Thomas Deliduka
I am the author of the original thread. I believe most of the things you're looking for
are right in the thread if you have the time to read it.

The query for me that hangs all the time is:

explain SELECT phpbb_topics.topic_title, phpbb_topics.topic_replies,
phpbb_topics.topic_views, phpbb_topics.topic_id, phpbb_users.username,
phpbb_users.user_realname, phpbb_posts.post_id, phpbb_posts.poster_id,
phpbb_posts.post_time FROM ((phpbb_topics INNER JOIN phpbb_posts ON
phpbb_topics.topic_last_post_id = phpbb_posts.post_id) INNER JOIN phpbb_users ON
phpbb_users.user_id = phpbb_posts.poster_id) INNER JOIN phpbb_forums ON
phpbb_forums.forum_id = phpbb_topics.forum_id WHERE (phpbb_forums.auth_view <= 0 ) AND
(phpbb_forums.auth_read <= 0 ) AND phpbb_forums.forum_id NOT IN (5, 6, 2, 11) AND
phpbb_topics.topic_status <> 2 ORDER BY post_time DESC LIMIT 5;

The explain function says
able type possible_keys key key_len ref rows Extra 
phpbb_forums ALL PRIMARY NULL NULL NULL 11 Using where; Using temporary; Using filesort 
phpbb_topics ref forum_id,topic_last_post_id forum_id 2 phpbb_forums.forum_id 2041 Using
where 
phpbb_posts eq_ref PRIMARY,poster_id,p_id PRIMARY 3 phpbb_topics.topic_last_post_id 1 
phpbb_users eq_ref PRIMARY PRIMARY 3 phpbb_posts.poster_id 1 

If you're familiar with phpbb you can get the table struture easily from phpbb.com.

I had this problem originally on 4.0.x that is when I posted the original thread. I made
TONS of updates and changes as suggested in the thread (please read the whole thing!) and
none of the suggestions has fixed my problem. Then I upgraded to 4.1.14-standard and the
problem still exists.

Please give this some serious though. I would also be willing to give the MySQL
development team access to my server if it would help.

Currently to keep my server running I have a process run every 7 minutes to check for
queries that are hung then flush the tables. Then at 3:30am everyday I restart the mysql
server process. That keeps things running. Otherwise without the restart my server would
die within 36-48 hours even with the flush tables directive.  Without flushing tables my
server lasts about 7-10 hours before it's hung.
[3 Dec 2005 17:34] Valeriy Kravchuk
Bug #15368 is marked as a duplicate of this one. Seed the additional details in that
report too. Hanging of phpbb queries on "Copying to tmp table" is noted on Linux 2.6.x,
with latest GA versions of MySQL 5 too.
[5 Dec 2005 15:02] Thomas Deliduka
it is not just phpBB though. I have a website, finerkitchens.com that is going down
constantly because of a join query that is hanging too.  I only gave the phpBB query as
an example because it is the most common.

I am going to post a second time the query that has trouble on finerkitchens.com away
from the public view due to proprietary issues and security.
[5 Dec 2005 17:06] Dâniel Fraga
If it doesn't happen only with phpbb then I presume it's only a mysql bug (phpbb
developers didn't even answer my bug report because fo this, I think).

Is there anyway to compile mysql with some sort of debug support or to run it thru gdb or
some king of debug tool so we can discover what's happening? Thanks.
[6 Dec 2005 21:32] James Harvard
Think I am seeing the same issue (4.1.15 on Mac OS X and Win XP).

I suspect that server resources are a factor (e.g. RAM size - see below for why) which
may mean that it is not always possible to reproduce the issue.

I first saw the problem when server I'm developing on (Win XP) bogged down. SHOW
PROCESSLIST listed 3 threads that had been running for over 3 hours with status 'copying
to tmp table'. This has happened a couple of times since.

The problem queries are variations on a multi-table join involving a couple of large
tables (6.5 & 330 MB data files). I've trimmed the query to take out other tables,
removing ORDER BY, GROUP BY and SUM() elements.

Strangely, as I have tried slimming the query it has stopped hanging on 'copying to tmp
table' and started hanging on 'sending data'.

I dumped a subset of the data from the two tables into a test db (also 4.1.15) on my Mac
OS box (n.b. that all the rows actually involved in the query result were included in the
dump).

The Mac is not as powerful as the Windows box (1.4 GHz G4 with 516MB versus 3 GHz Pentium
with 1GB). A query that was trimmed until it did return on the Windows box (in about 9
seconds) has now been running on my Mac for over 30 minutes and is still going. I think
you'll agree that this difference is disproportionate to the difference between the
'puters.

I only have remote access to the Windows box, but on the Mac it looks like MySQL starts
taking all the CPU time it can. However there is almost no disc access.

Below are the relevant statements & table structures. I can supply data if required.

If you're wondering what I'm trying to do there's an explanation at the end.

CREATE TABLE `data_eu_at_i` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `is_import` tinyint(1) NOT NULL default '1',
  `comcode_id` mediumint(8) unsigned NOT NULL default '0',
  `date_id` tinyint(3) unsigned NOT NULL default '0',
  `country_id` smallint(5) unsigned NOT NULL default '0',
  `value` bigint(20) unsigned NOT NULL default '0',
  `quantity_1` bigint(20) unsigned NOT NULL default '0',
  `quantity_2` bigint(20) unsigned default NULL,
  `c_value` bigint(20) unsigned NOT NULL default '0',
  `c_quantity_1` bigint(20) unsigned NOT NULL default '0',
  `c_quantity_2` bigint(20) unsigned default NULL,
  PRIMARY KEY  (`id`),
  KEY `comcode_id` (`comcode_id`),
  KEY `date_id` (`date_id`),
  KEY `country_id` (`country_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;

CREATE TABLE `comcodes` (
  `comcode_id` mediumint(8) unsigned NOT NULL auto_increment,
  `comcode` varchar(16) NOT NULL default '',
  `impexp` tinyint(3) unsigned NOT NULL default '3',
  `unit_1` varchar(64) NOT NULL default '',
  `unit_2` varchar(64) default NULL,
  `descrip` text NOT NULL,
  `rep_auth_id` tinyint(3) unsigned NOT NULL default '0',
  `year_start` smallint(5) unsigned NOT NULL default '0',
  `year_end` smallint(5) unsigned default NULL,
  PRIMARY KEY  (`comcode_id`),
  UNIQUE KEY `rep_auth_id` (`rep_auth_id`,`impexp`,`comcode`,`year_start`),
  FULLTEXT KEY `descrip` (`descrip`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

# This query returns successfully (275 rows) on the Windows box but stalls on the Mac
# Note that no rows from table alias cc2 are selected
# If the cc2 join is removed the the query returns normally on the Mac
select left(cc.comcode,2) as comcode, d.quantity_1 as mon_qty1
from data_eu_at_i d
inner join comcodes cc on d.comcode_id = cc.comcode_id and cc.comcode regexp
'^(10)......' and cc.rep_auth_id = 1
inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode and cc2.rep_auth_id = 1
where d.date_id = 60;

# This query only has the extra cc2.descrip column but stalls on the Windows box
select left(cc.comcode,2) as comcode, d.quantity_1 as mon_qty1, cc2.descrip
from data_eu_at_i d
inner join comcodes cc on d.comcode_id = cc.comcode_id and cc.comcode regexp
'^(10)......' and cc.rep_auth_id = 1
inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode and cc2.rep_auth_id = 1
where d.date_id = 60;

Here is the result of EXPLAIN SELECT for the two queries above.

Query that returns ok:
+----+-------------+-------+--------+---------------------+-------------+---------+----------------------+--------+--------------------------+
| id | select_type | table | type   | possible_keys       | key         | key_len | ref  
               | rows   | Extra                    |
+----+-------------+-------+--------+---------------------+-------------+---------+----------------------+--------+--------------------------+
|  1 | SIMPLE      | d     | ref    | comcode_id,date_id  | date_id     |       1 | const
               | 138655 | Using where              |
|  1 | SIMPLE      | cc    | eq_ref | PRIMARY,rep_auth_id | PRIMARY     |       3 |
bts_new.d.comcode_id |      1 | Using where              |
|  1 | SIMPLE      | cc2   | ref    | rep_auth_id         | rep_auth_id |       1 | const
               |  21226 | Using where; Using index |
+----+-------------+-------+--------+---------------------+-------------+---------+----------------------+--------+--------------------------+

Query that stalls:
+----+-------------+-------+--------+---------------------+---------+---------+----------------------+--------+-------------+
| id | select_type | table | type   | possible_keys       | key     | key_len | ref      
           | rows   | Extra       |
+----+-------------+-------+--------+---------------------+---------+---------+----------------------+--------+-------------+
|  1 | SIMPLE      | d     | ref    | comcode_id,date_id  | date_id |       1 | const    
           | 138655 | Using where |
|  1 | SIMPLE      | cc2   | ALL    | rep_auth_id         | NULL    |    NULL | NULL     
           |  33323 | Using where |
|  1 | SIMPLE      | cc    | eq_ref | PRIMARY,rep_auth_id | PRIMARY |       3 |
bts_new.d.comcode_id |      1 | Using where |
+----+-------------+-------+--------+---------------------+---------+---------+----------------------+--------+-------------+

Overview:
 * The 'comcodes' table holds classification codes for the data (import/export trade
figures).
 * I'm doing a normal inner join to match the data rows to their classification
information.
 * For some queries I want to group the data not by the normal full 8-digit
classification code, but by the first two digits of the code.
 * I do a second join to the 'comcodes' table to get the description of the 2-digit
classification code. That's when I start to see problems (but only sometimes).

(Not sure if this is relevant: I think the test query that stalled on the Windows box may
have returned after a very long time. I was running the test query using Lasso web
application middleware, which timed out after a few minutes. SHOW PROCESSLIST showed a
thread from Lasso as having 'sleep' status, so I tried to kill it using 'KILL n;' The
status changed to 'killed', but the thread is still there with 'killed' status after at
least an hour. I don't know whether this is a symptom of the MySQL issue (thread stuck in
an infinite loop therefore not checking for kill flag?) or whether it might be Lasso's
handling of problematic MySQL connections. But a thread killed while it is stuck on
'sending data' goes away.)
[6 Dec 2005 21:37] Thomas Deliduka
My server is running 4GB of RAM with over 250GB of disk space available. When the hang
happens I don't see the CPU being maxed out and the RAM levels are fine. In fact, there
is no visible sign by looking at the server status that there is a problem.
[9 Dec 2005 15:33] Valeriy Kravchuk
James: 

Thank you for a simplest test case. So, have you seen 'copying to tmp table' with that
query:

select left(cc.comcode,2) as comcode, d.quantity_1 as mon_qty1, cc2.descrip
  from data_eu_at_i d inner join comcodes cc on d.comcode_id = cc.comcode_id and
cc.comcode regexp '^(10)......' and cc.rep_auth_id = 1 
 inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode and cc2.rep_auth_id = 1 
 where d.date_id = 60;

If yes, please, try to provide the smallest set of rows for these 2 tables that will
demonstrate it. At leaset, inform how many rows needed and provide a couple of examples.

All: have you seen any disk activity during this hang? Send the SHOW STATUS results
before the problematic query and after some time hanging. Thank you for your patience.
[11 Dec 2005 16:41] James Harvard
I think I understand where things are going wrong now.

I was RTFMing about EXPLAIN SELECT and how MySQL performs joins, and I remembered
noticing that the table order in EXPLAIN SELECT's output was different for my two test
queries. Looking at the output now I can see that for some reason MySQL has switched the
order of the join so that it is attempting to join cc2 (table alias for second join to
'comcodes') to the data table before cc (alias for first join to 'comcodes').

To quote the manual:
"MySQL reads a row from the first table, then finds a matching row in the second table,
then in the third table, and so on. When all tables are processed, MySQL outputs the
selected columns and backtracks through the table list until a table is found for which
there are more matching rows. The next row is read from this table and the process
continues with the next table."

Just a reminder of the join:

select left(cc.comcode,2) as comcode,
 d.quantity_1 as mon_qty1, cc2.descrip
from data_eu_at_i d
inner join comcodes cc on d.comcode_id = cc.comcode_id
  and cc.comcode regexp '^(10)......' and cc.rep_auth_id = 1
inner join comcodes cc2 on left(cc.comcode,2) = cc2.comcode
  and cc2.rep_auth_id = 1
where d.date_id = 60;

If my understanding of MySQL's join procedure is correct, and if the query optimizer is
ordering the tables for the join as d - cc2 -cc, then MySQL will try to join every row of
the 'comcodes' table cc2 to the data table, because cc2 depends on cc and has no direct
relationship with the data table in its join condition. Then it will have to join each
join product row to 'comcodes' again. Many, many rows.

I think my theory is proved by testing the query with the STRAIGHT JOIN command, which
forces the tables to be joined in the order d - cc - cc2. The query then returns
normally.

Therefore I don't think my issue (emphasis on 'my') is a stalling bug, but rather the
query optimizer performing badly and causing MySQL to perform the join in a very
inefficient way. (I think this also explains why some queries performed differently with
a sub-set of the data - MySQL was ignoring some indices because they had low cardinality
when only a small sub-set of the data was being used for testing.)

Is there enough info here for you to check whether the query optimizer can be improved to
avoid this problem, Valeriy?

That said, I'm not sure all the comments here are reporting the same issue that I was
seeing. I will also check further to see whether my 'real world' instances of this issue
have the same solution.
[12 Dec 2005 21:12] Thomas Deliduka
I don't know if this is a question of query optimization for me anymore. I know that I am
the originator of this bug. Now I only found this today so I don't know if my problem is
fixed or not. I noticed on one of my servers there was one particular query that was
causing all the problems.  This one:

select DISTINCT xno_items.itemid, prod_ref, title, short_title, subtitle, promo, author,
maincat, inventory, price, strikeprice, retprice, reteprice, saleperc, onsale, clearance,
blockp, saledt, xno_items.added, updated, thumb_image, xno_manufacturers.mfg_name
from xno_items
left join xno_item_cat on xno_items.itemid = xno_item_cat.itemid
left join xno_categories on xno_item_cat.catid = xno_categories.catid
left join xno_manufacturers on xno_items.mfgid = xno_manufacturers.mfgid
LEFT JOIN xno_prod_index ON xno_items.itemid = xno_prod_index.itemid
where xno_items.active = 1 AND ((xno_prod_index.word ='thermometer')) AND
xno_categories.active = 1 order by xno_prod_index.quan DESC, xno_prod_index.density DESC,
xno_items.ordered DESC, xno_items.title;

I change the join order to this:
LEFT JOIN xno_prod_index ON xno_items.itemid = xno_prod_index.itemid
left join xno_item_cat on xno_items.itemid = xno_item_cat.itemid
left join xno_categories on xno_item_cat.catid = xno_categories.catid
left join xno_manufacturers on xno_items.mfgid = xno_manufacturers.mfgid

and instead of 9 seconds to execute the query, it took 0.14 seconds.

So far today the server has been humming along without any hangs problems.
[14 Dec 2005 2:22] James Harvard
What output do you get from EXPLAIN SELECT {problem query here} ?

(Maybe we should start a new forum / list thread for this.)
[14 Dec 2005 15:40] Thomas Deliduka
Before the switching of the joins:

 id  select_type  table              type    possible_keys   key      key_len  ref       
                               rows  Extra                           

  1  SIMPLE       xno_items          ref     PRIMARY,active  active         4  const     
                               1647  Using temporary; Using filesort 
  1  SIMPLE       xno_item_cat       index   NULL            PRIMARY        8  NULL      
                               5152  Using index                     
  1  SIMPLE       xno_categories     eq_ref  PRIMARY,active  PRIMARY        4 
finerkitchens_com.xno_item_cat.catid         1  Using where                     
  1  SIMPLE       xno_manufacturers  eq_ref  PRIMARY         PRIMARY        4 
finerkitchens_com.xno_items.mfgid            1                                  
  1  SIMPLE       xno_prod_index     eq_ref  PRIMARY         PRIMARY       64 
finerkitchens_com.xno_items.itemid,const     1  Using where; Distinct           

After switching the joins:
 id  select_type  table              type    possible_keys   key      key_len  ref       
                               rows  Extra                           

  1  SIMPLE       xno_items          ref     PRIMARY,active  active         4  const     
                               1647  Using temporary; Using filesort 
  1  SIMPLE       xno_prod_index     eq_ref  PRIMARY         PRIMARY       64 
finerkitchens_com.xno_items.itemid,const     1  Using where                     
  1  SIMPLE       xno_item_cat       index   NULL            PRIMARY        8  NULL      
                               5152  Using index                     
  1  SIMPLE       xno_categories     eq_ref  PRIMARY,active  PRIMARY        4 
finerkitchens_com.xno_item_cat.catid         1  Using where                     
  1  SIMPLE       xno_manufacturers  eq_ref  PRIMARY         PRIMARY        4 
finerkitchens_com.xno_items.mfgid            1                                  

Is there a "cleaner" way of presenting an explain here?
[15 Dec 2005 15:07] Valeriy Kravchuk
Thomas,

> Is there a "cleaner" way of presenting an explain here?

You may try to send your explain using \G:

mysql> explain select * from t\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra:
1 row in set (0.00 sec)

Or, simply, put the results in file and upload it.
[4 Jan 2006 14:52] Dâniel Fraga
I upgraded to mysql 5.0.18 and phpbb 2.0.19 and, until now, I didn't see this bug
happening anymnore. Was it fixed? Anyway, I commented so people upgrade too and test if
this bug has gone (or at least it doesn't happen anymore). Thanks.
[10 Jan 2006 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Jan 2006 21:55] Valeriy Kravchuk
Dear reporters,

This bug report contains several reports in one, really. So, please, note, that some
related bugs (like bug #4460, about inefficient subqueries optimization), or inefficiant
optimization of [NOT] IN  (1, 2, 3), incorrect optimization of some JOINs (not using
indexes) etc. are well-known, and shold be fixed later. 

What is common amongh all the reports here is that because of inefficient optimization
SELECTs need to use (large) temporary tables and, thus, became extrimely inefficient.

If somebody of you feels that his specific query is still optimized improperly in latest
versions, and it has no subquery or [NOT] IN clause, please, submit new bug reports (but
check tha tdatabase for existing similar ones).

Thank you for your patience. We really need to split several problems mentioned here.
[20 Jan 2006 22:16] Thomas Deliduka
I am not sure what my problem is. I do know that the suggestion to move to 5.0.19 has
helped a little but I still end up having problems with my postnuke/phpbb site (only site
running on the 5.x server right now) although the problems are much further apart. Instead
of once a day it's like once every 4 days.

I was able to optimize the query as y'all know but I still have a problem when there are
multiple search phrases like this:

select DISTINCT xno_items.itemid, prod_ref, title, short_title, subtitle, promo, author,
maincat, inventory, price, strikeprice, retprice, reteprice, saleperc, onsale, clearance,
blockp, saledt, xno_items.added, updated, thumb_image, (select xno_manufacturers.mfg_name
from xno_manufacturers where xno_manufacturers.mfgid = xno_items.mfgid) as mfg_name from
xno_items inner join xno_prod_index on xno_items.itemid = xno_prod_index.itemid AND
((xno_prod_index.word ='port') OR (xno_prod_index.word ='glasses')) inner join
xno_item_cat on xno_items.itemid = xno_item_cat.itemid inner join xno_categories on
xno_item_cat.catid = xno_categories.catid AND xno_categories.active = 1 where
xno_items.active = 1 order by xno_prod_index.quan DESC, xno_prod_index.density DESC,
xno_items.price DESC, xno_items.title
[20 Jan 2006 22:39] Valeriy Kravchuk
Thomas,

That 

((xno_prod_index.word ='port') OR (xno_prod_index.word ='glasses')) 

can be a problem here. Do you have index on xno_prod_index.word?
[20 Jan 2006 23:23] Thomas Deliduka
There is a key on it, however it's coupled as a primary key with "itemid" should I add an
additional index just to that field?
[21 Jan 2006 10:06] Valeriy Kravchuk
If 'word' is not a first column in the primary key, then yes, additional index may help. 

But, please, stop asking questions on this bug report. Especially those, that can be
qualified as free support requests. If you think that you had found a bug, just create a
new report.
[3 Mar 2006 9:28] Michael Fuita
Just for the developers, this is also happening in MySQL 4.1.18 and has been crashing my
site continuously for days and days non stop. I'm using phpBB 2.0.19 and many of the
queries run fine and it's just out of the blue that some fast queries will halt at
copying to tmp table.
[24 May 2006 17:38] John McNally
I'm seeing the problem of hanging in the "Copying to tmp table" state in version 4.0.15 as
well.

The table contains about 100,000 rows and its definition is:

CREATE TABLE FOO
(
        A INTEGER NOT NULL,
        B INTEGER NOT NULL,
        C INTEGER NOT NULL,
        NAME VARCHAR (255) NOT NULL,
        FLAG_1 INTEGER (1) default 1 NOT NULL,
        FLAG_2 INTEGER (1) default 0 NOT NULL,
        ORDER INTEGER default 0 NOT NULL,
        FLAG_3 INTEGER (1) default 0 NOT NULL,
        FLAG_4 INTEGER (1) default 0 NOT NULL,
        FLAG_5 INTEGER (1) default 0 NOT NULL,
    PRIMARY KEY(A,B,C),
    FOREIGN KEY (B) REFERENCES YTBL (B),
    FOREIGN KEY (A) REFERENCES XTBL (A),
    FOREIGN KEY (C) REFERENCES ZTBL (C),
    INDEX IX_NAME (NAME),
    INDEX IX_FLAG2 (FLAG_2),
    INDEX IX_FLAG3 (FLAG_3),
    INDEX IX_CBA (C, B, A),
    INDEX IX_A (A)
);

The query is a very simple grouping sql:

SELECT count(FOO.B), FOO.B
FROM FOO
WHERE FOO.C IN (50 IDs)
AND FOO.B IN (20 IDs)
AND FOO.A IN (300 IDs)
GROUP BY FOO.B

The number of IDs in each of the IN clauses varies and occasionally the query gets stuck
in the "Copying to tmp table" state.  This is the most serious part of the problem and
seems to be the same problem described in this issue, but there is another problem which
I'll briefly mention:

EXPLAIN on the query gives

+-------+-------+---------------------+---------+---------+------+--------+-----------------------------------------------------------+
| table | type  | possible_keys       | key     | key_len | ref  | rows   | Extra        
                                            |
+-------+-------+---------------------+---------+---------+------+--------+-----------------------------------------------------------+
| FOO   | range | PRIMARY,IX_CBA,IX_A | PRIMARY |      12 | NULL | 369740 | Using where;
Using index; Using temporary; Using filesort |
+-------+-------+---------------------+---------+---------+------+--------+-----------------------------------------------------------+
1 row in set (2.98 sec)

Note that the table only has 100k rows.  However, if I modify the query to force the use
of IX_A index:

SELECT count(FOO.B), FOO.B
FROM FOO force index (IX_A)
WHERE FOO.C IN (50 IDs)
AND FOO.B IN (20 IDs)
AND FOO.A IN (300 IDs)
GROUP BY FOO.B

the explain output changes to:

+-------+-------+---------------+---------+---------+------+-------+-------------+
| table | type  | possible_keys | key    | key_len | ref  | rows  | Extra       |
+-------+-------+---------------+--------------+---------+------+-------+-------------+
| FOO   | range | IX_A            | IX_A  |       4 | NULL | 16455 | Using where |
+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.00 sec)

The time to execute the query also drops about 2 orders of magnitude.

What would cause the query optimizer to choose to use a temporary table and filesort when
it can use an index to avoid that?
This probably isn't the place to ask and answer that question, but just wanted it to be
clear that the use of the temporary table itself is not desirable, but the fact that the
query optimizer chooses to do so helps to trigger this bug.
[24 May 2006 18:41] John McNally
Great, I messed up when sanitizing the ddl.  The index IX_A should be IX_B (B).  The index
is on the same column that is is being grouped.
[1 Sep 2006 16:50] Matt Lavallee
On Windows, 4.1.14, I resolved this issue by lowering thread_concurrency.  I have a
dual-dual-core box, but any thread concurrency above 2 would cause infinite "copying to
tmp table" statuses on ANY join.
[12 Sep 2006 5:54] Rodolfo Gonzalez
I'm having this very same problem with MySQL 5.0.24a on Slackware Linux i386. I'm a bit
worried when I read that this problem has 1+ year without solution (setting the
concurrent threads number to 2 didn't help). I have notice that this problem is more
frequent when using DISTINCT or GROUP BY.
[3 Nov 2006 10:16] Kjetil Jensen
Hello,

We have a issue with an customer, the server gets very slow, and the cpu isnt being used
and not the memory either, so i think its the disk.
We are today running a SQL server with Dual Xeons 3.2Ghz, 2GB Memory, and the MySQL
version is: 4.1.21

This is the following thing so hanging the server:
--------------------------------------------------------------------
| 16578 | oxtract_oxtract      | web01.proinet.se:43370 | oxtract_oxtract   | Query   |
483  | Copying to tmp table on disk | SELECT * FROM resultat,persondata,klubbdata
   WHERE
   resultat.klass_id = '30260'
   AND persondata.per |
--------------------------------------------------------------------

And i havent found any fix for this? Is this his fault or is it an bug in MySQL? 
I have warned the customer, but he is also gonna test on his server, but i really want to
know how if its possible to fix this or what actions i can do atleast?
[3 Apr 2007 23:57] Michael Ash
I've been here for 14.5 days. I don't mind waiting, but I would like some assurance that
the query is moving forward rather than simply hanging.

Here is the explain for a query that I am attempting. As you can see, I am left joining
one large table (1.6 million rows of which I am subsetting about 200,000) and one very
large table (1 billion rows). The query also involves a SUM and GROUP BY.

--------------
EXPLAIN
SELECT
stusab,logrecno,necma,pmsa,msacmsa,ma,county,tract,tract_cob,blkgrp,x,y,arealand,areawatr,SUM(ToxConc)
AS ToxConc
FROM sf3.geo FORCE INDEX (x,logrecno)
LEFT JOIN grids2.cell_2003 FORCE INDEX (X) USING (x,y)
WHERE sumlev="150"
GROUP BY stusab,logrecno
--------------

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE geo index NULL logrecno 15 NULL 1594259 Using where
1 SIMPLE cell_2003 ref X X 5 sf3.geo.x 966422338
2 rows in set

Bye

Should I restructure the query to subset the 200,000 from the 1.6 million into a new
table first and then join and group the 1 billion?

Any other tips? Again, I don't mind a long wait, but I would like some idea of whether or
not the query will run. Thanks.

Here is the current output of 'show processlist' for the query:

Id User Host db Command Time State Info
81343 mash localhost:4347 NULL Query 1248470 Sending data SELECT
stusab,logrecno,necma,pmsa,msacmsa,ma,county,tract,tract_cob,blkgrp,x,y,arealand,areawatr,SUM
[2 May 2007 8:17] Jesse Proudman
We are also having this issue.  Please let us know what details we can provide.
[5 May 2007 8:27] Valeriy Kravchuk
All reporters: 

Please, re-read my comment dated [20 Jan 2006 21:55] to this bug report before sending
further comments/reporting new similar bugs. 

Many of you may find this interesting also:

http://jcole.us/blog/archives/2007/02/08/progress-in-mysql-process-list/
[14 Nov 2008 8:15] Robert Freeland
This is STILL a bug in the latest version of MySQL. Queries involving complex joins on
large tables can still trigger a "copying to tmp table" status that can run for DAYS (or
longer) without finishing. The only dependable workaround is to break the query into
smaller pieces (e.g. - by filtering on date range).

IMHO, this goes way beyond a query optimization problem. Even allowing that the optimizer
chooses to use a temporary table when it doesn't need to, the actual algorithm for USING
that temporary table seems almost inconceivably SLOW. I had a process tonight running for
3 hours, and it only managed to write 45MB to the temporary table, all while using almost
zero CPU. I can copy that entire table in 15 seconds, so the process isn't CPU bound.
What can it possibly be doing??

This bug has been open for THREE YEARS! Time to fix it already.
[14 Nov 2008 8:28] Robert Freeland
I found the bug report at http://bugs.mysql.com/bug.php?id=28257 that seems to be a
continuation of this one. It suggests that the bug may be fixed in the alpha v6.0 builds.
So just to clarify, when I said above that the bug still exists in the "latest version", I
meant the latest RC version, which is 5.1.29.