Bug #1363 Not expected behaviour of CREATE TEMPORARY TABLE...SELECT
Submitted: 21 Sep 2003 9:18 Modified: 27 Sep 2008 9:17
Reporter: Apachez Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.15a OS:Linux (Linux 2.4.21-SMP-686 (Debian))
Assigned to: CPU Architecture:Any

[21 Sep 2003 9:18] Apachez
Description:
When doing a CREATE TEMPORARY TABLE...SELECT the columntypes that the optimizer chooses to use is not optimized in terms of how the source columns looks like.

In my case I am trying to optimize my queries by in this case merge a "CREATE TEMPORARY TABLE (colum def.)" and "INSERT...SELECT" into one single "CREATE TEMPORARY TABLE...SELECT" query.

This means by (for example) going from:

CREATE TEMPORARY TABLE Tmp_News1 (
`NewsThread` int UNSIGNED NOT NULL DEFAULT '0',
`NewsId` int UNSIGNED NOT NULL DEFAULT '0'
) TYPE=HEAP;

INSERT Tmp_News1 (NewsThread, NewsId)
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;

into:

CREATE TEMPORARY TABLE Tmp_News1 TYPE=HEAP
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;

If i perform a "DESCRIBE Tmp_News1" in the first case (with two queries) i get the following output, which is expected:

----------------------------------------------------
|Field     |Type            |Null|Key|Default|Extra|
----------------------------------------------------
|NewsThread|int(10) unsigned|    |   |0      |     |
----------------------------------------------------
|NewsId    |int(10) unsigned|    |   |0      |     |
----------------------------------------------------

and in the second case (using one query instead of two) the output is not expected behaviour:

-------------------------------------------------------
|Field     |Type               |Null|Key|Default|Extra|
-------------------------------------------------------
|NewsThread|int(10) unsigned   |    |   |0      |     |
-------------------------------------------------------
|NewsId    |bigint(20) unsigned|YES |   |(NULL) |     |
-------------------------------------------------------

The sourcetable (Tbl_News) have the following "DESCRIBE Tbl_News" output (I strip off not related columns):

----------------------------------------------------
|Field     |Type            |Null|Key|Default|Extra|
----------------------------------------------------
|NewsThread|int(10) unsigned|    |PRI|0      |     |
----------------------------------------------------
|NewsId    |int(10) unsigned|    |PRI|0      |     |
----------------------------------------------------
|NewsDate  |int(10) unsigned|    |MUL|0      |     |
----------------------------------------------------

In my opinion the optimizer should use same columntypes and setup (NOT NULL DEFAULT '0' etc) as the sourcetable which the SELECT is being performed at mainly because going from "INT UNSIGNED NOT NULL" which the source is in into "BIGINT UNSIGNED NULL" will have at least a theoretical (and most likely a practical (visible)) performance decrease.

I can understand if the optimizer chooses a larger columntype if I would do a SUM() but in this case I do a MAX() which cannot be larger than the sourcetable and therefor its my opinion that the optimizer should create the temporary table using the sourcetable columntype and setup for the column in question.

How to repeat:
CREATE TABLE Tbl_News (
`NewsThread` int UNSIGNED NOT NULL DEFAULT '0',
`NewsId` int UNSIGNED NOT NULL DEFAULT '0',
`NewsDate` int UNSIGNED NOT NULL DEFAULT '0'
PRIMARY KEY (`NewsThread`, `NewsId`),
INDEX Ix_News (`NewsDate`)
) TYPE=MYISAM;

Fill Tbl_News with data.

(correct behaviour):

DROP TABLE IF EXISTS Tmp_News1;

CREATE TEMPORARY TABLE Tmp_News1 (
`NewsThread` int UNSIGNED NOT NULL DEFAULT '0',
`NewsId` int UNSIGNED NOT NULL DEFAULT '0'
) TYPE=HEAP;

INSERT Tmp_News1 (NewsThread, NewsId)
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;

DESCRIBE Tbl_News;
DESCRIBE Tmp_News1;

(not correct behaviour in my opinion):

DROP TABLE IF EXISTS Tmp_News1;

CREATE TEMPORARY TABLE Tmp_News1 TYPE=HEAP
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;

DESCRIBE Tbl_News;
DESCRIBE Tmp_News1;
[23 Sep 2003 4:55] Alexander Keremidarski
Thank you for very detailed description of the problem.

This is deviation from ANSI standard which will be fixed in future MySQL releases.

Side note: 4.1 implements CREATE TABLE t1 LIKE t2; syntax which preserves original structure along with indexes.
[27 Sep 2008 9:17] Konstantin Osipov
Can't repeat this in 5.1.
The bug must have been fixed in scope of some other bug report:
mysql> DESCRIBE Tbl_News;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| NewsThread | int(10) unsigned | NO   | PRI | 0       |       | 
| NewsId     | int(10) unsigned | NO   | PRI | 0       |       | 
| NewsDate   | int(10) unsigned | NO   | MUL | 0       |       | 
+------------+------------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

mysql> DESCRIBE Tmp_News1;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| NewsThread | int(10) unsigned | NO   |     | 0       |       | 
| NewsId     | int(10) unsigned | NO   |     | 0       |       | 
+------------+------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)