Description:
When doing a CREATE TEMPORARY TABLE...SELECT the columntypes that the optimizer chooses to use is not optimized in terms of how the source columns looks like.
In my case I am trying to optimize my queries by in this case merge a "CREATE TEMPORARY TABLE (colum def.)" and "INSERT...SELECT" into one single "CREATE TEMPORARY TABLE...SELECT" query.
This means by (for example) going from:
CREATE TEMPORARY TABLE Tmp_News1 (
`NewsThread` int UNSIGNED NOT NULL DEFAULT '0',
`NewsId` int UNSIGNED NOT NULL DEFAULT '0'
) TYPE=HEAP;
INSERT Tmp_News1 (NewsThread, NewsId)
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;
into:
CREATE TEMPORARY TABLE Tmp_News1 TYPE=HEAP
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;
If i perform a "DESCRIBE Tmp_News1" in the first case (with two queries) i get the following output, which is expected:
----------------------------------------------------
|Field |Type |Null|Key|Default|Extra|
----------------------------------------------------
|NewsThread|int(10) unsigned| | |0 | |
----------------------------------------------------
|NewsId |int(10) unsigned| | |0 | |
----------------------------------------------------
and in the second case (using one query instead of two) the output is not expected behaviour:
-------------------------------------------------------
|Field |Type |Null|Key|Default|Extra|
-------------------------------------------------------
|NewsThread|int(10) unsigned | | |0 | |
-------------------------------------------------------
|NewsId |bigint(20) unsigned|YES | |(NULL) | |
-------------------------------------------------------
The sourcetable (Tbl_News) have the following "DESCRIBE Tbl_News" output (I strip off not related columns):
----------------------------------------------------
|Field |Type |Null|Key|Default|Extra|
----------------------------------------------------
|NewsThread|int(10) unsigned| |PRI|0 | |
----------------------------------------------------
|NewsId |int(10) unsigned| |PRI|0 | |
----------------------------------------------------
|NewsDate |int(10) unsigned| |MUL|0 | |
----------------------------------------------------
In my opinion the optimizer should use same columntypes and setup (NOT NULL DEFAULT '0' etc) as the sourcetable which the SELECT is being performed at mainly because going from "INT UNSIGNED NOT NULL" which the source is in into "BIGINT UNSIGNED NULL" will have at least a theoretical (and most likely a practical (visible)) performance decrease.
I can understand if the optimizer chooses a larger columntype if I would do a SUM() but in this case I do a MAX() which cannot be larger than the sourcetable and therefor its my opinion that the optimizer should create the temporary table using the sourcetable columntype and setup for the column in question.
How to repeat:
CREATE TABLE Tbl_News (
`NewsThread` int UNSIGNED NOT NULL DEFAULT '0',
`NewsId` int UNSIGNED NOT NULL DEFAULT '0',
`NewsDate` int UNSIGNED NOT NULL DEFAULT '0'
PRIMARY KEY (`NewsThread`, `NewsId`),
INDEX Ix_News (`NewsDate`)
) TYPE=MYISAM;
Fill Tbl_News with data.
(correct behaviour):
DROP TABLE IF EXISTS Tmp_News1;
CREATE TEMPORARY TABLE Tmp_News1 (
`NewsThread` int UNSIGNED NOT NULL DEFAULT '0',
`NewsId` int UNSIGNED NOT NULL DEFAULT '0'
) TYPE=HEAP;
INSERT Tmp_News1 (NewsThread, NewsId)
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;
DESCRIBE Tbl_News;
DESCRIBE Tmp_News1;
(not correct behaviour in my opinion):
DROP TABLE IF EXISTS Tmp_News1;
CREATE TEMPORARY TABLE Tmp_News1 TYPE=HEAP
SELECT NewsThread, MAX(NewsId) AS NewsId
FROM Tbl_News
WHERE NewsDate > 1060000000
GROUP BY NewsThread;
DESCRIBE Tbl_News;
DESCRIBE Tmp_News1;