Bug #14303 Internal temporary tables consume too much memory
Submitted: 25 Oct 2005 18:20 Modified: 13 Jun 2007 7:37
Reporter: Andre Timmer Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.0.26-BK, 5.0.18 OS:Any (Solaris)
Assigned to: CPU Architecture:Any
Tags: bfsm_2007_02_15

[25 Oct 2005 18:20] Andre Timmer
Description:
The following statement use large tables and the result is also large.
When running it consumed all of our swapspace, more than 3 Gig.

select * from
(
select col1, col2, .. coln from table1
where ...
union all
select col1, col2, .. coln from table1
where ...
) xx
group by col1, col2, .. coln-1

The reason was that the datatypes on table1 where larger than necessairy.
Many columns were of datatype varchar(80) and could be smaller.
- 

How to repeat:
After setting the datatypes to appropriate lengths it was solved.

This should not be necessairy, the table was analyzed and MySQL should have choosen a internal column length accordingly.

Suggested fix:
See "How to repeat".
[26 Oct 2005 7:02] Valeriy Kravchuk
Thank you for a problem report. 

Do you really use MySQL 4.1.3? It is really old and I recomment you to upgrade to 4.1.15. Then, please, try to repeat the problem and, if it still exists, describe what you expected to get and what you got really. Please, include the SHOW CREATE TABLE results, inform about the number of rows in each tables used, give the real statement showing the problem. Your my.cnf content, OS version and hardware description may be useful too.
[27 Nov 2005 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[4 Sep 2006 17:01] Andre Timmer
No feedback was asked.
[5 Sep 2006 9:39] Valeriy Kravchuk
As I already asked you in my previous comment, please, upgrade to the latest 4.1.x version, 4.1.21 now, try to repeat the problem and, if it still exists,
describe what you expected to get and what you got really. Please, include the
SHOW CREATE TABLE results, inform about the number of rows in each tables used,
give the real statement showing the problem. Send your my.cnf content, OS version and describe your hardware.

These is feedback I asked you about. We had not got any of it, yet.
[5 Sep 2006 12:03] Andre Timmer
My apologies Valeriy.

We upgraded to 5.0.18 and are using Solaris (SunOS boss25 5.8 Generic_117350-28 sun4u sparc SUNW,Ultra-60).

A test table is filled with 10.000 rows.
It has 2 columns of datatype varchar(200).

Max length of data is however 16 characters.
MySQL knows this because the table is analyzed and has not been changed since.

select distinct * from
(
select col1, col2 from test
union all
select col1, col2 from test
) zz;

Finishes in 2.5s.

When length of columns is set 16 it finishes in 1.3s.

---------------------

During query execution MySQL seems to reserve in query data structures 200 pos per column and not 16 bytes.

I don't know if it's reasonable of the database to use this data. Just want to point this out.

---------------------
- Script used:
---------------------

drop table if exists test;

create table test (col1 varchar(200), col2 varchar(200));
-- create table test (col1 varchar(16), col2 varchar(16));

drop procedure if exists p_fill;

delimiter //

create procedure p_fill()
begin
  declare v_cnt int default 0;
  
  l1: 
  loop
     set v_cnt = v_cnt+1;
     
     insert into test (col1, col2) values ('String length 16', 'String length 16');
     
     if (v_cnt >= 1000*10) then
        leave l1;
     end if;
  end loop;
  
  commit;
end;
//

delimiter ;

call p_fill();

analyze table test;

select count(*) from test;

drop procedure p_fill;

select distinct * from
(
select col1, col2 from test
union all
select col1, col2 from test
) zz;
[2 Oct 2006 13:13] Valeriy Kravchuk
Verified just as described in your last comment, with 5.0.26-BK on Linux. I've also got 0.27 sec for varchar(200) avs. 0.12 sec for varchar(16) on MyISAM table with the same data (and same data syze according to SHOW TABLE STATUS). 

Looks like temorary table used during query execution is allocated with STATIC rows based on maximum length of data. Or, may be, the reason is different. Anyway, it should be at least documented.
[16 Feb 2007 0:55] James Day
I suggest: bug that there isn't a working limit on the size of a temporary table on disk. Bug #4291 is related, should read it also. If we don't currently have one requesting the feature of limiting size of temporary tables on disk we should have one. I've renamed this bug from "Too much memory consumption" to "Need limit of temporary table size on disk. Too much memory consumption", which may be a duplicate?

See ORDER BY OPTIMIZATION at http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html and use max_length_for_sort_data to limit the row size. max_length_for_sort_data limitation is a useful per-session setting for queries that are known to produce a lot of results with large rows. If it's triggered, just the primary key and fields to sort by will be stored. Those will be smaller but could still be very long.

I've asked the documentation team to be more explicit and mention that the buffer size for all rows is the same, so large columns with a lot of rows will cause a lot of RAM use even if a column normally has only a little data in it.
[16 Feb 2007 11:56] Andre Timmer
The initial reason to file this report was that the statement first mentioned uses hugh amounts of memory / disk and could run much faster and efficiently.

So limiting the size of a temporary table is maybe a different issue and anyway doesn't solve the 'problem'.
[12 Jun 2007 20:53] Timothy Smith
Andre, other interested parties:

This bug has been misinterpreted a few times, and I want to clarify its current position.

This is a valid feature request, to reduce the memory used for internal temporary tables which are created during certain operations, such as GROUP BY, some ORDER BY, when using VIEWs, etc.  Especially when large VARCHAR (or similar) columns are sparsely filled, there can be extraneous memory consumption.

However, it's a constraint of the current design which has been present from the beginning, and will not be fixed in the near future.  The internal temporary tables are designed with fixed-length records, and the rest follows from that design.

Any comments or suggestions regarding limiting the size of temporary tables should be directed to bug #4291, not to this bug.

Best regards,

Timothy
[13 Jun 2007 7:37] Valeriy Kravchuk
So, this is a verified, reasonable feature request.
[28 Jul 2007 13:14] MySQL Verification Team
Timothy, Valeriy, Timour, Sergey,

Please check whether this bug is a duplicate of #29582.
[25 Sep 2007 19:00] Mark Callaghan
Is padding variable length columns to their max length limited to HEAP temporary tables or done for both HEAP and MyISAM?

Assuming an index is always maintained on these temp tables, is this only done for variable length fields used in the index key? Only for fields not used in the index key? Or for both?