Bug #21798 memory leak during query execution with subquery in column list using a function
Submitted: 23 Aug 2006 17:36 Modified: 25 Oct 2006 19:21
Reporter: Harrison Fisk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.0.24 OS:Linux (Linux 2.6.15 (Ubuntu 6.06 LTS))
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: memory leak

[23 Aug 2006 17:36] Harrison Fisk
Description:
When you execute a query that uses a function defined by a user in a subquery in the column list, it causes memory consumption to grow dramatically while the query is being executed.  

An example query which causes this issue is in the format:

SELECT a, (SELECT fn_t1(b, a) FROM t1 i1 WHERE i1.b=t1.a ORDER BY RAND() LIMIT 1) AS a FROM t1;

MySQL shouldn't use arbitrary large amounts of memory during execution.

How to repeat:
I've created a script which creates and populates a table to demostrate the problem:

delimiter ;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
        a VARCHAR(250),
        b INT auto_increment,
        PRIMARY KEY (b)
) ENGINE=MyISAM;

DROP PROCEDURE IF EXISTS pop_table;

DELIMITER $$
CREATE PROCEDURE pop_table (IN rows INT)
BEGIN
        DECLARE id, rand_num, m, n INT DEFAULT 0;

        WHILE m < rows DO
                SET rand_num = FLOOR(rand() * 1000);
                INSERT INTO t1 (a) values (rand_num);
                SET m = m +1;
        END WHILE;

END$$

DROP FUNCTION IF EXISTS fn_t1$$

CREATE FUNCTION fn_t1 (in_val INT, big_char VARCHAR(250)) returns
varchar(250)
BEGIN
  DECLARE foo VARCHAR(250);
  return big_char;
END$$

DELIMITER ;

/* this will take a bit of time to load a million rows into the table, ~45 seconds
 * on my system
 */
call pop_table(1000000);

/* 
 * watch top or similar while this is running, mysqld grew from ~100M to
 * 500M while running this statement before it finished
 * Grows much bigger if the table is larger
 */
SELECT a, (SELECT fn_t1(b, a) FROM t1 i1 WHERE i1.b=t1.a ORDER BY RAND() LIMIT 1) AS a FROM t1;

Suggested fix:
Do not allocate so much memory.

As a workaround, attempt to rewrite the statement to not use the function.
[12 Sep 2006 12:20] Georgi Kodinov
Thank you for taking the time to report a problem.  Unfortunately you are not using a current version of the product you reported a problem with -- the problem might already be fixed. Please download a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions, please change the version on this bug report to the version you tested and change the status back to "Open".  Again, thank you for your continued support of MySQL.

Additional info: 
I have tried the steps to reproduce using the latest 5.0.25 BK. 
During the SELECT mysqld process size remained stable. What has grown however is the mysql process  I was running the SELECT with. It went from 36MB (Virtual size) to ~340 MB on my MacOSX/Intel.
This is due to the fact that mysql tried to cache the results before displaying them.
Turning off the resultset cache (--quick) option of mysql caused both processes (mysql and mysqld) to remain at stable VM sizes.
[12 Sep 2006 13:25] Georgi Kodinov
Moving the bug back to "In Progress" - going to test it with older versions.
[14 Sep 2006 8:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11918

ChangeSet@1.2261, 2006-09-14 11:20:41+03:00, gkodinov@macbook.gmz +8 -0
  Bug#21798: memory leak during query execution with subquery in column
              list using a function
  When executing dependent subqueries they are re-inited and re-exec() for 
  each row of the outer context.
  Care must be taken not to allocate anything into the thread's memory pool
  while reiniting that.
  All such items mush be cached and reused because the thread's memory pool
  is freed at the end of the whole query.
  Most of the stuff that is allocated in JOIN::exec is taken care of (e.g.
  JOIN::exec_tmp_table2), but there are some that remain.
  Note that they must be cached and reused even for queries that are not 
  otherwise cacheable. 
  So we provide a structure (initialized and maintained as an unit) to 
  store references to the items that need to be cached.
[18 Sep 2006 16:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12150

ChangeSet@1.2261, 2006-09-18 19:03:21+03:00, gkodinov@macbook.gmz +8 -0
  Bug#21798: memory leak during query execution with subquery in column
              list using a function
  When executing dependent subqueries they are re-inited and re-exec() for 
  each row of the outer context.
  The cause for the bug is that during subquery reinitialization/re-execution,
  the optimizer reallocates JOIN::join_tab, JOIN::table in make_simple_join()
  and the local variable in 'sortorder' in create_sort_index(), which is
  allocated by make_unireg_sortorder().
  Care must be taken not to allocate anything into the thread's memory pool
  while re-initializing query plan structures between subquery re-executions.
  All such items mush be cached and reused because the thread's memory pool
  is freed at the end of the whole query.
  Note that they must be cached and reused even for queries that are not 
  otherwise cacheable because otherwise it will grow the thread's memory 
  pool every time a cacheable query is re-executed. 
  So we provide a structure (initialized and maintained as an unit) to 
  store references to the items that need to be cached.
[16 Oct 2006 12:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13729

ChangeSet@1.2261, 2006-10-16 14:39:11+03:00, gkodinov@macbook.gmz +8 -0
  Bug#21798: memory leak during query execution with subquery in column
              list using a function
  When executing dependent subqueries they are re-inited and re-exec() for 
  each row of the outer context.
  The cause for the bug is that during subquery reinitialization/re-execution,
  the optimizer reallocates JOIN::join_tab, JOIN::table in make_simple_join()
  and the local variable in 'sortorder' in create_sort_index(), which is
  allocated by make_unireg_sortorder().
  Care must be taken not to allocate anything into the thread's memory pool
  while re-initializing query plan structures between subquery re-executions.
  All such items mush be cached and reused because the thread's memory pool
  is freed at the end of the whole query.
  Note that they must be cached and reused even for queries that are not 
  otherwise cacheable because otherwise it will grow the thread's memory 
  pool every time a cacheable query is re-executed. 
  We provide additional members to the JOIN structure to store references 
  to the items that need to be cached.
[17 Oct 2006 13:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13802

ChangeSet@1.2261, 2006-10-17 16:20:26+03:00, gkodinov@macbook.gmz +8 -0
  Bug#21798: memory leak during query execution with subquery in column
              list using a function
  When executing dependent subqueries they are re-inited and re-exec() for 
  each row of the outer context.
  The cause for the bug is that during subquery reinitialization/re-execution,
  the optimizer reallocates JOIN::join_tab, JOIN::table in make_simple_join()
  and the local variable in 'sortorder' in create_sort_index(), which is
  allocated by make_unireg_sortorder().
  Care must be taken not to allocate anything into the thread's memory pool
  while re-initializing query plan structures between subquery re-executions.
  All such items mush be cached and reused because the thread's memory pool
  is freed at the end of the whole query.
  Note that they must be cached and reused even for queries that are not 
  otherwise cacheable because otherwise it will grow the thread's memory 
  pool every time a cacheable query is re-executed. 
  We provide additional members to the JOIN structure to store references 
  to the items that need to be cached.
[21 Oct 2006 9:12] Georgi Kodinov
Pushed in 5.0.27/5.1.13-beta
[25 Oct 2006 19:21] Paul Dubois
Noted in 5.0.30, 5.1.13 changelogs.