Bug #26750 valgrind leak in sp_head
Submitted: 1 Mar 2007 14:52 Modified: 7 Mar 2007 21:37
Reporter: Mads Martin Joergensen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0 OS:
Assigned to: Konstantin Osipov CPU Architecture:Any

[1 Mar 2007 14:52] Mads Martin Joergensen
Description:
VALGRIND: '3,704 bytes in 1 blocks are indirectly lost in loss record 8 of 9'
    COUNT: 1
    FUNCTION: malloc    FILES:    slave.err
    TESTS:    rpl_replicate_do
    STACK: at 0x4A20859: malloc (vg_replace_malloc.c:149)
             by 0x74CD5B: sp_head::reset_lex(THD*) (sp_head.cc:1753)
             by 0x60EA82: MYSQLparse(void*) (sql_yacc.yy:8164)
             by 0x5EEAAC: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5897)
             by 0x67542B: Query_log_event::exec_event(st_relay_log_info*, char const*, unsigned) (log_event.cc:1866)
             by 0x7248FE: exec_relay_log_event(THD*, st_relay_log_info*) (slave.cc:3335)
             by 0x7251A2: handle_slave_sql (slave.cc:3926)
             by 0x4E4D192: start_thread (in /lib64/libpthread-2.4.so)
             by 0x53C045C: clone (in /lib64/libc-2.4.so)

VALGRIND: '18,612 bytes in 28 blocks are indirectly lost in loss record 9 of 9'
    COUNT: 1
    FUNCTION: malloc    FILES:    slave.err
    TESTS:    rpl_replicate_do
    STACK: at 0x4A20859: malloc (vg_replace_malloc.c:149)
             by 0x9AD66F: my_malloc (my_malloc.c:34)
             by 0x9B7686: init_dynamic_array (array.c:63)
             by 0x9B5E44: _hash_init (hash.c:58)
             by 0x74C49B: sp_head::sp_head() (sp_head.cc:460)
             by 0x60C485: MYSQLparse(void*) (sql_yacc.yy:9483)
             by 0x5EEAAC: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5897)
             by 0x67542B: Query_log_event::exec_event(st_relay_log_info*, char const*, unsigned) (log_event.cc:1866)
             by 0x7248FE: exec_relay_log_event(THD*, st_relay_log_info*) (slave.cc:3335)
             by 0x7251A2: handle_slave_sql (slave.cc:3926)
             by 0x4E4D192: start_thread (in /lib64/libpthread-2.4.so)
             by 0x53C045C: clone (in /lib64/libc-2.4.so)

VALGRIND: '23,260 (944 direct, 22,316 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 9'
    COUNT: 1
    FUNCTION: malloc    FILES:    slave.err
    TESTS:    rpl_replicate_do
    STACK: at 0x4A20859: malloc (vg_replace_malloc.c:149)
             by 0x9AD66F: my_malloc (my_malloc.c:34)
             by 0x9AE253: alloc_root (my_alloc.c:154)
             by 0x74B9F9: sp_head::operator new(unsigned long) (sp_head.cc:419)
             by 0x60C47A: MYSQLparse(void*) (sql_yacc.yy:9483)
             by 0x5EEAAC: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5897)
             by 0x67542B: Query_log_event::exec_event(st_relay_log_info*, char const*, unsigned) (log_event.cc:1866)
             by 0x7248FE: exec_relay_log_event(THD*, st_relay_log_info*) (slave.cc:3335)
             by 0x7251A2: handle_slave_sql (slave.cc:3926)
             by 0x4E4D192: start_thread (in /lib64/libpthread-2.4.so)
             by 0x53C045C: clone (in /lib64/libc-2.4.so)

How to repeat:
mysql-5.0 pushbuild tree.
[1 Mar 2007 15:33] Alexander Barkov
How to reproduce memory leak

Attachment: bug26750.diff (text/x-patch), 949 bytes.

[1 Mar 2007 15:40] Alexander Barkov
Although this bug appeared after a fix for bug#24478,
it does not seem to be related to code change.
Just the test case added while fixing bug#24478 discovered
this problem.

This is a HOWTO to reproduce the same problem with an older tree,
which does not include a fix for bug#24478:

# Clone the current 5.0 main tree
bk clone username@bk-internal.mysql.com:/home/bk/mysql-5.0

# Make another clone, without fix for bug#24478
bk clone -r1.2422 mysql-5.0 mysql-5.0.b26750
cd mysql-5.0.b26750

# make sure the fix for bug#24478 is not here
bk changes |grep 24478

# compile
./BUILD/compile-pentium-debug-max

#
# Download the test case patch into the tree root directory.
# For example, using wget:
#
wget http://bugs.mysql.com/file.php?id=5815

# apply the patch:
patch -p1 < bug26750.diff

# run the test with valgrind (ignore that it will fail)
# Then see slave.err

cd mysql-test/
./mysql-test-run.pl --valgrind rpl_replicate_do
cd var/log/slave.err
cat var/log/slave.err

It will report the memory leak:

==21519== 25,212 bytes in 9 blocks are still reachable in loss record 7 of 7
==21519==    at 0x4005400: malloc (vg_replace_malloc.c:149)
==21519==    by 0x855E5DF: _mymalloc (safemalloc.c:137)
==21519==    by 0x855DA86: init_alloc_root (my_alloc.c:62)
==21519==    by 0x83ACA3E: sp_head::operator new(unsigned) (sp_head.cc:418)
==21519==    by 0x825A3FB: MYSQLparse(void*) (sql_yacc.yy:9464)
==21519==    by 0x823B522: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5872)
==21519==    by 0x82CA9E5: Query_log_event::exec_event(st_relay_log_info*, char const*, unsigned) (log_event.cc:1826)
==21519==    by 0x82CB0A3: Query_log_event::exec_event(st_relay_log_info*) (log_event.cc:1695)
==21519==    by 0x8380AC2: exec_relay_log_event(THD*, st_relay_log_info*) (slave.cc:3333)
==21519==    by 0x8381487: handle_slave_sql (slave.cc:3919)
==21519==    by 0x7153DA: start_thread (in /lib/libpthread-2.5.so)
==21519==    by 0x66F06D: clone (in /lib/libc-2.5.so)
[1 Mar 2007 15:45] Konstantin Osipov
Mads, we were not pushing anything into 5.0 in the last 3 weeks.
I believe the lead assignment is wrong - the bug is in replication (see backtrace: handle_slave_sql, exec_relay_log_event).
Please reassign.
[1 Mar 2007 23:55] Konstantin Osipov
Identified the problem, working on a fix.
[4 Mar 2007 21:15] Konstantin Osipov
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21101

ChangeSet@1.2438, 2007-03-04 14:44:46+03:00, kostja@bodhi.local +10 -0
  A fix for Bug#26570 "valgrind leak in sp_head"

  The legend: on a replication slave, in case a trigger creation
  was filtered out because of application of replicate-do-table/
  replicate-ignore-table rule, the parsed definition of a trigger was
not
  cleaned up properly. LEX::sphead member was left around and leaked
  memory. Until the actual implementation of support of
  replicate-ignore-table rules for triggers by the patch for Bug 24478
it
  was never the case that "case SQLCOM_CREATE_TRIGGER"
  was not executed once a trigger was parsed,
  so the deletion of lex->sphead there worked and the memory did not
leak.

  The fix:

  The real cause of the bug is that there is no 1 or 2 places where
  we can clean up the main LEX after parse. And the reason we
  can not have just one or two places where we clean up the LEX is
  asymmetric behaviour of MYSQLparse in case of success or error.

  One of the root causes of this behaviour is the code in Item::Item()
  constructor. There, a newly created item adds itself to
THD::free_list
  - a single-linked list of Items used in a statement. Yuck. This code
  is unaware that we may have more than one statement active at a time,
  and always assumes that the free_list of the current statement is
  located in THD::free_list. One day we need to be able to explicitly
  allocate an item in a given Query_arena.
  Thus, when parsing a definition of a stored procedure, like
  CREATE PROCEDURE p1() BEGIN SELECT a FROM t1; SELECT b FROM t1; END;
  we actually need to reset THD::mem_root, THD::free_list and THD::lex
  to parse the nested procedure statement (SELECT *).
  The actual reset and restore is implemented in semantic actions
  attached to sp_proc_stmt grammar rule.
  The problem is that in case of a parsing error inside a nested
statement
  Bison generated parser would abort immediately, without executing the
  restore part of the semantic action. This would leave THD in an
  in-the-middle-of-parsing state.
  This is why we couldn't have had a single place where we clean up the
LEX
  after MYSQLparse - in case of an error we needed to do a clean up
  immediately, in case of success a clean up could have been delayed.
  This left the door open for a memory leak.

  One of the following possibilities were considered when working on a
fix:
  - patch the replication logic to do the clean up. Rejected
  as breaks module borders, replication code should not need to know
the
  gory details of clean up procedure after CREATE TRIGGER.
  - wrap MYSQLparse with a function that would do a clean up.
  Rejected as ideally we should fix the problem when it happens, not
  adjust for it outside of the problematic code.
  - make sure MYSQLparse cleans up after itself by invoking the clean
up
  functionality in the appropriate places before return. Implemented in

  this patch.
  - use %destructor rule for sp_proc_stmt to restore THD - cleaner
  than the prevoius approach, but rejected
  because needs a careful analysis of the side effects, and this patch
is
  for 5.0, and long term we need to use the next alternative anyway
  - make sure that sp_proc_stmt doesn't juggle with THD - this is a
  large work that will affect many modules.

  Cleanup: move main_lex and main_mem_root from Statement to its
  only two descendants Prepared_statement and THD. This ensures that
  when a Statement instance was created for purposes of statement
backup,
  we do not involve LEX constructor/destructor, which is fairly
expensive.
  In order to track that the transformation produces equivalent
  functionality please check the respective constructors and
destructors
  of Statement, Prepared_statement and THD - these members were
  used only there.
  Unrelated to the patch and could be moved to 5.1 if reviewer so
requests.
[7 Mar 2007 9:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21344

ChangeSet@1.2438, 2007-03-07 12:24:46+03:00, kostja@bodhi.local +10 -0
  A fix for Bug#26750 "valgrind leak in sp_head" (and post-review
  fixes).
  
  The legend: on a replication slave, in case a trigger creation
  was filtered out because of application of replicate-do-table/
  replicate-ignore-table rule, the parsed definition of a trigger was not 
  cleaned up properly. LEX::sphead member was left around and leaked 
  memory. Until the actual implementation of support of 
  replicate-ignore-table rules for triggers by the patch for Bug 24478 it 
  was never the case that "case SQLCOM_CREATE_TRIGGER"
  was not executed once a trigger was parsed,
  so the deletion of lex->sphead there worked and the memory did not leak.
  
  The fix: 
  
  The real cause of the bug is that there is no 1 or 2 places where
  we can clean up the main LEX after parse. And the reason we 
  can not have just one or two places where we clean up the LEX is
  asymmetric behaviour of MYSQLparse in case of success or error. 
  
  One of the root causes of this behaviour is the code in Item::Item()
  constructor. There, a newly created item adds itself to THD::free_list
  - a single-linked list of Items used in a statement. Yuck. This code
  is unaware that we may have more than one statement active at a time,
  and always assumes that the free_list of the current statement is
  located in THD::free_list. One day we need to be able to explicitly
  allocate an item in a given Query_arena.
  Thus, when parsing a definition of a stored procedure, like
  CREATE PROCEDURE p1() BEGIN SELECT a FROM t1; SELECT b FROM t1; END;
  we actually need to reset THD::mem_root, THD::free_list and THD::lex
  to parse the nested procedure statement (SELECT *).
  The actual reset and restore is implemented in semantic actions
  attached to sp_proc_stmt grammar rule.
  The problem is that in case of a parsing error inside a nested statement
  Bison generated parser would abort immediately, without executing the
  restore part of the semantic action. This would leave THD in an 
  in-the-middle-of-parsing state.
  This is why we couldn't have had a single place where we clean up the LEX
  after MYSQLparse - in case of an error we needed to do a clean up
  immediately, in case of success a clean up could have been delayed.
  This left the door open for a memory leak.
  
  One of the following possibilities were considered when working on a fix:
  - patch the replication logic to do the clean up. Rejected
  as breaks module borders, replication code should not need to know the
  gory details of clean up procedure after CREATE TRIGGER.
  - wrap MYSQLparse with a function that would do a clean up.
  Rejected as ideally we should fix the problem when it happens, not
  adjust for it outside of the problematic code.
  - make sure MYSQLparse cleans up after itself by invoking the clean up
  functionality in the appropriate places before return. Implemented in 
  this patch.
  - use %destructor rule for sp_proc_stmt to restore THD - cleaner
  than the prevoius approach, but rejected
  because needs a careful analysis of the side effects, and this patch is 
  for 5.0, and long term we need to use the next alternative anyway
  - make sure that sp_proc_stmt doesn't juggle with THD - this is a 
  large work that will affect many modules.
  
  Cleanup: move main_lex and main_mem_root from Statement to its
  only two descendants Prepared_statement and THD. This ensures that
  when a Statement instance was created for purposes of statement backup,
  we do not involve LEX constructor/destructor, which is fairly expensive.
  In order to track that the transformation produces equivalent 
  functionality please check the respective constructors and destructors
  of Statement, Prepared_statement and THD - these members were
  used only there.
  This cleanup is unrelated to the patch.
[7 Mar 2007 15:16] Konstantin Osipov
Queued into 5.0-runtime, 5.1-runtime
[7 Mar 2007 21:37] Konstantin Osipov
An internal mem leak. No documentation entry needed.