Bug #18300 Memory leak of st_lex (and maybe sp_pcontext?) structures?
Submitted: 17 Mar 2006 10:45 Modified: 3 Aug 2009 17:35
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Stored Routines Severity:S1 (Critical)
Version:5.0.19, 4.1.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: bfsm_2007_02_15, memory leak, qc

[17 Mar 2006 10:45] Jeremy Cole
Description:
I believe st_lex structures are leaking somewhere, based on these aggregate statistics from a production mysql server:

+------------------+------+----------+------+
| file             | line | total    | nr   |
+------------------+------+----------+------+
| my_largepage.c   |   65 | 59381760 |    1 |
| sql_cache.cc     | 1566 | 33545976 |    1 |
| mf_keycache.c    |  366 |  7726976 |    1 |
| my_alloc.c       |  197 |  4330728 | 1234 |
| sql_lex.cc       | 1629 |  2380384 |  292 |   <-- [note 1]
| mulalloc.c       |   51 |  1798936 |  254 |
| my_alloc.c       |   64 |   526352 |   65 |
| sp_head.cc       |  469 |   521728 |   64 |
| sp_head.cc       |  468 |   521728 |   64 |
| sql_base.cc      | 1311 |   402456 |  246 |
| mf_iocache.c     |  218 |   155648 |    7 |
| net_serv.cc      |  120 |   114737 |    7 |
| sql_string.h     |  199 |   114688 |    7 |
| my_alloc.c       |  129 |    86240 |   14 |
| sql_acl.cc       | 2169 |    57064 |    7 |
| table.cc         |  502 |    45376 |  131 |
| sql_list.h       |  422 |    42104 |    9 |
| my_file.c        |  125 |    40848 |    1 |
| sp_cache.cc      |  252 |    32608 |    4 |
| derror.cc        |  128 |    26592 |    1 |
| sql_base.cc      |   59 |    16512 |    1 |
| sp_pcontext.cc   |   60 |    15872 |  124 |   <-- [note 2]
| sql_cache.cc     | 1645 |     8192 |    1 |
| sql_cache.cc     | 1634 |     8192 |    1 |
| sql_acl.cc       | 3330 |     8152 |    1 |
| sql_acl.cc       | 3327 |     8152 |    1 |
| sql_acl.cc       | 3324 |     8152 |    1 |
| sql_parse.cc     |  505 |     8000 |    1 |
| sp_pcontext.cc   |   61 |     7936 |  124 |   <-- [note 3]
| sp_pcontext.cc   |   59 |     7936 |  124 |
| sp_pcontext.cc   |   58 |     7936 |  124 |
| sp_pcontext.cc   |   57 |     7936 |  124 |
| mi_open.c        |  673 |     7851 |    9 |
| ha_innodb.cc     | 6730 |     5233 |   23 |
| queues.c         |   54 |     4128 |    1 |
| sp_head.cc       |  486 |     4096 |   64 |
| hash_filo.h      |   82 |     3072 |    2 |
| array.c          |  277 |     1756 |    5 |
| set_var.cc       | 2889 |     1304 |    1 |
| set_var.cc       | 3452 |     1268 |    1 |
| vio.c            |  144 |     1092 |    7 |
| repl_failsafe.cc |  212 |     1024 |    1 |
| sql_class.cc     |  252 |      896 |    7 |
| sql_class.cc     | 1727 |      896 |    7 |
| sql_class.cc     | 1724 |      896 |    7 |
| time.cc          |  593 |      864 |   24 |
| sql_class.cc     | 2052 |      800 |    1 |
| log.cc           | 1651 |      784 |    4 |
| table.cc         |  133 |      669 |   95 |
| sql_base.cc      | 1362 |      519 |  151 |
| sql_class.cc     |  260 |      448 |    7 |
| my_open.c        |  153 |      438 |   19 |
| ft_stopwords.c   |   57 |      344 |    1 |
| sql_udf.cc       |  153 |      256 |    1 |
| ha_innodb.cc     | 1420 |      256 |    1 |
| sql_db.cc        |  116 |      256 |    1 |
| tztime.cc        | 1548 |      208 |    1 |
| my_bitmap.c      |   71 |      192 |    1 |
| handler.cc       |  388 |      164 |    1 |
| tztime.cc        | 1542 |      160 |    1 |
| item_func.cc     | 2978 |      128 |    1 |
| mf_keycaches.c   |  114 |      128 |    1 |
| hostname.cc      |  159 |      119 |    6 |
| sql_parse.cc     |  810 |       78 |    6 |
| my_error.c       |  175 |       48 |    3 |
| sql_parse.cc     | 1027 |       47 |    7 |
| log.cc           |  531 |       45 |    2 |
| mf_cache.c       |   65 |       44 |    4 |
| sql_db.cc        | 1102 |       42 |    6 |
| ha_innodb.cc     | 1282 |       23 |    1 |
| mysqld.cc        | 3046 |       21 |    1 |
| sql_string.cc    |   52 |       16 |    1 |
| mf_cache.c       |   66 |       12 |    4 |
| mysqld.cc        | 7263 |       11 |    1 |
| mf_tempdir.c     |   51 |       11 |    1 |
| set_var.h        |  884 |        8 |    1 |
| mf_tempdir.c     |   32 |        4 |    1 |
| mysqld.cc        | 2696 |        1 |    1 |
| mysqld.cc        | 2690 |        1 |    1 |
+------------------+------+----------+------+

sql/sql_lex.cc:1629 is st_lex::st_lex:

  1626  st_lex::st_lex()
  1627    :result(0), sql_command(SQLCOM_END), query_tables_own_last(0)
  1628  {
  1629    hash_init(&sroutines, system_charset_info, 0, 0, 0, sp_sroutine_key, 0
, 0);
  1630    sroutines_list.empty();
  1631    sroutines_list_own_last= sroutines_list.next;
  1632    sroutines_list_own_elements= 0;
  1633  }

sql/sp_pcontext:57-61 are sp_pcontext::sp_pcontext:

    53  sp_pcontext::sp_pcontext(sp_pcontext *prev)
    54    :Sql_alloc(), m_total_pvars(0), m_csubsize(0), m_hsubsize(0),
    55     m_handlers(0), m_parent(prev), m_pboundary(0)
    56  {
    57    VOID(my_init_dynamic_array(&m_pvar, sizeof(sp_pvar_t *), 16, 8));
    58    VOID(my_init_dynamic_array(&m_case_expr_id_lst, sizeof(int), 16, 8));
    59    VOID(my_init_dynamic_array(&m_cond, sizeof(sp_cond_type_t *), 16, 8));
    60    VOID(my_init_dynamic_array(&m_cursor, sizeof(LEX_STRING), 16, 8));
    61    VOID(my_init_dynamic_array(&m_handler, sizeof(sp_cond_type_t *), 16, 8
));
   [snip]

How to repeat:
Info was collected using a binary with --with-debug=full, and running "mysqladmin debug" occasionally.  It was sent through an awk script and inserted into a table to summarize and format.
[17 Mar 2006 17:49] MySQL Verification Team
Hi Jeremy,

Thank you for the bug report. Could you please provide a reproducible
test case for to try it on our side?

Thanks in advance.
[18 Mar 2006 2:19] Jeremy Cole
If I had a reproducible test case I would have submitted it. :)

This is only happening on a production server, and attempts to reproduce it on a test machine have failed.
[18 Mar 2006 8:26] Per-Erik Martin
Can you tell us something about which features these systems are using?

Stored procedures or functions, or both?
...with cursors and/or exception handlers?
Triggers?
Views?
Which storage engine(s)?
[18 Mar 2006 8:33] Jeremy Cole
Hi PEM!

It's 100% InnoDB, using triggers, stored procedures, and prepared statements (both via PREPARE/EXECUTE commands with SPs and via JDBC).  Highly variable number of concurrent users and load: 25-500 concurrent sessions, and 200-1500 q/s total.  It leaks much faster during peak (reaching 2.5G address space limit in ~35 minutes).

Nothing special otherwise.  It seems to be leaking only from SELECTs, as slaves show no leaks (although they don't run very long -- more replication bugs).
[18 Mar 2006 8:39] Jeremy Cole
You can find the error log containing the complete debug dumps, as well as a log of out of memory errors etc., here:

  http://jcole.us/files/bug18300/gndb1.gaggle.net.err.20060318.gz

A debug trace from about 1 minute of operation here:

  http://jcole.us/files/bug18300/mysqld.trace.gz
[18 Mar 2006 8:42] Jeremy Cole
I didn't state, but maybe it's worth stating:

No cursors, no views, no stored functions.
[24 Mar 2006 15:51] Per-Erik Martin
I assume you are using local variables in your triggers?

Can you say anything about the number of rows you have in tables with triggers?

(I have a feeling it might be BUG#17260)
[28 Mar 2006 23:56] Jeremy Cole
An interesting turn of events... Since the memory leak was so painful, it was decided to roll back to 4.1 instead, since that seemed a sure thing.  Unfortunately, the memory leak is still occurring, and 4.1 is now crashing regularly with "out of memory" at about the same rate as 5.0 was.

That certainly limits the features that could be materially involved in the leak... really the only thing left in their code is prepared statements (via JDBC, and via PREPARE/EXECUTE SQL statements).

Here's a crazy thought:  Is it possible that the OS/threading library is leaking the memory somehow?  It's RHEL3, running the RHEL3 RPMs from MySQL.com.
[29 Mar 2006 9:50] Konstantin Osipov
To check that you're freeing all the allocated prepared statements, you could do the following:
show status like '%stmt%';
Check that the difference between Com_stmt_prepare and Com_stmt_close does not increase while your application runs. This difference would give you an estimation of how many statements there are allocated in the server.
[29 Mar 2006 9:56] Per-Erik Martin
I think it's very likely that at least part of the problem is BUG#17260 (for which a fix exists). It would be interesting to know if this solves your problem.
Do you want to wait for a release with the fix, or try out a separate patch?
[29 Mar 2006 12:02] Jeremy Cole
Konst: OK, now we're getting somewhere.  Is it true that for every allocated statement, Com_stmt_close should always be incremented?  Does it get incremented even if the statement is not explicitly deallocated (i.e. you depend on the cleanup at connection close to free it)?

The difference between Com_stmt_prepare and Com_stmt_close is definitely growing uncontrolled on this system:

$ while [ 1 ]; do echo -n `date` ": "; echo `mysql -N --batch -u root -e 'show status like "Com_stmt_prepare"' | cut -f2` - `mysql -N --batch -u root -e 'show status like "Com_stmt_close"' | cut -f2` | bc; sleep 10; done
Wed Mar 29 03:55:35 PST 2006 : 789706
Wed Mar 29 03:55:45 PST 2006 : 789711
Wed Mar 29 03:55:56 PST 2006 : 789737
Wed Mar 29 03:56:06 PST 2006 : 789746
Wed Mar 29 03:56:16 PST 2006 : 789748
Wed Mar 29 03:56:26 PST 2006 : 789755
Wed Mar 29 03:56:36 PST 2006 : 789763
Wed Mar 29 03:56:46 PST 2006 : 789770
Wed Mar 29 03:56:56 PST 2006 : 789768
Wed Mar 29 03:57:06 PST 2006 : 789772
Wed Mar 29 03:57:16 PST 2006 : 789776
Wed Mar 29 03:57:26 PST 2006 : 789779
Wed Mar 29 03:57:36 PST 2006 : 789782
Wed Mar 29 03:57:46 PST 2006 : 789787
Wed Mar 29 03:57:56 PST 2006 : 789789
Wed Mar 29 03:58:06 PST 2006 : 789794
Wed Mar 29 03:58:16 PST 2006 : 789798
Wed Mar 29 03:58:26 PST 2006 : 789799
Wed Mar 29 03:58:36 PST 2006 : 789803
Wed Mar 29 03:58:46 PST 2006 : 789808
Wed Mar 29 03:58:56 PST 2006 : 789810
Wed Mar 29 03:59:06 PST 2006 : 789819
Wed Mar 29 03:59:17 PST 2006 : 789819
Wed Mar 29 03:59:27 PST 2006 : 789827
Wed Mar 29 03:59:37 PST 2006 : 789826
Wed Mar 29 03:59:47 PST 2006 : 789828
Wed Mar 29 03:59:57 PST 2006 : 789828
Wed Mar 29 04:00:07 PST 2006 : 789833
Wed Mar 29 04:00:17 PST 2006 : 789867
Wed Mar 29 04:00:27 PST 2006 : 789884
Wed Mar 29 04:00:37 PST 2006 : 789881
Wed Mar 29 04:00:47 PST 2006 : 789889
Wed Mar 29 04:00:57 PST 2006 : 789880
Wed Mar 29 04:01:07 PST 2006 : 789883
Wed Mar 29 04:01:17 PST 2006 : 789897
Wed Mar 29 04:01:27 PST 2006 : 789900
Wed Mar 29 04:01:37 PST 2006 : 789889
[29 Mar 2006 12:04] Jeremy Cole
Per-Erik: How could that bug solve the problem, when they are running 4.1 now?  No triggers anymore...
[29 Mar 2006 12:21] Per-Erik Martin
Well, you initially indicated that is was sp structures leaking, and that you were using triggers. 
Both exists in 5.0 only, and if you for instance use local variables in the triggers it will definitely leak memory.
But this does not rule out other leaks as well (in 4.1), of course. :)
[29 Mar 2006 12:24] Konstantin Osipov
COM_STMT_PREPARE is incremented for every attempt to prepare a statement. So if you have statements that have syntax error or failed to prepare for some other reasons,  COM_STMT_PREPARE will differ from the accurate number of prepared statements. Otherwise it gives you the accurate number.
COM_STMT_CLOSE is incremented for every deallocate of a prepared statement. Basically, it's called in the destructor. So it doesn't matter who issued the deallocate call: the server itslef (at connection close) or the client (by calling mysql_stmt_close or DEALLOCATE PREPARE). Normally COM_STMT_PREPARE - COM_STMT_CLOSE would give you the more or less accurate number of prepared statements in the server.
[29 Mar 2006 12:25] Konstantin Osipov
You could try to apply my patch for 4.1 (Bug#16365) and set the hard limit for max_prepared_stmt_count. Then you'll be able to easily see if your application leaks statements, as at some point in time mysql_stmt_prepare will fail.
[30 Mar 2006 23:08] Jeremy Cole
OK, we have some exciting news.  Besides the previous comment that confirmed that it was server-side prepared statements leaking, we have further confirmation.

The programmer found a setting in Connector/J, useServerPrepStmts=false, which forces the driver to always use client-side prepared statements.  That has solved the memory leak, and the server is now steady using 1.1G of memory.

Someone needs to go over the server-side prepared statement code with a fine-toothed comb...
[31 Mar 2006 15:39] Mark Matthews
Jeremy, while "useServerPrepStmts=false" is a workaround that seems to alleviate the problem for you, I would suggest going back and checking _all_ of your code paths to make sure you're actually closing statements in all cases. This can be an issue even with client-side emulated prepared statements, as it can appear to be a memory leak in your Java application, even though it's as-designed, as it's required by the JDBC specification (more on this in a bit).

We've seen this behavior before with other users, and it turned out they missed a PreparedStatement.close() somewhere (or everywhere!).

The reason for this behavior is that the JDBC specification states that users are responsible for explicitly closing statements if they want them "freed" before the physical connection is closed. Because of this, if you're using a pool with long-lived connections, these statements will not be freed for quite some time.

There are some tools to help track down these types of issues. Our JDBC driver has the "usage advisor" (enabled by adding useUsageAdvisor=true"), which will log when statements are leaked (amongst other things), but if your application does a lot of things that it would complain about (statement leaks, prepared statements being prepared but executed 1 or less times, not looking at all rows or columns of a result set, etc), it might be too much logging for something in production.

If you can't run a tool in production, then a tool I use myself and highly recommend for static code analysis is FindBugs, which can catch cases where not all code paths close JDBC resources. http://findbugs.sourceforge.net/

Notice, I'm not saying that your application is definitely not closing statements, but empirical evidence with our userbase says that this is the most common cause, especially when the counts for PREPARE and CLOSE are way out of whack.

In any case, it's something you probably want to look into, as not closing these statements even when client-side emulated can lead to GC issues, as these prepared statement instances turn into tenured instances, and end up in a heap that is specially designed for them, but it much slower to collect. This can limit throughput and stability of your Java application itself.
[31 Mar 2006 16:26] Jeremy Cole
Hi Mark,

I had thought of this already, so some changes were made to try to ensure that things got cleaned up.  The connection pool was changed so that connections were closed/reopened at least every 60 seconds, but it didn't solve the problem.

Shouldn't closing a connection *always* free up all of these resources?
[31 Mar 2006 16:29] Mark Matthews
Jeremy,

Closing the physical connection, should in theory take care of all of that.

What connection pool are you using? Is it something like Apache DBCP, or C3p0, or something home-grown?
[8 Jun 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[16 Feb 2007 0:34] James Day
Jeremy, a number of candidate causes have been fixed and we'd like verification that you still see this problem.
[26 Feb 2007 18:37] Valeriy Kravchuk
Jeremy,

Will it be possible for you to check your application with server-side prepared statements on latest (>5.0.24 and >=4.1.22) versions of MySQL server? I want to check if this is not a duplicate of http://bugs.mysql.com/bug.php?id=21206.
[26 Feb 2007 18:46] Jeremy Cole
Hi Valeriy,

Sorry, it probably won't be possible to test that application again, as it is a production system, and the problems they saw last time were very serious.

Regards,

Jeremy
[11 Aug 2007 12:49] Konstantin Osipov
Closing as per comment from Jeremy.
[3 Aug 2009 17:02] Konstantin Osipov
This was open for 2.5 years without being verified. 
Valeriy, please either close this bug or verify it.
Thanks.
[3 Aug 2009 17:35] Valeriy Kravchuk
I see no way to repeat this on current 5.0.x (other than NOT closing prepared statements properly). 

If anybody has a test case that shows some different source of related memory leak, please, provide it.