MySQL Bugs: #18300: Memory leak of st_lex (and maybe sp

Bug #18300	Memory leak of st_lex (and maybe sp_pcontext?) structures?
Submitted:	17 Mar 2006 10:45	Modified:	3 Aug 2009 17:35
Reporter:	Jeremy Cole (Basic Quality Contributor) (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Stored Routines	Severity:	S1 (Critical)
Version:	5.0.19, 4.1.18	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	bfsm_2007_02_15, memory leak, qc

Description:
I believe st_lex structures are leaking somewhere, based on these aggregate statistics from a production mysql server:

+------------------+------+----------+------+
| file             | line | total    | nr   |
+------------------+------+----------+------+
| my_largepage.c   |   65 | 59381760 |    1 |
| sql_cache.cc     | 1566 | 33545976 |    1 |
| mf_keycache.c    |  366 |  7726976 |    1 |
| my_alloc.c       |  197 |  4330728 | 1234 |
| sql_lex.cc       | 1629 |  2380384 |  292 |   <-- [note 1]
| mulalloc.c       |   51 |  1798936 |  254 |
| my_alloc.c       |   64 |   526352 |   65 |
| sp_head.cc       |  469 |   521728 |   64 |
| sp_head.cc       |  468 |   521728 |   64 |
| sql_base.cc      | 1311 |   402456 |  246 |
| mf_iocache.c     |  218 |   155648 |    7 |
| net_serv.cc      |  120 |   114737 |    7 |
| sql_string.h     |  199 |   114688 |    7 |
| my_alloc.c       |  129 |    86240 |   14 |
| sql_acl.cc       | 2169 |    57064 |    7 |
| table.cc         |  502 |    45376 |  131 |
| sql_list.h       |  422 |    42104 |    9 |
| my_file.c        |  125 |    40848 |    1 |
| sp_cache.cc      |  252 |    32608 |    4 |
| derror.cc        |  128 |    26592 |    1 |
| sql_base.cc      |   59 |    16512 |    1 |
| sp_pcontext.cc   |   60 |    15872 |  124 |   <-- [note 2]
| sql_cache.cc     | 1645 |     8192 |    1 |
| sql_cache.cc     | 1634 |     8192 |    1 |
| sql_acl.cc       | 3330 |     8152 |    1 |
| sql_acl.cc       | 3327 |     8152 |    1 |
| sql_acl.cc       | 3324 |     8152 |    1 |
| sql_parse.cc     |  505 |     8000 |    1 |
| sp_pcontext.cc   |   61 |     7936 |  124 |   <-- [note 3]
| sp_pcontext.cc   |   59 |     7936 |  124 |
| sp_pcontext.cc   |   58 |     7936 |  124 |
| sp_pcontext.cc   |   57 |     7936 |  124 |
| mi_open.c        |  673 |     7851 |    9 |
| ha_innodb.cc     | 6730 |     5233 |   23 |
| queues.c         |   54 |     4128 |    1 |
| sp_head.cc       |  486 |     4096 |   64 |
| hash_filo.h      |   82 |     3072 |    2 |
| array.c          |  277 |     1756 |    5 |
| set_var.cc       | 2889 |     1304 |    1 |
| set_var.cc       | 3452 |     1268 |    1 |
| vio.c            |  144 |     1092 |    7 |
| repl_failsafe.cc |  212 |     1024 |    1 |
| sql_class.cc     |  252 |      896 |    7 |
| sql_class.cc     | 1727 |      896 |    7 |
| sql_class.cc     | 1724 |      896 |    7 |
| time.cc          |  593 |      864 |   24 |
| sql_class.cc     | 2052 |      800 |    1 |
| log.cc           | 1651 |      784 |    4 |
| table.cc         |  133 |      669 |   95 |
| sql_base.cc      | 1362 |      519 |  151 |
| sql_class.cc     |  260 |      448 |    7 |
| my_open.c        |  153 |      438 |   19 |
| ft_stopwords.c   |   57 |      344 |    1 |
| sql_udf.cc       |  153 |      256 |    1 |
| ha_innodb.cc     | 1420 |      256 |    1 |
| sql_db.cc        |  116 |      256 |    1 |
| tztime.cc        | 1548 |      208 |    1 |
| my_bitmap.c      |   71 |      192 |    1 |
| handler.cc       |  388 |      164 |    1 |
| tztime.cc        | 1542 |      160 |    1 |
| item_func.cc     | 2978 |      128 |    1 |
| mf_keycaches.c   |  114 |      128 |    1 |
| hostname.cc      |  159 |      119 |    6 |
| sql_parse.cc     |  810 |       78 |    6 |
| my_error.c       |  175 |       48 |    3 |
| sql_parse.cc     | 1027 |       47 |    7 |
| log.cc           |  531 |       45 |    2 |
| mf_cache.c       |   65 |       44 |    4 |
| sql_db.cc        | 1102 |       42 |    6 |
| ha_innodb.cc     | 1282 |       23 |    1 |
| mysqld.cc        | 3046 |       21 |    1 |
| sql_string.cc    |   52 |       16 |    1 |
| mf_cache.c       |   66 |       12 |    4 |
| mysqld.cc        | 7263 |       11 |    1 |
| mf_tempdir.c     |   51 |       11 |    1 |
| set_var.h        |  884 |        8 |    1 |
| mf_tempdir.c     |   32 |        4 |    1 |
| mysqld.cc        | 2696 |        1 |    1 |
| mysqld.cc        | 2690 |        1 |    1 |
+------------------+------+----------+------+

sql/sql_lex.cc:1629 is st_lex::st_lex:

  1626  st_lex::st_lex()
  1627    :result(0), sql_command(SQLCOM_END), query_tables_own_last(0)
  1628  {
  1629    hash_init(&sroutines, system_charset_info, 0, 0, 0, sp_sroutine_key, 0
, 0);
  1630    sroutines_list.empty();
  1631    sroutines_list_own_last= sroutines_list.next;
  1632    sroutines_list_own_elements= 0;
  1633  }

sql/sp_pcontext:57-61 are sp_pcontext::sp_pcontext:

    53  sp_pcontext::sp_pcontext(sp_pcontext *prev)
    54    :Sql_alloc(), m_total_pvars(0), m_csubsize(0), m_hsubsize(0),
    55     m_handlers(0), m_parent(prev), m_pboundary(0)
    56  {
    57    VOID(my_init_dynamic_array(&m_pvar, sizeof(sp_pvar_t *), 16, 8));
    58    VOID(my_init_dynamic_array(&m_case_expr_id_lst, sizeof(int), 16, 8));
    59    VOID(my_init_dynamic_array(&m_cond, sizeof(sp_cond_type_t *), 16, 8));
    60    VOID(my_init_dynamic_array(&m_cursor, sizeof(LEX_STRING), 16, 8));
    61    VOID(my_init_dynamic_array(&m_handler, sizeof(sp_cond_type_t *), 16, 8
));
   [snip]

How to repeat:
Info was collected using a binary with --with-debug=full, and running "mysqladmin debug" occasionally.  It was sent through an awk script and inserted into a table to summarize and format.

Hi Jeremy,

Thank you for the bug report. Could you please provide a reproducible
test case for to try it on our side?

Thanks in advance.

If I had a reproducible test case I would have submitted it. :)

This is only happening on a production server, and attempts to reproduce it on a test machine have failed.

Can you tell us something about which features these systems are using?

Stored procedures or functions, or both?
...with cursors and/or exception handlers?
Triggers?
Views?
Which storage engine(s)?

Hi PEM!

It's 100% InnoDB, using triggers, stored procedures, and prepared statements (both via PREPARE/EXECUTE commands with SPs and via JDBC).  Highly variable number of concurrent users and load: 25-500 concurrent sessions, and 200-1500 q/s total.  It leaks much faster during peak (reaching 2.5G address space limit in ~35 minutes).

Nothing special otherwise.  It seems to be leaking only from SELECTs, as slaves show no leaks (although they don't run very long -- more replication bugs).

You can find the error log containing the complete debug dumps, as well as a log of out of memory errors etc., here:

  http://jcole.us/files/bug18300/gndb1.gaggle.net.err.20060318.gz

A debug trace from about 1 minute of operation here:

  http://jcole.us/files/bug18300/mysqld.trace.gz

I didn't state, but maybe it's worth stating:

No cursors, no views, no stored functions.

I assume you are using local variables in your triggers?

Can you say anything about the number of rows you have in tables with triggers?

(I have a feeling it might be BUG#17260)

An interesting turn of events... Since the memory leak was so painful, it was decided to roll back to 4.1 instead, since that seemed a sure thing.  Unfortunately, the memory leak is still occurring, and 4.1 is now crashing regularly with "out of memory" at about the same rate as 5.0 was.

That certainly limits the features that could be materially involved in the leak... really the only thing left in their code is prepared statements (via JDBC, and via PREPARE/EXECUTE SQL statements).

Here's a crazy thought:  Is it possible that the OS/threading library is leaking the memory somehow?  It's RHEL3, running the RHEL3 RPMs from MySQL.com.

To check that you're freeing all the allocated prepared statements, you could do the following:
show status like '%stmt%';
Check that the difference between Com_stmt_prepare and Com_stmt_close does not increase while your application runs. This difference would give you an estimation of how many statements there are allocated in the server.

I think it's very likely that at least part of the problem is BUG#17260 (for which a fix exists). It would be interesting to know if this solves your problem.
Do you want to wait for a release with the fix, or try out a separate patch?

Konst: OK, now we're getting somewhere.  Is it true that for every allocated statement, Com_stmt_close should always be incremented?  Does it get incremented even if the statement is not explicitly deallocated (i.e. you depend on the cleanup at connection close to free it)?

The difference between Com_stmt_prepare and Com_stmt_close is definitely growing uncontrolled on this system:

$ while [ 1 ]; do echo -n `date` ": "; echo `mysql -N --batch -u root -e 'show status like "Com_stmt_prepare"' | cut -f2` - `mysql -N --batch -u root -e 'show status like "Com_stmt_close"' | cut -f2` | bc; sleep 10; done
Wed Mar 29 03:55:35 PST 2006 : 789706
Wed Mar 29 03:55:45 PST 2006 : 789711
Wed Mar 29 03:55:56 PST 2006 : 789737
Wed Mar 29 03:56:06 PST 2006 : 789746
Wed Mar 29 03:56:16 PST 2006 : 789748
Wed Mar 29 03:56:26 PST 2006 : 789755
Wed Mar 29 03:56:36 PST 2006 : 789763
Wed Mar 29 03:56:46 PST 2006 : 789770
Wed Mar 29 03:56:56 PST 2006 : 789768
Wed Mar 29 03:57:06 PST 2006 : 789772
Wed Mar 29 03:57:16 PST 2006 : 789776
Wed Mar 29 03:57:26 PST 2006 : 789779
Wed Mar 29 03:57:36 PST 2006 : 789782
Wed Mar 29 03:57:46 PST 2006 : 789787
Wed Mar 29 03:57:56 PST 2006 : 789789
Wed Mar 29 03:58:06 PST 2006 : 789794
Wed Mar 29 03:58:16 PST 2006 : 789798
Wed Mar 29 03:58:26 PST 2006 : 789799
Wed Mar 29 03:58:36 PST 2006 : 789803
Wed Mar 29 03:58:46 PST 2006 : 789808
Wed Mar 29 03:58:56 PST 2006 : 789810
Wed Mar 29 03:59:06 PST 2006 : 789819
Wed Mar 29 03:59:17 PST 2006 : 789819
Wed Mar 29 03:59:27 PST 2006 : 789827
Wed Mar 29 03:59:37 PST 2006 : 789826
Wed Mar 29 03:59:47 PST 2006 : 789828
Wed Mar 29 03:59:57 PST 2006 : 789828
Wed Mar 29 04:00:07 PST 2006 : 789833
Wed Mar 29 04:00:17 PST 2006 : 789867
Wed Mar 29 04:00:27 PST 2006 : 789884
Wed Mar 29 04:00:37 PST 2006 : 789881
Wed Mar 29 04:00:47 PST 2006 : 789889
Wed Mar 29 04:00:57 PST 2006 : 789880
Wed Mar 29 04:01:07 PST 2006 : 789883
Wed Mar 29 04:01:17 PST 2006 : 789897
Wed Mar 29 04:01:27 PST 2006 : 789900
Wed Mar 29 04:01:37 PST 2006 : 789889

Per-Erik: How could that bug solve the problem, when they are running 4.1 now?  No triggers anymore...

Well, you initially indicated that is was sp structures leaking, and that you were using triggers. 
Both exists in 5.0 only, and if you for instance use local variables in the triggers it will definitely leak memory.
But this does not rule out other leaks as well (in 4.1), of course. :)

COM_STMT_PREPARE is incremented for every attempt to prepare a statement. So if you have statements that have syntax error or failed to prepare for some other reasons,  COM_STMT_PREPARE will differ from the accurate number of prepared statements. Otherwise it gives you the accurate number.
COM_STMT_CLOSE is incremented for every deallocate of a prepared statement. Basically, it's called in the destructor. So it doesn't matter who issued the deallocate call: the server itslef (at connection close) or the client (by calling mysql_stmt_close or DEALLOCATE PREPARE). Normally COM_STMT_PREPARE - COM_STMT_CLOSE would give you the more or less accurate number of prepared statements in the server.

You could try to apply my patch for 4.1 (Bug#16365) and set the hard limit for max_prepared_stmt_count. Then you'll be able to easily see if your application leaks statements, as at some point in time mysql_stmt_prepare will fail.

OK, we have some exciting news.  Besides the previous comment that confirmed that it was server-side prepared statements leaking, we have further confirmation.

The programmer found a setting in Connector/J, useServerPrepStmts=false, which forces the driver to always use client-side prepared statements.  That has solved the memory leak, and the server is now steady using 1.1G of memory.

Someone needs to go over the server-side prepared statement code with a fine-toothed comb...

Jeremy, while "useServerPrepStmts=false" is a workaround that seems to alleviate the problem for you, I would suggest going back and checking _all_ of your code paths to make sure you're actually closing statements in all cases. This can be an issue even with client-side emulated prepared statements, as it can appear to be a memory leak in your Java application, even though it's as-designed, as it's required by the JDBC specification (more on this in a bit).

We've seen this behavior before with other users, and it turned out they missed a PreparedStatement.close() somewhere (or everywhere!).

The reason for this behavior is that the JDBC specification states that users are responsible for explicitly closing statements if they want them "freed" before the physical connection is closed. Because of this, if you're using a pool with long-lived connections, these statements will not be freed for quite some time.

There are some tools to help track down these types of issues. Our JDBC driver has the "usage advisor" (enabled by adding useUsageAdvisor=true"), which will log when statements are leaked (amongst other things), but if your application does a lot of things that it would complain about (statement leaks, prepared statements being prepared but executed 1 or less times, not looking at all rows or columns of a result set, etc), it might be too much logging for something in production.

If you can't run a tool in production, then a tool I use myself and highly recommend for static code analysis is FindBugs, which can catch cases where not all code paths close JDBC resources. http://findbugs.sourceforge.net/

Notice, I'm not saying that your application is definitely not closing statements, but empirical evidence with our userbase says that this is the most common cause, especially when the counts for PREPARE and CLOSE are way out of whack.

In any case, it's something you probably want to look into, as not closing these statements even when client-side emulated can lead to GC issues, as these prepared statement instances turn into tenured instances, and end up in a heap that is specially designed for them, but it much slower to collect. This can limit throughput and stability of your Java application itself.

Hi Mark,

I had thought of this already, so some changes were made to try to ensure that things got cleaned up.  The connection pool was changed so that connections were closed/reopened at least every 60 seconds, but it didn't solve the problem.

Shouldn't closing a connection *always* free up all of these resources?

Jeremy,

Closing the physical connection, should in theory take care of all of that.

What connection pool are you using? Is it something like Apache DBCP, or C3p0, or something home-grown?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Jeremy, a number of candidate causes have been fixed and we'd like verification that you still see this problem.

Jeremy,

Will it be possible for you to check your application with server-side prepared statements on latest (>5.0.24 and >=4.1.22) versions of MySQL server? I want to check if this is not a duplicate of http://bugs.mysql.com/bug.php?id=21206.

Hi Valeriy,

Sorry, it probably won't be possible to test that application again, as it is a production system, and the problems they saw last time were very serious.

Regards,

Jeremy

Closing as per comment from Jeremy.

This was open for 2.5 years without being verified. 
Valeriy, please either close this bug or verify it.
Thanks.

I see no way to repeat this on current 5.0.x (other than NOT closing prepared statements properly). 

If anybody has a test case that shows some different source of related memory leak, please, provide it.