Bug #27863 excessive memory usage for many small queries in a multiquery packet
Submitted: 16 Apr 2007 21:57 Modified: 20 May 2010 15:56
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S3 (Non-critical)
Version:5.0.56,5.1.23 OS:Any
Assigned to: Alexander Nozdrin CPU Architecture:Any
Tags: DoS, Memory, multiquery

[16 Apr 2007 21:57] Shane Bester
Description:
A simple multi-query query sent to the server such as this,

$sql= str_repeat("select '1';",95324);

causes mysqld to use between 300MB and 500MB of memory.  4.1 used less, while 5.1 used more as expected.

The problem is the above query can be run by many users simultaneously, and the overall effect can be devastating.  The query fits into 1MB default max_allowed_packet.

How to repeat:
See uploaded php script.

Suggested fix:
Not sure.  Feature request is to reduce the amount of memory used at any given time.
[16 Apr 2007 22:00] MySQL Verification Team
testcase. watch memory usage during execution

Attachment: bug27863.php (application/octet-stream, text), 970 bytes.

[24 Feb 2008 9:01] MySQL Verification Team
This extravagant memory usage still exists in recent versions of 5.1.23 and 5.0.56
5.1.23 needed over 9GB of memory to execute a 1MB packet filled with 'select 1;'!!
[12 Aug 2008 5:41] Sveta Smirnova
Bug #38006 was marked as duplicate of this one.
[18 Feb 2009 7:59] Valeriy Kravchuk
Bug #42946 was marked as a duplicate of this one. Looks like impact of this bug is higher than I3.
[12 Jun 2009 15:17] Guilhem Bichot
Staale proposed that I take it, for 5.4
[29 Jun 2009 9:23] Guilhem Bichot
Here is the analysis.
A 4MB "multi-query statement", made of 10,000 40-byte INSERT queries, is sent to mysqld.
So we call mysql_parse() 10,000 times (in a loop in dispatch_command()).
Each mysql_parse() does this:

  if (query_cache_send_result_to_client(thd, (char*) inBuf, length) <= 0)
  {
    ...
    Parser_state parser_state(thd, inBuf, length);

And the constructor of Parser_state does:
  Parser_state(THD *thd, const char* buff, unsigned int length)
    : m_lip(thd, buff, length), m_yacc()
  {}
and the one of Parser_state::m_lip (which is a Lex_input_stream) does:
  m_cpp_buf= (char*) thd->alloc(length + 1);
For the first mysql_parse(), length==4MB, so this allocates 4MB in the THD's MEM_ROOT. For the second mysql_parse(), length == 4MB-40bytes, and this allocates 40MB - 40bytes. The third, 40MB - 80bytes. Etc, allocates 2GB in total. And the memory is freed only in a final:
  free_root(thd->mem_root,MYF(MY_KEEP_PREALLOC));
at the end of dispatch_command().
[29 Jun 2009 11:42] Guilhem Bichot
correction to my previous post: not 2GB but 200GB (!); however not all of it is really accessed I guess, as true consumption according to "top" is 1GB.
[8 Jul 2009 8:52] Konstantin Osipov
In my opinion we need to distinguish the following problems:

1) Lex_input_stream allocates more memory on its construction than it is
   ever going to need
2) Both Lex_input_stream and alloc_query() allocate a copy of the query,
   whereas only one allocation is sufficient for needs of execution.
3) Memory allocated during execution of a multi-query
   is only freed at the end of execution, i.e. when
   all queries are executed.

3) has been the case all along, since the point multi-queries were
added to the system. We could try to fix it, but that would
be a clean-up and optimization, i.e. will extend beyond the scope
of the bugfix.

1) and 2) are regressions introduced by Marc in scope of the
patch for Bug#11986.
How we use thd->query and lex->m_cpp_buf
========================================

thd->query is used for general log and slow log, and
also for show processlist.
It is assigned in alloc_query().

lex->m_cpp_buf is used to get the SQL of a stored procedure in utf-8
and without version-specific comments, to store in .proc table.
So it's only used during CREATE PROCEDURE/FUNCTION/EVENT/TRIGGER.

Methods to fix
==============

To fix the problem 1), I think, it's sufficient to
introduce Lex_input_stream::start_new_statement() method,
which will reset all Lex_input_stream members without
re-allocating the buffer. The buffer allocated in the constructor
can be reused. This new method will be called inside the
multi-statement loop, and Lex_input_stream construction
will be moved outside the loop.

To fix problem 2), I believe, the proper victim
is alloc_query(). What we need to do, is to remove an
extra allocation there, and start assigning thd->query
to point to lex->m_cpp_buf. But it's a separate large fix.

Problem 3) is of a scale of a worklog task.
There are many interdependencies that need to be solved
to be able to fix it correctly.
[8 Jul 2009 9:17] Guilhem Bichot
I'll implement 1), with details from Konstantin:
<kostja> select 1; select 1; select 1;
<kostja> the length of this thd->query is 27
<kostja> you allocate m_cpp_buf to 27
<kostja> you parse the first query.
<kostja> there are pointers to the first 9 bytes of the cpp buff
<kostja> now you call start_new_statement()
<kostja> you make sure in it, that m_cpp_buf= m_cpp_buf_end
<kostja> so instead of reallocating m_cpp_buff, you just advance it, knowing it is already sufficienty pre-allocated
<kostja> so nobody interferes with anybody else
<kostja> that is not strictly necessary, but will solve your concern that during re-use of Lex_input_stream there is some overlap
[25 Sep 2009 16:59] Guilhem Bichot
not working on it at the moment (rather on Optimizer bugs)
[7 Apr 2010 16:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105171

3144 Alexander Nozdrin	2010-04-07
      A patch for Bug#27863 (excessive memory usage for many small queries
      in a multiquery packet).
      
      This patch fixes issue #1 described in the bug report:
      1) Lex_input_stream allocates more memory on its construction than it is
         ever going to need
      
      The fix is to allocate a buffer (cpp_buffer) once for a multiquery packet.
[28 Apr 2010 16:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106859

3144 Alexander Nozdrin	2010-04-28
      A patch for Bug#27863 (excessive memory usage for many small queries
      in a multiquery packet).
      
      This patch fixes issue #1 described in the bug report:
      1) Lex_input_stream allocates more memory on its construction than it is
         ever going to need
      
      The fix is to allocate a buffer (cpp_buffer) once for a multiquery packet.
[14 May 2010 18:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108374

3044 Alexander Nozdrin	2010-05-14
      Patch for Bug#27863 (excessive memory usage for many small queries in a
      multiquery packet).
      
      Background:
      
        - a query can contain multiple SQL statements;
      
        - the server frees resources allocated to process a query when the
          whole query is handled. In other words, resources allocated to process
          one SQL statement from a multi-statement query are freed when all SQL
          statements are handled.
      
      The problem was that the parser allocated a buffer of size of the whole
      query for each SQL statement in a multi-statement query. Thus, if a query
      had many SQL-statements (so, the query was long), but each SQL statement
      was short, ther parser tried to allocate huge amount of memory (number of
      small SQL statements * length of the whole query).
      
      The memory was allocated for a so-called "cpp buffer", which is intended to
      store pre-processed SQL statement -- SQL text without version specific
      comments.
      
      The fix is to allocate memory for the "cpp buffer" once for all SQL
      statements (once for a query).
[14 May 2010 19:21] Alexander Nozdrin
Pushed into trunk-bugfixing.
[17 May 2010 12:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108443

3045 Alexander Nozdrin	2010-05-17
      Follow-up patch for Bug#27863 (excessive memory usage for many small queries
      in a multiquery packet): fix NDB test failures.
[20 May 2010 10:03] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100520100225-oe4iuu5kuzsx0knq) (version source revid:alik@sun.com-20100520100057-rmn5y3o3ij726bm7) (merge vers: 6.0.14-alpha) (pib:16)
[20 May 2010 10:04] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100520100041-rf8ujv598zf86wjt) (version source revid:alik@sun.com-20100520100041-rf8ujv598zf86wjt) (merge vers: 5.5.5-m3) (pib:16)
[20 May 2010 10:05] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100520100140-5bzrtadw4w419i3m) (version source revid:alik@sun.com-20100520100049-1njm09rkvnhmysnr) (pib:16)
[20 May 2010 15:56] Paul DuBois
Noted in 5.5.5, 6.0.14 changelogs.

The parser allocated too much memory for a query containing multiple
statements.
[20 Jul 2010 18:02] Sveta Smirnova
Bug #55376 was marked as duplicate of this one.
[10 Aug 2011 10:48] Sveta Smirnova
Bug #62117 was marked as duplicate of this one.