Bug #29185 Large IN list crashes mysqld with cluster and condition pushdown
Submitted: 18 Jun 2007 22:21 Modified: 27 Jun 2007 14:24
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.18 OS:Any
Assigned to: Tomas Ulin CPU Architecture:Any

[18 Jun 2007 22:21] Hartmut Holzgraefe
Description:
A cluster query with a large IN(...) or NOT IN(...) list in the WHERE condition
crashes the mysql server due to a stack overflow when deleting condition objects.

This is due to the Ndb_cond objects maintainging a linked list of themselves and freeing the list members recursively in the desctructor:

class Ndb_cond : public Sql_alloc
{
 public:
  Ndb_cond() : ndb_item(NULL), next(NULL), prev(NULL) {};
  ~Ndb_cond()
  {
    if (ndb_item) delete ndb_item;
    ndb_item= NULL;
    if (next) delete next;
    next= prev= NULL;
  };
  Ndb_item *ndb_item;
  Ndb_cond *next;
  Ndb_cond *prev;
};

On a IN(...) list long enough the mysqld server crashes with several hundreds of  ~Ndb_cond() destructor calls in the debugger backtrace ...

How to repeat:
.

Suggested fix:
reimplement destruction in a non-recursive way
[19 Jun 2007 5:04] Tomas Ulin
patch

Attachment: tmp.patch (text/x-patch), 700 bytes.

[19 Jun 2007 5:11] Tomas Ulin
patch2

Attachment: tmp.patch (text/x-patch), 700 bytes.

[19 Jun 2007 10:04] Tomas Ulin
patch 2 really

Attachment: tmp.patch (text/x-patch), 727 bytes.

[19 Jun 2007 10:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29086

ChangeSet@1.2509, 2007-06-19 12:14:02+02:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug #29185 Large IN list crashes mysqld with cluster and condition pushdown
[19 Jun 2007 11:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29098

ChangeSet@1.2510, 2007-06-19 13:56:02+02:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug #29185 Large IN list crashes mysqld with cluster and condition pushdown
[21 Jun 2007 4:32] Adam Dixon
Tomas, I still get a crash.

mysql> source /tmp/test.sql;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
ERROR: 
Can't connect to the server
[21 Jun 2007 20:15] Bugs System
Pushed into 5.1.20-beta
[23 Jun 2007 7:12] Jon Stephens
1. Given Adam's comments, it looks like the fix isn't one.

2. Since this problem can be reproduced with actions involving SQL/mysqld, it's *not* category NDBAPI.

Reset status to Open, changed category to Cluster.
[26 Jun 2007 23:55] Hartmut Holzgraefe
The mysqld side patch works for me, i'm now getting the ndbd crashes, too, though when running on plain 32bit x86 (tested on x86_64 before which seemed to work on the ndbd side).

Problem is that the pushed down conditions are copied over into a fixed size buffer in Dbtup::copyAttrinfo(), its inBuffer parameter is actually the 

  Uint32 cinBuffer[ZATTR_BUFFER_SIZE + 16];

defined in ./src/kernel/blocks/dbtup/Dbtup.hpp
which is obviously of fixed size, and as no 
buffer size information is passed into Dbtup::copyAttrinfo()
it can't protect itself from overflowing the buffer if the
pushdown condition list is long enough ...

suggested short term solution: check for the known
hard coded limit or make the caller pass size information,
too (could be an optional parameter defaulting to zero if
unknown), return an error if it is exceeded (would require
the copyAttrinfo() return type to become non-void probably)

long term solution: either resize the buffer dynamicly as
needed or set up a max. limit for pushed down conditions,
fall back to non-pushdown behavior if this is exceeded

the big problem with the current situation is that all
nodes get the same condition list pushed down to them
so that they all fail simultaneously at the same time
=> cluster down 

adding a ndbrequire() check here wouldn't help either
as still *all* nodes would be going down at the same
time, just with a different error message
[27 Jun 2007 14:24] Hartmut Holzgraefe
Created new Bug #29390 for the ndbd side of the problem, closing this one as the originally reported mysqld side of the problem is fixed
[3 Jul 2007 4:35] Jon Stephens
Documented fix in 5.1.20/5.1.19-ndb-6.2.3.

Hartmut, please next time put in Docs status rather than Closed in this type of situation; otherwise it never shows up in my queue for documentation. Thanks!
[10 Jul 2007 13:28] Bugs System
Pushed into 5.0.46