MySQL Bugs: #64239: out of memory bug of cluster 7.2.1

Bug #64239	out of memory bug of cluster 7.2.1
Submitted:	6 Feb 2012 9:49	Modified:	16 Mar 2012 6:56
Reporter:	yuan chaohua	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	dev 7.2.1	OS:	Linux
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	cluster ; out of memory

Description:
Hi,

It seems there is memory issue for cluster 7.2.1 handling union statements.

The problem is:

When lots of union statements( these unions contain 2 or more sub joins ),

mysql will throw a "Out of memory ; please check mysqld ...." randomly.

The sub joins are normally 3 or 4 tables joins.

If i turn off the ndb_join_pushdown , the out of memory error will still appear but the appear number decrease compared with on.

I use profiling for the error statements.

Compared to the lucky ones, these statements are stopped in the statistic state.

The lucky sqls continue on the executing state. but the error ones will close the tmp tables and close the query.

How to repeat:

The sql:

+----+--------------+------------+--------+------------------------------------------+-----------------+---------+--------------------+------+------------------------------------------------------------------+
| id | select_type  | table      | type   | possible_keys                            | key             | key_len | ref                | rows | Extra                                                            |
+----+--------------+------------+--------+------------------------------------------+-----------------+---------+--------------------+------+------------------------------------------------------------------+
|  1 | PRIMARY      | <derived2> | system | NULL                                     | NULL            | NULL    | NULL               |    0 | const row not found                                              |
|  2 | DERIVED      | a          | range  | PRIMARY,id_ownerpidIndex,linetype_inndex | linetype_inndex | 5       | NULL               |   15 | Parent of 3 pushed join@1; Using where with pushed condition     |
|  2 | DERIVED      | b          | ref    | slotPairKey,pidKey                       | slotPairKey     | 5       | sso_pc_dev_g.a.id  |   18 | Child of 'a' in pushed join@1; Using where                       |
|  2 | DERIVED      | c          | ref    | id                                       | id              | 5       | sso_pc_dev_g.b.pid |    1 | Child of 'b' in pushed join@1; Using where                       |
|  3 | UNION        | a          | ref    | PRIMARY,id_ownerpidIndex,linetype_inndex | linetype_inndex | 5       |                    |    3 | Parent of 4 pushed join@1; Using where with pushed condition     |
|  3 | UNION        | b          | ref    | PRIMARY,id_ownerpidIndex,idx_otherteam   | idx_otherteam   | 5       | sso_pc_dev_g.a.id  |   15 | Child of 'a' in pushed join@1; Using where with pushed condition |
|  3 | UNION        | c          | ref    | slotPairKey,pidKey                       | slotPairKey     | 5       | sso_pc_dev_g.b.id  |   18 | Child of 'b' in pushed join@1; Using where with pushed condition |
|  3 | UNION        | d          | ref    | id                                       | id              | 5       | sso_pc_dev_g.c.pid |    1 | Child of 'c' in pushed join@1; Using where                       |
| NULL | UNION RESULT | <union2,3> | ALL    | NULL                                     | NULL            | NULL    | NULL               | NULL |                                                                  |
+----+--------------+------------+--------+------------------------------------------+-----------------+---------+--------------------+------+------------------------------------------------------------------+

you can try 5 tables with 10000+ row and make join between them. Using 100+ threads to execute a union sql. The bug will appear.

The nodes(data,api) all have enough memory while the error happened.

multi join sqls has the same problem!

can you please upload some queries/data so we can try to repeat this ?

sub_str = '''select * from (
            select distinct  match_id
            from tableaaaa s join tablebbbbb as b on b.id = s.match_id
            where pid = %(cid)d and result_time > now()- interval %(time_keep)d day) o
            union
            select * from (
            select distinct match_id
            from tableaaaa s join tablebbbbb as b on b.id = s.match_id
            where pid = %(cid)d and mtype = 2 order by result_time desc limit %(match_keep)d) a
            union
            select * from (
            select distinct match_id
            from tableaaaa s join tablebbbbb as b on b.id = s.match_id
            where pid = %(cid)d and mtype = 3 order by result_time desc limit %(match_keep)d) b
            union
            select * from (
            select distinct match_id
            from tableaaaa s join tablebbbbb as b on b.id = s.match_id
            where pid = %(cid)d and mtype = 4 order by result_time desc limit %(match_keep)d ) c
        '''%{'cid':cid, 'match_keep': 10, 'time_keep': 7}
c = self.DB().cursor()
c.execute(sub_str)

The tables:

| tablebbbbb | CREATE TABLE `tablebbbbb` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `rid` int(11) NOT NULL DEFAULT '0',
  `mtype` int(11) NOT NULL DEFAULT '0',
  `line_id` smallint(5) unsigned DEFAULT NULL,
  `result_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idxrid` (`rid`),
  KEY `idx_time` (`result_time`)
) ENGINE=ndbcluster AUTO_INCREMENT=809610 DEFAULT CHARSET=utf8 |

CREATE TABLE `tableaaaa` (
  `pid` int(11) NOT NULL,
  `match_id` int(11) NOT NULL DEFAULT '0',
  `money` int(11) NOT NULL,
  `exp` int(11) NOT NULL,
  KEY `pair` (`pid`,`match_id`),
  KEY `idxmatch` (`match_id`)
) ENGINE=ndbcluster DEFAULT CHARSET=utf8 

I use python to run 100+ threads to run the sql and in 2~5 min the out of memory appeared. These tables has more than 10M rows ,but it seems this problem has relationship with row number. This is just one case in our real app.
Hope can help you. 

I guess the bug is hidden in the join parse/exe cpps. But i am not sure. Currently i just rewriten the sql not using union...

has no relationship with rows.:) sorry

Hi,

1) I see nothing obvious that would lead to out of memory (except that union
   can consume "lots" of memory)

2) Can there be a "ulimit" in play hindering mysqld to allocate more memory
   from OS

3) My guess to why this occurs more frequently with ndb_join_pushdown on
   is that more query latency decreases, so more queries per second are
   executed and probability of getting out of memory increases.

4) the construct

select * from A, B where <condition> and mtype = 1
union
select * from A, B where <condition> and mtype = 2
...

can't that be rewritten to
select * from A, B where <condition> and mtype in (1, 2, ...)

which looks much more efficient...

/Jonas

Hi,

The memory of mysqld and ndb all nodes are less than 50% of total memeory when the bug appears. The limits:
 ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 38912
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 38912
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

For the in and the union performance it may depending on situations. 

I am quit sure the union and multistatement join has critical bugs.
Currently i rewritten all these sqls by noting using union and multistatement in our application. 

There are lots of complex sqls in our app.:(

Its better the next cluster version can fix it.

It just complain out of memory but from client view the memory is enough.
At least tell what memory it wants/cosumed.
And if I do not use union and multistatement the bug will never appear.

we just (yesterday) released 7.2.4
maybe you can try that one ?

/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".