Bug #63243 Subquery problem in cluster dev version 7.2.1
Submitted: 14 Nov 2011 8:45 Modified: 14 Nov 2011 9:28
Reporter: yuan chaohua Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:dev 7.2.1 OS:Linux
Assigned to: CPU Architecture:Any
Tags: cluster 7.2.1, SUB QUERY

[14 Nov 2011 8:45] yuan chaohua
Description:
Hi,

I met problem of sub queries in cluster.  For example 

when this query executed in cluster it explained like this:
explain select rid from roomidmapping where gid not in (select id from gatherings) and lineid =20;
+----+--------------------+---------------+-----------------+--------------------------+---------+---------+------+------+------------------------------------+
| id | select_type        | table         | type            | possible_keys            | key     | key_len | ref  | rows | Extra                              |
+----+--------------------+---------------+-----------------+--------------------------+---------+---------+------+------+------------------------------------+
|  1 | PRIMARY            | roomidmapping | ALL             | NULL                     | NULL    | NULL    | NULL | 3220 | Using where                        |
|  2 | DEPENDENT SUBQUERY | gatherings    | unique_subquery | PRIMARY,id_ownerpidIndex | PRIMARY | 4       | func |    1 | Using where; Full scan on NULL key |
+----+--------------------+---------------+-----------------+--------------------------+---------+---------+------+------+------------------------------------+

But the query in innodb is like this :
 explain select rid from roomidmapping where gid not in (select id from gatherings) and lineid =20;
+----+--------------------+---------------+-----------------+--------------------------+---------+---------+------+------+-------------------------------------------------+
| id | select_type        | table         | type            | possible_keys            | key     | key_len | ref  | rows | Extra                                           |
+----+--------------------+---------------+-----------------+--------------------------+---------+---------+------+------+-------------------------------------------------+
|  1 | PRIMARY            | roomidmapping | index           | NULL                     | idxgid  | 5       | NULL |    1 | Using where; Using index                        |
|  2 | DEPENDENT SUBQUERY | gatherings    | unique_subquery | PRIMARY,id_ownerpidIndex | PRIMARY | 4       | func |    1 | Using index; Using where; Full scan on NULL key |
+----+--------------------+---------------+-----------------+--------------------------+---------+---------+------+------+-------------------------------------------------+

With the same data , the same query in innodb is less than 1ms . While in cluster 7.2.1 it cost more than 0.5 S almost time.

It seems all the sub queries in cluster 7.2.1 are slower than if using join.
For the "in" sub query we can use join to replace. But for the " not in " clause until now i can not find a better method to resolve.

I do not know if it is really a bug , but it really influence the usage of the 7.2.1 which is great in handling joins(80%). 

How to repeat:
show create table roomidmapping;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table         | Create Table                                                                                                                                                                                                    |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| roomidmapping | CREATE TABLE `roomidmapping` (
  `lineid` int(11) NOT NULL,
  `rid` int(11) NOT NULL,
  `gid` int(11) DEFAULT NULL,
  PRIMARY KEY (`rid`,`lineid`),
  KEY `idxgid` (`gid`)
) ENGINE=ndb DEFAULT CHARSET=utf8 |

Create Table                                                                           
| gatherings | CREATE TABLE `gatherings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `type` varchar(60) DEFAULT NULL,
  `flags` int(11) DEFAULT NULL,
  `state` int(11) DEFAULT NULL,
  `ownerpid` int(11) DEFAULT NULL,
  `hostpid` int(11) DEFAULT NULL,
  `hostcid` int(11) DEFAULT NULL,
  `url` varchar(120) DEFAULT NULL,
  `participationpolicy` int(11) DEFAULT NULL,
  `policyargument` int(10) unsigned DEFAULT NULL,
  `minparticipants` int(11) DEFAULT NULL,
  `maxparticipants` int(11) DEFAULT NULL,
  `description` varchar(60) DEFAULT NULL,
  `start` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `expiration` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `buffer` mediumblob,
  `version1` bit(1) DEFAULT NULL,
  `owner_aid` tinyint(3) unsigned DEFAULT NULL,
  `status` varchar(10) DEFAULT 'Waiting',
  `maxleftparticipants` int(11) DEFAULT NULL,
  `maxrightparticipants` int(11) DEFAULT NULL,
  `openedslots` int(10) DEFAULT '255',
  `dgsid` int(11) DEFAULT '-1',
  `rsid` int(11) DEFAULT '-1',
  `homeTeamKickerID` int(4) DEFAULT '-1',
  `awayTeamKickerID` int(4) DEFAULT '-1',
  `roomtype` int(4) DEFAULT '0',
  `slotstate0` int(4) DEFAULT '0',
  `slotstate1` int(4) DEFAULT '0',
  `slotstate2` int(4) DEFAULT '0',
  `slotstate3` int(4) DEFAULT '0',
  `slotstate4` int(4) DEFAULT '0',
  `slotstate5` int(4) DEFAULT '0',
  `slotstate6` int(4) DEFAULT '0',
  `slotstate7` int(4) DEFAULT '0',
  `posfilter` int(8) DEFAULT '0',
  `otherteamid` int(11) DEFAULT NULL,
  `level_min` int(8) DEFAULT '0',
  `level_max` int(8) DEFAULT '99',
  `password` varchar(20) DEFAULT '',
  `matchtype` int(4) DEFAULT '0',
  `match_id` int(32) DEFAULT '0',
  `quest_id` int(11) DEFAULT '0',
  `stdium` varchar(64) DEFAULT 'Sha_Street01',
  `line_id` int(5) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `id_ownerpidIndex` (`id`,`ownerpid`),
  KEY `idx_ownerpid_flags` (`ownerpid`,`flags`),
  KEY `idx_hostpid_state` (`hostpid`,`state`),
  KEY `dgsid_inndex` (`dgsid`),
  KEY `idx_otherteam` (`otherteamid`),
  KEY `linetype_inndex` (`roomtype`)
) ENGINE=ndbcluster AUTO_INCREMENT=573193 DEFAULT CHARSET=utf8 |

Suggested fix:
The sub query can run the same speed as joins if possible.
[14 Nov 2011 9:28] yuan chaohua
for the not in i can use left join .....