Bug #29011 Query optimizaion error with NDB Engine
Submitted: 11 Jun 2007 2:57 Modified: 28 Dec 2007 11:42
Reporter: Brian Moon Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0.38, 5.1 OS:Linux (Genroo)
Assigned to: Assigned Account CPU Architecture:Any
Tags: ndb

[11 Jun 2007 2:57] Brian Moon
Description:
I posted in a blog post at http://doughboy.wordpress.com/2007/06/08/mysql-cluster-sql-tips/ about how I had to rework a join to get better performance when using NDB.  Sergey Petrunia of the Query optimizer team asked I post the queries and table structure here as he felt it should not happen.

How to repeat:
CREATE TABLE `pub_articles` (
  `article_id` int(10) unsigned NOT NULL auto_increment,
  `category_id` int(10) unsigned NOT NULL default '0',
  `category2_id` int(10) unsigned NOT NULL default '0',
  `requested_end_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `first_publish_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `latest_publish_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `expiration_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `headline` varchar(150) NOT NULL default '',
  `brief_headline` varchar(50) NOT NULL default '',
  `hot_level` char(1) NOT NULL default '',
  `used` char(1) NOT NULL default 'N',
  `image_id` int(11) NOT NULL default '0',
  `image_price` varchar(50) NOT NULL default '',
  `image_avux_id` int(10) unsigned NOT NULL,
  `large_image_url` varchar(255) NOT NULL default '',
  `large_image_width` smallint(6) NOT NULL default '0',
  `large_image_height` smallint(6) NOT NULL default '0',
  `medium_image_url` varchar(255) NOT NULL default '',
  `medium_image_width` smallint(6) NOT NULL default '0',
  `medium_image_height` smallint(6) NOT NULL default '0',
  `small_image_url` varchar(255) NOT NULL default '',
  `small_image_width` smallint(6) NOT NULL default '0',
  `small_image_height` smallint(6) NOT NULL default '0',
  `price` double(8,2) default NULL,
  `category_name` varchar(100) NOT NULL default '',
  `category_url` varchar(100) NOT NULL default '',
  `category_path` varchar(255) NOT NULL default '',
  `publication_id` int(11) NOT NULL default '0',
  `section_id` int(11) NOT NULL default '0',
  `hotness` decimal(8,4) NOT NULL default '0.0000',
  `rank` tinyint(3) unsigned NOT NULL default '0',
  `billable` tinyint(4) NOT NULL default '0',
  `normal_body` mediumtext NOT NULL,
  `teaser_body` mediumtext NOT NULL,
  `exclusive_body` mediumtext NOT NULL,
  `vendors` mediumtext NOT NULL,
  `urls` mediumtext NOT NULL,
  PRIMARY KEY  (`article_id`,`publication_id`),
  KEY `publication_id` (`publication_id`,`first_publish_time`),
  KEY `publication_id_2` (`publication_id`,`requested_end_time`)
) ENGINE=ndbcluster DEFAULT CHARSET=utf8

CREATE TABLE `article_edition_lookup` (
  `publication_id` int(11) NOT NULL default '0',
  `article_id` int(10) unsigned NOT NULL,
  `publication_date` date NOT NULL default '0000-00-00',
  PRIMARY KEY  (`publication_id`,`publication_date`,`article_id`)
) ENGINE=ndbcluster DEFAULT CHARSET=utf8

Query 1:

select pub_articles.article_id
from pub_articles
inner join article_edition_lookup on
pub_articles.article_id=article_edition_lookup.article_id and
pub_articles.publication_id=article_edition_lookup.publication_id and
publication_date='2007-06-07'
where pub_articles.publication_id=2;

216 rows in set (26.63 sec)

Query 2:

select article_id
from article_edition_lookup
inner join pub_articles using (article_id, publication_id)
where publication_id=2 and publication_date='2007-06-07';

216 rows in set (0.06 sec)

Suggested fix:
Well, here is what Sergey said in his comment:

It is surprising that this happens, the optimizer is expected to convert the two mentioned queries into the same internal representation. Could you please submit this example as a bug at bugs.mysql.com (specify bug category as Optimizer, not NDB)?

There is room for improvement in how MySQL runs joins over NDB tables, but those issues should not result in such effects.
[11 Jun 2007 3:01] Brian Moon
FYI, here is the explain output from the cluster servers:

Query 1:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pub_articles
         type: ref
possible_keys: PRIMARY,publication_id,publication_id_2
          key: publication_id
      key_len: 4
          ref: const
         rows: 10
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: article_edition_lookup
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 11
          ref: const,const,phewsro.pub_articles.article_id
         rows: 1
        Extra: 

Query 2:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article_edition_lookup
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: const,const
         rows: 10
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: pub_articles
         type: eq_ref
possible_keys: PRIMARY,publication_id,publication_id_2
          key: PRIMARY
      key_len: 8
          ref: phewsro.article_edition_lookup.article_id,const
         rows: 1
        Extra: 

And here is the explain from the server with the exact same data using InnoDB.  FYI, the queries both run in .01 or less with InnoDB.

Query 1:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article_edition_lookup
         type: ref
possible_keys: PRIMARY,article_id
          key: PRIMARY
      key_len: 7
          ref: const,const
         rows: 215
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: pub_articles
         type: eq_ref
possible_keys: PRIMARY,publication_id,publication_id_2,expiring_deals
          key: PRIMARY
      key_len: 8
          ref: phewsro.article_edition_lookup.article_id,const
         rows: 1
        Extra: Using index

Query 2:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article_edition_lookup
         type: ref
possible_keys: PRIMARY,article_id
          key: PRIMARY
      key_len: 7
          ref: const,const
         rows: 215
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: pub_articles
         type: eq_ref
possible_keys: PRIMARY,publication_id,publication_id_2,expiring_deals
          key: PRIMARY
      key_len: 8
          ref: phewsro.article_edition_lookup.article_id,const
         rows: 1
        Extra: Using index
[11 Jun 2007 10:14] Sveta Smirnova
test case

Attachment: ndb_bug29011.test (application/octet-stream, text), 3.55 KiB.

[11 Jun 2007 10:17] Sveta Smirnova
Thank you for the report.

Verified using attached test.
[26 Jul 2007 16:54] Martin Hansson
This problem arises on ndb when there is a ref on the first keypart and a join.

Consider:

CREATE TABLE t1 (a int, b int, c int, KEY abc( a, b, c )) ENGINE = ndbcluster;
CREATE TABLE t3 (a int, b int, c int, KEY abc( a, b, c ));

INSERT INTO t1 VALUES ( -1, 1, 1 ), ( 0, 1, 2 ), ( 0, 1, 3 );
INSERT INTO t3 VALUES ( -1, 1, 1 ), ( 0, 1, 2 ), ( 0, 1, 3 );

CREATE TABLE t2 (a int, b int, KEY ab( a, b ), KEY a( a )) ENGINE = ndbcluster;
CREATE TABLE t4 (a int, b int, KEY ab( a, b ), KEY a( a ));

INSERT INTO t2 VALUES ( 0, 0 ), ( 0, 0 ), ( 0, 0 ), ( 0, 0 ), ( 0, 1 );
INSERT INTO t4 VALUES ( 0, 0 ), ( 0, 0 ), ( 0, 0 ), ( 0, 0 ), ( 0, 1 );

Table t1 is identical with t2, except for the storage engine. Same goes for t2 and t4.

Four EXPLAINs follow, a join between t1 and t2, then the same query but with t1 and t2 written in different order. Then it repeats for t3 and t4.

mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.a = t2.a AND t1.b = t2.b WHERE t2.a = 0;
+----+-------------+-------+------+---------------+------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref                 | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | t1    | ref  | abc           | abc  | 5       | const               |   10 | Using where |
|  1 | SIMPLE      | t2    | ref  | ab,a          | ab   | 10      | const,bug29011.t1.b |    1 | Using where |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+-------------+
2 rows in set (0.09 sec)

mysql> EXPLAIN SELECT * FROM t2 JOIN t1 ON t1.a = t2.a AND t1.b = t2.b WHERE t2.a = 0;
+----+-------------+-------+------+---------------+------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref                 | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | t2    | ref  | ab,a          | ab   | 5       | const               |   10 | Using where |
|  1 | SIMPLE      | t1    | ref  | abc           | abc  | 10      | const,bug29011.t2.b |    1 | Using where |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+-------------+
2 rows in set (0.06 sec)

mysql> EXPLAIN SELECT * FROM t3 JOIN t4 ON t3.a = t4.a AND t3.b = t4.b WHERE t4.a = 0;
+----+-------------+-------+------+---------------+------+---------+---------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref                 | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+--------------------------+
|  1 | SIMPLE      | t3    | ref  | abc           | abc  | 5       | const               |    1 | Using where; Using index |
|  1 | SIMPLE      | t4    | ref  | ab,a          | ab   | 10      | const,bug29011.t3.b |    2 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+--------------------------+
2 rows in set (0.05 sec)

mysql> EXPLAIN SELECT * FROM t4 JOIN t3 ON t3.a = t4.a AND t3.b = t4.b WHERE t4.a = 0;
+----+-------------+-------+------+---------------+------+---------+---------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref                 | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+--------------------------+
|  1 | SIMPLE      | t3    | ref  | abc           | abc  | 5       | const               |    1 | Using where; Using index |
|  1 | SIMPLE      | t4    | ref  | ab,a          | ab   | 10      | const,bug29011.t3.b |    2 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+--------------------------+
2 rows in set (0.06 sec)

The column <rows> for ndb is very suspicius. It says that the first table has 
10 rows, whichever table that is.
[27 Jul 2007 16:16] Martin Hansson
Ref access is used whenever there is a condition of the sort ... WHERE ... field = constant in a query. But since there are no statistics how many row this would return when using ndb, the optimizer instead uses range statistcs information, which comes from ha_ndbcluster::records_in_range. But there are no statistics here either, so this method always returns 10. If there are two keys that can be used, and they appear to have the same cost, the given join order is not altered.

According to Pekka of the cluster team, there are some plans to fix range statistics for ndb in 5.1, but not for 5.0.
[27 Jul 2007 16:23] Martin Hansson
Here is an even simpler test case to show the problem. We have two 'equally good' keys in the optimizer's eyes.
The data is only there to make one access obviously cheaper than the other.

CREATE TABLE t1( 
  a int,
  KEY ( a )
) ENGINE = ndbcluster;

INSERT INTO t1 VALUES ( -1 ), ( 0 ), ( 1 );

CREATE TABLE t2( 
  a int,  
  KEY a( a )
) ENGINE = ndbcluster;

INSERT INTO t2 VALUES ( 0 ), ( 0 ), ( 0 ), ( 0 ), ( 1 );

# This tells the engine to use index statistics, but it appears to have no 
# effect
SET NDB_INDEX_STAT_ENABLE = 1;

# Note how access order depends on order in query
EXPLAIN SELECT * FROM t2 JOIN t1 USING( a ) WHERE t2.a = 0;
EXPLAIN SELECT * FROM t1 JOIN t2 USING( a ) WHERE t2.a = 0;
[27 Jul 2007 16:25] Martin Hansson
Here's my output from the two EXPLAINs

EXPLAIN SELECT * FROM t2 JOIN t1 USING( a ) WHERE t2.a = 0;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t2	ref	a	a	5	const	10	Using where with pushed condition
1	SIMPLE	t1	ref	a	a	5	const	10	Using where with pushed condition
EXPLAIN SELECT * FROM t1 JOIN t2 USING( a ) WHERE t2.a = 0;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t1	ref	a	a	5	const	10	Using where with pushed condition
1	SIMPLE	t2	ref	a	a	5	const	10	Using where with pushed condition
[28 Nov 2007 11:42] Martin Hansson
Brian,
I'm about to write this off as 'Not a bug' as SET ndb_index_stat_enable=1 fixes the problem on 5.1 in the minimized test case. In order to run the original test case, data must be migrated to 5.1. Are you able to do this? Then please try the above in 5.1.
[29 Dec 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".