Bug #42665 Query with multiple joins and an 'or' in the 'where' clause does full scan.
Submitted: 6 Feb 2009 21:06 Modified: 19 Feb 2009 18:44
Reporter: Von Chance Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.0.67 OS:Any
Assigned to: CPU Architecture:Any

[6 Feb 2009 21:06] Von Chance
Description:
A query returning hierarchical information was originally designed to return the children of a particular entry.  Using 'explain' on the query, everything in the query was using the proper indexes.  Changing the query to also return its parent by adding an 'or' to the where clause resulted in an unexpected full table scan.  Even issuing an 'or false' to the where clause would result in the full table scan.

How to repeat:
A test schema and sample query is provided that exhibits the behavior.
---------------------------------------------------------------------

CREATE DATABASE IF NOT EXISTS testschema;
USE testschema;

--
-- Definition of table `context`
--

DROP TABLE IF EXISTS `context`;
CREATE TABLE `context` (
  `context_id` int(11) NOT NULL,
  `node_id` int(11) default NULL,
  PRIMARY KEY  (`context_id`),
  KEY `fk_context_node` (`node_id`),
  CONSTRAINT `fk_context_node` FOREIGN KEY (`node_id`) REFERENCES `node` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

--
-- Dumping data for table `context`
--

/*!40000 ALTER TABLE `context` DISABLE KEYS */;
INSERT INTO `context` (`context_id`,`node_id`) VALUES 
 (1,1);
/*!40000 ALTER TABLE `context` ENABLE KEYS */;

--
-- Definition of table `hierarchy`
--

DROP TABLE IF EXISTS `hierarchy`;
CREATE TABLE `hierarchy` (
  `id` int(11) NOT NULL auto_increment,
  `parent_id` int(11) default NULL,
  `node_id` int(11) default NULL,
  PRIMARY KEY  (`id`),
  KEY `fk_hierarchy_hierarchy` (`parent_id`),
  KEY `fk_hierarchy_node` (`node_id`),
  CONSTRAINT `fk_hierarchy_hierarchy` FOREIGN KEY (`parent_id`) REFERENCES `hierarchy` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_hierarchy_node` FOREIGN KEY (`node_id`) REFERENCES `node` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

--
-- Dumping data for table `hierarchy`
--

/*!40000 ALTER TABLE `hierarchy` DISABLE KEYS */;
INSERT INTO `hierarchy` (`id`,`parent_id`,`node_id`) VALUES 
 (1,NULL,1),
 (2,1,2),
 (3,1,3),
 (4,1,4),
 (5,1,5);
/*!40000 ALTER TABLE `hierarchy` ENABLE KEYS */;

--
-- Definition of table `node`
--

DROP TABLE IF EXISTS `node`;
CREATE TABLE `node` (
  `id` int(11) NOT NULL auto_increment,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

--
-- Dumping data for table `node`
--

/*!40000 ALTER TABLE `node` DISABLE KEYS */;
INSERT INTO `node` (`id`) VALUES 
 (1),
 (2),
 (3),
 (4),
 (5);
/*!40000 ALTER TABLE `node` ENABLE KEYS */;
------------------------------------------------------------------------------

Sample query:

select * from hierarchy hchild
  left join hierarchy as hparent on hparent.id = hchild.parent_id
  left join node as nchild on nchild.id=hchild.node_id
  left join node as nparent on nparent.id=hparent.node_id
  left join context as ctxchild on ctxchild.node_id=nchild.id
  left join context as ctxparent on ctxparent.node_id=nparent.id
  where ctxparent.context_id=1 or ctxchild.context_id=1;

Output from Explain command:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hchild
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: hparent
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: testschema.hchild.parent_id
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: nchild
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: testschema.hchild.node_id
         rows: 1
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: nparent
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: testschema.hparent.node_id
         rows: 1
        Extra: Using index
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: ctxchild
         type: ref
possible_keys: fk_context_node
          key: fk_context_node
      key_len: 5
          ref: testschema.nchild.id
         rows: 1
        Extra: Using index
*************************** 6. row ***************************
           id: 1
  select_type: SIMPLE
        table: ctxparent
         type: ref
possible_keys: fk_context_node
          key: fk_context_node
      key_len: 5
          ref: testschema.nparent.id
         rows: 1
        Extra: Using where; Using index
6 rows in set (0.00 sec)

Note:  Removing any one of the 'or' operands in the where clause will use the proper index.  Changing the where clause to 'or false' will cause a full table scan.

Suggested fix:
I think that this is not optimized correctly, however I may be incorrect in my assumptions.
[8 Feb 2009 11:13] Valeriy Kravchuk
Thank you for the problem report. The only way to execute your query with OR efficiently is to rewrite it as UNION (as you are actually combining 2 different result sets):

select * from hierarchy hchild
  left join hierarchy as hparent on hparent.id = hchild.parent_id
  left join node as nchild on nchild.id=hchild.node_id
  left join node as nparent on nparent.id=hparent.node_id
  left join context as ctxchild on ctxchild.node_id=nchild.id
  left join context as ctxparent on ctxparent.node_id=nparent.id
  where ctxparent.context_id=1
UNION 
select * from hierarchy hchild
  left join hierarchy as hparent on hparent.id = hchild.parent_id
  left join node as nchild on nchild.id=hchild.node_id
  left join node as nparent on nparent.id=hparent.node_id
  left join context as ctxchild on ctxchild.node_id=nchild.id
  left join context as ctxparent on ctxparent.node_id=nparent.id
  where ctxchild.context_id=1;

Then you will see the desired plans for individual queries. It would be nice for the optimizer to do the query rewrite for you, as well as pre-evaluate and remove OR FALSE conditions in WHERE. Both are resonable feature requests for the query rewrite. I hope they will be implemented eventually.
[19 Feb 2009 18:44] Von Chance
The union query is what we used for our work-around...Thanks!

I just wanted to add that this was tried on Microsoft SQL Server 2000 and the indexes were used appropriately.  When the hierarchy table is used in the where clause even with the "or", the indexes are then used correctly on MySQL.  Just FYI.