Bug #47123 Endless 100% CPU loop with STRAIGHT_JOIN
Submitted: 4 Sep 2009 7:13 Modified: 12 Mar 2010 17:32
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0, 5.1, 5.4 OS:Any
Assigned to: Alexey Kopytov CPU Architecture:Any
Tags: join_cache_level
Triage: Triaged: D1 (Critical)

[4 Sep 2009 7:13] Philip Stoev
Description:
Under default settings, the following query:

SELECT   table1 . `pk` AS field1 , table1 . `pk` AS field2 , table2 . `int_key` AS field3
FROM ( CC AS table1 STRAIGHT_JOIN ( ( C AS table2 STRAIGHT_JOIN A AS table3 ON (( table3 . `varchar_key` <> table2 . `varchar_key` ) OR ( table3 . `varchar_key` <= table2 . `varchar_key` ) ) ) ) ON (table3 . `varchar_key` <> table2 . `varchar_nokey` ) );

Causes a 100% CPU loop with the following backtrace:

#0  0x00000000007ae8d5 in test_rb_tree (element=0x2249490, parent=0x2249550) at opt_range.cc:7132
#1  0x00000000007ae8c8 in test_rb_tree (element=0x2249550, parent=0x22493d0) at opt_range.cc:7131
#2  0x00000000007ae8b4 in test_rb_tree (element=0x22493d0, parent=0x22490d0) at opt_range.cc:7130
#3  0x00000000007ae8b4 in test_rb_tree (element=0x22490d0, parent=0x22496d0) at opt_range.cc:7130
#4  0x00000000007ae8c8 in test_rb_tree (element=0x22496d0, parent=0x224a2d0) at opt_range.cc:7131
#5  0x00000000007ae8c8 in test_rb_tree (element=0x224a2d0, parent=0x224bad0) at opt_range.cc:7131
#6  0x00000000007ae8c8 in test_rb_tree (element=0x224bad0, parent=0x2248ad0) at opt_range.cc:7131
#7  0x00000000007ae8b4 in test_rb_tree (element=0x2248ad0, parent=0x2242a80) at opt_range.cc:7130
#8  0x00000000007ae8b4 in test_rb_tree (element=0x2242a80, parent=0x2236a80) at opt_range.cc:7130
#9  0x00000000007ae8b4 in test_rb_tree (element=0x2236a80, parent=0x224ead0) at opt_range.cc:7130
#10 0x00000000007ae8c8 in test_rb_tree (element=0x224ead0, parent=0x227ebc0) at opt_range.cc:7131
#11 0x00000000007ae8c8 in test_rb_tree (element=0x227ebc0, parent=0x221e9e0) at opt_range.cc:7131
#12 0x00000000007ae8b4 in test_rb_tree (element=0x221e9e0, parent=0x215e580) at opt_range.cc:7130
#13 0x00000000007ae8b4 in test_rb_tree (element=0x215e580, parent=0x209de70) at opt_range.cc:7130
#14 0x00000000007ae8b4 in test_rb_tree (element=0x209de70, parent=0x0) at opt_range.cc:7130
#15 0x00000000007aeae9 in SEL_ARG::rb_insert (this=0x209de70, leaf=0x23528a0) at opt_range.cc:7020
#16 0x00000000007aec58 in SEL_ARG::insert (this=0x209de70, key=0x23528a0) at opt_range.cc:6815
#17 0x00000000007b0dbc in key_and (param=0x7f9b78015bb0, key1=0x1fc1cc8, key2=0x1fc24d8, clone_flag=3) at opt_range.cc:6460
#18 0x00000000007b1067 in tree_and (param=0x7f9b78015bb0, tree1=0x1fc1a78, tree2=0x1fc1fd8) at opt_range.cc:6071
#19 0x00000000007b32c2 in get_mm_tree (param=0x7f9b78015bb0, cond=0x1f6bee0) at opt_range.cc:5431
#20 0x00000000007b8cd4 in SQL_SELECT::test_quick_select (this=0x1f6c0b8, thd=0x7f9b7cc67178, keys_to_use={map = 32}, prev_tables=0,
    limit=18446744073709551615, force_quick_range=false, ordered_output=false) at opt_range.cc:2407
#21 0x00000000006f8498 in test_if_quick_select (tab=0x1f6b990) at sql_select.cc:17047
#22 0x00000000006f8844 in join_init_quick_read_record (tab=0x1f6b990) at sql_select.cc:17027
#23 0x00000000006fc021 in sub_select (join=0x1f653c0, join_tab=0x1f6b990, end_of_records=false) at sql_select.cc:16257
#24 0x00000000006c832c in JOIN_CACHE::generate_full_extensions (this=0x1f6c470, rec_ptr=0x1f8daf8 "\fЭ\n") at sql_join_cache.cc:1908
#25 0x00000000006c8d1a in JOIN_CACHE_BNL::join_matching_records (this=0x1f6c470, skip_last=false) at sql_join_cache.cc:1801
#26 0x00000000006c70e7 in JOIN_CACHE::join_records (this=0x1f6c470, skip_last=false) at sql_join_cache.cc:1617
#27 0x00000000006fc28f in sub_select_cache (join=0x1f653c0, join_tab=0x1f6b6f0, end_of_records=true) at sql_select.cc:16060
#28 0x00000000006fbf1a in sub_select (join=0x1f653c0, join_tab=0x1f6b450, end_of_records=true) at sql_select.cc:16218
#29 0x0000000000709b4c in do_select (join=0x1f653c0, fields=0x0, table=0x1e4f250, procedure=0x0) at sql_select.cc:15823
#30 0x0000000000722593 in JOIN::exec (this=0x1f653c0) at sql_select.cc:2478
#31 0x000000000071e92a in mysql_select (thd=0x7f9b7cc67178, rref_pointer_array=0x7f9b7cc69358, tables=0x1ef1600, wild_num=0, fields=@0x7f9b7cc69278,
    conds=0x0, og_num=4, order=0x1f632b0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0x1f63e78, unit=0x7f9b7cc68ae0,
    select_lex=0x7f9b7cc69170) at sql_select.cc:3091
#32 0x00000000007241c1 in handle_select (thd=0x7f9b7cc67178, lex=0x7f9b7cc68a40, result=0x1f63e78, setup_tables_done_option=0) at sql_select.cc:306
#33 0x000000000067d4e4 in execute_sqlcom_select (thd=0x7f9b7cc67178, all_tables=0x1ef1600) at sql_parse.cc:4935
#34 0x000000000067ed43 in mysql_execute_command (thd=0x7f9b7cc67178) at sql_parse.cc:2112
#35 0x000000000068756a in mysql_parse (thd=0x7f9b7cc67178,
    inBuf=0x1ef08a0 "SELECT   table1 . `pk` AS field1 , table1 . `pk` AS field2 , table2 . `int_key` AS field3 FROM ( CC AS table1 STRAIGHT_JOIN ( ( C AS table2 STRAIGHT_JOIN A AS table3 ON (( table3 . `varchar_key` <> ta"..., length=402, found_semicolon=0x7f9b7801af00) at sql_parse.cc:5950
#36 0x0000000000688185 in dispatch_command (command=COM_QUERY, thd=0x7f9b7cc67178,
    packet=0x7f9b7cca28c9 "SELECT   table1 . `pk` AS field1 , table1 . `pk` AS field2 , table2 . `int_key` AS field3 FROM ( CC AS table1 STRAIGHT_JOIN ( ( C AS table2 STRAIGHT_JOIN A AS table3 ON (( table3 . `varchar_key` <> ta"..., packet_length=402) at sql_parse.cc:1062
#37 0x0000000000689651 in do_command (thd=0x7f9b7cc67178) at sql_parse.cc:744
---Type <return> to continue, or q <return> to quit---
#38 0x000000000067688b in handle_one_connection (arg=0x7f9b7cc67178) at sql_connect.cc:1163
#39 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#40 0x000000315a4e627d in clone () from /lib64/libc.so.6

The loop is in the following code section:

7132      if (count_l >= 0 && count_r >= 0)
(gdb)
7134        if (count_l == count_r)
(gdb)
7135          return count_l+(element->color == SEL_ARG::BLACK);

The query could not be evaluated within 2 hours. Under 5.1, the same issue is observed, with a slightly different stack trace (no join cache).

The EXPLAIN looks like this:

+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------------------------------------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra                                           |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------------------------------------------+
|  1 | SIMPLE      | table1 | index | NULL          | PRIMARY | 4       | NULL |   20 | Using index                                     |
|  1 | SIMPLE      | table2 | ALL   | C_varchar_key | NULL    | NULL    | NULL |   21 | Using join buffer                               |
|  1 | SIMPLE      | table3 | ALL   | A_varchar_key | NULL    | NULL    | NULL |    0 | Range checked for each record (index map: 0x20) |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------------------------------------------+

Replacing both STRAIGHT_JOINs with JOIN causes the query to return an empty result immediately and EXPLAIN shows "impossible where". Removing the ON conditions also causes swift evaluation.

Unfortunately, due to the nature of this issue, automatic query simplification could not be applied.

How to repeat:
CREATE TABLE `A` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `A_int_key` (`int_key`),
  KEY `A_varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `CC` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `CC_int_key` (`int_key`),
  KEY `CC_varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;

INSERT INTO `CC` VALUES (10,8,'v','v'),(11,9,'r','r'),(12,9,'a','a'),(13,186,'m','m'),(14,NULL,'y','y'),(15,2,'j','j'),(16,3,'d','d'),(17,0,'z','z'),(18,133,'e','e'),(19,1,'h','h'),(20,8,'b','b'),(21,5,'s','s'),(22,5,'e','e'),(23,8,'j','j'),(24,6,'e','e'),(25,51,'f','f'),(26,4,'v','v'),(27,7,'x','x'),(28,6,'m','m'),(29,4,'c','c');

CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `C_int_key` (`int_key`),
  KEY `C_varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=22 DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (1,2,'w','w'),(2,9,'m','m'),(3,3,'m','m'),(4,9,'k','k'),(5,NULL,'r','r'),(6,9,'t','t'),(7,3,'j','j'),(8,8,'u','u'),(9,8,'h','h'),(10,53,'o','o'),(11,0,NULL,NULL),(12,5,'k','k'),(13,166,'e','e'),(14,3,'n','n'),(15,0,'t','t'),(16,1,'c','c'),(17,9,'m','m'),(18,5,'y','y'),(19,6,'f','f'),(20,2,'d','d'),(21,NULL,NULL,NULL);

SELECT   table1 . `pk` AS field1 , table1 . `pk` AS field2 , table2 . `int_key` AS field3
FROM ( CC AS table1 STRAIGHT_JOIN ( ( C AS table2 STRAIGHT_JOIN A AS table3 ON (( table3 . `varchar_key` <> table2 . `varchar_key` ) OR ( table3 . `varchar_key` <= table2 . `varchar_key` ) ) ) ) ON (table3 . `varchar_key` <> table2 . `varchar_nokey` ) );
[4 Sep 2009 8:14] Sveta Smirnova
Thank you for the report.

Verified as described.
[13 Oct 2009 8:52] Alexey Kopytov
A shorter testcase:

create table t1(a int, key(a));
insert into t1 values (1), (NULL);
select * from t1 where (a <> NULL) and (a <> NULL or a <= NULL);
[15 Oct 2009 11:49] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091015114812-i3gm8km6gfruny5x) (version source revid:alexey.kopytov@sun.com-20091015104251-0sjkte1joim4pfns) (merge vers: 5.1.41) (pib:13)
[15 Oct 2009 23:38] Paul Dubois
Noted in 5.1.41 changelog.

Incorrect handling of predicates involving NULL by the range 
optimizer could lead to to an infinite loop during query execution.

Setting report to NDI pending push into 5.5.x+.
[16 Oct 2009 8:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87079

3175 Georgi Kodinov	2009-10-16
      Revert the fix for bug #47123 until test suite failures are resolved.
[16 Oct 2009 20:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87197

3181 Alexey Kopytov	2009-10-17
      Bug #47123: Endless 100% CPU loop with STRAIGHT_JOIN 
       
      The problem was in incorrect handling of predicates involving 
      NULL as a constant value by the range optimizer. 
       
      For example, when creating a SEL_ARG node from a condition of 
      the form "field < const" (which would normally result in the 
      "NULL < field < const" SEL_ARG),  the special case when "const" 
      is NULL was not taken into account, so "NULL < field < NULL" 
      was produced for the "field < NULL" condition. 
       
      As a result, SEL_ARG structures of this form could not be 
      further optimized which in turn could lead to incorrectly 
      constructed SEL_ARG trees. In particular, code assuming SEL_ARG 
      structures to always form a sequence of ordered disjoint 
      intervals could enter an infinite loop under some 
      circumstances. 
       
      Fixed by changing get_mm_leaf() so that for any sargable 
      predicate except "<=>" involving NULL as a constant, "empty" 
      SEL_ARG is returned, since such a predicate is always false. 
     @ mysql-test/r/partition_pruning.result
        Fixed a broken test case.
     @ mysql-test/r/range.result
        Added a test case for bug #47123.
     @ mysql-test/r/subselect.result
        Fixed a broken test cases.
     @ mysql-test/t/range.test
        Added a test case for bug #47123.
     @ sql/opt_range.cc
        Fixed get_mm_leaf() so that for any sargable
        predicate except "<=>" involving NULL as a constant, "empty"
        SEL_ARG is returned, since such a predicate is always false.
[22 Oct 2009 6:32] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:05] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091019132831-5j6xs19ld16m7r8j) (merge vers: 5.5.0-beta) (pib:13)
[22 Oct 2009 19:18] Paul Dubois
Noted in 5.5.0, 6.0.14 changelogs.
[4 Nov 2009 9:24] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091104092152-qz96bzlf2o1japwc) (version source revid:alexey.kopytov@sun.com-20091016201951-fsht0wm8xn8vkzsx) (merge vers: 5.1.41) (pib:13)
[11 Nov 2009 6:48] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091110093407-rw5g8dys2baqkt67) (version source revid:alik@sun.com-20091109080109-7dxapd5y5pxlu08w) (merge vers: 6.0.14-alpha) (pib:13)
[11 Nov 2009 6:56] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091109115615-nuohp02h8mdrz8m2) (version source revid:alik@sun.com-20091105090203-cls5j6k3ohu04xpt) (merge vers: 5.5.0-beta) (pib:13)
[1 Dec 2009 5:27] Igor Babaev
Alexey,

After your patch, for your test case, we have:

mysql> EXPLAIN SELECT * FROM t1 WHERE a <> NULL;
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
1 row in set (10.68 sec)

Yet, if we drop the index we get:

mysql > drop index a on t1;
Query OK, 4 rows affected (0.03 sec)
Records: 4  Duplicates: 0  Warnings: 0

(I've added two extra records to t1 with values '2' and '3').

mysql> EXPLAIN SELECT * FROM t1 WHERE a <> NULL;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+

This seems to me to be counter-intuitive. I would expect to get "Impossible where" in this case too.

I think it happens because the null-rejecting predicates returning always UNKNOWN must be caught much earlier, somewhere in optimize_cond().

In this case for the following query:

mysql> EXPLAIN SELECT * FROM t1, t1 t2  WHERE t1.a <> NULL or t2.a <> NULL;
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                       |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------+
|  1 | SIMPLE      | t1    | index | a             | a    | 5       | NULL |    4 | Using index                                 |
|  1 | SIMPLE      | t2    | index | a             | a    | 5       | NULL |    4 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------+
2 rows in set (9.85 sec)

you also would get "Impossible where".
[3 Dec 2009 13:19] Alexey Kopytov
Igor,

I agree that the inconsistency with indexed vs. non-indexed column must be fixed.

However, I think this should be done in addition to the patch for this bug, not instead of it. If we catch null-rejecting predicates only in optimize_cond(), then the following query will still trigger the bug (i.e. the endless loop in the range optimizer):

SELECT * FROM t1, t1 t2 WHERE (t2.a <> t1.a) AND (t2.a <> t1.a or t2.a <= t1.a);

So I think we still must catch such predicates in get_mm_leaf() as done in my patch, and do the same in optimize_cond() to be consistent in the non-indexed column case.

I have a draft version of the patch for optimize_cond() ready. Please let me know if you agree, I will then apply it as a fix for a separate bug.
[7 Dec 2009 12:55] Alexey Kopytov
Reported the inconsistency with indexed vs. non-indexed columns separately, bug #49504.
[7 Dec 2009 16:41] Paul Dubois
Noted in 5.1.40sp1 changelog.
[8 Dec 2009 9:30] Bugs System
Pushed into 5.1.43 (revid:build@mysql.com-20091208092611-pbno5awyb0v38hs7) (version source revid:build@mysql.com-20091208092611-pbno5awyb0v38hs7) (merge vers: 5.1.43) (pib:13)
[16 Dec 2009 8:34] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091216083311-xorsasf5kopjxshf) (version source revid:alik@sun.com-20091215065750-5m04ogppd5l0pol5) (merge vers: 6.0.14-alpha) (pib:14)
[16 Dec 2009 8:42] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091216082430-s0gtzibcgkv4pqul) (version source revid:alik@sun.com-20091211070127-kl8uvlrv9cr11kva) (merge vers: 5.5.0-beta) (pib:14)
[16 Dec 2009 8:48] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20091216083231-rp8ecpnvkkbhtb27) (version source revid:alik@sun.com-20091212203859-fx4rx5uab47wwuzd) (merge vers: 5.6.0-beta) (pib:14)
[18 Dec 2009 10:26] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:42] Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 10:57] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:12] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[12 Mar 2010 14:06] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:22] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:36] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)