Bug #42593 cluster join hangs in BKA
Submitted: 4 Feb 2009 14:05 Modified: 23 Nov 2010 2:50
Reporter: Tomas Ulin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Igor Babaev CPU Architecture:Any

[4 Feb 2009 14:05] Tomas Ulin
Description:
see below

How to repeat:
create table relation (uid int, fid int, index(uid)) engine ndb;
insert into relation values (1,1);
insert into relation values (1,2);
insert into relation values (1,3);
insert into relation values (1,4);
insert into relation values (2,5);
insert into relation values (2,6);
insert into relation values (2,7);
insert into relation values (2,8);
insert into relation values (3,1);
insert into relation values (3,2);
insert into relation values (3,9);

create table users (uid int primary key, name varchar(128), index(name)) engine ndb;

insert into users values (1, "A");
insert into users values (2, "B");
insert into users values (3, "C");
insert into users values (4, "D");
insert into users values (5, "E");
insert into users values (6, "F");
insert into users values (7, "G");
insert into users values (8, "H");
insert into users values (9, "I");

set join_cache_level=7;
select name from users, relation where relation.uid in (select users.uid from users, relation where relation.uid=1 and users.uid=relation.fid) and users.uid=relation.fid;

kill -6 gives BT where it hangs

(gdb) where
#0  0x0000000000d1ab55 in Ndb::sendPrepTrans (this=0x267ba90, forceSend=1) at Ndbif.cpp:1166
#1  0x0000000000d1b2de in Ndb::sendPollNdb (this=0x267ba90, aMillisecondNumber=360000, minNoOfEventsToWakeup=1, forceSend=1) at Ndbif.cpp:1326
#2  0x0000000000d72393 in NdbTransaction::executeNoBlobs (this=0x2638970, aTypeOfExec=NdbTransaction::NoCommit, abortOption=NdbOperation::AO_IgnoreError, forceSend=1) at NdbTransaction.cpp:539
#3  0x0000000000d726c2 in NdbTransaction::execute (this=0x2638970, aTypeOfExec=NdbTransaction::NoCommit, abortOption=NdbOperation::AO_IgnoreError, forceSend=1) at NdbTransaction.cpp:285
#4  0x00000000009c6050 in execute_no_commit_ie (h=0x27018a0, trans=0x2638970) at ha_ndbcluster.cc:343
#5  0x00000000009b1c2d in ha_ndbcluster::multi_range_start_retrievals (this=0x27018a0, starting_range=9) at ha_ndbcluster.cc:9776
#6  0x00000000009b41d3 in ha_ndbcluster::multi_range_read_next (this=0x27018a0, range_info=0x41497658) at ha_ndbcluster.cc:9983
#7  0x00000000007b73de in JOIN_CACHE_BKA_UNIQUE::join_matching_records (this=0x26e24a8, skip_last=false) at sql_join_cache.cc:3117
#8  0x00000000007b5dbf in JOIN_CACHE::join_records (this=0x26e24a8, skip_last=false) at sql_join_cache.cc:1591
#9  0x00000000007eb582 in sub_select_cache (join=0x271ac40, join_tab=0x26e0f30, end_of_records=true) at sql_select.cc:16029
#10 0x00000000007eb1dd in sub_select (join=0x271ac40, join_tab=0x26e0c90, end_of_records=true) at sql_select.cc:16187
#11 0x00000000007eb5a5 in sub_select_cache (join=0x271ac40, join_tab=0x26e0c90, end_of_records=true) at sql_select.cc:16031
#12 0x00000000007eb1dd in sub_select (join=0x271ac40, join_tab=0x26e09f0, end_of_records=true) at sql_select.cc:16187
#13 0x00000000007eb5a5 in sub_select_cache (join=0x271ac40, join_tab=0x26e09f0, end_of_records=true) at sql_select.cc:16031
#14 0x00000000007eb1dd in sub_select (join=0x271ac40, join_tab=0x26e0750, end_of_records=true) at sql_select.cc:16187
#15 0x00000000007f906f in do_select (join=0x271ac40, fields=0x2208c90, table=0x0, procedure=0x0) at sql_select.cc:15792
#16 0x0000000000812e38 in JOIN::exec (this=0x271ac40) at sql_select.cc:2877
#17 0x000000000080d82f in mysql_select (thd=0x2206cb8, rref_pointer_array=0x2208d70, tables=0x265e590, wild_num=0, fields=@0x2208c90, conds=0x2713fe0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, 
    result=0x2714198, unit=0x2208720, select_lex=0x2208b88) at sql_select.cc:3058
#18 0x0000000000813156 in handle_select (thd=0x2206cb8, lex=0x2208680, result=0x2714198, setup_tables_done_option=0) at sql_select.cc:315
#19 0x000000000076d266 in execute_sqlcom_select (thd=0x2206cb8, all_tables=0x265e590) at sql_parse.cc:4756
#20 0x000000000076e827 in mysql_execute_command (thd=0x2206cb8) at sql_parse.cc:2063
#21 0x00000000007767bb in mysql_parse (thd=0x2206cb8, inBuf=0x265e040 "select name from users, relation where relation.uid in (select users.uid from users, relation where relation.uid=1 and users.uid=relation.fid) and users.uid=relation.fid", length=169, 
    found_semicolon=0x41499b50) at sql_parse.cc:5750
[10 Feb 2009 11:00] Jonas Oreland
Patch that adds DBUG_ASSERT for inconsistency in mrr-interface usage

Attachment: bug42593.patch (text/x-patch), 597 bytes.

[10 Feb 2009 11:03] Jonas Oreland
So the problem is that we get a "multi_range_read_init"-call with n_ranges=11
But we can only call *seq_funcs.next(...) 9 times

This cause us to enter an endless loop
(unless patch i attached to bug-report is applied, which
 cause it to assert instead)
[10 Feb 2009 11:37] Jonas Oreland
info: same behavior in mysql-6.0-telco-6.3 and mysql-6.0-telco (6.4)
[10 Feb 2009 12:27] Jonas Oreland
Another thing: I seem to recall that n_ranges (in the various flavors)
should not be used. We use it here and there.

Iff this argument is "undefined",
can you please remove it from the interfaces please.
(or at the very minimum document in handler.h when/if it's undefined)

/Jonas
[13 Feb 2009 5:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66142

2697 Igor Babaev	2009-02-12
      Fixed bug #42593.
      The method JOIN_CACHE_BKA_UNIQUE::join_matching_record
      calls JOIN_CACHE_BKA::init_join_matching_records that
      calls the handler function multi_range_read_init.
      The code before the fix erroneously always passed numbers
      of records in the join buffer as the third parameter for
      the call of multi_range_read_init. Yet this parameter
      must specify the number of ranges passed to the MRR 
      interface for processing. If a join cache of the type 
      JOIN_CACHE_BKA_UNIQUE is employed the number of distinct
      keys occurred in the records from the join buffer must
      be passed as the required number of ranges to the function
      multi_range_read_init. 
      Currently only the NDB implementation of the
      multi_range_read_init handler function really uses
      the expected number of ranges.
[16 Feb 2009 18:08] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090216180446-dl1xovi02kbd2fgn) (version source revid:igor@mysql.com-20090213055616-8jruyhpluecmv2b8) (merge vers: 6.0.10-alpha) (pib:6)
[24 Feb 2009 0:54] Paul DuBois
Noted in 6.0.10 changelog.

For the batched-key access method, numbers of records were being
specified rather than numbers of ranges.
[16 Aug 2010 6:34] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:22] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 2:50] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.