Bug #57481 'multi range read' may fail to close completed NdbScanOperations
Submitted: 15 Oct 2010 12:22 Modified: 25 Nov 2010 21:51
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1.47-ndb-7.1.9 OS:Any
Assigned to: Ole John Aske CPU Architecture:Any
Triage: Triaged: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment)

[15 Oct 2010 12:22] Ole John Aske
Description:
Depending on the usage pattern from mysqld, there may still be an open scan operation from the previous mrr operation when another mrr access is started.

ha_ndbcluster::read_multi_range_first() fails to close these, which may eventually lead to situations where all NdbScanOperations, (hupp'ed) NdbTransactions or lock object has been consumed be the running operation.

This may materialize as unexpected:

'ERROR 1205 (HY000): Lock wait timeout exceeded...'
 or 
'1297: Got temporary error 4006 'Connect failure - out of connection objects...'

How to repeat:
create table t1 (pk int primary key, a int) engine=ndb;
create table t2 (pk int primary key, a int) engine=ndb;

insert into t2 values
   (0,0), (1,1), (2,2), (3,3), (4,4),
   (5,5), (6,6), (7,7), (8,8), (9,9);

##
# 10^4 cross product on t2 creates 10.000 rows:
##
insert into t1
 select
   t1.a + t2.a*10 + t3.a*100 + t4.a*1000, 
   (t1.a + t2.a*10 + t3.a*100 + t4.a*1000) / 1000
from
  t2 as t1, t2 as t2, t2 as t3, t2 as t4;

# Execute a 'scan(t1) join mrr(t2)'
#  - 'DISTINCT t1.pk' will cause optimizer to stop fetching mrr(t2) 
#     when the first matching 't2.a = t1.a' is found.
#  - 'LEFT JOIN' is to ensure that 'Using join buffer' is *not* used
#

SELECT DISTINCT STRAIGHT_JOIN t1.pk FROM 
   t1 LEFT JOIN t2 ON t2.a = t1.a AND t2.pk != 6;

--> Txn timeout or 'out of connection objects..'
[15 Oct 2010 12:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120835

3880 Ole John Aske	2010-10-15
      Proposed patch for bug#57481
      
      ha_ndbcluster::read_multi_range_first() should close any previous completed scan operations.
      
      MTR result file *not* included due to its size.....
[24 Nov 2010 16:39] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.40 (revid:jonas@mysql.com-20101124163758-ylwcnqce1piv1rut) (version source revid:jonas@mysql.com-20101124131609-7eegarkflxjvcxmq) (merge vers: 5.1.51-ndb-6.3.40) (pib:23)
[24 Nov 2010 17:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124869

3346 Jonas Oreland	2010-11-24
      ndb - bug#57481 - make sure to close previous mrr scans, before starting new
[24 Nov 2010 17:50] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.21 (revid:jonas@mysql.com-20101124174527-2n60gu2e0wx7an60) (version source revid:jonas@mysql.com-20101124174527-2n60gu2e0wx7an60) (merge vers: 5.1.51-ndb-7.0.21) (pib:23)
[24 Nov 2010 17:53] Jonas Oreland
pushed to 6.3.40, 7.0.21 and 7.1.10
[25 Nov 2010 21:51] Jon Stephens
Documented bugfix in the NDB-6.3.40, 7.0.21, and 7.1.10 changelogs as follows:

        In some circumstances, it was possible for mysqld to begin a new
        multi-range read scan without having closed a previous one. This
        could lead to exhaustion of all scan operation objects,
        transaction objects, or lock objects (or some combination of
        these) in NDB, causing queries to fail with such errors as Lock
        wait timeout exceeded or Connect failure - out of connection
        objects.

Closed.