Bug #67596 mysqld critical failure during ordered select from ndbinfo.cluster_operations
Submitted: 15 Nov 2012 8:24 Modified: 12 May 2015 16:25
Reporter: Олег Ненашев Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.2.6,7.2.12 OS:Linux (Oracle Enterprise Linux 6.2)
Assigned to: CPU Architecture:Any
Tags: mysqld; ndbinfo

[15 Nov 2012 8:24] Олег Ненашев
Description:
I've tried to request information about pending operations via ndbinfo. 
Due to timeout configuration, there're many locks in database, and select from ndbinfo.cluster_operations returns ~1000 entries. Most of them have same transid (see attachments).

Problem:
"select * from ndbinfo.cluster_operations order by transid" cause critical failure and termination of mysqld. Logs are attached.

Bug is being reproduced on different cluster configurations (1-node, 2-node with replication, 4-node)

How to repeat:
1) Set big TransactionDeadlockDetectionTimeout
2) Create multiple conflicting or deadlocked transactions. Each transaction must have several pending operations
3) Call "select * from ndbinfo.cluster_operations order by transid"

Expected result: mysql returns Ordered list of operations
[15 Nov 2012 8:24] Олег Ненашев
Error log

Attachment: mysqld.error.log (application/octet-stream, text), 4.47 KiB.

[15 Nov 2012 8:25] Олег Ненашев
Result of  "select * from ndbinfo.cluster_operations"

Attachment: cluster_operations_content.csv (application/vnd.ms-excel, text), 53.31 KiB.

[15 Nov 2012 17:54] MySQL Verification Team
what is your sort_buffer_size variable set to?  Does this work better:

set session sort_buffer_size=1024*1024*8;
select * from cluster_operations order by transid LIMIT 0, 1000;

regards,
Shane (Not a cluster person) :)
[21 Nov 2012 9:51] Олег Ненашев
Yes, that works fine. Anyway, I suppose that mysqld shouldn't terminate after such errors, because user can send any request :(
[26 Apr 2013 14:45] MySQL Verification Team
Hello Oleg,

Thank you for the report.
Verified as described on reported and later versions including GA.

Thanks,
Umesh
[26 Apr 2013 14:47] MySQL Verification Team
Repeatable on latest GA 7.2.12

############### Resolved stack trace

stack_bottom = 7fb1cc51fe68 thread_stack 0x40000
mysqld(my_print_stacktrace+0x35)[0x81d315]
mysqld(handle_fatal_signal+0x403)[0x6e94f3]
/lib64/libpthread.so.0[0x3b2560f500]
/lib64/libc.so.6(memcpy+0xd2)[0x3b24e89122]
mysqld(merge_buffers(st_sort_param*, st_io_cache*, st_io_cache*, unsigned char*, st_buffpek*, st_buffpek*, st_buffpek*, int)+0x3fd)[0x6e6a7d]
mysqld(filesort(THD*, TABLE*, st_sort_field*, unsigned int, SQL_SELECT*, unsigned long long, bool, unsigned long long*)+0x1dd0)[0x6e8fd0]
mysqld[0x6200e6]
mysqld(JOIN::exec()+0x80a)[0x62bcba]
mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*)
P13st_select_lex+0x1c3)[0x627ba3]
mysqld(handle_select(THD*, LEX*, select_result*, unsigned long)+0x179)[0x62d939]
mysqld[0x5eab31]
mysqld(mysql_execute_command(THD*)+0x269b)[0x5f060b]
mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x18a)[0x5f27da]
mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1331)[0x5f4391]
mysqld(do_handle_one_connection(THD*)+0x167)[0x68d957]
mysqld(handle_one_connection+0x54)[0x68d9c4]

############# How to repeat:

*** Set big TransactionDeadlockDetectionTimeout

## MySQL Session <one>

create database if not exists test;
use test;

drop table if exists keyvalue;
CREATE TABLE `keyvalue` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(40),
  PRIMARY KEY (`id`)
) ENGINE=NDB;

set @id:=0;

 insert into `keyvalue` values (@id:=@id+1,md5(rand()*1000000)),(@id:=@id+1,md5(rand()*1000000)),(@id:=@id+1,md5(rand()*1000000)),(@id:=@id+1,md5(rand()*1000000));

 insert into `keyvalue`(`id`,`name`)
   select @id:=@id+1,md5(rand()*1000000) from
    `keyvalue` k1, `keyvalue` k2, `keyvalue` k3, `keyvalue` k4,`keyvalue` k5,`keyvalue` k6, `keyvalue` k7, `keyvalue` k8, `keyvalue` k9,
`keyvalue` k0,`keyvalue` ka, `keyvalue` kb, `keyvalue` kc, `keyvalue` kd limit 5000000;

## MySQL Session <two>

mysql> select version();
+-----------------------------------------------+
| version()                                     |
+-----------------------------------------------+
| 5.5.30-ndb-7.2.12-cluster-commercial-advanced |
+-----------------------------------------------+

mysql> show variables like 'sort_buffer_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| sort_buffer_size | 32768 |
+------------------+-------+

select * from ndbinfo.cluster_operations order by transid LIMIT 0, 1000;

## Workaround as suggested by Shane

set session sort_buffer_size=1024*1024*8;
select * from cluster_operations order by transid LIMIT 0, 1000;
[12 May 2015 16:25] Jon Stephens
Fixed in NDB 7.5.0. Will be documented in the NDB 7.5.0 changelog (when we have one) as follows:

        When using a sufficiently large value for
        TransactionDeadlockDetectionTimeout and the default value for
        sort_buffer_size, executing SELECT * FROM
        ndbinfo.cluster_operations ORDER BY transid with multiple
        concurrent conflicting or deadlocked transactions, each
        transaction having several pending operations, caused the SQL
        node where the query was run to fail.

Closed.