Bug #35137 Query crashed mysqld
Submitted: 7 Mar 2008 9:45 Modified: 9 Jul 2008 7:26
Reporter: Karsten Thygesen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.23 OS:Any
Assigned to: Frazer Clement CPU Architecture:Any
Tags: segmentation fault

[7 Mar 2008 9:45] Karsten Thygesen
Description:
Hi

When I do a query like this:

select p.id,count(replies.id),max(replies.posted),(select count(*) from post newPosts where newPosts.posted >= now() and newPosts.sequence like concat(p.sequence, '.%')) as newReplies from post p left join post replies on ( replies.sequence = concat(p.sequence, '.%')) where p.id in (371324,376782);

mysqld crashes with a segmentation fault. It happens on 5.1.23 GPL on FreeBSD-6.2 and on 5.1.23 binary from MySQL on Linux as well.

The table is on engine=NDB and the post table is created as:

mysql> show create table post\G
*************************** 1. row ***************************
       Table: post
Create Table: CREATE TABLE `post` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `inreplyto` int(11) DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  `modifier` varchar(255) DEFAULT NULL,
  `posted` datetime NOT NULL,
  `state` int(11) NOT NULL,
  `topic` int(11) DEFAULT NULL,
  `body` text,
  `sequence` varchar(4352) DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL,
  `owner` int(11) NOT NULL,
  `posteremail` varchar(255) DEFAULT NULL,
  `postername` varchar(255) DEFAULT NULL,
  `posterprofile` int(11) DEFAULT NULL,
  `sendMailOnReply` bit(1) DEFAULT NULL,
  `created_timestamp` datetime DEFAULT NULL,
  `created_by` int(11) DEFAULT NULL,
  `last_modified_timestamp` datetime DEFAULT NULL,
  `last_modified_by` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `post_posted_idx` (`posted`),
  KEY `post_seq_idx` (`sequence`(3072)),
  KEY `FK3498A080E99FFD` (`owner`),
  KEY `FK3498A09EC30BEE` (`topic`),
  KEY `FK3498A032F4559F` (`last_modified_by`),
  KEY `FK3498A08F262483` (`posterprofile`),
  KEY `FK3498A0A8729330` (`inreplyto`),
  KEY `FK3498A04608B149` (`created_by`)
) /*!50100 TABLESPACE ts1 STORAGE DISK */ ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

The query works on an empty table. I have 262 rows of data (will be attached here). If I drop the post_posted_idx index, then the query also succeeds. If I move the table to InnoDB, it also works. No errrors from ndbd.

How to repeat:
create table in NDB, load data and execute query.
[7 Mar 2008 9:50] Karsten Thygesen
dump of table post

Attachment: post.outfile (application/octet-stream, text), 276.10 KiB.

[7 Mar 2008 9:52] Karsten Thygesen
Error log from mysqld:

080307 10:39:32 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=9
max_threads=151
threads_connected=9
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 133237 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

080307 10:39:32 mysqld_safe mysqld restarted
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
080307 10:39:32  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Last MySQL binlog file position 0 430894, file name ./mysql-bin.000035
080307 10:39:32  InnoDB: Started; log sequence number 0 33604805
080307 10:39:36 [Note] Starting MySQL Cluster Binlog Thread
080307 10:39:36 [Note] Event Scheduler: Loaded 0 events
080307 10:39:36 [Note] /usr/local/libexec/mysqld: ready for connections.
Version: '5.1.23-rc'  socket: '/tmp/mysql.sock'  port: 3306  FreeBSD port: mysql-server-5.1.23
[17 Mar 2008 12:50] Frazer Clement
Proposed patch

Attachment: bug35137.patch (text/x-patch), 15.81 KiB.

[17 Mar 2008 12:58] Frazer Clement
Proposed fix attached.  
Problem 
The problem was that MRR range iteration assumed that between calls to read_multi_range_next(), the completed NdbOperations would remain accessible.  However, depending on how the first / subsequent range results are processed, there can be further execute() calls on the transaction object which free the completed operations, so that when read_multi_range_next() is called again the next NdbOperation object has been released, and the NdbRecAttr list is NULL etc. 
Fix 
Fix is to process and unpack all primary / unique key operations immediately after execute into MRR buffer.  
A byte per buffer entry is used to indicate whether data was found or not.
An additional fix is made to the partition pruning code in read_multi_range_first() which skips buffer space not skipped when handling SKIP_RANGE in read_multi_range_next().
A new testcase is added to ndb_read_multi_range.

I believe that this fix needs to be propagated onwards from 5.1 via -telco.  6.0 has a fix due to a rewrite, but no testcase.
[19 Mar 2008 12:40] Frazer Clement
Simplified patch after review

Attachment: bug35137-2.patch (text/x-patch), 13.46 KiB.

[19 Mar 2008 14:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44229

ChangeSet@1.2544, 2008-03-19 14:10:47+00:00, frazer@forth.ndb.mysql.com +5 -0
  Bug#35137 : Query crashed mysqld
  
  Nested MRR reads failed as the second MRR range released the first MRR range's unprocessed operations.
  Fix is to process all outstanding operations upfront before returning first result.
  New testcase added to ndb_read_multi_range
[20 Mar 2008 14:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44288

ChangeSet@1.2564, 2008-03-20 14:52:07+00:00, frazer@forth.ndb.mysql.com +1 -0
  Bug #35137
  Merge fix
[3 Apr 2008 15:22] Jon Stephens
Documented in the 5.1.23-ndb-6.3.11 changelog as follows:

        Nested multi-range reads failed as the second multi-range read released
        the first read's unprocessed operations, sometimes leading to a SQL node
        crash.

Left status as Patch Approved pending additional merges.
[4 Apr 2008 22:37] Jon Stephens
Also noted fix in the 5.1.23-ndb-6.2.15 changelog.
[9 Jul 2008 7:26] Jon Stephens
Already documented, now closing.
[12 Dec 2008 23:26] Bugs System
Pushed into 6.0.6-alpha  (revid:sp1r-frazer@forth.ndb.mysql.com-20080320145207-62608) (version source revid:sp1r-tomas@poseidon.ndb.mysql.com-20080516085603-30848) (pib:5)