Bug #26449 Deadlock with ndbcluster engine and subqueries
Submitted: 16 Feb 2007 15:49 Modified: 22 Jun 2007 15:16
Reporter: Matteo Brusa Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.0.33 OS:Linux (Debian etch (testing))
Assigned to: CPU Architecture:Any
Tags: cluster, deadlock

[16 Feb 2007 15:49] Matteo Brusa
Description:
This problem occurs with both the current binary package (5.0.27) and the source only release (5.0.33).
I execute a SELECT query which contains subqueries on the same machine upon 2 different connections. After a few seconds mysql hungs; mysqladmin processlist shows the processes stuck. Every other connection attempt fails.

How to repeat:
I run this command in 2 terminal windows:
while echo <query> | mysql test; do true; done
The actual query is quite long, if needed i'll try to produce a good example.
[16 Feb 2007 18:00] Hartmut Holzgraefe
Yes, please provide an example, we won't be able to handle this bug without one
[19 Feb 2007 13:28] Matteo Brusa
I created a set of data which causes the system to crash in the crash8.sql file.
To replicate the crash, execute in 2 shells the following query:
 while echo "select *,
 (select count(*) from tasksqueue as tq where tq.jobs_id=jobs.id and status=0) as queued
   from jobs left join tasksqueue on tasksqueue.jobs_id=jobs.id" | mysql test ; do true;done
I tried to remove some more uninteresting fields from table "tasksqueue" but as soon as i remove some the problem disappears. Weird.
As soon as the system hangs, "mysqladmin  processlist" shows the "Time" field of the 2 queries increasing. 
I had to kill the mysqld processes with -9, they don't respond to normal kill signal.
[19 Feb 2007 13:28] Matteo Brusa
set of data to replicate the problem

Attachment: crash8.sql (application/octet-stream, text), 3.40 KiB.

[19 Feb 2007 23:05] Hartmut Holzgraefe
I ran the test case with all nodes on the same machine for about an hour without problems, will now try in a distributed setup ...
[20 Feb 2007 0:19] Hartmut Holzgraefe
I haven't been able to reproduce this in a 4 machine setup either, 
i was running:

- the management host on machine 1
- one data node each on machine 2 and 3
- mysqld and two shells running the sample loop on machine 3

all machines have 2 dual core CPUs so even with mysqld and
2 mysql clients all running on one machine there should still
be full parallelism of these processes.

Could you explain your cluster setup in more detail and provide 
your config.ini so that we might try to reproduce it more closely?
[20 Feb 2007 9:08] Matteo Brusa
Hardware: 
Node 1:
Dual processor Intel(R) Xeon(TM) CPU 2.80GHz family 15 model 4 stepping 3 with hyperthreading, 1Gb memory

Node 2:
Single processor  Intel(R) Xeon(R) CPU 5110  @ 1.60GHz family 6 model 15 stepping 6 with hyperthreading, 2Gb memory

Node 3:
Dual processor Intel(R) Xeon(TM) MP CPU 2.00GHz family 15 model 2 stepping 5 with hyperthreading, 2 Gb memory

All nodes are running debian testing (etch).
On node 2 mysql is installed from debian package mysql-server-5.0 (5.0.32-3) 
On node 3 mysql is installed from sources (5.0.33), compiled as:
CFLAGS="-O3" CXX=gcc CXXFLAGS="-O3 -felide-constructors \
            -fno-exceptions -fno-rtti" ./configure \
            --prefix=/usr/local/mysql --enable-assembler \
            --with-mysqld-ldflags=-all-static --with-ndbcluster 

As you see from config.ini, the two machine are also connected with a crossed cable.
Let me know if you need any more info or clarification.
[20 Feb 2007 9:08] Matteo Brusa
config.ini

Attachment: config.ini (application/octet-stream, text), 660 bytes.

[21 Feb 2007 12:19] Matteo Brusa
To avoid any possible problem, i installed on node 2 the same mysql as on node 3, 5.0.33 from sources.
The problem persists. If there's any debug output i could provide, or compilation flag i could test, please let me know.
[15 May 2007 5:18] Tomas Ulin
Matteo,

we're gessing this might be blob related.  Can you try changing Blob/Text columns to Varbinary/Varchar and see if you still see the problem.

BR,

Tomas
[18 May 2007 8:06] Matteo Brusa
Hi,
I changed the queries time ago to avoid all those joins.
Now the system is in production phase, therefore i cannot possibly test the benefits of your suggestion.
Thanks anyway.