Bug #26142 Random empty SELECT results on Cluster (between 10 in 10,000 and 2 in 200,000)
Submitted: 7 Feb 2007 11:43 Modified: 26 Feb 2007 18:24
Reporter: Nicola Worthington
Status: Analyzing
Category:Server: Cluster Severity:S2 (Serious)
Version:mysql-5.0 OS:Linux (Linux methone 2.6.9-42.0.3.ELsmp)
Assigned to: Hartmut Holzgraefe Target Version:
Tags: duplicate keys, failures, SELECT, cluster, 5.0.27-max
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[7 Feb 2007 11:43] Nicola Worthington
Description:
Creating a simple database of 26 rows A through Z (see example tables in "how to
repeat"), then doing a continuous loop selecting A through Z over and over to pull the ID
from the row. Occasionally results in an empty SELECT result set (returns no rows) trying
to extract the ID that we know is there.

When select failure occurs it happens in alternating batches of success and failures. For
example, B and D will fail but C will succeed.

These errors only happen when the tables are in the NDB cluster engine. If the tables are
MyISAM for example on the same mysql instance (still connected to a cluster), the problem
does not happen.

The cluster consists of 3 servers (all dual core 2 cpu boxes with 8GB of ram and fast
disks). There are 2 ndb nodes on 2 of those boxes, making a 4 ndbd node system. There is
1 mysqld instance on each of those machines too. The 3rd machine is purely a management
node.

Some example output from the test.pl script in "how to repeat" for us is:

now select items that should exist - loop 2377 (queries 185412, failures 10)
now select items that should exist - loop 2378 (queries 185490, failures 10)
now select items that should exist - loop 2379 (queries 185568, failures 10)
now select items that should exist - loop 2380 (queries 185646, failures 10)
FAILED -> m
ERROR #11 - ERROR: cache_exists SELECT m returned no ID - SQL Said: fetch() without
execute()
FAILED -> o
ERROR #12 - ERROR: cache_exists SELECT o returned no ID - SQL Said: fetch() without
execute()
FAILED -> q
ERROR #13 - ERROR: cache_exists SELECT q returned no ID - SQL Said: fetch() without
execute()
now select items that should exist - loop 2381 (queries 185718, failures 13)
now select items that should exist - loop 2382 (queries 185796, failures 13)
now select items that should exist - loop 2383 (queries 185874, failures 13)
now select items that should exist - loop 2384 (queries 185952, failures 13)

How to repeat:
Will attach to this bug:

+ The /etc/my.cnf on the mysqld, management and ndb nodes
+ The config.ini for the cluster
+ Database schema .SQL file
+ test.pl script to demonstrate incorrect empty SELECT results
+ Results set of a run of test.pl on our system
[7 Feb 2007 11:44] Nicola Worthington
debug.txt

Attachment: debug.txt (text/plain), 280.00 KiB.

[7 Feb 2007 11:45] Nicola Worthington
Perl script to demonstrate empty SELECT results

Attachment: test.pl (application/x-perl, text), 4.47 KiB.

[7 Feb 2007 11:46] Nicola Worthington
Database schema

Attachment: test.sql (text/x-sql), 937 bytes.

[7 Feb 2007 11:47] Nicola Worthington
Cluster configuration file

Attachment: config.ini (application/octet-stream, text), 1.14 KiB.

[7 Feb 2007 11:48] Nicola Worthington
my.cnf file

Attachment: my.cnf (application/octet-stream, text), 1.82 KiB.

[12 Oct 10:31] Jonas Oreland
can't repeat ?