Bug #48753 Search an ordered index after update sometimes fails
Submitted: 13 Nov 2009 8:27 Modified: 4 Jan 2010 10:50
Reporter: Marco Tusa Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: CPU Architecture:Any
Tags: 7.0.9a, Orderd index, search, UPDATE

[13 Nov 2009 8:27] Marco Tusa
Description:
We have a cluster installation based on:
2 Machines hosting 2 data nodes each; 
1 ndb_mgmd and 1 MySQLd each.

Using ndbmtd with MaxNoOfExecutionThreads=4

One disk RAID1 15K.RPM
==================================================================
We are performing functional and stress test at client site Using LoadRunner, simulating the concurrent access of 70 users.

The test is performing several actions, one of them is an UPDATE and then a SELECT on the table describe below.

The actions are performed as follow:
BEGIN TRANSACTION
  UPDATE APM_E_SESSION_CAPTIVE_PROCESS_T set RID_PE='?' WHERE IP_ADDRESS = '?' and CAPTIVE = '?' and ORDERING ='?';
COMMIT;

  select * from APM_E_SESSION_CAPTIVE_PROCESS_T WHERE RID_PE='?';

Time elapsed between the two queries 100 ms.
---------------------------------------------------------------------
CREATE TABLE `APM_E_SESSION_CAPTIVE_PROCESS_T` (
  `IP_ADDRESS` varchar(15) NOT NULL,
  `CAPTIVE` varchar(45) NOT NULL,
  `ORDERING` int(11) NOT NULL,
  `CLI` varchar(12) NOT NULL,
  `RID` varchar(255) NOT NULL,
  `RID_PE` varchar(255) DEFAULT NULL,
  `MESSAGE_TYPE` varchar(45) NOT NULL,
  `STATUS` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`IP_ADDRESS`,`CAPTIVE`,`ORDERING`),
  KEY `IDX_RID_PE` (`RID_PE`),
  KEY `captive-process-captive-idx` (`CAPTIVE`),
  KEY `IDX_RID` (`RID`),
  KEY `IDX_CLI_ORDERING` (`CLI`,`ORDERING`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

======================================================================== 

What we have experienced is that 1-2% of the time this action is not successfully executed, because the SELECT statement was not returning the value as expected.
The SELECT is querying the table using the RID_PE attribute, which is indexed:
KEY `IDX_RID_PE` (`RID_PE`).

We know that the value is there and if we execute the same select adding the 
IP_ADDRESS attribute which is the left most value of the PK, then the select will return the correct record.

Modifying the table structure changing the IDX_RID_PE to be UNIQUE or to use HASH (and UNIQUE), the problem will not come up anymore.
The selct *always* report the expected record.

We are wondering why the different behavior which is not reporting correct result in the case we use ordered index.

--------------------------------------------------------
We suspect that a delay exists in the Index creation or update when not bound to a parallel HASH index like the UNIQUE.

--------------------------------------------------------
I have asked to the client to provide us the output of the ndb_error_reporter, I will upload it as soon as I have it. 

How to repeat:
Unfortunately the only way to replicate the issue as we had, is at client site, given the concurrent factors that generate it, and the tools use for the tests.

Suggested fix:
Work around:
if possible make it UNIQUE.
[13 Nov 2009 9:03] Jonas Oreland
just checking: how do you make sure that your 70 client access distinct data.
I.e that multiple threads don't use same rows ?

/Jonas
[16 Nov 2009 12:14] Marco Tusa
Jonas, the table is used to temporary store modified values, which are deleted after the update transaction.
The value of the attribute RID_PE is unique in respect of the IP_ADDRESS.

More the application has only one client accessing the value of RID_PE (which is also unique) in that table a time.

So they state that they cannot have more then one client accessing the same information (record) at the same time on this table.
[16 Nov 2009 14:13] Jørgen Austvik
Marco, do you have a support issue ticket number?
[20 Nov 2009 16:25] Marco Tusa
No
[5 Jan 2010 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".