Bug #33534 Bad performance of INSERT's in auto_incremented tables
Submitted: 27 Dec 2007 13:12 Modified: 20 Feb 2008 20:57
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:MySQL Cluster CGE 6.3 OS:Any
Assigned to: Martin Skold CPU Architecture:Any

[27 Dec 2007 13:12] Mikael Ronström
Description:
I've got two cases:
1) A normal auto_increment table
I perform multi-insert statements
When performing them in one thread only each multi-insert takes around
3 milliseconds.
When performing them in 2 threads simultaneously each multi-insert usually
takes 35 milliseconds.
The auto_increment column is assigned the value NULL to ensure an auto-increment
value is assigned.

Occassionally these execute faster, possibly this could be when they're not
executing in the same MySQL Server the connections. This observation is based
on the fact that when executing this using the JDBC path:
jdbc:mysql:loadbalance://dator6:3306,dator6:3307/test?...
then occassionally the performance drop disappeared whereas when using the
JDBC path
jdbc:mysql://dator6:3306/test?....
the performance drop is very consistent.

2) An auto_increment table where INSERT's assign values to the auto_increment
column

In this case the auto_increment column was assigned a value.
One thread would e.g. assign values in the range 1-2880 and
the other one in the range 2881-5760.

The same behaviour was seen here except that here the performance dropped
to 300 ms per multi-insert instead of 3 ms so here the performance drop
was of the order 100.

Again occassionally the performance was ok when using the load balance
variant of the JDBC driver. So again it seems like this behaviour is
only seen when executing in one MySQL Server.

How to repeat:
I will upload the java files needed for the test. These are
part of a TPCW benchmark program I put together based on work
done at the University of Wisconsin.

Suggested fix:
Have no real idea at this moment.
[27 Dec 2007 13:14] Mikael Ronström
main program

Attachment: TPCW_Populate.java (text/x-java), 3.05 KiB.

[27 Dec 2007 13:15] Mikael Ronström
Thread object class

Attachment: TPCW_Populate_Thread.java (text/x-java), 1.14 KiB.

[27 Dec 2007 13:15] Mikael Ronström
File where the real action happens

Attachment: TPCW_Populate_exec.java (text/x-java), 50.73 KiB.

[27 Dec 2007 14:12] Mikael Ronström
Tested it with latest bk version of CGE 6.3.
Used ndb_autoincrement_prefetch_sz set to 32 and 20
where multi-inserts did 20 inserts at a time.

Behaviour stays the same very consistently.
[27 Dec 2007 21:26] Mikael Ronström
Hi,
I have now verified that the problem lies in that
Every time that one goes off to update the SYSTAB_0-table it consumes approx. 20 milliseconds.
While this happens, the other threads gets blocked fairly quickly on the mutex which is held to ensure
that only one thread at a time can update this counter (why is this, beats me, isn't the NDB interpreted
program and the fact that we're using locking enough to ensure that they are serialised in a proper
order?).

Why does it take 20 milliseconds? Have no clue on the real reason so far. But I can explain when it
happens.

It happens in the following code: (get_auto_increment in ha_ndbcluster.cc)

  for (;;)
  {
    Ndb_tuple_id_range_guard g(m_share);
    if (m_skip_auto_increment &&
        ndb->readAutoIncrementValue(m_table, g.range, auto_value) ||
        ndb->getAutoIncrementValue(m_table, g.range, auto_value, cache_size, increment, offset))
    {
      if (--retries &&
          ndb->getNdbError().status == NdbError::TemporaryError);
      {
        my_sleep(retry_sleep);
        continue;
      }
      const NdbError err= ndb->getNdbError();
      sql_print_error("Error %lu in ::get_auto_increment(): %s",
                      (ulong) err.code, err.message);
      *first_value= ~(ulonglong) 0;
      DBUG_VOID_RETURN;
    }
    break;
  }

When this code is executed with one thread only the code executes efficiently and without any issues.
When the code executes with 2 threads there is one thread at a time executing the query and the other
is blocked on the mutex. For some obscure reason in this scenario it takes 20 milliseconds for the thread
holding the lock to be awaken after returning from NDB

A very simple fix for this is to simply remove the mutex since I have actually no idea of why we need a
mutex here. When doing this the performance was quite ok again. So this is the solution currently.
Unfortunately this doesn't work since the state of these things are shared.

I have also verified that the problem exists on 3 different OS versions.

1) Ubuntu 7.04 using 2.6.20-15
2) Fedora Core 8 using 2.6.23.8-63.fc8
3) CentOS 4.4 using 2.6.9

So it doesn't seem to be a random behaviour on only my machine.
[27 Dec 2007 23:21] Mikael Ronström
I made a final check to see if the problem would go away by having parallel threads working on different tables
in parallel. This has been shown to solve issues like this before but it didn't here. So the issue is still that as soon
as there is contention on the Ndb_tuple_id_range_guard on the NDB_SHARE the performance drops through
the floor.
[7 Jan 2008 15:58] Johan Jerrang
We are experiencing the same problem, but with single insert statements.

MySQL version: 5.1.22
OS: RHEL4 x86_64

Table definition:
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dummy` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

Query: 
INSERT INTO t1 (dummy) VALUES($(NUMBER));

The cluster has been up and running for about 3 weeks.

Running against 1 SQL server in a cluster with 2 NDB nodes:
1 thread: total throughput:[792.824] (queries/sec)
2 threads: total throughput:[132.849] (queries/sec)
10 threads: total throughput:[671.762] (queries/sec)

I tried to restart the whole cluster, and that actually seems to make things a bit better (especially on 2 threads):
1 thread: total throughput:[892.383] (queries/sec)
2 threads: total throughput:[1286.23] (queries/sec)
10 threads: total throughput:[1498.13] (queries/sec)

Removing the auto_increment gives the following figures:
1 thread: total throughput:[1397.96] (queries/sec)
2 threads: total throughput:[2384.74] (queries/sec)
10 threads: total throughput:[7387.53] (queries/sec)
[25 Jan 2008 9:40] Martin Skold
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/41143

ChangeSet@1.2689, 2008-01-23 11:40:06+01:00, mskold@mysql.com +4 -0
  Ndb.hpp, Ndb.cpp, ha_ndbcluster.cc:
    Add a check if setting an auto_increment field will change it's next value before
retrieving tuple_id_range lock. This avoids hitting locks when updating auto_increment
values to a lower value than the current maximum. This is useful in loading a table with
auto_increment where one loads the highest numbered pk's first and then proceeds
backwards to the first. This can then be achieved with the same performance as a normal
insert without auto_increment.
  ndb_restore.result:
    Updated result file
[25 Jan 2008 9:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41250

ChangeSet@1.2691, 2008-01-25 10:43:30+01:00, mskold@mysql.com +4 -0
  ndb_restore.result, ndb_restore.test:
    Changed to use information_schema to check auto_increment
  Ndb.cpp:
    Bug #33534   Bad performance of INSERT's in auto_incremented tables: Saving highest seen value when setting auto_increment fields
  ndb_auto_increment.result:
    Regenerated result
[12 Feb 2008 14:20] Jon Stephens
Documented in the 5.1.23-ndb-6.3.9 changelog as follows:

        Statements executing multiple inserts performed poorly on NDB
        tables having AUTO_INCREMENT columns.

Left in PQ status pending further merges.
[12 Feb 2008 16:04] Jon Stephens
Also documented for 5.1.23-ndb-6.2.12.
[20 Feb 2008 16:02] Bugs System
Pushed into 5.1.24-rc
[20 Feb 2008 16:03] Bugs System
Pushed into 6.0.5-alpha
[20 Feb 2008 20:57] Jon Stephens
Also documented for 5.1.24 and 6.0.5.