MySQL Bugs: #44927: allow storage engine to maintain auto-increment counter

Bug #44927	allow storage engine to maintain auto-increment counter
Submitted:	18 May 2009 8:58	Modified:	4 Aug 2009 6:08
Reporter:	Bradley Kuszmaul (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Storage Engine API	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	auto_increment, Contribution

Description:
The TokuDB storage engine maintains its own auto-increment counter, and does not require that the auto-increment field be the first field in some key. It would be helpful if the storage-engine API provided a way to indicate that we do this.

Rationale: The TokuDB engine implements auto-increment in a way that tries to faithful to the semantics of SQL 2003 sequences.

According to the MySQL manual MyISAM and BDB provide interesting semantics for auto increment. The relevant behavior is:

For MyISAM and BDB tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.

InnoDB provides simpler semantics. InnoDB requires that the auto incrememnt field be the first field in any key.

To implement the MyISAM behavior seems to require a point query (which would slow down insertions, especially for a fractal-tree storage engine in which insertions are much cheaper than point queries). Two reasonable alternatives present themselves:

1. Do what InnoDB does. There is a precedent (InnoDB) for this behavior being good enough.
2. Allow the auto increment column to appear anywhere, but make its value independent of preceding fields. Simply increment it as a single counter. This approach allows people to design their primary keys without worrying about duplicates and then add an auto increment column to differentiate any duplicates that actually appear.

According to the draft for the SQL:2003 standard (the link for which can be found via http://en.wikipedia.org/wiki/SQL:2003) the standard SQL meaning for a sequence is nothing like MyISAM, and is somewhat similar to InnoDB.

In SQL:2003, every table has a hidden sequence generator. A sequence generator is simply a counter (with some extra information, such as how much to increment the counter by, and what to do when the counter overflows).

The sequence generator does not increment as part of a transaction. If a transaction increments the sequence generator and then the transaction aborts, the counter remains incremented.

There are no restrictions how a auto-generated field can be used as part of key.

The SQL 2003 standard differs from InnoDB's behavior. For example, if we delete the last row, shut down the server, restart the server, and insert a row, the same sequence number gets reused in InnoDB. According to the SQL:2003 standard, the same sequence number should not be reused in this case (because a generator remembers the previously used number).

Note: in SQL:2003 standard, the buzzwords are "sequence generator" and "identity column".

In our implementation, we store the sequence number metadata, instead
of seeking to the end to find the last item and proceeding from there.

We propose to add a virtual method to allow a storage engine to indicate that it can manage the auto increment values.

How to repeat:
This is a feature request to make our storage engine work right.

a patch to the storage engine API to fix this

Attachment: autoincrement-patch (application/octet-stream, text), 1.30 KiB.

I attached a patch for this.

Thank you for the contribution.

Bradley, I am not sure we want to have this auto-increment behavior -
a counter in the middle of the key that does not depend on previous
values. Even less the users would appreciate the confusion of having
it engine-dependent.

What's the problem of doing the MyISAM way ? It'd be something like

void ha_tokudb::get_auto_increment( ... )
{
  if (!table->s->next_number_key_offset)
  {
     // increment and return your sequence number
  }
  else
    handler::get_auto_increment(...);
}

And that would be *exactly* what MyISAM does - when the auto-inc
column is the first in a key, it's a "sequence generator", the values
are not reused (not after restart, not after delete). There's no cost
of extra index lookup either. When the auto-inc column is in the
middle of the key, the number is in a group, depends on a prefix.

This is the familiar behavior that our users find useful.

The MySQL behavior is problematic for us, since it would slow down
insertions a lot. I am not talking about a factor of two, but rather
a factor of 100. In the TokuDB engine, when unique-key checking is
disabled, 99% of insertions do not require a disk seek. But consider this table:

CREATE TABLE e (a BIGINT NOT NULL PRIMARY KEY, b BIGINT NOT NULL AUTO_INCREMENT, key k2(a,b));

If I choose random values of a, then for a bigger-than-memory table,
it will usually require a disk seek. Disk seeks are two
orders-of-magnitude slower than in-memory operations, so the
performance would slowed down by a two orders of magnitude.

This may seem counterintuitive. Much of our intuition about index
performance comes from the B-tree index, in which inserting a record is
about the same cost as looking up the record. In the TokuDB engine,
however, inserting a record is 200x faster than looking up a record,
so we must take some care to avoid "hidden look up" operations. The
MyISAM non-first-key behavior requires a "hidden look up".

If we were to go the MyISAM route, we would have to explain to our
customers the performance implications of using auto-increment keys,
and some would inevitably misuse it in ways that caused mysterious (to
them) performance bugs.

It is true that our behavior is different from the other storage
engines, but it is much easier to observe and explain an unexpected
semantic behavior than an unexpected performance behavior.

The behavior is already storage-engine dependent. In InnoDB if I
delete the highest-numbered auto-increment field, and restart the
server, then the counter gets reused. Most of our customers are using
InnoDB, so the behavior of InnoDB is more relevant than the behavior
of MyISAM.

Since the behavior is storage-engine dependent, it is slightly messy
that handler.cc knows that auto-increment keys with
next_number_keypart==0 are treated differently from other
auto-increment keys. This patch, which changes only one line of code,
allows the storage engine to control the semantics of auto-increment.
Since MySQL puts the semantics of auto-increment into the storage
engine, I think it makes sense for the storage engine to actually have
control.

The behavior of MyISAM and InnoDB are neither of them like SQL:2003,
which allows you to use an autoincremented column without putting it
into any kind of index. That's not necessarily a criticism of MyISAM
or InnoDB, but it does provide a rationale for allowing storage
engines to implement our behavior.

The patch that I attached allows a storage engine to implement
SQL:2003-like behavior, which we have found is much easier to explain
to our customers than either the InnoDB or the MyISAM behavior.

I wouldn't worry much about the performance when auto_increment key is not the first in the key. It's pretty much a borderline case, rarely used.

But if you prefer to avoid the slowdown at all cost - just don't implement this mode, support only auto_increment at the start of the key. It's "good enough", as you wrote earlier.

That InnoDB doesn't remember the auto_inc value after restart is a known InnoDB limitation, not a feature, basically it's a bug that we've heard many complains about.

Sergei,

I originally implemented this feature in the TokuDB storage engine because I saw several customer scenarios that benefited from this feature. The main reason is that this feature allows users more flexibility in selecting a primary key. This increased flexibility leads to better performance. Here is a customer example we have run into numerous times.

We noticed several customers had table schemas of the following nature:

create table foo (id int auto_increment, time date,..., primary key (id), key (date));

Most queries ran would be something like "select * from foo where date < some_value AND date > other_value;". For storage engines with a clustered primary key (like TokuDB and InnoDB), it would be more efficient if the primary key was (date), instead of (id). However, the "date" field was not unique. Therefore, this was not possible.

With this feature, the user can define a primary key of (date, id). The primary key was guaranteed to be unique because an auto increment field was being padded at the end. The queries became faster because they consisted of a range query on a clustered primary table. Also, insertions sometimes became faster because a superflous index of (date) did not need to be maintained.

Although this scenario is a corner case now, I believe if users had the ability to place an auto increment field anywhere in the key without suffering performance degradation, they would take advantage of it.

I understand the worries of having different auto-increment behavior causing user confusion. However, the behaviors are already different. MyISAM is different than InnoDB. And because storage engines can implement functions like "get_auto_increment", auto increment behavior is already storage engine dependent.

Okay, but even if you really need this auto-increment behavior - why do you need the attached patch ? As you wrote above it's up to the engine to implement any auto-inc behavior. And indeed, InnoDB does it your way, if the auto-inc column is the second segment in a key it will get sequential values, independent from the first key part. Well, InnoDB doesn't define HA_AUTO_PART_KEY in the table_flags so MySQL doesn't allow auto-inc column to be second in a key, but I've tried to add this flag to ha_innodb.cc and the auto-inc worked exactly the way you wanted. Am I missing something ?

Suppose one did the following statements:

create table foo ( a int, b int auto_increment, primary key (a,b));
insert into foo (a) values (1),(2),(3),(4),(5),(6);

Without this patch, handler::get_auto_increment gets called six times, each time with nb_desired_values set to 6. It should only be called once, at the beginning of the statement.

With this patch, handler::get_auto_increment is called once at the beginning of the statement, with nb_desired_values set to 6.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I'm in agreement with Zardosht, and this bug should still be open.
-Bradley

Zardosht, a couple of thoughts:

1. for consistency the new patch should use a table flag not a virtual method.
Like in

   if (!table_flags() & HA_AUTO_INC_IS_SINGLETON ...

2. but then, how it'll be different from HA_AUTO_PART_KEY ? The engine either supports different auto-inc sequences per distinct value in the first key column (like MyISAM) or it implements a straight sequence no matter where in the index the auto-inc column is (InnoDB). The differentiator is HA_AUTO_PART_KEY. And the patch could be simply

   if (!table_flags() & HA_AUTO_PART_KEY &&

I suppose...

Sergei,

A table flag sounds reasonable. However, I do not think we can simply reuse HA_AUTO_PART_KEY. Looking at the comment next to its declaration in handler.h, it states "/* auto-increment in multi-part key */". MyISAM sets this flag. I am not sure what it means, but I think it serves another purpose and cannot be overloaded.

I guess this means that a new flag is needed. That seems to be correct because the new flag will have a different meaning. It is meant to distinguish auto-inc behavior when the auto-inc field is not the first field in the key.

So, I think the right thing to do would be to declare a new flag HA_AUTO_INC_IS_SINGLETON, and make the patch you suggest.

HA_AUTO_PART_KEY used to mean that the engine supported auto-increment column
not only in the first key part. And practically this also meant MyISAM-like
behavior, a sequence generator was reset every time previous key parts changed their values.

I mean, if the engine supports auto increment at all, it can do it in one of
the two possible ways (in regards to multi-part keys) - MyISAM-like (sequence
generator is reset) or InnoDB like (sequence generator is not reset).

The engine was conveying this information to the MySQL via HA_AUTO_PART_KEY. If this flag was unset (InnoDB) MySQL was not allowing auto-increment column to be not the first key part.

Now you want to have InnoDB behavior (sequence generator is not reset) and still allow auto-increment column to be not the first key part. Fine, all that should be done, I suppose, is to remove the check from CREATE TABLE and allow auto-increment column to be not the first key part even if HA_AUTO_PART_KEY is unset.

Yes, we do want InnoDB behavior and still allow auto-increment column to be not the first key part, but we do not want to break existing storage engines. If you "remove the check from CREATE TABLE and allow auto-increment column to be not the first key part even if HA_AUTO_PART_KEY is unset", then InnoDB may break. InnoDB does not set HA_AUTO_PART_KEY, but does only allow auto-inc to be the first column of a key. 

Although I am having trouble finding the code now (and it may have changed recently), but I recall the way they know what autoinc value to start with upon opening a table is by doing an index_last on the appropriate index that has the autoinc field starting the key. If the autoinc field is not longer the start of the key, then index_last will not work.

Once again, the above statement is going off of memory, and it may have changed, or I may be wrong. I cannot find the InnoDB code to confirm.

Nevertheless, I was hoping to make a change that does not affect InnoDB and MyISAM, and I think reusing HA_AUTO_PART_KEY does that.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Still want this fix.

hello?