Bug #47622 the new index is added before the existing ones in MySQL, but after one in SE
Submitted: 24 Sep 2009 16:01 Modified: 14 Oct 2010 13:39
Reporter: Sveta Smirnova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Jimmy Yang CPU Architecture:Any

[24 Sep 2009 16:01] Sveta Smirnova
Description:
This is split from bug #44571:

"[12 May 13:10] Marko Mäkelä
Case #2 (error message printed out by ha_innobase::info()) is a different mismatch between
the MySQL and InnoDB data dictionaries. table->s->keys==2, but ib_table->indexes.count==3,
because the first index is the generated clustered index (there is no PRIMARY KEY or
UNIQUE INDEX comprising NOT NULL columns). For some reason,
table->keys[0].name=="ruleacts$r$e$seq$act$a_id" but
ib_table->indexes.start->indexes.next.name=="ruleacts$arg_id". InnoDB assumes that the
secondary indexes are defined in the same order in both MySQL and InnoDB. In case #2, the
new index is added before the existing ones in MySQL, but after existing ones in InnoDB.
This could be fixed in MySQL, by adding the created indexes last in the .frm file. If
UNIQUE indexes must be listed before non-UNIQUE ones in the .frm file, then we must make
InnoDB follow the same order.
"

How to repeat:
See bug #44571
[17 Nov 2009 11:41] Sergey Vojtovich
A comment from sql_table.cc (sort_keys()):

  Sort keys in the following order:
  - PRIMARY KEY
  - UNIQUE keys where all column are NOT NULL
  - UNIQUE keys that don't contain partial segments
  - Other UNIQUE keys
  - Normal keys
  - Fulltext keys

  This will make checking for duplicated keys faster and ensure that
  PRIMARY keys are prioritized.
[26 Nov 2009 8:34] Marko Mäkelä
Bug #49034 is a duplicate of this.
[27 Nov 2009 11:08] Yasufumi Kinoshita
In innodb, PRIMARY KEY always at first and normal INDEX is added to last.
Only unique index is not sorted crrectly.

So, how about the following patch for now?

--- handler/ha_innodb.cc        2009-11-27 16:19:14.000000000 +0900
+++ handler/ha_innodb.cc        2009-11-27 19:49:21.000000000 +0900
@@ -2568,7 +2568,10 @@ innobase_alter_table_flags(
 {
        return(HA_ONLINE_ADD_INDEX_NO_WRITES
                | HA_ONLINE_DROP_INDEX_NO_WRITES
-               | HA_ONLINE_ADD_UNIQUE_INDEX_NO_WRITES
+               /* Current InnoDB doesn't sort unique indexes along mysqld's order
+                  It is dangerous to use index. So it is disabled until
+                  the bug http://bugs.mysql.com/47622 */
+               /* | HA_ONLINE_ADD_UNIQUE_INDEX_NO_WRITES */
                | HA_ONLINE_DROP_UNIQUE_INDEX_NO_WRITES
                | HA_ONLINE_ADD_PK_INDEX_NO_WRITES);
 }
[2 Dec 2009 7:39] Jimmy Yang
We accommodate this request by de-coupling the metadata sequence in cache and those on disk for innodb. Each time we open the table, and load index into memory, we check and make additional effort (if necessary) to sync up the innodb index sequence along with the sequence from mysql.
[3 Dec 2009 9:59] Marko Mäkelä
It seems that sorting the InnoDB data dictionary cache representation of the persistent InnoDB data dictionary records in SYS_INDEXES would generate a large number of problems. To name a few:

* In many places in row/row0*.c, the linked list of dict_table_t::indexes is traversed without any mutex protection.
 * It seems unreasonable to protect each traversal with dict_sys->mutex. That would likely generate a lot of mutex contention.
 * Because the doubly-linked list is never traversed in the backward direction, we could possibly sort the list by adapting a lock-free algorithm for singly-linked lists.
* When InnoDB is started, it can access tables before MySQL tries to access them.
 * trx_rollback_active() will roll back any transactions that were active when InnoDB was last shut down or killed
 * trx_purge() will remove delete-marked records pointed to by undo logs
* The mapping would not be adjusted until MySQL tries to access the InnoDB tables.
 * There could be unexpected consequences if the order of indexes suddenly changes. I am thinking about locking, transaction-level crash recovery and the idempotence of row operations.

All in all, decoupling SYS_INDEXES from the order of indexes in the .frm file will incur a performance penalty every time an InnoDB table is accessed. I would rather solve this bug at CREATE INDEX time.  I see two possible fixes:

(a) Always append the created indexes to the .frm file, no matter what. Adapt the MySQL query optimizer for this. (This may incur a performance penalty for every query execution.)
(b) Have InnoDB refuse to CREATE INDEX if the index would not be appended to the end. Workaround: ALTER TABLE … DROP INDEX …, DROP INDEX … for the indexes that would follow the to-be-created index in the MySQL sort order, and then ALTER TABLE … ADD INDEX …, ADD INDEX …, ADD INDEX ….

Solution (b) would add no complexity or performance penalty for normal operation. It would mean some inconvenience for the DBA, but I would tend to believe that any serious DBA would choose the inconvenience (having to DROP some indexes in order to be able to create an index) over consistently slower performance.

We could fine-tune solution (b) so that whenever there is a suitable gap in INDEX_ID in InnoDB, the index can be created by reusing that INDEX_ID.
[3 Dec 2009 10:31] Jimmy Yang
I concur the conclusion here. Re-ordering existing index metadata for creating new index is not a good idea in many senses. This could either involve work on existing indexes metadata on disk and possibly data reducing concurrency or introduce additional mapping require substantial synchronization maintainance.

We shall not change any existing index metadata or data information for creating new index. And any cached index ordering can be achieved easily in memory.

Thanks
Jimmy
[3 Dec 2009 11:22] Sergei Golubchik
Marko, I don't quite understand why you cannot have a mapping between MySQL index order and InnoDB index order. That is, each side uses its own ordering and the "interface layer" - that is, the handler - takes care of the mapping. Why would that affect the concurrency, internal InnoDB index order will stay the same ?

but if you want to do b) - fine, as it's a fix completely on InnoDB side, and does not require any MySQL changes, it's your call anyway.
[3 Dec 2009 14:02] Marko Mäkelä
ha_innodb.cc could implement the mapping from MySQL key numbers to dict_index_t* without touching the actual InnoDB data dictionary cache. A hash table implementing the mapping could be added to INNOBASE_SHARE.
[21 Jan 2010 14:45] Marko Mäkelä
There is a workaround for this bug:
SET OLD_ALTER_TABLE=ON;
This will enable the old table-copying ALTER TABLE (or CREATE INDEX or DROP INDEX).
[28 Jan 2010 16:10] Jimmy Yang
------------------------------------------------------------------------
r6526 | jyang | 2010-01-28 08:12:40 -0800 (Thu, 28 Jan 2010) | 8 lines

branches/zip: Add index translation table to map mysql index
number to InnoDB index structure directly. Fix Bug #47622:
"the new index is added before the existing ones in MySQL,
but after one in SE".
[6 Feb 2010 16:01] MySQL Verification Team
Can somebody confirm if the patch fixes another testcase?

----
drop table if exists `t1`;
create table `t1`(`a` int,`b` int,`c` int,`d` blob,
key (`d`(10)),key (`a`,`b`,`c`)) engine=innodb pack_keys=0;
alter table `t1` remove partitioning;
alter table `t1` partition by key(`d`)partitions 1;
----

In error log we have:
[ERROR] Index d of test/t1 has 2 columns unique inside InnoDB, but MySQL is asking statistics for 3 columns. <cut>
[8 Feb 2010 1:56] Jimmy Yang
Shane, "BLOB field seems not allowed in partition function":

mysql> create table `t1`(`a` int,`b` int,`c` int,`d` blob,
    -> key (`d`(10)),key (`a`,`b`,`c`)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table `t1` remove partitioning;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>
mysql> alter table `t1` partition by key(`d`)partitions 1;
ERROR 1502 (HY000): A BLOB field is not allowed in partition function
mysql>
mysql>  alter table `t1` partition by key(`d`)partitions 1;
ERROR 1502 (HY000): A BLOB field is not allowed in partition function
mysql> alter table `t1` partition by key(`c`) partitions 1;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0
[8 Feb 2010 5:54] MySQL Verification Team
Jimmy, nevermind the error on the client, that's expected to fail.  In innodb plugin 1.0.6 there is an error in the mysql error log after this test runs.. does it happen with this fix ?
[8 Feb 2010 6:04] Jimmy Yang
Shane, no error is printed with servers coming with the fix, for above scripts.

thanks
[22 Feb 2010 6:34] MySQL Verification Team
Hi Jimmy!

Can you check if another testcase works with the patch?
Problems seen were:
1. valgrind errors
   ==14918== Invalid read of size 4
   ==14918==    at 0x8515323: ha_innobase::open(char const*, int, unsigned int)      (ha_innodb.cc:3230)
2. 2G memory allocation attempted
3. error message:  [ERROR] Table ./test/t5 has a primary key in InnoDB data dictionary, but not in MySQL!

----
set old_alter_table=0;
drop table if exists `t5`;
create table `t5`(
`col1` int not null,
`col2` blob not null,
`col3` time not null)engine=innodb;
create index `idx1`on `t5`(`col1`);
create index `idx2` on `t5`(`col3`,`col1`);
create unique index `idx5` on `t5`(`col1`,`col2`(31));
analyze table t5;
check table t5 extended;
----
[22 Feb 2010 8:16] Marko Mäkelä
The patch does not fix that bug:

100222 10:12:06 [ERROR] Table ./test/t5 has a primary key in InnoDB data dictionary, but not in MySQL!
==17033== Thread 17:
==17033== Invalid read of size 4
==17033==    at 0x8352B3C: ha_innobase::open(char const*, int, unsigned int) (ha_innodb.cc:3510)

ref_length = table->key_info[primary_key].key_length;

We should not dereference table->key_info if primary_key >= MAX_KEY.
[22 Feb 2010 9:37] Jimmy Yang
Shane, this issue is not related to this fix.

In fact, this seems to be a MySQL issue or its behavior change. There creates a unique index (idx5) on all non null column, so InnoDB has following assumption on MySQL:

innobase_create_key_def() {

       /* If there is a UNIQUE INDEX consisting entirely of NOT NULL
        columns, MySQL will treat it as a PRIMARY KEY unless the
        table already has one. */
}

So InnoDB treat this as a primary key.

Unfortunately, MySQL somehow does not treat it as so, and set the primary_key still to be MAX_KEY, which means there is NO primary key defined. Thus the error message:

(gdb)  print  table->s->primary_key
$19 = 64

A simple create table, followed by the create unique index would trigger the problem (no need for other create indexes).

create table `t5`(
`col1` int not null,
`col2` blob not null,
`col3` time not null)engine=innodb;

create unique index `idx5` on `t5`(`col1`,`col2`(31));

Please have MySQL check if above assumption still holds or there is behavior change.

Thanks
[22 Feb 2010 10:46] Marko Mäkelä
Jimmy filed Bug #51378 for the Valgrind error [22 Feb 7:34] Shane Bester
[22 Feb 2010 11:39] Jimmy Yang
Please note, bug #51378 does not address the root cause of Valgrind error [22 Feb 7:34] Shane Bester (see analysis above).

MySQL might need to investigate the real cause of it, ie. - why the unique index was not treated as primary index in the first place
[17 Mar 2010 16:27] Feronia P
I think this problem caused a severe failure for my replication. I added a unique index to a table that already had a pk (auto_increment) and several fks. Didn't notice the error in the log and started to insert data via:

insert into tableA (pk,fk,name)
select null, fk, concat('new ',name)
from tableA
where name like 'something%';

The statement completed without error on both master and slave but the auto_increment pk id that was given to each entry was mixed up on the slave. 
The same id where used just for different entries. This is a rather big problem for me. When will this patch be integrated? (using 5.1.45 + 1.0.6 and still have this problem)
[1 Apr 2010 11:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/104823

3470 Sergey Vojtovich	2010-04-01
      Applying InnoDB snapshot, fixes BUG#47622.
      
      Detailed revision comments:
      
      r6526 | jyang | 2010-01-28 18:12:40 +0200 (Thu, 28 Jan 2010) | 8 lines
      branches/zip: Add index translation table to map mysql index
      number to InnoDB index structure directly. Fix Bug #47622:
      "the new index is added before the existing ones in MySQL,
      but after one in SE".
      
      rb://215, approved by Marko
[6 Apr 2010 7:58] Bugs System
Pushed into 5.1.46 (revid:sergey.glukhov@sun.com-20100405111026-7kz1p8qlzglqgfmu) (version source revid:svoj@sun.com-20100401151005-c6re90vdvutln15d) (merge vers: 5.1.46) (pib:16)
[5 May 2010 15:11] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[27 May 2010 10:30] Jimmy Yang
The implication of this bug is that there could be mismatch between index order in MySQL and InnoDB metadata, so that wrong index metadata could be returned to MySQL by InnoDB and causing subsequent errors.
[28 May 2010 6:10] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:38] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 7:06] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[15 Jun 2010 8:13] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)
[15 Jun 2010 8:28] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)
[17 Jun 2010 12:15] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:02] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:43] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[13 Jul 2010 16:20] Paul DuBois
Noted in 5.1.46, 5.5.5 changelogs.

When using fast ALTER TABLE, different internal ordering of indexes
in the MySQL optimizer and the InnoDB storage engine could cause
error messages about possibly mixed up .frm files and incorrect index
use.
[19 Jul 2010 14:35] Bugs System
Pushed into 5.1.49 (revid:build@mysql.com-20100719143034-omcma40sblwmay3x) (version source revid:vasil.dimov@oracle.com-20100704071244-3lo4okzels3kvy1p) (merge vers: 5.1.49) (pib:16)
[23 Jul 2010 12:25] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (version source revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (merge vers: 5.5.6-m3) (pib:18)
[23 Jul 2010 12:32] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100723121929-90e9zemk3jkr2ocy) (version source revid:alik@sun.com-20100723121827-3bsh51m5sj6g4oma) (pib:18)
[4 Aug 2010 23:03] Paul DuBois
Already fixed in 5.1.x, 5.5.x.
[10 Aug 2010 12:24] Marko Mäkelä
This introduced Bug #55878 in MySQL 5.5.
[14 Oct 2010 8:26] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[14 Oct 2010 8:41] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[14 Oct 2010 8:56] Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (merge vers: 5.1.51-ndb-6.2.19) (pib:21)
[14 Oct 2010 13:39] Jon Stephens
Already documented as above; no additional changelog entries required. Set back to Closed state.