Bug #98501 Exchanging partition with tables does not update SDI correctly
Submitted: 6 Feb 2020 14:01 Modified: 27 Feb 2020 1:06
Reporter: Fungo Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Partitions Severity:S2 (Serious)
Version:8.0.19 OS:Any
Assigned to: CPU Architecture:Any
Tags: duplicate, exchange partition, SDI

[6 Feb 2020 14:01] Fungo Wang
Description:
We spotted this issue while preparing backup result, using Percona XtraBackup

You can find the detail here if interested, 
https://jira.percona.com/browse/PXB-2101.

After some investigation, I found the root cause is that different tablespaces contain duplicated info in SDI, which is caused by "Exchange partition" operation.

How to repeat:
Create a normal table and a partition table, the do exchange partition

```
USE test;
CREATE TABLE pt1 (a INT NOT NULL, PRIMARY KEY(a)) engine = InnoDB 
    PARTITION BY RANGE (a) PARTITIONS 3 ( 
        PARTITION p1 VALUES LESS THAN (1000), 
        PARTITION p2 VALUES LESS THAN (2000), 
        PARTITION p3 VALUES LESS THAN (3000));

CREATE TABLE t1 (a INT NOT NULL, PRIMARY KEY(a)) engine = InnoDB;
ALTER TABLE pt1 EXCHANGE PARTITION p1 WITH TABLE t1;
```

Then we can use ibd2sdi to check the SDI info, in tablespaces
1. test/t1.ibd
2. test/pt1#p#p1.ibd

For test/t1.ibd there are multiple table SDI info

{
        "type": 1,
        "id": 347,
        "object":

...
info for the partition table

{
        "type": 1,
        "id": 348,
        "object":
...
info for the normal table 

For test/pt1#p#p1.ibd, there are also multiple table SDI info

{
        "type": 1,
        "id": 347,
        "object":

...
info for the partition table

{
        "type": 1,
        "id": 348,
        "object":
..
info for the normal table 

Suggested fix:
Remove the old SDI info, and keep the SDI consistent with table.
[6 Feb 2020 15:15] MySQL Verification Team
Hi Mr. Wang,

Regarding backup tools, we support MEB, mysqldump and mysqlpump.

Hence, regarding the tool that you mention, we do not support it.

If you manage to reproduce this behaviour with the above mentioned tools, please create a new report.
[6 Feb 2020 15:59] MySQL Verification Team
Hi,

I have tested your test case and it is independent of the backup tool.

I am simply asking what should SDI for the partition look like, in your point of view, and why ?????

Waiting on the feedback.
[7 Feb 2020 0:37] Fungo Wang
Hi Sinisa,

After the exchange partition, use ibd2sdi to parse SDI test/t1.ibd, the JSON result is (removed some info):

ibd2sid ./test/t1.ibd --skip-pretty

```
["ibd2sdi"
,
{
        "type": 1,
        "id": 347,
        "object":
                {"mysqld_version_id":80019,"dd_version":80017,"sdi_version":80019,"dd_object_type":"Table","dd_object":{"name":"pt1","mysql_version_id":80019,"created":20200207002025,"last_altered":20200207002025,"hidden":1,"options":"avg_row_length=0;encrypt_type=N;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;" ... ,"schema_ref":"test",,"indexes":[{"options":"","se_private_data":"id=146;root=4;space_id=4;table_id=1062;trx_id=1545;","index_opx":0,"tablespace_ref":"test/pt1#p#p1"}],"subpartitions":[]},{"name":"p2","parent_partition_id":18446744073709551615,"number":1,"se_private_id":1063,"description_utf8":"2000","engine":"InnoDB","comment":"","options":"","se_private_data":"","values":[{"max_value":false,"null_value":false,"list_num":0,"column_num":0,"value_utf8":"2000"}],"indexes":[{"options":"","se_private_data":"id=147;root=4;space_id=5;table_id=1063;trx_id=1545;","index_opx":0,"tablespace_ref":"test/pt1#p#p2"}],"subpartitions":[]},{"name":"p3","parent_partition_id":18446744073709551615,"number":2,"se_private_id":1064,"description_utf8":"3000","engine":"InnoDB","comment":"","options":"","se_private_data":"","values":[{"max_value":false,"null_value":false,"list_num":0,"column_num":0,"value_utf8":"3000"}],"indexes":[{"options":"","se_private_data":"id=148;root=4;space_id=6;table_id=1064;trx_id=1545;","index_opx":0,"tablespace_ref":"test/pt1#p#p3"}],"subpartitions":[]}],"collation_id":255}}
}
,
{
        "type": 1,
        "id": 348,
        "object":
                {"mysqld_version_id":80019,"dd_version":80017,"sdi_version":80019,"dd_object_type":"Table","dd_object":{"name":"t1","mysql_version_id":80019,"created":20200207002025,"last_altered":20200207002025,"hidden":1,"options":"avg_row_length=0;encrypt_type=N;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;", ... "schema_ref":"test","se_private_id":1062,"engine":"InnoDB","last_checked_for_upgrade_version_id":0,"comment":"","se_private_data":"","row_format":2,"partition_type":0,"partition_expression":"","partition_expression_utf8":"","default_partitioning":0,"subpartition_type":0,"subpartition_expression":"","subpartition_expression_utf8":"","default_subpartitioning":0,"indexes":[{"name":"PRIMARY","hidden":false,"is_generated":false,"ordinal_position":1,"comment":"","options":"flags=0;","se_private_data":"id=146;root=4;space_id=4;table_id=1062;trx_id=1545;","type":1,"algorithm":2,"is_algorithm_explicit":false,"is_visible":true,"engine":"InnoDB","elements":[{"ordinal_position":1,"length":4,"order":2,"hidden":false,"column_opx":0},{"ordinal_position":2,"length":4294967295,"order":2,"hidden":true,"column_opx":1},{"ordinal_position":3,"length":4294967295,"order":2,"hidden":true,"column_opx":2}],"tablespace_ref":"test/t1"}],"foreign_keys":[],"check_constraints":[],"partitions":[],"collation_id":255}}
}
,
{
        "type": 2,
        "id": 9,
...
]
```

There are 2 table type entries, one for the old partition table, as you can find `"tablespace_ref": "test/pt1#p#p1"` in the object field of  ("type": 1,        "id": 347) entry; the other one is for the normal table, as you can find "tablespace_ref": "test/t1" the in object field of entry(  "type": 1,    "id": 348,).

I don't think this is expected. If using DDL such as `OPTIMIZE TABLE t1`, which will update the SDI info, the partition info in SDI is removed.

ibd2sid ./test/t1.ibd --skip-pretty

```
["ibd2sdi"
,
{
        "type": 1,
        "id": 349,
        "object":
                {"mysqld_version_id":80019,"dd_version":80017,"sdi_version":80019,"dd_object_type":"Table","dd_object":{"name":"t1","mysql_version_id":80019,"created":20200207002348,"last_altered":20200207002348,"hidden":1,"options":"avg_row_length=0;encrypt_type=N;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;", ... "schema_ref":"test","se_private_id":1066,"engine":"InnoDB","last_checked_for_upgrade_version_id":0,"comment":"","se_private_data":"","row_format":2,"partition_type":0,"partition_expression":"","partition_expression_utf8":"","default_partitioning":0,"subpartition_type":0,"subpartition_expression":"","subpartition_expression_utf8":"","default_subpartitioning":0,"indexes":[{"name":"PRIMARY","hidden":false,"is_generated":false,"ordinal_position":1,"comment":"","options":"flags=0;","se_private_data":"id=150;root=4;space_id=8;table_id=1066;trx_id=1595;","type":1,"algorithm":2,"is_algorithm_explicit":false,"is_visible":true,"engine":"InnoDB","elements":[{"ordinal_position":1,"length":4,"order":2,"hidden":false,"column_opx":0},{"ordinal_position":2,"length":4294967295,"order":2,"hidden":true,"column_opx":1},{"ordinal_position":3,"length":4294967295,"order":2,"hidden":true,"column_opx":2}],"tablespace_ref":"test/t1"}],"foreign_keys":[],"check_constraints":[],"partitions":[],"collation_id":255}}
}
,
{
        "type": 2,
        "id": 13,
...
]
```

The above also applies to the partition test/pt1#p#p1.ibd.
[7 Feb 2020 12:55] MySQL Verification Team
Hi Mr. Wang,

I have seen this output as I have run your test case, exactly as you submitted it.

However, I do not see what is wrong in that output. I asked you already to explain what is a bug in that output and you failed to reply to me.

I have discussed this issue with developers in charge and they claim that if the .SDI file is readable, then this is not a bug.
[7 Feb 2020 15:42] MySQL Verification Team
Hi,

Actually, it turns out that this behaviour is necessary due to a fix for one bug. To quote:

"
Any ALTER is a DDL. And for any DDL we write the SDI again as part of
committing the transaction. So the SDI will then be stored in whatever is now
the first partition in the m_partitions list.

SDI is are never updated, just written (from the server's perspective. It may
well be implemented as an update in SE if the SDI record is present). So if
the partition previously used to store the SDI is no longer available (in the
DD) the whole SDI is written to the tablespace of what is now the first
partition.
"

This comment means that an exchange partition DDL should cause the SDI to be rewritten in each tablespace exchanged, and the partition tablespace will only be written if it is the first partition. 

I hope that this explains the intended behaviour.
[9 Feb 2020 4:58] Fungo Wang
Hi Sinisa,

Thanks for the explanation.

https://dev.mysql.com/doc/refman/8.0/en/serialized-dictionary-information.html
The documentation of SDI would make the user (at least for me) thought that *a InnoDB tablespace should store the SDI only about the table it contains*. (PXB developers may also has such expectation, and then depends on the SDI info to init innodb dictionary cache during crash recovery).

And that's why I submitted this bug report.

If a tablespace could contain redundant and unrelated SDI info, and this is by design, this bug report should be invalid.

But I would suggest the SDI documentation should mention such unrelated SDI could exist.
[10 Feb 2020 13:22] MySQL Verification Team
Hi Mr. Wang,

I agree, such a small addition to the documentation would not harm.

Verified as a documentation bug.
[14 Feb 2020 13:48] Dyre Tjeldvoll
Posted by developer:
 
Just a couple of clarifications from the dev responsible here:

As everybody probably realizes, EXCHANGE PARTITION just swaps the "partitions". The SDIs are stored in a special Btree in the tablespace/partition, so after the swap you are left with whatever had been stored in that Btree by the other table.

The EXCHANGE PARTITION is itself a DDL operation, so it will write the new (correct) SDI into the Btree. But it will use the table id of the table now owning the partition as the key. So the new SDI will be inserted along with the existing SDI.

For a "normal" DDL operation you will write the SDI with the same key and so the SDI record is updated (like a PK update).

Arguably, we should have cleared the SDI Btree as part of EXCHANGE PARTITION.
[14 Feb 2020 14:00] MySQL Verification Team
Thank you very much,  Mr. Tjeldvoll, for the clarification.
[14 Feb 2020 15:08] Fungo Wang
Hi Dyre, Sinisa,

Thanks for your clarifications.

Just an update.
"Importing InnoDB Tables" scenario could also lead to incorrect SDIs. 
https://dev.mysql.com/doc/refman/8.0/en/innodb-table-import.html

```
session 1> use test
session 1> CREATE TABLE t1 (a INT NOT NULL, PRIMARY KEY(a)) engine = InnoDB;                                                                                                 
session 1> flush table t1 for export;                                                                                                                                        

```

```
session 2> use test2
session 2> CREATE TABLE t1 (a INT NOT NULL, PRIMARY KEY(a)) engine = InnoDB;
session 2> alter table t1 discard tablespace;
```

cp /path/to/test/t1.* /path/to/test2

```
session 2> alter table t1 import tablespace;
```

then ibd2sdi /path/to/test2/t1.ibd 

BTW, if this is unexpected behavior, then it is not a Doc bug and we should change the category.
[26 Feb 2020 8:21] Satya Bodapati
May I know why this is still Documentation bug? Specially after Dyre acknowledged that SDI could have been cleared for EXCHANGE PARTITION.

The SDI is wrong in two places after an exchange. 

1) The t1.ibd (which is old p1.ibd) still has partition info etc
2) The new p1.ibd (which is old t1.ibd) still has old SDI info.

Like the reporter said, this bug also exists for IMPORT TABLESPACE. On import too, old SDI should be erased.

Also SDI from a tablespace should strictly contain SDI about the current tables in it.

Currently SDI is not used anywhere. With the future goal, there will be problems.

IMHO, dict_sdi_get_keys() can be used to get current keys and remove. Or the dict_sdi_drop() should be used to drop entire SDI. The APIs are currently disabled.
[26 Feb 2020 11:59] Satya Bodapati
More thoughts:
dict_sdi_drop, will drop the entire SDI Index. SDI Index has to be recreated in such case. More a like a DDL operation. Should avoid this

Better to use get_keys() and remove entries. This is like DML operation DELETE * FROM..
[26 Feb 2020 13:19] MySQL Verification Team
Hi Satya,

I have analysed this bug again. I have also analysed the code that deals with SDI update.

I am changing both a category and severity of this bug in all relevant places.

Thanks.
[26 Feb 2020 13:22] Satya Bodapati
Thanks Sinisa and Thanks to Dyre for giving honest opinion.
[27 Feb 2020 1:06] Fungo Wang
Thanks for all you guys, I really appreciate all the effort you have done, to make the bug get proper handle!

BTW, as IMPORT TABLESPACE is also affected, which has nothing to do with partitions, maybe "MySQL Server: Partitions" category is a bit misleading and inappropriate?

Best regards!
[27 Feb 2020 12:55] MySQL Verification Team
Hi Mr. Wang,

Both categories are handled by the same team , so .........
[27 May 2020 9:48] tu ming
any progress´╝č
[27 May 2020 13:13] MySQL Verification Team
Hi Mr. Wang,

When this bug is fixed, then this page will be updated. In short, you will know it before I do.