Bug #110352 mysqlsh produces wrong dump in case of complex primary key
Submitted: 14 Mar 2023 1:31 Modified: 8 Jun 2023 16:32
Reporter: Vaskes Valmos Email Updates:
Status: Closed Impact on me:
None 
Category:Shell Dump & Load Severity:S2 (Serious)
Version:8.0.32 OS:CentOS (CentOS Linux release 7.9.2009 (Core))
Assigned to: CPU Architecture:x86
Tags: mysqlsh dumptables chunk mismatch dumpInstance

[14 Mar 2023 1:31] Vaskes Valmos
Description:
Mysql-shell / mysqlsh produces dump of tables or of instance with mismatch rows count between source server and destination dump / MySQL server.

Initially I've found this bug at migration from 5.6 to 8.0 MySQL Server, but to report this bug I've reproduced it on latest stable available version of MySQL.

How to repeat:
There is an initial database with one table:
```
mysql> use t1;
Database changed
mysql> select * from product_service_map;
+--------------+-------------------+------------+
| content_type | service           | product_id |
+--------------+-------------------+------------+
| music_video  | individual        |        255 |
| music        | trial             |        327 |
| ringtone     | trial             |        327 |
| music        | family            |        383 |
| ringtone     | family            |        383 |
| music        | individual        |        384 |
| ringtone     | individual        |        384 |
| music_video  | family            |        385 |
| music_video  | trial             |        386 |
| music        | student           |        392 |
| ringtone     | student           |        392 |
| music_video  | student           |        393 |
| music        | match             |        394 |
| ringtone     | match             |        394 |
| music_video  | match             |        395 |
| music        | individual_annual |        431 |
| ringtone     | individual_annual |        431 |
| music_video  | individual_annual |        432 |
+--------------+-------------------+------------+
18 rows in set (0.00 sec)

mysql> show create table product_service_map\G
*************************** 1. row ***************************
       Table: product_service_map
Create Table: CREATE TABLE `product_service_map` (
  `content_type` enum('music','music_video','ringtone') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `service` enum('individual','individual_annual','match','student','family','trial') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `product_id` int unsigned NOT NULL,
  PRIMARY KEY (`content_type`,`service`),
  KEY `product_id` (`product_id`),
  CONSTRAINT `product_service_map_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `foo`.`product` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
1 row in set (0.00 sec)
```

Let's dump it with mysqlsh:
```
$ mysqlsh -S /var/lib/mysql/mysql.sock
MySQL Shell 8.0.32

Copyright (c) 2016, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
Creating a Classic session to 'root@/var%2Flib%2Fmysql%2Fmysql.sock'
Fetching schema names for auto-completion... Press ^C to stop.
Your MySQL connection id is 23
Server version: 8.0.31-23 Percona Server (GPL), Release 23, Revision 71449379
No default schema selected; type \use <schema> to set one.
 MySQL  localhost  JS > util.dumpTables("t1",[ "product_service_map" ],"/tmp/t1_product_service_map",{ threads: 1, chunking: true })
Acquiring global read lock
Global read lock acquired
Initializing - done
1 tables and 0 views will be dumped.
Gathering information - done
All transactions have been started
Locking instance for backup
Global read lock has been released
Writing global DDL files
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
94% (17 rows / ~18 rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed
Dump duration: 00:00:00s
Total duration: 00:00:00s
Schemas dumped: 1
Tables dumped: 1
Uncompressed data size: 385 bytes
Compressed data size: 171 bytes
Compression ratio: 2.3
Rows written: 17
Bytes written: 171 bytes
Average uncompressed throughput: 385.00 B/s
Average compressed throughput: 171.00 B/s
 MySQL  localhost  JS >
```

Now we can check an on-disk copy:
```
# ls -al
total 48
drwxr-x---  2 root root 4096 Mar 14 01:14 .
drwxrwxrwt. 9 root root 4096 Mar 14 01:14 ..
-rw-r-----  1 root root  234 Mar 14 01:14 @.done.json
-rw-r-----  1 root root  747 Mar 14 01:14 @.json
-rw-r-----  1 root root  234 Mar 14 01:14 @.post.sql
-rw-r-----  1 root root  234 Mar 14 01:14 @.sql
-rw-r-----  1 root root  255 Mar 14 01:14 t1.json
-rw-r-----  1 root root  171 Mar 14 01:14 t1@product_service_map@@0.tsv.zst
-rw-r-----  1 root root    8 Mar 14 01:14 t1@product_service_map@@0.tsv.zst.idx
-rw-r-----  1 root root  680 Mar 14 01:14 t1@product_service_map.json
-rw-r-----  1 root root 1092 Mar 14 01:14 t1@product_service_map.sql
-rw-r-----  1 root root  417 Mar 14 01:14 t1.sql
# zstdcat t1@product_service_map@@0.tsv.zst | wc -l
17
# zstdcat t1@product_service_map@@0.tsv.zst
music   individual      384
music   individual_annual       431
music   match   394
music   student 392
music   trial   327
music_video     individual      255
music_video     individual_annual       432
music_video     match   395
music_video     student 393
music_video     family  385
music_video     trial   386
ringtone        individual      384
ringtone        individual_annual       431
ringtone        match   394
ringtone        student 392
ringtone        family  383
ringtone        trial   327
$ ```

This is dump of problem table:
```--
-- Table structure for table `product_service_map`
--

DROP TABLE IF EXISTS `product_service_map`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `product_service_map` (
  `content_type` enum('music','music_video','ringtone') CHARACTER SET ascii NOT NULL,
  `service` enum('individual','individual_annual','match','student','family','trial') CHARACTER SET ascii NOT NULL,
  `product_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`content_type`,`service`),
  KEY `product_id` (`product_id`),
  CONSTRAINT `product_service_map_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `foo`.`product` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `product_service_map`
--

LOCK TABLES `product_service_map` WRITE;
/*!40000 ALTER TABLE `product_service_map` DISABLE KEYS */;
INSERT INTO `product_service_map` VALUES ('music_video','individual',255),('music','trial',327),('ringtone','trial',327),('music','family',383),('ringtone','family',383),('music','individual',384),('ringtone','individual',384),('music_video','family',385),('music_video','trial',386),('music','student',392),('ringtone','student',392),('music_video','student',393),('music','match',394),('ringtone','match',394),('music_video','match',395),('music','individual_annual',431),('ringtone','individual_annual',431),('music_video','individual_annual',432);
/*!40000 ALTER TABLE `product_service_map` ENABLE KEYS */;
UNLOCK TABLES;
```

All tables with the same issue has below definition:
```*************************** 1. row ***************************
product_map
CREATE TABLE `product_map` (
  `content_type` enum('sr','music','music_video','ringtone') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `product_id` int unsigned NOT NULL,
  PRIMARY KEY (`content_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
product_service_map
CREATE TABLE `product_service_map` (
  `content_type` enum('music','music_video','ringtone') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `service` enum('individual','individual_annual','match','student','family','trial') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `product_id` int unsigned NOT NULL,
  PRIMARY KEY (`content_type`,`service`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_identified
CREATE TABLE `view_identified` (
  `owned` enum('entirely','partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`owned`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_identified_date
CREATE TABLE `view_identified_date` (
  `owned` enum('entirely','partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `date` date NOT NULL,
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`owned`,`date`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_identified_licensor
CREATE TABLE `view_identified_licensor` (
  `owned` enum('entirely','partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `licensor_id` int unsigned NOT NULL,
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`owned`,`licensor_id`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_identified_licensor_date
CREATE TABLE `view_identified_licensor_date` (
  `owned` enum('entirely','partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `licensor_id` int unsigned NOT NULL,
  `date` date NOT NULL,
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`owned`,`licensor_id`,`date`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_licensor
CREATE TABLE `view_licensor` (
  `focus` enum('identified','unconfirmed','ignored','owned','claimable','claimed','unclaimable','rightless','disputed','disputed_copublishing','disputed_contribution','looped','licensed_non_exclusively','rightless_distribution','accounted','unaccounted','withheld') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'identified',
  `licensor_id` int unsigned NOT NULL,
  PRIMARY KEY (`focus`,`licensor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_licensor_date
CREATE TABLE `view_licensor_date` (
  `focus` enum('identified','unconfirmed','ignored','owned','claimable','claimed','unclaimable','rightless','disputed','disputed_copublishing','disputed_contribution','looped','licensed_non_exclusively','rightless_distribution','accounted','unaccounted','withheld') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'identified',
  `licensor_id` int unsigned NOT NULL,
  `date` date NOT NULL,
  PRIMARY KEY (`focus`,`licensor_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_unclaimable
CREATE TABLE `view_unclaimable` (
  `focus` enum('rightless','disputed','disputed_copublishing','disputed_contribution','looped','licensed_non_exclusively','rightless_distribution') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'rightless',
  `owned` enum('partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`focus`,`owned`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_unclaimable_date
CREATE TABLE `view_unclaimable_date` (
  `focus` enum('rightless','disputed','disputed_copublishing','disputed_contribution','looped','licensed_non_exclusively','rightless_distribution') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'rightless',
  `owned` enum('partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `date` date NOT NULL,
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`focus`,`owned`,`date`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_unclaimable_licensor
CREATE TABLE `view_unclaimable_licensor` (
  `focus` enum('rightless','disputed','disputed_copublishing','disputed_contribution','looped','licensed_non_exclusively','rightless_distribution') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'rightless',
  `owned` enum('partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `licensor_id` int unsigned NOT NULL,
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`focus`,`owned`,`licensor_id`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
view_unclaimable_licensor_date
CREATE TABLE `view_unclaimable_licensor_date` (
  `focus` enum('rightless','disputed','disputed_copublishing','disputed_contribution','looped','licensed_non_exclusively','rightless_distribution') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'rightless',
  `owned` enum('partially','no') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'no',
  `licensor_id` int unsigned NOT NULL,
  `date` date NOT NULL,
  `track_id` int unsigned NOT NULL,
  PRIMARY KEY (`focus`,`owned`,`licensor_id`,`date`,`track_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
*************************** 1. row ***************************
property_weight
CREATE TABLE `property_weight` (
  `resource_type` enum('release','track','recording','composition','video') CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `property_id` tinyint unsigned NOT NULL,
  `weight` tinyint unsigned NOT NULL DEFAULT '100',
  PRIMARY KEY (`resource_type`,`property_id`),
  KEY `property_id` (`property_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3```

Suggested fix:
CHECKSUM each table if you migrating your database;
do not use complex primary keys;
probably `chunking: false` could solve the problem; 
wait for fix.
[14 Mar 2023 1:45] Vaskes Valmos
"chunking: false" creates really full dump:

util.dumpTables("t1",[ "product_service_map" ],"/tmp/t1_product_service_map",{ threads: 1, chunking: false })
[14 Mar 2023 11:27] MySQL Verification Team
Hello Vaskes,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[14 Mar 2023 12:28] Vaskes Valmos
For community treasures I prefer to have this ticket open to world.

So hope you will unmark it as private :)
[14 Mar 2023 12:28] Pawel Andruszkiewicz
Posted by developer:
 
This issue affects tables where primary key contains an enum column, which has its values defined in non-alphabetical order. Chunker uses < and > operators to find boundaries of a chunk, and these use lexicographical ordering. As a workaround, fixing the order allows to dump with chunking enabled.
[14 Mar 2023 18:04] Omer Barnir
Reporter's change of label in queries replaced with 'foo'
[8 Jun 2023 16:32] Edward Gilmore
Posted by developer:
 
Added the following note to the MySQL Shell 8.0.34 and 8.1.0 release notes:
	
  If chunking was enabled for a dump of tables, but the primary key or unique index used to chunk the table
  contained an ENUM column, some of the tables rows were not exported to the dump. This occurred if the ENUM
  column's values were not ordered alphabetically.
  As of this release, primary keys or unique indexes which contain one or more ENUM columns are ignored when
  selecting an index for chunking.